Skip to content
View zahere's full-sized avatar

Block or report zahere

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
zahere/README.md

Zaher Khateeb | AI/ML Engineer

AGNTCY-native agent infrastructure • Multi-agent systems • LLM infrastructure • Formal methods

Founder of AgentiCraft — an AGNTCY-native, 8-layer service mesh for production multi-agent AI systems. Solo-built end-to-end across Rust, Python, Go, and TypeScript.

I work at the intersection of formal methods and production engineering — building systems that are provably correct, not just empirically okay.

Founder — open to the right opportunities Website LinkedIn


What I've Built

AgentiCraft — AGNTCY-native, 8-layer production platform for multi-agent AI:

  • AGNTCY-native across the full stack — OASF agent schemas, a federated discovery directory, W3C Verifiable Credentials identity, and SLIM messaging, implemented end-to-end in Rust + Python (the Linux Foundation's "Internet of Agents" standard, contributed by Cisco)
  • Unified inference across 18 LLM providers with Thompson Sampling routing and per-destination circuit breakers — cutting inference cost through provider routing + caching
  • MCP/A2A protocol interoperability layer with native codec handlers — cutting redundant LLM calls through lossless bidirectional translation
  • Full-stack platform — FastAPI control plane, Next.js dashboard on Vercel, Python and TypeScript SDKs (Apache 2.0)
  • Formal verification foundation — CSP process algebra, multiparty session types, CTL model checking, probabilistic verification (open source on PyPI, Apache 2.0)
  • Shipped multi-agent Telegram bot — 6 domain agents, persistent memory, human-in-the-loop approvals, deployed to production via Docker Compose with CI/CD

Architecture spans Layers 0–7: formal verification, NATS transport, Rust data plane, Python control plane, Kubernetes operator, developer SDK, app framework, end-user products.


Current Research

"When Does Topology Matter? Reliability Polynomials for Stochastic Service Meshes" (in progress)

An iff characterization of when network topology affects multi-agent resilience. Core result: under crash-stop faults all mesh topologies are equivalent (a mathematical identity), while Byzantine faults break that equivalence in ways determined by the coordination protocol, not the graph structure.

Validated across ~34,000 LLM experiments spanning 13 coordination topologies, two fault regimes, two task domains, and two model generations.


Technical Focus

Multi-Agent Systems — mesh coordination, fault-dependent topology selection, Byzantine fault tolerance for LLM systems, MCP/A2A protocol integration

Formal Methods — session type theory for deadlock-freedom guarantees, runtime property verification, CSP process algebra, refinement checking

LLM Infrastructure — provider-agnostic inference abstraction, statistical circuit breakers with CUSUM-optimal change detection, quality-weighted reliability theory

Distributed Systems — consensus protocols, fault injection, observability, Kubernetes-native deployment


Tech Stack

Languages: Python, Rust, Go, TypeScript, SQL, Bash

AI/ML: PyTorch, Transformers, vLLM, LangChain, RAG, Fine-tuning (LoRA/QLoRA), Inference Optimization, Agentic Workflows

Infrastructure: FastAPI, Pydantic, Next.js, Kubernetes, Docker, CI/CD, gRPC/Protobuf, OpenTelemetry, Prometheus, PostgreSQL, Redis, Qdrant

Cloud: AWS, GCP, Vercel, Nebius Cloud


Background

  • Founder & Lead Architect, AgentiCraft (May 2025 – Present)
  • AI & Infrastructure Engineer, Visual Arena, Gothenburg (Nov 2023 – Oct 2024)
  • AI Performance Engineering, Nebius Academy, Tel Aviv University (Mar 2026 – Present)
  • Advanced Data Science & AI (Y-DATA), Nebius Academy, Tel Aviv University (Nov 2023 – Aug 2024)
  • B.Sc. Industrial Engineering & Management (Data Science concentration), Tel Aviv University (2017 – 2022)

Let's Connect

Email LinkedIn AgentiCraft

Languages

English (Fluent) · Hebrew (Fluent) · Arabic (Native)

Last updated: May 2026

Pinned Loading

  1. stochastic-circuit-breaker stochastic-circuit-breaker Public

    Statistically optimal circuit breaker for stochastic systems. 4-state CUSUM-based FSM with provably minimax detection delay (Moustakides 1986). Zero dependencies.

    Python

  2. reliability-polynomials reliability-polynomials Public

    Generalized reliability polynomials for quality-weighted network analysis. Every reliability library assumes binary survival — this one doesn't. Zero dependencies.

    Python