diff --git a/py/PARITY_AUDIT.md b/py/PARITY_AUDIT.md index 1a2b5ca351..42881516b1 100644 --- a/py/PARITY_AUDIT.md +++ b/py/PARITY_AUDIT.md @@ -1,7 +1,7 @@ # Genkit Feature Parity Audit — JS / Go / Python -> Generated: 2025-02-08. Updated: 2026-02-08. Baseline: `firebase/genkit` JS implementation, with explicit JS vs Go vs Python parity tracking. -> Last verified: 2026-02-08 against genkit-ai org (14 repos) and BloomLabsInc/genkit-plugins. +> Generated: 2025-02-08. Updated: 2026-02-09. Baseline: `firebase/genkit` JS implementation, with explicit JS vs Go vs Python parity tracking. +> Last verified: 2026-02-09 against genkit-ai org (14 repos) and BloomLabsInc/genkit-plugins. ## 1. Plugin Parity Matrix @@ -331,7 +331,14 @@ Python users typically use `httpx` or `requests` directly. |---------|:--:|:--:|:------:|-----------|:--------:| | `runFlow` / `streamFlow` client | ✅ (beta/client) | ❌ | ❌ | Go + Python | P2 | | `defineTool({multipart: true})` | ✅ | ✅ | ❌ | Python | P1 | -| Model API V2 (`apiVersion: 'v2'`) | ✅ | ❌ | ❌ | Go + Python | P1 | +| ~~Model API V2 (`apiVersion: 'v2'`)~~ | ~~✅~~ | ~~❌~~ | ~~❌~~ | ~~Go + Python~~ | ~~Superseded by Middleware V2 + Bidi~~ | +| **Generate Middleware V2** (3-tier: `generate`/`model`/`tool` hooks) | 🔄 RFC | 🔄 RFC | ❌ | All SDKs | P0 | +| **`defineBidiAction`** | 🔄 | 🔄 RFC | ❌ | Go + Python | P1 | +| **`defineBidiFlow`** | 🔄 | 🔄 RFC | ❌ | Go + Python | P1 | +| **`defineBidiModel`** / `generateBidi` | 🔄 | 🔄 RFC | ❌ | Go + Python | P1 | +| **`defineAgent`** (replaces Chat API) | 🔄 RFC | 🔄 RFC | ❌ | Go + Python | P1 | +| **Plugin V2** (plugins provide middleware) | ✅ | ❌ | ❌ | Go + Python | P2 | +| **Reflection API V2** (WebSocket + JSON-RPC 2.0) | 🔄 | 🔄 | 🔄 (draft) | All SDKs | P1 | | `defineDynamicActionProvider` | ✅ | ❌ | ✅ | Go | P2 | | `defineIndexer` | ✅ | ❌ | ✅ | Go | P2 | | `defineReranker` | ✅ | ❌ | ✅ | Go | P2 | @@ -479,35 +486,44 @@ Full plugin list from the repository README (10 plugins, 33 contributors, 54 rel ### 7a. Python Roadmap (JS-Canonical Parity) -| Gap ID | SDK | Work Item | Reference | Status | -|--------|-----|-----------|-----------|:------:| -| G2 → G1 | Python | Add `middleware` storage to `Action`, then add `use=` to `define_model` | §8b.1 | ⬜ | -| G7 | Python | Wire DAP action discovery into `GET /api/actions` | §8a, §8c.5 | ⏳ Deferred | -| G6 → G5 | Python | Pass `span_id` in `on_trace_start`, send `X-Genkit-Span-Id` | §8c.3, §8c.4 | ⬜ | -| G3 | Python | Implement `simulate_constrained_generation` middleware | §8b.3, §8f | ⬜ | -| G12 | Python | Implement `retry` middleware | §8f | ⬜ | -| G13 | Python | Implement `fallback` middleware | §8f | ⬜ | -| G14 | Python | Implement `validate_support` middleware | §8f | ⬜ | -| G15 | Python | Implement `download_request_media` middleware | §8f | ⬜ | -| G16 | Python | Implement `simulate_system_prompt` middleware | §8f | ⬜ | -| G18 | Python | Add multipart tool support (`defineTool({multipart: true})`) | §8h | ⬜ | -| G19 | Python | Add Model API V2 (`defineModel({apiVersion: 'v2'})`) | §8i | ⬜ | -| G20 | Python | Add `context` parameter to `Genkit()` constructor | §8j | ⬜ | -| G21 | Python | Add `clientHeader` parameter to `Genkit()` constructor | §8j | ⬜ | -| G22 | Python | Add `name` parameter to `Genkit()` constructor | §8j | ⬜ | -| G4 | Python | Move `augment_with_context` to define-model time | §8b.2 | ⬜ | -| G9 | Python | Add Pinecone vector store plugin | §5g | ⬜ | -| G10 | Python | Add ChromaDB vector store plugin | §5g | ⬜ | -| G30 | Python | Add Cloud SQL PG vector store parity | §5g | ⬜ | -| G31 | Python | Add dedicated Python MCP parity sample | §2b/§9 | ⏳ Deferred | -| G8 | Python | Implement `genkit.client` (`run_flow` / `stream_flow`) | §5c/§9 | ⏳ Deferred | -| G17 | Python | Add built-in `api_key()` context provider | §8g | ⬜ | -| G11 | Python | Add `CHANGELOG.md` to plugins + core | §3c | ✅ Done | -| G33 | Python | Consider LangChain integration parity | §1c/§9 | ⬜ | -| G34 | Python | Track BloomLabs vector stores (Convex, HNSW, Milvus) | §6b/§9 | ⬜ | -| G35 | Python | Add Groq provider (or document compat-oai usage) | §1d/§6b | ⬜ | -| G36 | Python | Add Cohere provider (or document compat-oai usage) | §1d/§6b | ⬜ | -| G37 | Python | Track BloomLabs graph workflows plugin | §1d/§6b | ⬜ | +> Updated: 2026-02-09. Status legend: ⬜ = not started, 🔄 = PR open, ✅ = merged, ⏳ = deferred, ⏸️ = paused (blocked on upstream), ~~struck~~ = superseded. + +| Gap ID | SDK | Work Item | Reference | Status | PR | +|--------|-----|-----------|-----------|:------:|:---| +| **G38** | Python | **Generate-level middleware V2** — 3-tier hooks (`generate`/`model`/`tool`), `define_middleware`, registry | §8l | ⬜ Blocked | Upstream: JS [#4515](https://github.com/firebase/genkit/pull/4515), Go [#4422](https://github.com/firebase/genkit/pull/4422) | +| G2 → G1 | Python | Add `middleware` storage to `Action`, then add `use=` to `define_model` | §8b.1 | ⏸️ Paused | [#4516](https://github.com/firebase/genkit/pull/4516) — paused pending G38 | +| G7 | Python | Wire DAP action discovery into `GET /api/actions` | §8a, §8c.5 | ✅ Done | [#4459](https://github.com/firebase/genkit/pull/4459) | +| G6 → G5 | Python | Pass `span_id` in `on_trace_start`, send `X-Genkit-Span-Id` | §8c.3, §8c.4 | ✅ Done | [#4511](https://github.com/firebase/genkit/pull/4511) | +| G3 | Python | Implement `simulate_constrained_generation` middleware | §8b.3, §8f | ⏸️ Paused | [#4510](https://github.com/firebase/genkit/pull/4510) — paused pending G38 | +| G12 | Python | Implement `retry` middleware | §8f | ⏸️ Paused | [#4510](https://github.com/firebase/genkit/pull/4510) — paused pending G38 | +| G13 | Python | Implement `fallback` middleware | §8f | ⏸️ Paused | [#4510](https://github.com/firebase/genkit/pull/4510) — paused pending G38 | +| G14 | Python | Implement `validate_support` middleware | §8f | ⏸️ Paused | [#4510](https://github.com/firebase/genkit/pull/4510) — paused pending G38 | +| G15 | Python | Implement `download_request_media` middleware | §8f | ⏸️ Paused | [#4510](https://github.com/firebase/genkit/pull/4510) — paused pending G38 | +| G16 | Python | Implement `simulate_system_prompt` middleware | §8f | ⏸️ Paused | [#4510](https://github.com/firebase/genkit/pull/4510) — paused pending G38 | +| G18 | Python | Add multipart tool support (`defineTool({multipart: true})`) | §8h | 🔄 | [#4513](https://github.com/firebase/genkit/pull/4513) | +| ~~G19~~ | ~~Python~~ | ~~Add Model API V2 (`defineModel({apiVersion: 'v2'})`)~~ | ~~§8i~~ | ~~Superseded~~ | Replaced by G38 (middleware V2) + G41 (bidi models) | +| G20 | Python | Add `context` parameter to `Genkit()` constructor | §8j | 🔄 | [#4512](https://github.com/firebase/genkit/pull/4512) | +| G21 | Python | Add `clientHeader` parameter to `Genkit()` constructor | §8j | 🔄 | [#4512](https://github.com/firebase/genkit/pull/4512) | +| G22 | Python | Add `name` parameter to `Genkit()` constructor | §8j | 🔄 | [#4512](https://github.com/firebase/genkit/pull/4512) | +| G4 | Python | Move `augment_with_context` to define-model time | §8b.2 | 🔄 | [#4510](https://github.com/firebase/genkit/pull/4510) — logic valid, needs G38 interface | +| **G39** | Python | **Bidirectional Action** primitive (`define_bidi_action`) | §8m | ⬜ Blocked | Upstream: JS [#4288](https://github.com/firebase/genkit/pull/4288) | +| **G40** | Python | **Bidirectional Flow** primitive (`define_bidi_flow`) | §8m | ⬜ Blocked | Upstream: JS [#4288](https://github.com/firebase/genkit/pull/4288) | +| **G41** | Python | **Bidirectional Model** (`define_bidi_model`, `generate_bidi`) for real-time LLM APIs | §8m | ⬜ Blocked | Upstream: JS [#4210](https://github.com/firebase/genkit/pull/4210) | +| **G42** | Python | **Agent primitive** (`define_agent`) with session stores, replacing Chat API | §8n | ⬜ Blocked | Upstream: JS [#4212](https://github.com/firebase/genkit/pull/4212) | +| **G43** | Python | **Plugin V2 architecture** — plugins provide middleware arrays (`GenkitPluginV2`) | §8o | ⬜ | Upstream: JS [#4132](https://github.com/firebase/genkit/pull/4132) (merged) | +| **G44** | Python | **Reflection API V2** — WebSocket + JSON-RPC 2.0 | §8p | 🔄 | [#4401](https://github.com/firebase/genkit/pull/4401) (draft) | +| G9 | Python | Add Pinecone vector store plugin | §5g | ⏳ Deferred | — | +| G10 | Python | Add ChromaDB vector store plugin | §5g | ⏳ Deferred | — | +| G30 | Python | Add Cloud SQL PG vector store parity | §5g | ⏳ Deferred | — | +| G31 | Python | Add dedicated Python MCP parity sample | §2b/§9 | 🔄 | [#4248](https://github.com/firebase/genkit/pull/4248) | +| G8 | Python | Implement `genkit.client` (`run_flow` / `stream_flow`) | §5c/§9 | ⏳ Deferred | — | +| G17 | Python | Add built-in `api_key()` context provider | §8g | 🔄 | [#4521](https://github.com/firebase/genkit/pull/4521) (draft) | +| G11 | Python | Add `CHANGELOG.md` to plugins + core | §3c | ✅ Done | [#4507](https://github.com/firebase/genkit/pull/4507), [#4508](https://github.com/firebase/genkit/pull/4508) | +| G33 | Python | Consider LangChain integration parity | §1c/§9 | ⏳ Deferred | — | +| G34 | Python | Track BloomLabs vector stores (Convex, HNSW, Milvus) | §6b/§9 | ⏳ Deferred | — | +| G35 | Python | Add Groq provider (or document compat-oai usage) | §1d/§6b | ⬜ | — | +| G36 | Python | Add Cohere provider (or document compat-oai usage) | §1d/§6b | ✅ Done | [#4518](https://github.com/firebase/genkit/pull/4518) | +| G37 | Python | Track BloomLabs graph workflows plugin | §1d/§6b | ⏳ Deferred | — | ### 7b. Go Roadmap (JS-Canonical Parity) — Deferred @@ -1015,6 +1031,176 @@ export interface GenkitOptions { - `FindMatchingResource()` — Finds resource matching a URI pattern (Python has `find_matching_resource()` equivalent) - `ListResources()` — Lists all registered resources +### 8l. Generate Middleware V2 — 3-Tier Hook Architecture (Active RFC) + +> **JS RFC**: [#4515](https://github.com/firebase/genkit/pull/4515) (`@pavelgj`). **Go RFC**: [#4422](https://github.com/firebase/genkit/pull/4422) (`@apascal07`). **Go impl**: [#4464](https://github.com/firebase/genkit/pull/4464). +> **JS registered middleware**: [#3906](https://github.com/firebase/genkit/pull/3906) (`@pavelgj`). +> **Status**: Active development. The old `ModelMiddleware` type is being deprecated. + +The middleware system is being redesigned from a single model-wrapping function to a 3-tier hook system: + +| Hook | Scope | Called When | +|------|-------|------------| +| `generate` | Wraps entire generation including tool loop | Each `ai.generate()` call iteration | +| `model` | Wraps individual model API call | Each model invocation | +| `tool` | Wraps individual tool execution | Each tool call | + +**JS API** (`generateMiddleware`): + +```typescript +export const myMiddleware = generateMiddleware( + { name: 'myMiddleware', configSchema: z.object({...}) }, + (config) => ({ + async generate(options, ctx, next) { return next(options, ctx); }, + async model(request, ctx, next) { return next(request, ctx); }, + async tool(request, ctx, next) { return next(request, ctx); }, + tools: [/* additional tools to inject */], + }) +); + +// Usage: generate({..., use: [myMiddleware({verbose: true})]}) +// Registry: ai.defineMiddleware('name', myMiddleware) +// Plugin: plugins: [myMiddleware.plugin()] +``` + +**Go API** (`Middleware` interface): + +```go +type Middleware interface { + Name() string + New() Middleware // per-invocation state + Generate(ctx, *GenerateState, GenerateNext) (*ModelResponse, error) + Model(ctx, *ModelState, ModelNext) (*ModelResponse, error) + Tool(ctx, *ToolState, ToolNext) (*ToolResponse, error) +} +``` + +**Key design differences from old `ModelMiddleware`:** + +| Aspect | Old (`ModelMiddleware`) | New (Middleware V2) | +|--------|------------------------|---------------------| +| Hooks | Model-call only | `generate` + `model` + `tool` | +| State | Stateless function | Per-invocation state (`New()`) | +| Registration | Anonymous function | Named, registerable, referenceable by string | +| Attachment | `define_model(use=[...])` only | `generate(use=[...])` + `define_model(use=[...])` + plugin | +| Config | None | Typed config schema (JSON Schema for Dev UI) | +| Tool injection | Not possible | `tools` field in middleware def | +| Reflection | Not visible | Listed in `/api/values?type=middleware` | + +**Impact on Python gaps**: G1, G2, G3, G12–G16 must target this new architecture. Old `ModelMiddleware`-based implementations (#4510, #4516) are **paused** until the JS/Go canonical implementations land. + +### 8m. Bidirectional Streaming Primitives (Active RFC) + +> **JS RFC**: [#4210](https://github.com/firebase/genkit/pull/4210) (`@pavelgj`). **JS impl**: [#4288](https://github.com/firebase/genkit/pull/4288). +> **Go RFC**: [#4184](https://github.com/firebase/genkit/pull/4184) (`@apascal07`). **Go impl**: [#4387](https://github.com/firebase/genkit/pull/4387). +> **Status**: Active development in JS and Go. Python has no bidi work yet. + +Adds three new primitives for bidirectional streaming: + +| Primitive | Purpose | Init | Input Stream | Output Stream | Final Output | +|-----------|---------|------|-------------|---------------|-------------| +| `defineBidiAction` | Core bidi primitive | Setup context | `AsyncIterable` | `AsyncIterable` | `Output` | +| `defineBidiFlow` | Bidi action + observability | Setup context | `AsyncIterable` | `AsyncIterable` | `Output` | +| `defineBidiModel` | Specialized for real-time LLM APIs | `ModelRequest` (config, tools, system prompt) | `ModelRequest` (messages) | `ModelResponseChunk` | `ModelResponse` | + +**JS usage pattern:** + +```typescript +const session = await ai.generateBidi({ + model: myRealtimeModel, + config: { temperature: 0.7 }, + system: 'You are a helpful assistant', +}); +session.send('Hello!'); +for await (const chunk of session.stream) { console.log(chunk.content); } +``` + +**`BidiConnection` / `BidiStreamingResponse`:** + +```typescript +interface BidiStreamingResponse { + stream: AsyncGenerator; // Output stream + output: Promise; // Final result + send(chunk: I): void; // Push input + close(): void; // End input stream +} +``` + +**Python implications**: Will need async generator-based implementation with `asyncio` channels. The `init` pattern maps well to Python's existing `GenerateRequest` types. + +### 8n. Agent Primitive (Active RFC) + +> **JS RFC**: [#4212](https://github.com/firebase/genkit/pull/4212) (`@pavelgj`). +> **Go RFC**: In [#4184](https://github.com/firebase/genkit/pull/4184) (`@apascal07`). **Go impl**: [#4462](https://github.com/firebase/genkit/pull/4462). +> **Status**: RFC stage. The JS RFC explicitly states *"`defineAgent` would replace the current Chat API."* + +`defineAgent` is a high-level abstraction built on top of Bidi Flows for stateful multi-turn agents: + +| Feature | Chat API (current) | Agent Primitive (new) | +|---------|-------------------|-----------------------| +| State management | Client-side history | Client-managed or server-managed (via `SessionStore`) | +| Streaming | Output only | Bidirectional (input + output) | +| Interrupts | Tool interrupts | Full human-in-the-loop with turn semantics | +| Session persistence | None built-in | Pluggable `SessionStore` (Postgres, Firestore, etc.) | +| Snapshots | None | Session snapshots for rollback | + +**JS API:** + +```typescript +const myAgent = ai.defineAgent( + { name: 'myAgent', store: postgresSessionStore({...}) }, + async function* ({ inputStream, init, sendChunk }) { + let messages = init?.messages ?? []; + for await (const input of inputStream) { + const response = await ai.generate({ messages: [...messages, input], model: ... }); + messages = response.messages; + } + return { sessionId: init?.sessionId, messages }; + } +); +``` + +**Python implications**: Will replace or extend the existing `Chat`/`Session` classes in `blocks/session/`. Needs async generator support and pluggable session store abstraction. + +### 8o. Plugin V2 Architecture (JS Merged) + +> **JS impl**: [#4132](https://github.com/firebase/genkit/pull/4132) (`@huangjeff5`, merged 2026-01-22). +> **Plugin migrations**: [#3541](https://github.com/firebase/genkit/pull/3541) (checks), [#3547](https://github.com/firebase/genkit/pull/3547) (ollama), [#3749](https://github.com/firebase/genkit/pull/3749) (googleai). +> **Status**: JS core merged. Plugin migrations in progress. Python + Go not started. + +Plugin V2 adds a `version: 'v2'` field and a `generateMiddleware` method to the plugin interface, enabling plugins to provide middleware: + +```typescript +interface GenkitPluginV2 { + name: string; + version: 'v2'; + model: (registry: Registry) => void; + generateMiddleware?: () => GenerateMiddleware[]; +} +``` + +**Key changes from Plugin V1:** +- Plugins can register middleware globally (not just models/embedders) +- `resolve()` pattern for deferred action creation (e.g., `ollama().model('phi3.5')`) +- Middleware plugins can be composed: `plugins: [myLogger.plugin(), retryPlugin()]` + +**Python implications**: The current plugin system (`core/_plugins.py`) does not support middleware registration. Will need a V2 plugin interface once G38 (Middleware V2) lands. + +### 8p. Reflection API V2 — WebSocket + JSON-RPC 2.0 (Active RFC) + +> **RFC**: [#4211](https://github.com/firebase/genkit/pull/4211) (`@pavelgj`). +> **JS+CLI impl**: [#4295](https://github.com/firebase/genkit/pull/4295) (behind `--experimental-reflection-v2`). +> **Go impl**: [#4300](https://github.com/firebase/genkit/pull/4300) (draft). +> **Python impl**: [#4401](https://github.com/firebase/genkit/pull/4401) (draft). + +Replaces the HTTP REST-based reflection server with WebSocket + JSON-RPC 2.0 for: +- Bidirectional streaming support (required for bidi actions/flows in Dev UI) +- Lower latency action invocation +- Server-push notifications (action progress, trace events) +- Multiplexed connections + +**Python implications**: The existing `core/reflection.py` HTTP server needs a WebSocket transport layer. The Python draft (#4401) is already tracking this work. + --- ## 9. Gap Summary — Prioritized Fix List @@ -1060,18 +1246,74 @@ export interface GenkitOptions { | G35 | Python | Groq provider parity missing (or compat-oai doc) | P3 | new plugin or `compat-oai` usage guide | basic model call test | | G36 | Python | Cohere provider parity missing (or compat-oai doc) | P3 | new plugin or `compat-oai` usage guide | basic model call + embed test | | G37 | Python | Graph workflows plugin parity missing | P3 | new plugin under `py/plugins/graph` | basic graph workflow test | - -### 9b. Dependency Matrix +| **G38** | **All SDKs** | **Generate Middleware V2** — 3-tier hooks (`generate`/`model`/`tool`), `define_middleware`, middleware registry, per-invocation state, config schema, tool injection | **P0** | `py/packages/genkit/src/genkit/blocks/middleware.py`, `core/action/`, `ai/_registry.py` | middleware V2 interface + 3-hook dispatch + registry lookup + config validation tests | +| **G39** | **Go + Python** | **Bidirectional Action** primitive (`define_bidi_action`) — core bidi streaming with `init`, `input_stream`, `output_stream` | **P1** | `py/packages/genkit/src/genkit/core/action/` (new bidi action type) | bidi action send/receive/close lifecycle tests | +| **G40** | **Go + Python** | **Bidirectional Flow** primitive (`define_bidi_flow`) — bidi action with observability/tracing | **P1** | `py/packages/genkit/src/genkit/blocks/` (new bidi flow module) | bidi flow tracing + streaming roundtrip tests | +| **G41** | **Go + Python** | **Bidirectional Model** (`define_bidi_model`, `generate_bidi`) — specialized bidi for real-time LLM APIs (Gemini Live, OpenAI Realtime) | **P1** | `py/packages/genkit/src/genkit/blocks/model.py`, `ai/_registry.py` | bidi model init + streaming conversation tests | +| **G42** | **Go + Python** | **Agent primitive** (`define_agent`) — stateful multi-turn agent with session stores, replaces Chat API | **P1** | `py/packages/genkit/src/genkit/blocks/` (new agent module, replaces/extends `session/`) | agent creation + session persistence + turn semantics tests | +| **G43** | **Go + Python** | **Plugin V2 architecture** — plugins provide `generate_middleware` arrays (`GenkitPluginV2`) | **P2** | `py/packages/genkit/src/genkit/core/_plugins.py` | plugin V2 middleware registration + resolution tests | +| **G44** | **All SDKs** | **Reflection API V2** — WebSocket + JSON-RPC 2.0, replacing HTTP REST reflection server | **P1** | `py/packages/genkit/src/genkit/core/reflection.py`, `web/manager/` | WebSocket connection + JSON-RPC dispatch + bidi action streaming tests | + +### 9b. Python Gap Status Tracker (Updated 2026-02-09) + +> Status legend: ⬜ = not started, 🔄 = PR open, ✅ = merged, ⏳ = deferred, ⏸️ = paused (blocked on upstream RFC), ~~struck~~ = superseded. + +| Gap | Status | PR | Notes | +|-----|:------:|:---|-------| +| **G38** | ⬜ Blocked | Upstream: JS [#4515](https://github.com/firebase/genkit/pull/4515), Go [#4422](https://github.com/firebase/genkit/pull/4422) | **Middleware V2** (3-tier hooks) — waiting on JS/Go to land first | +| G1 | ⏸️ | [#4516](https://github.com/firebase/genkit/pull/4516) | `define_model(use=[...])` — **paused**, architecture changing (blocked on G38) | +| G2 | ⏸️ | [#4516](https://github.com/firebase/genkit/pull/4516) | Action middleware storage — **paused** (blocked on G38) | +| G3 | ⏸️ | [#4510](https://github.com/firebase/genkit/pull/4510) | `simulate_constrained_generation` — **paused** (blocked on G38) | +| G4 | 🔄 | [#4510](https://github.com/firebase/genkit/pull/4510) | `augment_with_context` lifecycle — logic valid, needs G38 interface | +| G5 | ✅ | [#4511](https://github.com/firebase/genkit/pull/4511) | `X-Genkit-Span-Id` header — merged 2026-02-09 | +| G6 | ✅ | [#4511](https://github.com/firebase/genkit/pull/4511) | `on_trace_start` span_id — merged 2026-02-09 | +| G7 | ✅ | [#4459](https://github.com/firebase/genkit/pull/4459) | DAP discovery — merged 2026-02-06 | +| G8 | ⏳ | — | `genkit.client` — deferred | +| G9 | ⏳ | — | Pinecone — deferred | +| G10 | ⏳ | — | ChromaDB — deferred | +| G11 | ✅ | [#4507](https://github.com/firebase/genkit/pull/4507), [#4508](https://github.com/firebase/genkit/pull/4508) | CHANGELOGs — merged 2026-02-09 | +| G12 | ⏸️ | [#4510](https://github.com/firebase/genkit/pull/4510) | `retry` middleware — **paused** (blocked on G38) | +| G13 | ⏸️ | [#4510](https://github.com/firebase/genkit/pull/4510) | `fallback` middleware — **paused** (blocked on G38) | +| G14 | ⏸️ | [#4510](https://github.com/firebase/genkit/pull/4510) | `validate_support` — **paused** (blocked on G38) | +| G15 | ⏸️ | [#4510](https://github.com/firebase/genkit/pull/4510) | `download_request_media` — **paused** (blocked on G38) | +| G16 | ⏸️ | [#4510](https://github.com/firebase/genkit/pull/4510) | `simulate_system_prompt` — **paused** (blocked on G38) | +| G17 | 🔄 | [#4521](https://github.com/firebase/genkit/pull/4521) | `api_key()` context — draft | +| G18 | 🔄 | [#4513](https://github.com/firebase/genkit/pull/4513) | multipart tool (tool.v2) — open | +| ~~G19~~ | ~~Superseded~~ | — | ~~Model API V2~~ — replaced by G38 (middleware V2) + G41 (bidi models) | +| G20 | 🔄 | [#4512](https://github.com/firebase/genkit/pull/4512) | `Genkit(context=...)` — open | +| G21 | 🔄 | [#4512](https://github.com/firebase/genkit/pull/4512) | `Genkit(client_header=...)` — open | +| G22 | 🔄 | [#4512](https://github.com/firebase/genkit/pull/4512) | `Genkit(name=...)` — open | +| G30 | ⏳ | — | Cloud SQL PG — deferred | +| G31 | 🔄 | [#4248](https://github.com/firebase/genkit/pull/4248) | MCP sample v2 — open | +| G33 | ⏳ | — | LangChain — deferred | +| G34 | ⏳ | — | BloomLabs vector stores — deferred | +| G35 | ⬜ | — | Groq provider — not started | +| G36 | ✅ | [#4518](https://github.com/firebase/genkit/pull/4518) | Cohere provider — merged 2026-02-09 | +| G37 | ⏳ | — | Graph workflows — deferred | +| **G39** | ⬜ Blocked | Upstream: JS [#4288](https://github.com/firebase/genkit/pull/4288) | **Bidi Action** — waiting on JS to land | +| **G40** | ⬜ Blocked | Upstream: JS [#4288](https://github.com/firebase/genkit/pull/4288) | **Bidi Flow** — waiting on JS to land | +| **G41** | ⬜ Blocked | Upstream: JS [#4210](https://github.com/firebase/genkit/pull/4210) | **Bidi Model** — waiting on JS to land | +| **G42** | ⬜ Blocked | Upstream: JS [#4212](https://github.com/firebase/genkit/pull/4212) | **Agent primitive** — waiting on JS RFC | +| **G43** | ⬜ | Upstream: JS [#4132](https://github.com/firebase/genkit/pull/4132) (merged) | **Plugin V2** — JS landed, Python design needed | +| **G44** | 🔄 | [#4401](https://github.com/firebase/genkit/pull/4401) (draft) | **Reflection API V2** — Python draft open | + +**Progress**: 5 merged, 6 in review, 8 paused (middleware V2 blocked), 1 superseded, 6 blocked on upstream RFCs, 2 not started, 8 deferred. (Go gaps G23–G29, G32 tracked in §7b.) + +### 9c. Dependency Matrix | Depends On | Unblocks | Why | |------------|----------|-----| +| **G38** | G2, G1, G3, G4, G12, G13, G14, G15, G16, G43 | **Middleware V2 architecture** must land in JS/Go before Python can implement any middleware | | G2 | G1, G3, G4, G12, G13, G14, G16 | Python model middleware architecture must exist before feature middleware parity | | G6 | G5 | Need span ID in callback before header emission | | G7, G23 | G31 | MCP parity sample quality depends on DAP discoverability in tooling | +| **G39** | G40, G41 | Bidi Action is the core primitive; Flow and Model build on it | +| **G41** | G42 | Agent primitive is built on top of Bidi Flow/Model | +| **G44** | Bidi Dev UI support | WebSocket reflection needed for bidi streaming in Dev UI | | G25 | G27, G28 | Go reranker/model API work shares core generation extension points | | G29 | G8 | constructor/client header parity helps consistent remote invocation behavior | -### 9c. Fast-Close Implementation Bundles +### 9d. Fast-Close Implementation Bundles | Bundle | Scope | Gaps | Deliverable | Exit Tests | |--------|-------|------|-------------|------------| @@ -1082,7 +1324,7 @@ export interface GenkitOptions { | B5 | Cross-SDK client/plugin parity | G8, G9, G10, G30, G31 | client helpers + plugin/sample parity | cross-SDK parity smoke suite green | | B6 | Ecosystem/compliance | G11, G17, G32, G33, G34, G35, G36, G37 | docs/compliance + secondary plugins | consistency + sample smoke checks green | -### 9d. Prioritized Execution Order (All 3 SDKs) +### 9e. Prioritized Execution Order (All 3 SDKs) 1. B1: Python middleware foundation (highest behavior delta). 2. B2: Python reflection/protocol parity (Dev UI and observability correctness). @@ -1091,7 +1333,7 @@ export interface GenkitOptions { 5. B5: cross-SDK client + plugin/sample parity. 6. B6: ecosystem/compliance. -### 9e. Cross-SDK Summary +### 9f. Cross-SDK Summary | SDK | P1 Gaps | P2 Gaps | P3 Gaps | Critical Themes | |-----|:-------:|:-------:|:-------:|-----------------| @@ -1109,7 +1351,9 @@ export interface GenkitOptions { ## 10. Implementation Roadmap (Python SDK Focus) -> Generated: 2026-02-08. Based on reverse topological sort of the dependency graph across all tracked Python gaps (G1–G37). +> Generated: 2026-02-08. Updated: 2026-02-09. Based on reverse topological sort of the dependency graph across all tracked Python gaps (G1–G44). +> +> **2026-02-09 update**: Five major cross-SDK redesigns (Middleware V2, Bidi, Agent, Plugin V2, Reflection V2) have been identified as active RFCs. The roadmap has been restructured: middleware gaps G1–G3, G12–G16 are **paused** pending upstream Middleware V2 (#4515, #4422); G19 is **superseded**; new gaps G38–G44 added. ### 10a. Dependency Graph @@ -1118,55 +1362,91 @@ The following directed acyclic graph (DAG) captures all prerequisite relationshi ``` Legend: ───► = "is prerequisite for" (Pn) = priority level + [PAUSED] = blocked on upstream RFC + [DONE] = merged + [SUPERSEDED] = replaced by new gap + +UPSTREAM BLOCKERS (waiting on JS/Go RFCs to land) +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + + G38 (P0) Generate Middleware V2 (3-tier hooks) [BLOCKED on JS #4515, Go #4422] + ├───► G2 (P1) Action middleware storage [PAUSED] + ├───► G43 (P2) Plugin V2 architecture + └───► (transitively) G1, G3, G4, G12-G16 -FOUNDATION LAYER (no prerequisites) -━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + G39 (P1) Bidirectional Action [BLOCKED on JS #4288] + ├───► G40 (P1) Bidirectional Flow + └───► G41 (P1) Bidirectional Model - G2 (P1) Action middleware storage - ├───► G1 (P1) define_model(use=[...]) - ├───► G12 (P1) retry middleware - ├───► G13 (P1) fallback middleware - ├───► G15 (P2) download_request_media middleware - └───► G19 (P1) Model API V2 runner interface + G41 (P1) Bidirectional Model [BLOCKED on JS #4210] + └───► G42 (P1) Agent primitive (replaces Chat API) - G1 (P1) define_model(use=[...]) [depends on G2] - ├───► G3 (P1) simulate_constrained_generation + G44 (P1) Reflection API V2 (WebSocket) [draft PR #4401] + +MIDDLEWARE CHAIN (all PAUSED pending G38) +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + + G2 (P1) Action middleware storage [PAUSED] + ├───► G1 (P1) define_model(use=[...]) [PAUSED] + ├───► G12 (P1) retry middleware [PAUSED] + ├───► G13 (P1) fallback middleware [PAUSED] + └───► G15 (P2) download_request_media [PAUSED] + + G1 (P1) define_model(use=[...]) [PAUSED] + ├───► G3 (P1) simulate_constrained_generation [PAUSED] ├───► G4 (P2) augment_with_context lifecycle fix - ├───► G14 (P2) validate_support middleware - └───► G16 (P2) simulate_system_prompt middleware + ├───► G14 (P2) validate_support middleware [PAUSED] + └───► G16 (P2) simulate_system_prompt [PAUSED] - G6 (P1) on_trace_start span_id - └───► G5 (P1) X-Genkit-Span-Id header +COMPLETED +━━━━━━━━━ - G7 (P1) DAP discovery in /api/actions + G6 (P1) on_trace_start span_id [DONE #4511] + └───► G5 (P1) X-Genkit-Span-Id header [DONE #4511] + + G7 (P1) DAP discovery in /api/actions [DONE #4459] └───► G31 (P2) MCP parity sample + G11 (P3) CHANGELOG.md [DONE #4507, #4508] + G36 (P3) Cohere provider [DONE #4518] + +SUPERSEDED +━━━━━━━━━━ + G19 (P1) Model API V2 [SUPERSEDED by G38 + G41] + +ACTIVE (unblocked, can proceed now) +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + G21 (P2) Genkit(clientHeader=...) └───► G8 (P2) genkit.client module (run_flow/stream_flow) -INDEPENDENT NODES (no prerequisites, unblock nothing) -━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + G18 (P1) Multipart tool (tool.v2) G20 (P2) Genkit(context=...) + G22 (P2) Genkit(name=...) G17 (P3) api_key() context + G35 (P3) Groq provider - G9 (P2) Pinecone plugin G18 (P1) Multipart tool (tool.v2) - G10 (P2) ChromaDB plugin G20 (P2) Genkit(context=...) - G11 (P3) CHANGELOG.md G22 (P2) Genkit(name=...) - G17 (P3) api_key() context G30 (P2) Cloud SQL PG plugin - G35 (P3) Groq provider G36 (P3) Cohere provider - G33 (P3) LangChain integration G34 (P3) BloomLabs vector stores - G37 (P3) Graph workflows +DEFERRED +━━━━━━━━ + G9 (P2) Pinecone plugin G10 (P2) ChromaDB plugin + G30 (P2) Cloud SQL PG plugin G33 (P3) LangChain integration + G34 (P3) BloomLabs vector stores G37 (P3) Graph workflows + G8 (P2) genkit.client (deferred) ``` ### 10b. Topological Sort — Dependency Levels Reverse topological sort of the gap DAG yields the following dependency levels. Each level contains gaps whose prerequisites are all satisfied by prior levels. **Work within each level can be fully parallelized.** -| Level | Gaps | Prerequisites | Theme | -|:-----:|------|:--------------|-------| -| **L0** | G2, G6, G7, G18, G20, G21, G22, G9, G10, G11, G17, G30, G35, G36, G33, G34, G37 | *None* | Foundation + all independent work | -| **L1** | G1, G5, G12, G13, G15, G19, G31, G8 | G2, G6, G7, G21 | Middleware arch + protocol + client | -| **L2** | G3, G4, G14, G16 | G1 | Feature middleware requiring define-model-time wiring | +| Level | Gaps | Prerequisites | Theme | Status | +|:-----:|------|:--------------|-------|:------:| +| **L-1** | **G38**, **G39**, **G44** | *Upstream JS/Go RFCs* | Upstream blockers — must land in JS/Go first | ⏸️ Blocked | +| **L0** | G2, G18, G20, G21, G22, G17, G35, G40, G41, G43 | G38 (for G2, G43); G39 (for G40, G41); *none* for others | Foundation + all independent work | Mixed | +| **L1** | G1, G12, G13, G15, G42, G8 | G2, G21, G41 | Middleware arch + client + agent | ⏸️ (middleware) | +| **L2** | G3, G4, G14, G16 | G1 | Feature middleware requiring define-model-time wiring | ⏸️ | + +**Critical path** (longest chain): `G38 → G2 → G1 → G3` (4 levels deep, governs minimum calendar time for full P1 closure). **G38 is an external dependency on upstream JS/Go RFC work.** -**Critical path** (longest chain): `G2 → G1 → G3` (3 levels deep, governs minimum calendar time for full P1 closure). +**Completed items** (removed from active levels): G5, G6, G7, G11, G19 (superseded), G36. +**Deferred**: G8, G9, G10, G30, G33, G34, G37. ### 10c. Phased Roadmap @@ -1174,15 +1454,15 @@ Reverse topological sort of the gap DAG yields the following dependency levels. > **Start immediately.** All items are independent of each other and of core framework work. Can run in parallel with all subsequent phases. -| ID | Work Item | Effort | Type | -|----|-----------|:------:|------| -| **QW-1** | **Test coverage uplift** for all "Minimum" and "Adequate" plugins (see §10f) | M | Testing | -| **QW-2** | **Verify all existing samples run** — execute every `py/samples/*/run.sh`, fix any breakage | M | Validation | -| **QW-3** | G11: Add `CHANGELOG.md` to all 20 plugins + core package (21 files) | XS | Compliance | -| **QW-4** | G22: Add `name` parameter to `Genkit()` constructor — pass to `ReflectionServer` display name | XS | Feature | -| **QW-5** | G17: Implement `api_key()` context provider in `core/context.py` | S | Feature | -| **QW-6** | G35: Groq provider — thin `compat-oai` wrapper + usage documentation | S | Plugin | -| **QW-7** | G36: Cohere provider — thin `compat-oai` wrapper + embedder support + docs | S | Plugin | +| ID | Work Item | Effort | Type | Status | +|----|-----------|:------:|------|:------:| +| **QW-1** | **Test coverage uplift** for all "Minimum" and "Adequate" plugins (see §10f) | M | Testing | 🔄 [#4509](https://github.com/firebase/genkit/pull/4509) (merged), ongoing | +| **QW-2** | **Verify all existing samples run** — execute every `py/samples/*/run.sh`, fix any breakage | M | Validation | 🔄 | +| ~~**QW-3**~~ | ~~G11: Add `CHANGELOG.md` to all 20 plugins + core package (21 files)~~ | ~~XS~~ | ~~Compliance~~ | ✅ [#4507](https://github.com/firebase/genkit/pull/4507), [#4508](https://github.com/firebase/genkit/pull/4508) | +| **QW-4** | G22: Add `name` parameter to `Genkit()` constructor — pass to `ReflectionServer` display name | XS | Feature | 🔄 [#4512](https://github.com/firebase/genkit/pull/4512) | +| **QW-5** | G17: Implement `api_key()` context provider in `core/context.py` | S | Feature | 🔄 [#4521](https://github.com/firebase/genkit/pull/4521) (draft) | +| **QW-6** | G35: Groq provider — thin `compat-oai` wrapper + usage documentation | S | Plugin | ⬜ | +| ~~**QW-7**~~ | ~~G36: Cohere provider — thin `compat-oai` wrapper + embedder support + docs~~ | ~~S~~ | ~~Plugin~~ | ✅ [#4518](https://github.com/firebase/genkit/pull/4518) | **Effort key**: XS = < 1 day, S = 1–2 days, M = 3–5 days, L = 1–2 weeks, XL = 2+ weeks. @@ -1190,80 +1470,89 @@ Reverse topological sort of the gap DAG yields the following dependency levels. --- -#### Phase 1 — Core Infrastructure Foundation +#### Phase 1 — Unblocked Core Work (No Upstream Dependencies) -> **Prerequisite for Phases 2 and 3.** This is the highest-leverage work — it unblocks 11 downstream gaps. +> **Start now.** These items have no upstream RFC blockers and are unrelated to the middleware V2 redesign. -| ID | Gap | Work Item | Files to Touch | Effort | Unblocks | -|----|-----|-----------|----------------|:------:|----------| -| **P1.1** | **G2** | Add `middleware` storage to `Action` class; implement `action_with_middleware()` wrapper that chains model-level middleware around `action.run()` | `core/action/_action.py` | L | G1, G12, G13, G15, G19 | -| **P1.2** | **G6** | Update `on_trace_start` callback signature to `(trace_id: str, span_id: str)` throughout action system | `core/action/_action.py`, `core/reflection.py`, `core/trace/` | S | G5 | -| **P1.3** | **G18** | Add multipart tool support: `define_tool(multipart=True)`, `MultipartToolAction` type `tool.v2`, dual registration for non-multipart tools | `blocks/tools.py`, `blocks/generate.py` | M | — | -| **P1.4** | **G20** | Add `context` parameter to `Genkit()` that sets `registry.context` for default action context | `ai/_aio.py` | XS | — | -| **P1.5** | **G21** | Add `clientHeader` parameter to `Genkit()` that appends to `GENKIT_CLIENT_HEADER` via `set_client_header()` | `ai/_aio.py`, `core/http_client.py` | XS | G8 | +| ID | Gap | Work Item | Files to Touch | Effort | Unblocks | Status | +|----|-----|-----------|----------------|:------:|----------|:------:| +| ~~**P1.2**~~ | ~~**G6**~~ | ~~Update `on_trace_start` callback signature~~ | ~~`core/action/`, `core/reflection.py`~~ | ~~S~~ | ~~G5~~ | ✅ [#4511](https://github.com/firebase/genkit/pull/4511) | +| **P1.3** | **G18** | Add multipart tool support: `define_tool(multipart=True)`, `MultipartToolAction` type `tool.v2`, dual registration for non-multipart tools | `blocks/tools.py`, `blocks/generate.py` | M | — | 🔄 [#4513](https://github.com/firebase/genkit/pull/4513) | +| **P1.4** | **G20** | Add `context` parameter to `Genkit()` that sets `registry.context` for default action context | `ai/_aio.py` | XS | — | 🔄 [#4512](https://github.com/firebase/genkit/pull/4512) | +| **P1.5** | **G21** | Add `clientHeader` parameter to `Genkit()` that appends to `GENKIT_CLIENT_HEADER` via `set_client_header()` | `ai/_aio.py`, `core/http_client.py` | XS | G8 | 🔄 [#4512](https://github.com/firebase/genkit/pull/4512) | -**Exit criteria**: All unit tests green for action middleware dispatch, span_id propagation, tool.v2 registration, and constructor parameter propagation. +**Exit criteria**: All unit tests green for tool.v2 registration and constructor parameter propagation. --- -#### Phase 2 — Middleware Architecture & Protocol Parity +#### Phase 2 — Middleware V2 Architecture (PAUSED — Blocked on Upstream RFCs) -> **Depends on Phase 1** (specifically G2 for middleware gaps, G6 for span header). All items within this phase can be parallelized. - -| ID | Gap | Work Item | Files to Touch | Effort | Unblocks | -|----|-----|-----------|----------------|:------:|----------| -| **P2.1** | **G1** | Add `use` parameter to `define_model()`; pass middleware list to `Action` via `action_with_middleware()` from Phase 1 | `ai/_registry.py`, `blocks/model.py` | M | G3, G4, G14, G16 | -| **P2.2** | **G5** | Emit `X-Genkit-Span-Id` response header in reflection server using span_id from updated callback | `core/reflection.py` | XS | — | -| **P2.3** | **G12** | Implement `retry()` middleware: exponential backoff with jitter, configurable statuses (UNAVAILABLE, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED, INTERNAL), `max_retries`, `initial_delay_ms`, `max_delay_ms`, `backoff_factor`, `on_error` callback | `blocks/middleware.py` | M | — | -| **P2.4** | **G13** | Implement `fallback()` middleware: ordered model list, configurable error statuses, `on_error` callback, model resolution via registry | `blocks/middleware.py` | M | — | -| **P2.5** | **G15** | Implement `download_request_media()` middleware: download `http(s)` media URLs → data URIs, `max_bytes` limit, `filter` predicate | `blocks/middleware.py` | S | — | -| **P2.6** | **G19** | Add Model API V2: `define_model(api_version='v2')` with unified `ActionFnArg` options object (`on_chunk`, `context`, `abort_signal`, `registry`); maintain backward-compatible v1 path | `ai/_registry.py`, `blocks/model.py` | L | — | +> **PAUSED.** Blocked on upstream JS Middleware V2 ([#4515](https://github.com/firebase/genkit/pull/4515)) and Go Middleware V2 ([#4422](https://github.com/firebase/genkit/pull/4422)) landing. PRs [#4510](https://github.com/firebase/genkit/pull/4510) and [#4516](https://github.com/firebase/genkit/pull/4516) are paused. +> +> When upstream lands, these items will need to be redesigned to target the new 3-tier middleware architecture (see §8l). The **core middleware logic** (retry backoff, fallback chain, constraint simulation, etc.) remains valid — only the **wrapping interface** changes from `ModelMiddleware` function to `GenerateMiddlewareDef` with `generate`/`model`/`tool` hooks. -**Exit criteria**: Full middleware parity test suite green — retry with mock flaky model, fallback chain invocation, media download roundtrip, v2 runner signature tests. Reflection server returns `X-Genkit-Span-Id` in all action run responses. +| ID | Gap | Work Item | Effort | Status | +|----|-----|-----------|:------:|:------:| +| **P2.0** | **G38** | Implement Middleware V2 architecture: 3-tier hooks, `define_middleware()`, middleware registry, per-invocation state, config schema | XL | ⏸️ Blocked on upstream | +| **P2.1** | **G2 → G1** | Adapt `Action` middleware storage and `define_model(use=[...])` to new V2 interface | L | ⏸️ [#4516](https://github.com/firebase/genkit/pull/4516) paused | +| **P2.3** | **G12** | Reimplement `retry()` as V2 middleware with `model` hook | M | ⏸️ [#4510](https://github.com/firebase/genkit/pull/4510) paused | +| **P2.4** | **G13** | Reimplement `fallback()` as V2 middleware with `model` hook | M | ⏸️ [#4510](https://github.com/firebase/genkit/pull/4510) paused | +| **P2.5** | **G15** | Reimplement `download_request_media()` as V2 middleware with `model` hook | S | ⏸️ [#4510](https://github.com/firebase/genkit/pull/4510) paused | --- -#### Phase 3 — Feature Middleware Parity +#### Phase 3 — Feature Middleware Parity (PAUSED — Depends on Phase 2) -> **Depends on Phase 2** (specifically G1: `define_model(use=[...])`). These middleware functions are applied at **define-model time** as part of the model's built-in middleware chain. +> **PAUSED.** Depends on Phase 2 (G38 + G1). These middleware functions will use the `model` hook in the new V2 architecture. -| ID | Gap | Work Item | Files to Touch | Effort | Unblocks | -|----|-----|-----------|----------------|:------:|----------| -| **P3.1** | **G3** | Implement `simulate_constrained_generation()` middleware: inject JSON schema instructions into prompt for models with `supports.constrained = 'none'` or `'no-tools'`; clear `constrained`, `format`, `content_type`, `schema` from output config | `blocks/middleware.py` | M | — | -| **P3.2** | **G4** | Move `augment_with_context()` from call-time to define-model time: add unconditionally (when `supports.context` is false) to model middleware chain via `get_model_middleware()`, remove conditional addition from `generate.py` | `blocks/middleware.py`, `blocks/model.py`, `blocks/generate.py` | S | — | -| **P3.3** | **G14** | Implement `validate_support()` middleware: validate request against model `supports` declaration (media, tools, multiturn, system prompt); throw descriptive `GenkitError` with model name and unsupported feature details | `blocks/middleware.py` | S | — | -| **P3.4** | **G16** | Implement `simulate_system_prompt()` middleware: convert system messages into user/model turn pairs with configurable preface and acknowledgement strings | `blocks/middleware.py` | S | — | +| ID | Gap | Work Item | Effort | Status | +|----|-----|-----------|:------:|:------:| +| **P3.1** | **G3** | Reimplement `simulate_constrained_generation()` as V2 middleware | M | ⏸️ | +| **P3.2** | **G4** | Move `augment_with_context()` to define-model-time V2 middleware chain | S | ⏸️ | +| **P3.3** | **G14** | Reimplement `validate_support()` as V2 middleware | S | ⏸️ | +| **P3.4** | **G16** | Reimplement `simulate_system_prompt()` as V2 middleware | S | ⏸️ | +| **P3.5** | **G43** | Plugin V2 architecture — plugins provide `generate_middleware` arrays | M | ⏸️ | + +--- + +#### Phase 4 — Bidirectional Streaming & Agent (BLOCKED — Awaiting Upstream) + +> **BLOCKED.** Depends on JS Bidi Actions ([#4288](https://github.com/firebase/genkit/pull/4288)) and Agent RFC ([#4212](https://github.com/firebase/genkit/pull/4212)) landing. -**Exit criteria**: Every middleware has dedicated unit tests verifying: (a) correct request transformation, (b) passthrough when condition not met, (c) matching JS behavior for edge cases. Model middleware ordering test confirms: `validate_support → download_request_media → simulate_system_prompt → augment_with_context → simulate_constrained_generation → [user middleware] → runner`. +| ID | Gap | Work Item | Effort | Status | +|----|-----|-----------|:------:|:------:| +| **P4.1** | **G39** | Implement `define_bidi_action` — core bidi action with `init`, async input/output streams | L | ⬜ Blocked | +| **P4.2** | **G40** | Implement `define_bidi_flow` — bidi action with observability/tracing wrappers | M | ⬜ Blocked | +| **P4.3** | **G41** | Implement `define_bidi_model` + `generate_bidi` — specialized bidi for real-time LLM APIs | L | ⬜ Blocked | +| **P4.4** | **G42** | Implement `define_agent` — stateful agent with session stores, replaces Chat API | XL | ⬜ Blocked | +| **P4.5** | **G44** | Implement Reflection API V2 — WebSocket + JSON-RPC 2.0 transport | L | 🔄 [#4401](https://github.com/firebase/genkit/pull/4401) (draft) | --- -#### Phase 4 — Integration & Client Parity +#### Phase 5 — Integration & Client Parity > **Depends on**: G21 (Phase 1) for client helpers. | ID | Gap | Work Item | Files to Touch | Effort | Unblocks | |----|-----|-----------|----------------|:------:|----------| -| **P4.1** | **G8** | Implement `genkit.client` module with `run_flow()` (HTTP POST + JSON response) and `stream_flow()` (HTTP POST + NDJSON streaming response) helpers; use `httpx` with configurable `client_header` | New `client/` module | M | — | +| **P5.1** | **G8** | Implement `genkit.client` module with `run_flow()` (HTTP POST + JSON response) and `stream_flow()` (HTTP POST + NDJSON streaming response) helpers; use `httpx` with configurable `client_header` | New `client/` module | M | — | **Exit criteria**: `run_flow` and `stream_flow` can invoke a deployed genkit flow endpoint over HTTP with correct headers and response parsing. --- -#### Phase 5 — Deferred & Ecosystem Parity +#### Phase 6 — Deferred & Ecosystem Parity -> **Deprioritized items.** Vector store plugins, DAP discovery, and community ecosystem work are deferred to focus on core framework 1:1 parity and existing plugin quality first. +> **Deprioritized items.** Vector store plugins and community ecosystem work are deferred to focus on core framework 1:1 parity and existing plugin quality first. | ID | Gap | Work Item | Effort | Notes | |----|-----|-----------|:------:|-------| -| **P5.1** | G7 | DAP discovery in `/api/actions` — wire `get_action_metadata_record()` into reflection `handle_list_actions` | S | Deferred; unblocks G31 | -| **P5.2** | G31 | Dedicated MCP parity sample — depends on G7 DAP discovery | S | Deferred | -| **P5.3** | G9 | Pinecone vector store plugin (new `py/plugins/pinecone`) | M | Deferred | -| **P5.4** | G10 | ChromaDB vector store plugin (new `py/plugins/chroma`) | M | Deferred | -| **P5.5** | G30 | Cloud SQL PG vector store plugin (new `py/plugins/cloud-sql-pg`) | M | Deferred | -| **P5.6** | G33 | LangChain integration plugin | L | Evaluate if LangChain Python integration adds value given Python's existing rich plugin ecosystem | -| **P5.7** | G34 | BloomLabs vector stores (Convex, HNSW, Milvus) | L per store | Community-driven; consider as `compat-oai`-style shims or documentation-only | -| **P5.8** | G37 | Graph workflows plugin | L | Port `genkitx-graph` concepts; evaluate against native Python workflow libraries | +| **P6.1** | G9 | Pinecone vector store plugin (new `py/plugins/pinecone`) | M | Deferred | +| **P6.2** | G10 | ChromaDB vector store plugin (new `py/plugins/chroma`) | M | Deferred | +| **P6.3** | G30 | Cloud SQL PG vector store plugin (new `py/plugins/cloud-sql-pg`) | M | Deferred | +| **P6.4** | G33 | LangChain integration plugin | L | Evaluate if LangChain Python integration adds value given Python's existing rich plugin ecosystem | +| **P6.5** | G34 | BloomLabs vector stores (Convex, HNSW, Milvus) | L per store | Community-driven; consider as `compat-oai`-style shims or documentation-only | +| **P6.6** | G37 | Graph workflows plugin | L | Port `genkitx-graph` concepts; evaluate against native Python workflow libraries | **Exit criteria**: Each plugin has README, tests, sample, and passes `check_consistency`. @@ -1272,66 +1561,53 @@ Reverse topological sort of the gap DAG yields the following dependency levels. ### 10d. Dependency Graph — Visual Summary ``` - PHASE 0 (parallel) PHASE 1 PHASE 2 PHASE 3 PHASE 4 - ════════════════ ═══════ ═══════ ═══════ ═══════ - - ┌──────────────────────┐ - │ QW: G11,G17,G22 │ - │ G35,G36 │ ┌────────┐ ┌────────┐ ┌────────┐ - │ G9,G10,G30 │ │ G2 │──────►│ G1 │──────►│ G3 │ - │ Test Coverage Uplift │ │ (P1) │ ┌───►│ (P1) │──┬──►│ (P1) │ - └──────────────────────┘ └───┬────┘ │ └────────┘ │ ├────────┤ - │ (runs in parallel │ │ ├──►│ G4 │ - │ with all phases) ├───────┼──────────┐ │ │ (P2) │ - ▼ │ │ │ │ ├────────┤ - │ │ ▼ ├──►│ G14 │ - ┌────┼───┐ │ ┌────────┐ │ │ (P2) │ - │ │ │ │ │ G12 │ │ ├────────┤ - │ ▼ │ │ │ (P1) │ └──►│ G16 │ - │ ┌──────┤ │ ├────────┤ │ (P2) │ - │ │ G15 │ │ │ G13 │ └────────┘ - │ │ (P2) │ │ │ (P1) │ - │ └──────┘ │ ├────────┤ - │ │ │ G19 │ - │ │ │ (P1) │ - │ │ └────────┘ - ┌────────┐ ┌────┴───┐ ┌────┴───┐ - │ G21 │─────────►│ G8 │ │ G5 │ - │ (P2) │ │ (P2) │ │ (P1) │ - └────────┘ └────────┘ └────────┘ - ▲ - ┌────────┐ ┌────┴───┐ - │ G7 │ │ G6 │ - │ (P1) │ ┌────────┐ │ (P1) │ - └────┬───┘ │ G31 │ └────────┘ - └─────────────►│ (P2) │ - └────────┘ - - ┌────────┐ - │ G18 │ (independent, Phase 1) - │ (P1) │ - └────────┘ - - ┌────────┐ ┌────────┐ - │ G20 │ │ G22 │ (independent, Phase 0–1) - │ (P2) │ │ (P2) │ - └────────┘ └────────┘ + UPSTREAM BLOCKERS PHASE 0 (parallel) PHASE 1 (active) + ═════════════════ ════════════════ ════════════════ + + ┌─────────────────┐ ┌──────────────────────┐ + │ G38 (P0) │ │ QW: G11✅,G17,G22 │ ┌────────┐ ┌────────┐ + │ Middleware V2 │─ ─ ─ ─►│ G35, G36✅ │ │ G18 │ │ G20 │ + │ [JS #4515] │ waits │ Test Coverage Uplift │ │ (P1) │ │ (P2) │ + │ [Go #4422] │ └──────────────────────┘ │ tool.v2│ │ ctx │ + └────────┬────────┘ (runs in parallel) └────────┘ └────────┘ + │ + │ unblocks ┌────────┐ ┌────────┐ + ▼ │ G21 │ │ G22 │ + ┌────────────────┐ │ (P2) │ │ (P2) │ + │ G2 → G1 │─────────► PHASE 2+3 (middleware) │ header │ │ name │ + │ [PAUSED] │ all middleware items └────┬───┘ └────────┘ + │ #4516 paused │ #4510 paused │ + └────────────────┘ ▼ + ┌────────┐ + ┌─────────────────┐ │ G8 │ + │ G39 (P1) │ │ (P2) │ + │ Bidi Action │─────────► G40 (Bidi Flow) │ client │ + │ [JS #4288] │────────► G41 (Bidi Model) ──► G42 └────────┘ + └─────────────────┘ (Agent) + + ┌─────────────────┐ COMPLETED + │ G44 (P1) │ ═════════ + │ Reflection V2 │ G5✅, G6✅ (#4511) + │ [Py #4401] │ G7✅ (#4459) + └─────────────────┘ G11✅ (#4507,#4508) + G36✅ (#4518) + G19 ──► SUPERSEDED (by G38+G41) ``` ### 10e. Critical Path Analysis -| Path | Chain Length | Calendar Estimate | Covers | -|------|:-----------:|:-----------------:|--------| -| **G2 → G1 → G3** | 3 levels | ~4–5 weeks | Core middleware → define-model → constrained generation | -| **G2 → G1 → G14** | 3 levels | ~4–5 weeks | Core middleware → define-model → validate support | -| **G2 → G1 → G16** | 3 levels | ~4–5 weeks | Core middleware → define-model → system prompt simulation | -| **G2 → G12** | 2 levels | ~3 weeks | Core middleware → retry | -| **G2 → G13** | 2 levels | ~3 weeks | Core middleware → fallback | -| **G6 → G5** | 2 levels | ~1 week | Span callback → span header | -| **G21 → G8** | 2 levels | ~2 weeks | Client header → client module | -| ~~G7 → G31~~ | 2 levels | ~2 weeks | *(Deferred — DAP discovery → MCP sample)* | +| Path | Chain Length | Calendar Estimate | Covers | Status | +|------|:-----------:|:-----------------:|--------|:------:| +| **G38 → G2 → G1 → G3** | 4 levels | Unknown (depends on upstream) | Middleware V2 → storage → define-model → constrained gen | ⏸️ Blocked | +| **G38 → G2 → G1 → G14** | 4 levels | Unknown | Middleware V2 → storage → define-model → validate support | ⏸️ Blocked | +| **G38 → G2 → G12** | 3 levels | Unknown | Middleware V2 → storage → retry | ⏸️ Blocked | +| **G39 → G41 → G42** | 3 levels | Unknown (depends on upstream) | Bidi Action → Bidi Model → Agent | ⬜ Blocked | +| ~~G6 → G5~~ | ~~2 levels~~ | — | ~~Span callback → span header~~ | ✅ Done | +| **G21 → G8** | 2 levels | ~2 weeks | Client header → client module | 🔄 Active | -**Bottleneck**: G2 (Action middleware storage) is the single highest-leverage item. It unblocks 5 direct dependents and 4 transitive dependents. **Prioritize G2 above all other work.** +**Bottleneck shift**: The bottleneck has moved from G2 (internal) to **G38** (external dependency on upstream JS/Go Middleware V2 RFCs). Until JS [#4515](https://github.com/firebase/genkit/pull/4515) and Go [#4422](https://github.com/firebase/genkit/pull/4422) land, 8 Python middleware gaps remain blocked. + +**Actionable now**: Phase 0 quick wins, Phase 1 unblocked items (G18, G20, G21, G22), test coverage uplift, sample verification. ### 10f. Test Coverage Uplift Plan @@ -1389,21 +1665,31 @@ Reverse topological sort of the gap DAG yields the following dependency levels. ### 10g. Execution Timeline +> **Updated 2026-02-09**: Timeline restructured due to upstream Middleware V2 and Bidi RFC blockers. + ``` -Week 1 2 3 4 5 6 7 8 9 10 11 12 +Week 1 2 3 4 5 ? ? ? ? ? ? ? ──── ──── ──── ──── ──── ──── ──── ──── ──── ──── ──── ──── -P0 ████████████████████████████████████████████████████████████ Quick wins + test uplift + sample verification (continuous) -P1 ████████████████ G2, G6, G18, G20, G21 -P2 ████████████████ G1, G5, G12, G13, G15, G19 -P3 ████████████ G3, G4, G14, G16 -P4 ████████ G8 -P5 ████████████████── G7, G31, G9, G10, G30, G33, G34, G37 (deferred) - -Milestone ▲ P1 infra ▲ Middleware ▲ Full P1 ▲ Client - complete parity closure parity - (week 3) (week 5) (week 7) (week 9) +P0 ████████████████████████████████████████████████████████████ Quick wins + test uplift (continuous) +P1 ████████████████ G18, G20, G21, G22 (unblocked) + ╔═══════════════════════════════════════ + ║ WAITING ON UPSTREAM RFCs + ║ G38: JS #4515 (Middleware V2) + ║ G39: JS #4288 (Bidi Actions) + ║ G42: JS #4212 (Agent Primitive) + ╚═══════════════════════════════════════ +P2 ████████████████ G38→G2→G1, G12, G13, G15 (after upstream) +P3 ████████████ G3, G4, G14, G16, G43 (after P2) +P4 ████████████── G39-G42, G44 (Bidi + Agent + Reflection V2) +P5 ████ G8 (client) +P6 ──── Deferred ecosystem + +Milestone ▲ P1 done ▲ Upstream ▲ Middleware ▲ Bidi+Agent + (week 3) lands (?) parity (?) parity (?) ``` +**Note**: Phases 2–4 timelines depend on when upstream JS/Go RFCs land. Phase 0 and Phase 1 work continues in parallel. + ### 10h. PR Breakdown > **Key rule**: Changes to core framework (`py/packages/genkit/`) MUST be sent as separate PRs from plugin (`py/plugins/`) and sample (`py/samples/`) changes. This keeps reviews focused, reduces blast radius, and allows independent rollback. @@ -1523,14 +1809,896 @@ The current `yesudeep/feat/checks-plugin` branch bundles 32 changed files spanni | Metric | Value | |--------|-------| -| Total Python gaps | 30 (G1–G22, G30–G31, G33–G37) | -| **Active focus (Phases 0–4)** | **22 items** — core framework 1:1 parity + existing plugin quality | -| Phase 0 quick wins | 7 items (parallelizable, no core changes) | -| Phases 1–3 (core parity) | 15 items on critical path | -| Phase 4 (integration) | 1 item | -| Phase 5 (deferred) | 8 items (vector stores, DAP, ecosystem) | -| Critical path length | 3 dependency levels (G2 → G1 → G3) | -| Estimated calendar time to full P1 closure | ~7 weeks | -| Estimated calendar time to active P2 closure | ~9 weeks | +| Total Python gaps | **36** (G1–G22, G30–G31, G33–G44, minus G19 superseded) | +| **Completed** | **5** — G5, G6, G7, G11, G36 | +| **In review (PRs open)** | **6** — G4, G17, G18, G20, G21, G22 | +| **Paused (blocked on upstream Middleware V2)** | **8** — G1, G2, G3, G12, G13, G14, G15, G16 | +| **Blocked on upstream RFCs (new)** | **6** — G38, G39, G40, G41, G42, G43 | +| **Reflection V2 (draft)** | **1** — G44 | +| **Superseded** | **1** — G19 (replaced by G38 + G41) | +| **Not started** | **1** — G35 | +| **Deferred** | **8** — G8, G9, G10, G30, G33, G34, G37, G31 | +| Phase 0 quick wins | 5 active items (2 done) | +| Phase 1 (unblocked) | 4 items (G18, G20, G21, G22) — **actionable now** | +| Phases 2–3 (middleware) | 13 items — **paused**, awaiting upstream G38 | +| Phase 4 (bidi + agent) | 5 items — **blocked**, awaiting upstream G39–G42, G44 | +| Phase 5 (integration) | 1 item (G8) | +| Phase 6 (deferred) | 6 items (vector stores, ecosystem) | +| Critical path length | **4 dependency levels** (G38 → G2 → G1 → G3) | +| External blockers | JS [#4515](https://github.com/firebase/genkit/pull/4515), [#4288](https://github.com/firebase/genkit/pull/4288), [#4212](https://github.com/firebase/genkit/pull/4212); Go [#4422](https://github.com/firebase/genkit/pull/4422) | +| Estimated calendar time to P1 closure | **Depends on upstream** — Phase 1 items completable in ~2–3 weeks | | Plugins needing test uplift | 13 of 20 | | New test files needed (est.) | ~40–50 across all plugins | + +--- + +## 11. Cross-SDK Issue Tracker Analysis + +> **Purpose**: Catalogue real-world issues reported against JS, Go, and Python SDKs on +> GitHub to (a) identify problems that already affect or could affect the Python SDK, +> (b) avoid repeating the same mistakes, and (c) prioritize fixes. Each row records +> the original issue, its category, a Python-applicability verdict, and the +> recommended action. +> +> **Methodology**: Issues were collected from +> [firebase/genkit/issues](https://github.com/firebase/genkit/issues) using +> keyword searches (error, streaming, telemetry, schema, install, etc.) and +> by examining the most upvoted / most recent open issues as of 2026-02-09. + +### 11a. Category Legend + +| Category | Icon | Description | +|----------|:----:|-------------| +| **Bug — Runtime** | 🐛 | Incorrect behavior at runtime (data corruption, crashes, wrong output) | +| **Bug — Schema / Output** | 📐 | JSON Schema generation, structured output, or validation failures | +| **Streaming** | 🌊 | Streaming-specific bugs or missing features | +| **Telemetry / Observability** | 📡 | Tracing, logging, OTel integration issues | +| **DevX / Documentation** | 📖 | Confusing docs, outdated examples, developer friction | +| **Installation / Dependency** | 📦 | Build failures, version pinning, incompatible transitive deps | +| **Plugin Interop** | 🔌 | Plugin-specific bugs or missing capabilities | +| **Error Handling** | ⚠️ | Poor error messages, silent failures, missing error types | +| **Security** | 🔒 | Leaked data, credential handling | +| **Feature Request** | 💡 | Frequently-requested features that improve production readiness | + +### 11b. Python-Applicability Verdicts + +| Verdict | Meaning | +|---------|---------| +| ✅ **Confirmed** | The issue already exists in the Python SDK (verified in code) | +| ⚠️ **Likely** | The Python SDK has similar architecture; the same bug class is probable | +| 🔍 **Investigate** | Needs code audit to confirm; the pattern exists but may differ | +| 🛡️ **Protected** | Python's design already prevents this class of bug | +| ➖ **N/A** | Language or runtime-specific; does not apply to Python | + +### 11c. Bug — Runtime Issues + +| # | Issue | SDK | Summary | Python Verdict | Action / Notes | +|---|-------|:---:|---------|:--------------:|----------------| +| 1 | [#3839](https://github.com/firebase/genkit/issues/3839) | Go | **LookupPrompt caches input and reuses stale values** — prompt template not re-rendered on subsequent calls with different input. Silent data corruption (no runtime error). | 🔍 Investigate | Python's Dotprompt uses Handlebars rendering per-call, but audit `prompt.py` to verify template text is never mutated in place. | +| 2 | [#4264](https://github.com/firebase/genkit/issues/4264) | Go | **Prompt renders incorrect input after initial execution or when used concurrently** — `templateText` appears fragmented and pre-rendered on second run. Duplicate of #3839 class. | 🔍 Investigate | Same class as #3839. Verify Python prompt compilation creates a fresh template each time. | +| 3 | [#4492](https://github.com/firebase/genkit/issues/4492) | **PY** | **Tools with only `ToolRunContext` crash with `PydanticSchemaGenerationError`** — defining a tool with `ctx: ToolRunContext` as the sole parameter causes schema generation to fail at import time; even if bypassed, wrong value dispatched at runtime. | ✅ **Confirmed** | Two bugs: (A) `_registry.py` line 557–561 treats 1-arg `ToolRunContext`-only tool as data input, (B) schema builder tries `TypeAdapter(ToolRunContext)`. Fix: detect context-only signature and skip schema generation. | +| 4 | [#4117](https://github.com/firebase/genkit/issues/4117) | **PY** | **Backend log timestamp leaked into generated text** — `multipart_tool_calling` flow returns text prefixed with `"011-25 15:58:15.908000 +0000 UTC"`. | 🔍 Investigate | Likely model-side artifact (gemini-3-pro-preview), but audit Python's tool response concatenation in `generate.py` to ensure no log contamination in message assembly. | +| 5 | [#4279](https://github.com/firebase/genkit/issues/4279) | JS | **`compat-oai` raw response is always empty** — `response.raw` returns `{}` despite data being present in traces. | ⚠️ Likely | Python `compat-oai` plugin should be audited — check if `raw` field is populated in `GenerateResponse`. The JS bug is in response construction; Python may have the same omission. | + +### 11d. Bug — Schema / Output Issues + +| # | Issue | SDK | Summary | Python Verdict | Action / Notes | +|---|-------|:---:|---------|:--------------:|----------------| +| 6 | [#4119](https://github.com/firebase/genkit/issues/4119) | Go | **`InferJSONSchema` produces invalid schema for repeated struct types** — `{additionalProperties: true}` without `type` field causes Gemini API rejection. | 🛡️ Protected | Python uses Pydantic's `TypeAdapter.json_schema()` which handles repeated types correctly via `$defs`/`$ref`. No action needed. | +| 7 | [#4110](https://github.com/firebase/genkit/issues/4110) | JS | **Schema regression from v1.22 → v1.23** — `$ref` in output schema not resolved before API call, causing `400 Bad Request`. Discriminated unions with `z.discriminatedUnion` broke between versions. | ⚠️ Likely | Python's `gen.go`-based schema sanitizer and Pydantic schema generation should be audited. Verify `$ref` is resolved before sending to Gemini API. Also test discriminated unions via `Literal` + `Union`. | +| 8 | [#2758](https://github.com/firebase/genkit/issues/2758) | JS | **Zod integration pitfalls** — `nullable()`, `describe()`, `literal()` rejected by Gemini; structured output randomly missing properties. | ⚠️ Likely | Python equivalent: Pydantic `Optional`, `Field(description=...)`, `Literal`. Verify these are correctly translated in schema for `google-genai` plugin. Create test cases for edge cases. | +| 9 | [#4350](https://github.com/firebase/genkit/issues/4350) | **PY** | **No handling for malformed JSON in `extract.py`** — `TODO` at line 42. | ✅ **Confirmed** | `extract.py:42` has `# TODO(#4350)`. Implement robust JSON parsing with fallback/repair for model responses that contain markdown fences or trailing commas. | + +### 11e. Streaming Issues + +| # | Issue | SDK | Summary | Python Verdict | Action / Notes | +|---|-------|:---:|---------|:--------------:|----------------| +| 10 | [#3851](https://github.com/firebase/genkit/issues/3851) | Go | **Streaming with tools causes message loss** — final response only includes tool response, ignoring reasoning/previous model messages. | 🔍 Investigate | Audit Python's streaming + tool-calling path in `generate.py`. Verify message history is correctly accumulated across tool call turns during streaming. | +| 11 | [#4036](https://github.com/firebase/genkit/issues/4036) | JS | **Anthropic: `input_json_delta` not supported for streaming tool calls** — server tools stream deltas that aren't parsed. | 🔍 Investigate | If Python's Anthropic plugin supports streaming tool calls, verify delta parsing. Currently likely N/A since Anthropic plugin may not stream tool args. | +| 12 | [#3938](https://github.com/firebase/genkit/issues/3938) | JS | **MCP tool inputs never exposed in `streamResponse.toolRequest`** — streaming responses don't surface tool request arguments. | 🔍 Investigate | Audit Python MCP plugin streaming path. | + +### 11f. Telemetry / Observability Issues + +| # | Issue | SDK | Summary | Python Verdict | Action / Notes | +|---|-------|:---:|---------|:--------------:|----------------| +| 13 | [#2904](https://github.com/firebase/genkit/issues/2904) | JS | **Telemetry doesn't work with Sentry or Elastic APM** — no traces exported when using third-party APM alongside Genkit telemetry. | 🔍 Investigate | Python's OTel integration should be tested with Sentry and Elastic APM Python SDKs. The `web-endpoints-hello` sample already supports Sentry (`sentry_init.py`), but verify trace propagation when both Genkit tracing and Sentry coexist. | +| 14 | [#2278](https://github.com/firebase/genkit/issues/2278) | JS | **Telemetry not exported when flow called from Cloud Function** — traces appear in Dev UI but not in Firebase Console when invoked from a Cloud Function. | ⚠️ Likely | Verify Python SDK flushes traces before the cloud function process exits. Short-lived serverless environments (Cloud Functions, Lambda) may terminate before async OTel export completes. Add `force_flush()` on shutdown. | +| 15 | — | All | **`X-Genkit-Span-Id` header missing in Python reflection server** (documented in §8c.3) | ✅ **Confirmed** | Python's `onTraceStart` callback receives only `tid: str`, not `spanId`. Add `spanId` to callback signature and emit `X-Genkit-Span-Id` response header. | + +### 11g. DevX / Documentation Issues + +| # | Issue | SDK | Summary | Python Verdict | Action / Notes | +|---|-------|:---:|---------|:--------------:|----------------| +| 16 | [#4501](https://github.com/firebase/genkit/issues/4501) | Go | **Documentation is outdated — `ai.Retrieve` doesn't work** — RAG Go examples on genkit.dev use deprecated APIs. | ⚠️ Likely | Python docs should be audited for accuracy. Ensure all code examples in README files and docstrings compile and run against the current SDK version. | +| 17 | [#3810](https://github.com/firebase/genkit/issues/3810) | JS | **Ollama plugin docs claim structured output support but it doesn't work** — developers waste time trying to use `output: { schema }` with Ollama. | ⚠️ Likely | Python Ollama plugin should document what is and isn't supported (structured output, tool calling, streaming). Add `supports` metadata to model definition. | +| 18 | [#3915](https://github.com/firebase/genkit/issues/3915) | JS | **Gemini "free tier" quota errors on first request** — docs say "generous free tier" but users hit immediate `429` quota errors. `limit: 0` for free tier in some regions. | ⚠️ Likely | Python getting-started docs/samples should mention quota limitations and add retry/backoff guidance. The `web-endpoints-hello` sample handles this via circuit breaker, but simpler samples need a note. | +| 19 | [#2758](https://github.com/firebase/genkit/issues/2758) | JS | **Schema definition pitfalls not documented** — `nullable()`, `describe()`, `literal()` silently fail or get rejected. | ⚠️ Likely | Document which Pydantic field types/options are fully supported by each provider (Gemini, Vertex, Anthropic, OpenAI). Add a "Schema Compatibility" section to Python plugin docs. | + +### 11h. Installation / Dependency Issues + +| # | Issue | SDK | Summary | Python Verdict | Action / Notes | +|---|-------|:---:|---------|:--------------:|----------------| +| 20 | [#2771](https://github.com/firebase/genkit/issues/2771) | Go | **Genkit v0.5.1 won't build with OTel SDK v1.35.0** — `instrumentation.Library` deprecated in favor of `instrumentation.Scope`, causing compile failure. | 🔍 Investigate | Python pins OTel versions in `pyproject.toml`. Run `uv pip check` and verify no version conflicts with latest `opentelemetry-sdk`. Add lower-bound checks in CI. | +| 21 | — | All | **CLI installation has wrong architecture for darwin-x64** — reported for the `genkit` CLI binary. | ➖ N/A | Python SDK doesn't ship native binaries. However, ensure `setup.sh` in samples detects architecture correctly when installing the genkit CLI. | +| 22 | — | All | **CI/CD interrupted by cookie/analytics prompt** — CLI tooling shows interactive prompts in headless environments. | ⚠️ Likely | Python's `genkit start` may show similar prompts. Ensure `--non-interactive` or `CI=true` suppresses all prompts. Test in CI matrix. | + +### 11i. Plugin Interop Issues + +| # | Issue | SDK | Summary | Python Verdict | Action / Notes | +|---|-------|:---:|---------|:--------------:|----------------| +| 23 | [#4490](https://github.com/firebase/genkit/issues/4490) | Go | **Cannot use moondream:v2 with Ollama plugin** — models are statically defined; any model not in the hardcoded list fails with "model not found". | 🔍 Investigate | Verify Python Ollama plugin allows arbitrary model names. If models are statically listed, add a pass-through for unknown model names. | +| 24 | [#3651](https://github.com/firebase/genkit/issues/3651) | JS | **Vertex AI plugin uses wrong URL for `location: 'global'`** — constructs `https://global-aiplatform.googleapis.com` (404) instead of `https://aiplatform.googleapis.com`. | 🔍 Investigate | Check Python `vertex-ai` plugin for the same URL construction pattern. The Google `genai` Python SDK may handle this correctly, but verify. | +| 25 | [#4299](https://github.com/firebase/genkit/issues/4299) | Go | **MCP client silently swallows initialization errors** — `NewGenkitMCPClient` returns `nil` error on misconfigured `BaseURL`; user only discovers failure on first tool call. | 🔍 Investigate | Audit Python MCP plugin's `__init__` / connection setup. Ensure initialization errors (bad URL, connection refused, auth failure) are raised immediately, not deferred. | + +### 11j. Error Handling Issues + +| # | Issue | SDK | Summary | Python Verdict | Action / Notes | +|---|-------|:---:|---------|:--------------:|----------------| +| 26 | [#4336](https://github.com/firebase/genkit/issues/4336) | **PY** | **`GenerationBlockedError` should extend `GenkitError`** — `TODO` at `generate.py:1034`. Currently a bare exception, making it hard to catch in a typed error hierarchy. | ✅ **Confirmed** | Implement the error hierarchy. `GenerationBlockedError(GenkitError)` enables structured error handling and consistent HTTP status code mapping. | +| 27 | [#4347](https://github.com/firebase/genkit/issues/4347) | **PY** | **Tool arguments not validated against schema** — `TODO` at `tools.py:212`. Models can pass invalid args and the tool receives garbage. | ✅ **Confirmed** | Implement Pydantic validation before dispatching to tool function. Return structured error to model on validation failure (enables retry). | +| 28 | [#4365](https://github.com/firebase/genkit/issues/4365) | **PY** | **MCP tool args not validated against schema** — similar to #4347 but for MCP-sourced tools. | ✅ **Confirmed** | Same fix pattern as #4347. | + +### 11k. Security Issues + +| # | Issue | SDK | Summary | Python Verdict | Action / Notes | +|---|-------|:---:|---------|:--------------:|----------------| +| 29 | [#4117](https://github.com/firebase/genkit/issues/4117) | **PY** | **Backend log timestamp leaked into generated text** — internal timestamps appear in model output. If log messages contain secrets (API keys, user data), this is a data leak vector. | 🔍 Investigate | Audit log formatters and verify structured logging (`log_config.py`) never injects into model message assembly. The `web-endpoints-hello` sample's secret masking processor is best practice. | + +### 11l. Feature Requests (Production Readiness) + +| # | Issue | SDK | Summary | Python Verdict | Action / Notes | +|---|-------|:---:|---------|:--------------:|----------------| +| 30 | [#1598](https://github.com/firebase/genkit/issues/1598) | JS | **Allow changing API key per-request in `generate()`** — multi-tenant apps need per-customer API keys. Currently must create separate Genkit instances. | 💡 Design | Python should support per-request auth override. Consider `ai.generate(config=ModelConfig(api_key="..."))` or a context-based approach. This is critical for SaaS/multi-tenant deployments. | +| 31 | [#663](https://github.com/firebase/genkit/issues/663) | JS | **Support tool calling for models without native support** — simulate tool calling via prompt injection for Ollama/local models. | 💡 Design | This maps to the missing `simulateConstrainedGeneration` middleware (Gap G3 in §8f). When implemented, it would also cover simulated tool calling. | +| 32 | [#4468](https://github.com/firebase/genkit/issues/4468) | All | **RFC: Agents** — first-class agent support with multi-turn planning, memory, and tool orchestration. | 💡 Track | Monitor RFC progress. Python implementation should follow the same API surface as JS. | +| 33 | [#4467](https://github.com/firebase/genkit/issues/4467) | All | **RFC: Session flows** — stateful multi-turn conversations with persistent context. | 💡 Track | Monitor RFC progress. Python's async-first design is well-suited for session management. | +| 34 | [#4466](https://github.com/firebase/genkit/issues/4466) | All | **RFC: Middleware V2** — redesign of the middleware system for composability and layering. | 💡 Track | Directly addresses Python's single-layer middleware gap (§8b). Wait for RFC to stabilize before implementing. | + +### 11m. Priority Matrix — Python Actions from Issue Tracker + +| Priority | Issue(s) | Category | Action | Effort | +|:--------:|----------|----------|--------|:------:| +| **P0** | #4492 | 🐛 Bug | Fix context-only tool crash + dispatch | S | +| **P0** | #4350 | 📐 Schema | Implement malformed JSON handling in `extract.py` | M | +| **P0** | #4347, #4365 | ⚠️ Error | Validate tool args against schema | M | +| **P0** | #4336 | ⚠️ Error | `GenerationBlockedError` → extend `GenkitError` | S | +| **P1** | #4279 analog | 🔌 Plugin | Audit `compat-oai` raw response population | S | +| **P1** | #3851 analog | 🌊 Stream | Audit streaming + tool-calling message accumulation | M | +| **P1** | §8c.3 | 📡 Telemetry | Add `X-Genkit-Span-Id` header to reflection server | S | +| **P1** | #2278 analog | 📡 Telemetry | Add `force_flush()` for serverless environments | S | +| **P2** | #4490 analog | 🔌 Plugin | Verify Ollama plugin allows arbitrary model names | S | +| **P2** | #3651 analog | 🔌 Plugin | Audit Vertex AI `global` location URL construction | S | +| **P2** | #4299 analog | 🔌 Plugin | Audit MCP client init error surfacing | S | +| **P2** | #3810 analog | 📖 DevX | Document plugin capability matrices (structured output, tools, streaming) | M | +| **P2** | #4110 analog | 📐 Schema | Test discriminated unions / `$ref` resolution with Gemini API | M | +| **P2** | #1598 | 💡 Feature | Design per-request API key override | L | +| **P3** | #3839 analog | 🐛 Bug | Audit prompt template mutation safety | S | +| **P3** | #4117 | 🔒 Security | Audit log/model output isolation | S | +| **P3** | RFCs | 💡 Feature | Track Agent, Session, Middleware V2 RFCs | — | + +**Effort**: S = small (< 1 day), M = medium (1–3 days), L = large (3+ days) + +### 11n. Summary + +| Metric | Count | +|--------|:-----:| +| Total issues analyzed | 34 | +| ✅ Confirmed in Python | 5 (#4492, #4350, #4347, #4365, #4336 + §8c.3 span header) | +| ⚠️ Likely applicable | 9 | +| 🔍 Needs investigation | 12 | +| 🛡️ Already protected | 1 | +| ➖ Not applicable | 2 | +| 💡 Feature requests to track | 5 | +| **P0 actions (immediate)** | **4 work items** | +| **P1 actions (next sprint)** | **4 work items** | +| **P2 actions (planned)** | **7 work items** | +| **P3 actions (backlog)** | **3 work items** | + +--- + +## 12. Fixability Assessment — "⚠️ Likely" Issues in Python + +> Each of the 9 "⚠️ Likely applicable" issues from §11 was verified against the +> Python SDK source. Below is the code-level verdict and recommended action. + +### 12a. Fixable in Python Code (5 of 9) + +| # | Issue | Category | Code Location | Verdict | Fix | +|---|-------|----------|---------------|---------|-----| +| 5 | [#4279](https://github.com/firebase/genkit/issues/4279) — `compat-oai` raw response empty | 🔌 Plugin | `compat-oai/models/*.py` — no `custom=` field set on `GenerateResponseData` | **Fixable** | Populate `custom` field with the raw API response dict in all compat-oai model response constructors. | +| 7 | [#4110](https://github.com/firebase/genkit/issues/4110) — Schema `$ref` regression | 📐 Schema | `google-genai/models/gemini.py:1090–1119` — `_convert_schema_property()` resolves `$ref` via `$defs` | **Already handled** ✅ but needs test coverage | Add test cases for `Literal` + `Union` discriminated unions, recursive schemas, and deeply nested `$ref`. | +| 8 | [#2758](https://github.com/firebase/genkit/issues/2758) — Pydantic schema pitfalls | 📐 Schema | `google-genai/models/gemini.py` schema conversion | **Fixable** | Write provider-specific schema compat tests for `Optional`, `Field(description=...)`, `Literal`, nested unions. | +| 14 | [#2278](https://github.com/firebase/genkit/issues/2278) — Telemetry not exported in serverless | 📡 Telemetry | `genkit/core/trace/` — `force_flush()` exists but not auto-called on exit | **Fixable** | Add `atexit` handler or document `ai.close()` / `force_flush()` requirement for serverless. | +| 22 | CI/CD interactive prompt | 📦 Install | `genkit start` CLI tooling | **Fixable** | Verify `CI=true` suppresses prompts; add `GENKIT_NONINTERACTIVE=1` support if needed. | + +### 12b. Documentation / Audit Only (3 of 9) + +| # | Issue | Category | Verdict | Action | +|---|-------|----------|---------|--------| +| 16 | [#4501](https://github.com/firebase/genkit/issues/4501) — Outdated docs | 📖 DevX | **Docs audit** | Run all README/docstring examples against current SDK; fix failures. | +| 17 | [#3810](https://github.com/firebase/genkit/issues/3810) — Ollama structured output misleading | 📖 DevX | **🛡️ Already protected** — Python Ollama plugin allows arbitrary models via `resolve()` fallback and declares `'output': ['text', 'json'], 'constrained': 'all'` | Document which Ollama models reliably produce JSON mode output. | +| 18 | [#3915](https://github.com/firebase/genkit/issues/3915) — Gemini quota errors | 📖 DevX | **Docs task** | Add quota/rate-limit notes to getting-started samples. | + +### 12c. Already Protected (1 of 9) + +| # | Issue | Category | Verdict | +|---|-------|----------|---------| +| 19 | [#2758](https://github.com/firebase/genkit/issues/2758) (dup) — Schema pitfalls undocumented | 📖 DevX | Same as #8 — code fix is schema testing; doc fix is compatibility matrix. | + +--- + +## 13. Dependency Graph & Reverse Topological Sort Roadmap + +### 13a. Dependency Graph + +Each node is a work item. An arrow A → B means "B depends on A" (A must land first). + +``` + ┌───────────────────────────────────────────────────────────┐ + │ DEPENDENCY GRAPH │ + │ (arrows = "must land before") │ + └───────────────────────────────────────────────────────────┘ + + ╔══════════════════════════════════════════════════════════════════╗ + ║ LAYER 0 — No dependencies (all independent, can run parallel) ║ + ╚══════════════════════════════════════════════════════════════════╝ + + ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ + │ W1: Error │ │ W2: Context-only │ │ W3: Malformed │ + │ hierarchy │ │ tool crash │ │ JSON handling │ + │ #4336 + #4346 │ │ #4492 │ │ #4350 │ + │ generate.py │ │ _registry.py │ │ extract.py │ + │ tools.py │ │ │ │ │ + └────────┬─────────┘ └──────────────────┘ └──────────────────┘ + │ + │ (establishes GenkitError base) + ▼ + ╔══════════════════════════════════════════════════════════════════╗ + ║ LAYER 1 — Depends on W1 (error hierarchy) ║ + ╚══════════════════════════════════════════════════════════════════╝ + + ┌──────────────────┐ + │ W4: Tool arg │ + │ validation │ + │ #4347 + #4365 │ + │ tools.py │ + │ (uses GenkitError│ + │ for validation │ + │ errors) │ + └────────┬─────────┘ + │ + │ (validation relies on error types + schema infra) + ▼ + ╔══════════════════════════════════════════════════════════════════╗ + ║ LAYER 2 — Depends on W4 (validation infrastructure) ║ + ╚══════════════════════════════════════════════════════════════════╝ + + ┌──────────────────┐ ┌──────────────────┐ + │ W5: compat-oai │ │ W6: Streaming + │ + │ raw response │ │ tools message │ + │ #4279 analog │ │ accumulation │ + │ compat-oai/*.py │ │ #3851 analog │ + │ │ │ generate.py │ + └──────────────────┘ └──────────────────┘ + + ╔══════════════════════════════════════════════════════════════════╗ + ║ LAYER 2 (parallel) — No core deps, can run alongside W4 ║ + ╚══════════════════════════════════════════════════════════════════╝ + + ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ + │ W7: Span-Id │ │ W8: force_flush │ │ W9: Schema │ + │ header │ │ serverless │ │ compat tests │ + │ §8c.3 │ │ #2278 analog │ │ #4110 + #2758 │ + │ reflection API │ │ trace/*.py │ │ google-genai │ + └──────────────────┘ └──────────────────┘ └──────────────────┘ + + ╔══════════════════════════════════════════════════════════════════╗ + ║ LAYER 3 — Depends on W9 (schema compat tests) ║ + ╚══════════════════════════════════════════════════════════════════╝ + + ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ + │ W10: Plugin │ │ W11: Vertex AI │ │ W12: MCP init │ + │ capability docs │ │ global URL │ │ error surfacing │ + │ #3810 analog │ │ #3651 analog │ │ #4299 analog │ + │ README files │ │ vertex-ai plugin │ │ mcp plugin │ + └──────────────────┘ └──────────────────┘ └──────────────────┘ + + ╔══════════════════════════════════════════════════════════════════╗ + ║ LAYER 4 — Feature design (long-term) ║ + ╚══════════════════════════════════════════════════════════════════╝ + + ┌──────────────────┐ ┌──────────────────┐ + │ W13: Per-request │ │ W14: RFC │ + │ API key override │ │ tracking │ + │ #1598 │ │ Agents/Sessions/ │ + │ genkit core │ │ Middleware V2 │ + └──────────────────┘ └──────────────────┘ +``` + +### 13b. File Conflict Matrix + +Work items touching the same file must be ordered or merged into one PR: + +| File | Work Items | Conflict? | Resolution | +|------|:---------:|:---------:|------------| +| `blocks/generate.py` | W1, W6 | ⚠️ Yes | W1 lands first (error class at EOF), then W6 (message accumulation in body) | +| `blocks/tools.py` | W1, W4 | ⚠️ Yes | W1 lands first (`ToolInterruptError` base class), then W4 (validation) | +| `ai/_registry.py` | W2 | — | No conflicts | +| `core/extract.py` | W3 | — | No conflicts | +| `compat-oai/models/*.py` | W5 | — | No conflicts | +| `core/trace/*.py` | W8 | — | No conflicts | +| `google-genai/models/gemini.py` | W9 | — | No conflicts | + +### 13c. Reverse Topological Sort — Execution Order + +Items are listed in **dependency-safe order** (leaves first). Items at the same +layer can execute in parallel. + +``` +Sprint 1 (P0 — immediate, ~3 days) +────────────────────────────────── + [parallel] + ├── PR-A: W1 — Error hierarchy (#4336 + #4346) + ├── PR-B: W2 — Context-only tool crash (#4492) + └── PR-C: W3 — Malformed JSON handling (#4350) + + [sequential after PR-A] + └── PR-D: W4 — Tool arg validation (#4347 + #4365) + +Sprint 2 (P1 — next sprint, ~4 days) +───────────────────────────────────── + [parallel] + ├── PR-E: W5 — compat-oai raw response (#4279 analog) + ├── PR-F: W6 — Streaming + tools message audit (#3851 analog) + ├── PR-G: W7 — X-Genkit-Span-Id header (§8c.3) + └── PR-H: W8 — force_flush for serverless (#2278 analog) + +Sprint 3 (P2 — planned, ~5 days) +────────────────────────────────── + [parallel] + ├── PR-I: W9 — Schema compat tests (#4110 + #2758) + ├── PR-J: W11 — Vertex AI global URL audit (#3651 analog) + └── PR-K: W12 — MCP init error surfacing (#4299 analog) + + [after PR-I] + └── PR-L: W10 — Plugin capability docs (#3810 analog) + +Sprint 4+ (P3/backlog) +───────────────────── + PR-M: W13 — Per-request API key override (design RFC) + PR-N: W14 — Track Agent/Session/Middleware V2 RFCs +``` + +### 13d. PR Manifest with Regression Tests + +| PR | Branch | Work Items | Files Changed | Regression Tests Required | Commit Message | +|----|--------|:----------:|:-------------:|---------------------------|----------------| +| **A** | `yesudeep/fix/error-hierarchy` | W1 | `generate.py`, `tools.py` | `test_generation_response_error_is_genkit_error`, `test_tool_interrupt_error_is_genkit_error`, `test_generation_blocked_error_http_status` | `fix(py/core): make GenerationResponseError and ToolInterruptError extend GenkitError` | +| **B** | `yesudeep/fix/context-only-tool` | W2 | `_registry.py` | `test_tool_with_only_context_param`, `test_tool_with_context_and_input`, `test_tool_with_no_params`, `test_tool_schema_skips_context_type` | `fix(py/core): handle tools with only ToolRunContext parameter` | +| **C** | `yesudeep/fix/malformed-json` | W3 | `extract.py` | `test_extract_json_markdown_fences`, `test_extract_json_trailing_comma`, `test_extract_json_bare_string`, `test_parse_partial_json_incomplete`, `test_extract_json_with_code_block` | `fix(py/core): handle malformed JSON in extract.py` | +| **D** | `yesudeep/fix/tool-validation` | W4 | `tools.py`, `generate.py` | `test_tool_validates_input_schema`, `test_tool_validation_error_message`, `test_mcp_tool_validates_input`, `test_tool_validation_allows_valid_input` | `fix(py/core): validate tool arguments against schema before dispatch` | +| **E** | `yesudeep/fix/compat-oai-raw` | W5 | `compat-oai/models/*.py` | `test_chat_response_has_raw_data`, `test_image_response_has_raw_data`, `test_audio_response_has_raw_data` | `fix(py/compat-oai): populate custom/raw field on GenerateResponseData` | +| **F** | `yesudeep/audit/streaming-tools` | W6 | `generate.py` (audit) | `test_streaming_tool_calls_preserve_messages`, `test_streaming_multi_turn_history` | `fix(py/core): preserve message history during streaming tool calls` | +| **G** | `yesudeep/fix/span-id-header` | W7 | `web/manager/*.py` | `test_reflection_response_has_span_id_header` | `fix(py/core): add X-Genkit-Span-Id header to reflection server` | +| **H** | `yesudeep/fix/serverless-flush` | W8 | `ai/_aio.py`, `core/trace/*.py` | `test_force_flush_called_on_close`, `test_atexit_handler_registered` | `fix(py/core): ensure trace flush in serverless environments` | +| **I** | `yesudeep/test/schema-compat` | W9 | `tests/` (new test files) | `test_discriminated_union_schema`, `test_recursive_schema_ref`, `test_optional_field_schema`, `test_literal_field_schema`, `test_nested_ref_resolution` | `test(py/google-genai): add schema compatibility tests for Pydantic edge cases` | +| **J** | `yesudeep/audit/vertex-global-url` | W11 | `vertex-ai/` (audit) | `test_global_location_url_construction` | `fix(py/vertex-ai): audit global location URL construction` | +| **K** | `yesudeep/fix/mcp-init-errors` | W12 | `mcp/` plugin | `test_mcp_init_bad_url_raises`, `test_mcp_init_connection_refused_raises` | `fix(py/mcp): surface initialization errors immediately` | +| **L** | `yesudeep/docs/plugin-capabilities` | W10 | `README.md` files | — (docs only) | `docs(py/plugins): add capability matrices for structured output, tools, streaming` | + +### 13e. Regression Test Specifications + +Each test below targets a specific bug to prevent regressions. + +#### PR-A: Error Hierarchy Tests + +```python +# tests/genkit/blocks/generate_error_test.py +def test_generation_response_error_is_genkit_error(): + """GenerationResponseError must be a subclass of GenkitError.""" + assert issubclass(GenerationResponseError, GenkitError) + +def test_generation_response_error_has_status(): + """GenerationResponseError must have a status field for HTTP mapping.""" + err = GenerationResponseError(response=mock_response, message="blocked", + status="FAILED_PRECONDITION", details={}) + assert err.status == "FAILED_PRECONDITION" + +# tests/genkit/blocks/tools_error_test.py +def test_tool_interrupt_error_is_genkit_error(): + """ToolInterruptError must be a subclass of GenkitError.""" + assert issubclass(ToolInterruptError, GenkitError) +``` + +#### PR-B: Context-Only Tool Tests + +```python +# tests/genkit/ai/tool_context_test.py +def test_tool_with_only_context_param(): + """A tool with only ToolRunContext must not crash at registration.""" + @ai.tool() + def my_tool(ctx: ToolRunContext) -> str: + return "ok" + # Should not raise PydanticSchemaGenerationError + assert my_tool is not None + +def test_tool_with_no_params(): + """A tool with no params must register and execute.""" + @ai.tool() + def no_params_tool() -> str: + return "hello" + assert no_params_tool is not None + +def test_tool_schema_skips_context_type(): + """Schema generation must skip ToolRunContext, not try to build schema for it.""" + @ai.tool() + def ctx_tool(ctx: ToolRunContext) -> str: + return "ok" + action = ai.registry.lookup_action(ActionKind.TOOL, "ctx_tool") + assert action.input_schema is None or "ToolRunContext" not in str(action.input_schema) +``` + +#### PR-C: Malformed JSON Tests + +```python +# tests/genkit/core/extract_malformed_test.py +def test_extract_json_markdown_fences(): + """JSON wrapped in ```json ... ``` fences must be extracted.""" + text = '```json\n{"key": "value"}\n```' + assert extract_json(text) == {"key": "value"} + +def test_extract_json_with_code_block(): + """JSON inside a markdown code block with extra text must be extracted.""" + text = 'Here is the result:\n```json\n{"name": "test"}\n```\nDone.' + assert extract_json(text) == {"name": "test"} + +def test_extract_json_trailing_comma(): + """JSON with trailing comma must be parsed (json5 handles this).""" + text = '{"key": "value",}' + result = extract_json(text) + assert result == {"key": "value"} +``` + +#### PR-D: Tool Validation Tests + +```python +# tests/genkit/blocks/tool_validation_test.py +def test_tool_validates_input_schema(): + """Invalid tool arguments must raise a validation error, not crash the tool.""" + @ai.tool() + def typed_tool(input: MyModel) -> str: + return input.name + # Passing invalid input should raise structured error + with pytest.raises(GenkitError) as exc_info: + await typed_tool.action.arun({"invalid_field": 123}) + assert "validation" in str(exc_info.value).lower() + +def test_tool_validation_allows_valid_input(): + """Valid tool arguments must pass validation and execute normally.""" + @ai.tool() + def typed_tool(input: MyModel) -> str: + return input.name + result = await typed_tool.action.arun({"name": "test"}) + assert result.response == "test" +``` + +--- + +## 14. Model Conformance Roadmap + +> Source: Cross-runtime model conformance testing framework from KI +> `genkit_model_conformance`. The Python SDK follows a phased approach to ensure +> all model provider plugins exhibit identical behavior to the JS canonical +> implementation. + +### 14a. Architecture + +``` + py/bin/test-model-conformance + | + v + genkit dev:test-model --from-file spec.yaml + | + discovers runtime + | + v + Reflection Server (:3100) + | + /api/runAction + | + v + Plugin: GoogleAI / Anthropic / etc. + ^ + | + conformance_entry.py +``` + +### 14b. Phased Execution Plan + +| Phase | Target | Status | Key Tasks | +|:-----:|--------|:------:|-----------| +| **0** | Foundations | ✅ Done | Imagen support under `googleai/` prefix; directory tree setup | +| **1** | Specs & Entry Points | ✅ Done | Symlink JS specs; create `conformance_entry.py` per plugin; YAML specs for anthropic/compat-oai | +| **2** | Orchestration | ✅ Done | `py/bin/test-model-conformance` script; `uv run --project` integration | +| **3** | Validation | ✅ Done | Discovery across 11 providers verified; multimodal parity (PR #4477) | +| **4** | Remaining Gaps | 📋 Planned | xAI image gen, MS Foundry multimodal, Ollama metadata, final google-genai pass | + +### 14c. Plugin Parity Matrix + +| Plugin | JS Name | Python Name | Parity | Key Gap | +|--------|---------|-------------|:------:|---------| +| **Anthropic** | `@genkit-ai/anthropic` | `genkit-plugin-anthropic` | ✅ Full + superset | `output_config.effort` minor | +| **Google GenAI** | `@genkit-ai/google-genai` | `genkit-plugin-google-genai` | ✅ Full | — | +| **Vertex AI** | `@genkit-ai/vertexai` | `genkit-plugin-vertex-ai` | ✅ Full | — | +| **OpenAI** | `@genkit-ai/compat-oai/openai` | `genkit-plugin-compat-oai` | ⚠️ Minor | Embeddings, GPT-5 refs, `gpt-image-1` ext config | +| **xAI** | `@genkit-ai/compat-oai/xai` | `genkit-plugin-xai` | ⚠️ Medium | `grok-2-image-1212`, `deferred`, `webSearchOptions`, `reasoningEffort` | +| **DeepSeek** | `@genkit-ai/compat-oai/deepseek` | `genkit-plugin-deepseek` | ✅ Superset | Python has V3, R1 | +| **Ollama** | `@genkit-ai/ollama` | `genkit-plugin-ollama` | ⚠️ Metadata | Missing `media`, `toolChoice` flags | +| **Amazon Bedrock** | External | `genkit-plugin-amazon-bedrock` | 🟢 Superset | — | +| **Microsoft Foundry** | External | `genkit-plugin-microsoft-foundry` | ⚠️ Missing | DALL-E, TTS, Whisper not ported | +| **Mistral** | N/A | `genkit-plugin-mistral` | 🟢 Python-only | — | +| **Hugging Face** | N/A | `genkit-plugin-huggingface` | 🟢 Python-only | — | +| **Cloudflare** | N/A | `genkit-plugin-cloudflare-workers-ai` | 🟢 Python-only | — | +| **Cohere** | N/A | `genkit-plugin-cohere` | 🟢 Python-only | — | + +### 14d. Conformance Priority Actions + +| Priority | Action | Plugin | Effort | +|:--------:|--------|--------|:------:| +| P1 | Add `media` and `toolChoice` metadata flags | Ollama | S | +| P1 | Add embeddings support | compat-oai | M | +| P2 | Add `grok-2-image-1212` image generation | xAI | M | +| P2 | Add `gpt-image-1` extended config | compat-oai | S | +| P2 | Add `deferred`, `webSearchOptions`, `reasoningEffort` | xAI | S | +| P3 | Add DALL-E/TTS/Whisper | Microsoft Foundry | M | +| P3 | Add GPT-5 model refs | compat-oai | S | +| P4 | Add `output_config.effort` for opus-4-5 | Anthropic | S | + +### 14e. Sample Coverage Audit + +| Sample | Basic | Stream | Tools | Struct | Vision | Embed | Code | Reason | TTS/STT | Cache | PDF | RAG | +|--------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| +| **amazon-bedrock** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | +| **anthropic** | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | +| **cloudflare** | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | +| **compat-oai** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | +| **deepseek** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | +| **google-genai** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | +| **huggingface** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | +| **ms-foundry** | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | +| **mistral** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | +| **ollama** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | +| **xai** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | + +### 14f. JS-Only Plugins Not Yet in Python + +| Plugin | Purpose | Python Priority | +|--------|---------|:---------------:| +| **Chroma** | Vector store (ChromaDB) | Medium | +| **Pinecone** | Vector store (Pinecone) | Medium | +| **Cloud SQL PG** | Vector store (PostgreSQL) | Low | +| **LangChain** | LangChain integration | Low | +| **Checks** | Safety/content evaluation | ✅ Merged (#4504) | + +### 14g. Conformance PR Mapping + +| Phase | PR | Description | Status | +|:-----:|---:|-------------|:------:| +| P0 | #4472 | Imagen support under `googleai/` prefix | ✅ Merged | +| P0 | #4474 | Model conformance testing plan | ✅ Merged | +| P0/P1/P2 | #4473 | Conformance test infrastructure | ✅ Merged | +| P2+ | #4476 | Specs for remaining 8 providers | ✅ Merged | +| P3 | #4477 | compat-oai multimodal (image, TTS, STT) | ✅ Merged | +| Core | #4401 | Reflection API v2 (WebSocket + JSON-RPC 2.0) | 🔄 Active | +| P4 | — | xAI image generation | 📋 Planned | +| P4 | — | Microsoft Foundry multimodal | 📋 Planned | +| P4 | — | Ollama metadata parity | 📋 Planned | + +--- + +## 15. Combined Roadmap — All Streams + +> This section unifies the parity gaps (§7–10), issue tracker fixes (§11–13), +> and model conformance work (§14) into a single prioritized roadmap. + +### 15a. Sprint Plan + +| Sprint | Timeline | Work Items | PRs | Dependencies | +|:------:|:--------:|:-----------|:---:|:------------:| +| **S1** | Week 1 | W1 (error hierarchy), W2 (context-only tool), W3 (malformed JSON) | A, B, C | None | +| **S1** | Week 1 | W4 (tool validation) — after PR-A lands | D | A | +| **S2** | Week 2 | W5 (compat-oai raw), W6 (streaming audit), W7 (span-id), W8 (force_flush) | E, F, G, H | None | +| **S2** | Week 2 | Ollama metadata flags (conformance P1) | — | None | +| **S3** | Week 3 | W9 (schema compat tests), W11 (vertex URL), W12 (MCP init errors) | I, J, K | None | +| **S3** | Week 3 | W10 (plugin capability docs) — after PR-I | L | I | +| **S3** | Week 3 | compat-oai embeddings (conformance P1) | — | None | +| **S4+** | Week 4+ | Per-request API key design, xAI image gen, MS Foundry multimodal | M, — | RFC | +| **S4+** | Week 4+ | Track Agent/Session/Middleware V2 RFCs | N | External | + +### 15b. PR Status (as of 2026-02-11) + +#### Recently Merged (since 2026-02-09) + +| PR | Title | Merged | Relates To | +|---:|-------|:------:|:----------:| +| #4519 | fix(py/core): `arun_raw` None input validation | 2026-02-09 | OSS compliance | +| #4522 | docs(py): architecture diagrams, concepts table | 2026-02-09 | Documentation | +| #4524 | fix(py): CI license check failures, lint | 2026-02-09 | Tooling | +| #4504 | feat(py/checks): Google Checks AI Safety plugin | 2026-02-09 | Plugin — Checks | +| #4541 | fix(py): uv.lock out of sync | 2026-02-10 | Workspace | +| #4544 | docs(py): release roadmap and orchestration | 2026-02-10 | Release tooling | +| #4547 | fix(py/samples): endpoints sample resilience | 2026-02-10 | Sample — web-endpoints | +| #4548 | feat(py/tools): releasekit — release orchestration | 2026-02-10 | Release tooling | +| #4550 | feat(py/tools): releasekit phase 1 — workspace + graph | 2026-02-10 | Release tooling | +| #4555 | feat(py/tools): releasekit phase 2 — versioning, bump, pin | 2026-02-10 | Release tooling | +| #4556 | feat(releasekit): phase 3 publish MVP | 2026-02-10 | Release tooling | +| #4558 | feat(releasekit): phase 4 Rich Live progress table | 2026-02-10 | Release tooling | +| #4561 | fix(py/plugins/flask): remove cyclical dependency | 2026-02-11 | Plugin — Flask | +| #4563 | feat(releasekit): comprehensive check command | 2026-02-11 | Release tooling | +| #4564 | feat(releasekit): checksum verification + preflight | 2026-02-11 | Release tooling | +| #4565 | feat(releasekit): dependency-triggered scheduler | 2026-02-11 | Release tooling | +| #4569 | feat(releasekit): dynamic scheduler add/remove | 2026-02-11 | Release tooling | +| #4570 | feat(releasekit): tags, changelog, release notes | 2026-02-11 | Release tooling | +| #4571 | fix(py): add missing LICENSE to samples | 2026-02-11 | OSS compliance | +| #4572 | feat(releasekit): Phase 6 UX polish | 2026-02-11 | Release tooling | +| #4574 | feat(releasekit): async refactoring + test suite | 2026-02-11 | Release tooling | +| #4575 | docs(releasekit): adopt release-please model | 2026-02-11 | Release tooling | +| #4577 | feat(releasekit): Forge protocol, transitive propagation | 2026-02-11 | Release tooling | + +#### Closed (Superseded) + +| PR | Title | Status | Notes | +|---:|-------|:------:|:------| +| #4510 | feat(py): model middleware parity | ❌ Closed | Superseded by new approach | +| #4516 | feat(py): model-level middleware support | ❌ Closed | Superseded | +| #4521 | feat(py/core): api_key() context provider | ❌ Closed | Superseded | + +#### Currently Open + +| PR | Title | Status | Relates To | +|---:|-------|:------:|:----------:| +| #4401 | feat(py): Reflection API v2 (WebSocket + JSON-RPC) | 🔄 Active | Conformance core | +| #4512 | feat(py/genkit): Genkit constructor parity | 🔄 Open | §14e samples | +| #4513 | feat(py/genkit): multipart tool support | 🔄 Open | Gap G18 | +| #4517 | docs(py): PARITY_AUDIT.md update | 🔄 Open | This document | +| #4538 | fix(py/ai): dotprompt input.default for DevUI | 🔄 Open | Dotprompt | +| #4549 | fix(py/core): guard RealtimeSpanProcessor export | 🔄 Open | Telemetry | +| #4578 | fix(js): duplicate sample project names | 🔄 Open | Cross-SDK | +| #4584 | fix(py/genkit): framework classifiers, Changelog URL | 🔄 Open | Release prep | +| #4585 | docs(releasekit): README, roadmap, CHANGELOG | 🔄 Open | Release tooling docs | +| #4586 | ci(releasekit): migrate publish_python.yml | 🔄 Open | CI automation | +| #4587 | feat(releasekit): log view keyboard shortcut | 🔄 Open | Release tooling UX | + +### 15c. Summary Metrics + +| Metric | Value | +|--------|:-----:| +| Total work items (issue tracker) | 14 (W1–W14) | +| Total work items (conformance) | 8 (P1–P4) | +| Total work items (parity gaps §7) | 30 (G1–G37) | +| **Combined unique actions** | **~45** | +| PRs merged (total since §15 inception) | **31** | +| PRs currently open | **11** | +| PRs closed (superseded) | **3** | +| PRs in Sprint 1 (P0) | 4 (A, B, C, D) | +| PRs in Sprint 2 (P1) | 4 (E, F, G, H) | +| PRs in Sprint 3 (P2) | 4 (I, J, K, L) | +| Estimated weeks to P0 closure | 1 week | +| Estimated weeks to P1 closure | 2 weeks | +| Estimated weeks to P2 closure | 3 weeks | +| Regression tests required | ~35 new test functions across 12 PRs | +| **New: releasekit (release tooling)** | **14 PRs merged, 3 PRs open** | + +--- + +## 16. Sample Flow Test Plan — Optimal Error Detection Order + +> **Goal**: Execute sample flows in an order that maximizes early bug detection. +> The strategy: exercise **core framework features first** (where bugs affect +> all providers), then test **the cheapest provider** (Google GenAI free tier), +> then progressively test more specialized providers. + +### 16a. Execution Order Rationale + +``` + ┌─────────────────────────────────────────────────────┐ + │ ERROR DETECTION PRIORITY PYRAMID │ + │ │ + │ Layer 1 (Core Framework) ← Bugs here affect │ + │ ┌─────────────────────┐ ALL providers │ + │ │ Tools, Streaming, │ │ + │ │ Structured Output, │ Test FIRST │ + │ │ Interrupts, Formats │ │ + │ └─────────────────────┘ │ + │ │ + │ Layer 2 (Cheapest Provider) ← Free tier = fast, │ + │ ┌─────────────────────┐ cheap validation │ + │ │ Google GenAI │ │ + │ │ (Gemini free tier) │ Test SECOND │ + │ └─────────────────────┘ │ + │ │ + │ Layer 3 (Multi-Provider) ← Same features, │ + │ ┌──────────────────────────┐ different plugins │ + │ │ Anthropic, OpenAI, Ollama,│ │ + │ │ Mistral, DeepSeek, xAI │ Test THIRD │ + │ └──────────────────────────┘ │ + │ │ + │ Layer 4 (Specialized) ← Unique features │ + │ ┌──────────────────────────┐ │ + │ │ Vertex AI, Bedrock, Cloud │ │ + │ │ infra, evals, RAG, media │ Test FOURTH │ + │ └──────────────────────────┘ │ + │ │ + │ Layer 5 (Web Infra) ← Deployment, not │ + │ ┌──────────────────────────┐ model logic │ + │ │ Flask, ASGI, multi-server,│ │ + │ │ gRPC endpoints │ Test LAST │ + │ └──────────────────────────┘ │ + └─────────────────────────────────────────────────────┘ +``` + +### 16b. Ordered Test Execution Plan + +Each row below is a sample to test. Column "Features Exercised" lists the +core Genkit capabilities each sample validates. The order is designed so that +the **first failure** reveals the **most impactful bug**. + +**Usage**: `py/bin/test_sample_flows ` or `py/bin/run_sample ` + +--- + +#### Phase 1: Core Framework (no external API keys needed for some) + +These samples exercise core Genkit framework features. A bug here affects +every downstream provider. + +| # | Sample | Env Vars | Flows | Tools | Features Exercised | +|:-:|--------|----------|:-----:|:-----:|:-------------------| +| 1 | `framework-tool-interrupts` | `GEMINI_API_KEY` | 1 | 1 | **Tool interrupts** (human-in-the-loop), `ctx.interrupt()`, `tool_response()`, resume flow — directly validates W1 (error hierarchy) and W4 (tool validation) | +| 2 | `framework-context-demo` | `GEMINI_API_KEY` | 4 | 3 | **Context providers**, auth propagation, `ToolRunContext` usage — directly validates W2 (context-only tool crash) | +| 3 | `framework-dynamic-tools-demo` | `GEMINI_API_KEY` | 3 | 2 | **Dynamic tool registration**, DAP action discovery — validates registry internals | +| 4 | `framework-format-demo` | `GEMINI_API_KEY` | ~5 | 0 | **Output formats** (JSON, text, custom), structured output, format injection — validates W3 (malformed JSON) | +| 5 | `framework-prompt-demo` | `GEMINI_API_KEY` | ~3 | 0 | **Dotprompt** templates, system prompts, prompt files — validates prompt parsing | +| 6 | `framework-middleware-demo` | `GEMINI_API_KEY` | ~3 | 0 | **Action middleware**, model middleware, context middleware — validates middleware chain | +| 7 | `framework-realtime-tracing-demo` | `GEMINI_API_KEY` | ~3 | 0 | **OpenTelemetry** traces, spans, real-time trace streaming — validates W7 (span-id) and W8 (force_flush) | +| 8 | `framework-restaurant-demo` | `GEMINI_API_KEY` | ~3 | 0 | **Sessions**, multi-turn chat, state management — validates session/chat infrastructure | +| 9 | `framework-evaluator-demo` | `GEMINI_API_KEY` | N/A | N/A | **Evaluators**, custom scorers — validates evaluation infrastructure | + +#### Phase 2: Google GenAI (free tier — cheapest to test) + +The highest flow coverage with zero cost. This is the primary provider for +the Python SDK. + +| # | Sample | Env Vars | Flows | Tools | Features Exercised | +|:-:|--------|----------|:-----:|:-----:|:-------------------| +| 10 | `provider-google-genai-hello` | `GEMINI_API_KEY` | 24 | 7 | **Complete feature set**: basic, streaming, tools, structured output, vision, embeddings, code gen, multi-turn, system prompt, temperature config — exercises the most code paths | +| 11 | `provider-google-genai-code-execution` | `GEMINI_API_KEY` | ~2 | 0 | **Code execution** sandbox — exercises config forwarding | +| 12 | `provider-google-genai-context-caching` | `GEMINI_API_KEY` | ~2 | 0 | **Context caching** — exercises cache config and token optimization | +| 13 | `provider-google-genai-media-models-demo` | `GEMINI_API_KEY` | 13 | 1 | **Imagen + Veo** image/video generation — exercises multimodal output | +| 14 | `provider-google-genai-vertexai-hello` | `GOOGLE_CLOUD_PROJECT` | 15 | 3 | **Vertex AI** variant — same features but with Vertex credentials | +| 15 | `provider-google-genai-vertexai-image` | `GOOGLE_CLOUD_PROJECT` | 1 | 0 | **Vertex AI Imagen** — specialized image generation | + +#### Phase 3: Multi-Provider (validate cross-provider parity) + +Each provider should behave identically for basic/streaming/tools/structured. +A failure here that doesn't appear in Phase 2 isolates a **plugin-specific bug**. + +| # | Sample | Env Vars | Flows | Tools | Features Exercised | Unique Tests | +|:-:|--------|----------|:-----:|:-----:|:-------------------|:-------------| +| 16 | `provider-ollama-hello` | (local Ollama) | 14 | 1 | Basic, stream, tools, struct, vision, embed, RAG | **RAG flow** (unique to Ollama), local-only model, arbitrary model resolution | +| 17 | `provider-anthropic-hello` | `ANTHROPIC_API_KEY` | 15 | 1 | Basic, stream, tools, struct, vision, code, reasoning | **Prompt caching**, PDF support, extended thinking | +| 18 | `provider-compat-oai-hello` | `OPENAI_API_KEY` | 19 | 3 | Basic, stream, tools, struct, code, **TTS/STT** | **Audio** (TTS, STT), image generation — validates W5 (raw response) | +| 19 | `provider-deepseek-hello` | `DEEPSEEK_API_KEY` | 12 | 1 | Basic, stream, tools, struct, code, reasoning | **Deep reasoning** (V3/R1) | +| 20 | `provider-mistral-hello` | `MISTRAL_API_KEY` | 18 | 1 | Basic, stream, tools, struct, vision, embed, code, reasoning | **Mistral-specific** `codestral` model | +| 21 | `provider-xai-hello` | `XAI_API_KEY` | 13 | 0 | Basic, stream, tools, struct, code | Grok models, native gRPC SDK | +| 22 | `provider-huggingface-hello` | `HF_TOKEN` | 15 | 1 | Basic, stream, tools, struct, code | **HF Inference API**, multiple model architectures | +| 23 | `provider-microsoft-foundry-hello` | `AZURE_OPENAI_*` | 13 | 1 | Basic, stream, tools, vision, code | **Azure endpoints** — validates W12 (MCP/init errors) | +| 24 | `provider-cohere-hello` | `COHERE_API_KEY` | 15 | 1 | Basic, stream, tools, struct, code | **Cohere** rerank, embeddings (if present) | +| 25 | `provider-cloudflare-workers-ai-hello` | `CLOUDFLARE_*` | ~5 | 0 | Stream, tools, vision, embed, code | **Cloudflare Workers AI** — edge inference | + +#### Phase 4: Specialized Infrastructure + +These test provider-specific infrastructure (vector search, evals, RAG). + +| # | Sample | Env Vars | Flows | Features Exercised | +|:-:|--------|----------|:-----:|:-------------------| +| 26 | `dev-local-vectorstore-hello` | `GOOGLE_CLOUD_PROJECT` | 2 | **Local vector store**, document indexing, retrieval | +| 27 | `provider-vertex-ai-model-garden` | `GOOGLE_CLOUD_PROJECT` | 11 | **Model Garden** (Llama, Claude on Vertex), cross-model tool calling | +| 28 | `provider-vertex-ai-rerank-eval` | `GOOGLE_CLOUD_PROJECT` | 7 | **Reranking**, evaluation flows, quality scoring | +| 29 | `provider-vertex-ai-vector-search-firestore` | `GOOGLE_CLOUD_PROJECT` | 1 | **Firestore vector search** integration | +| 30 | `provider-vertex-ai-vector-search-bigquery` | `GOOGLE_CLOUD_PROJECT` | 2 | **BigQuery vector search** integration | +| 31 | `provider-firestore-retriever` | `GOOGLE_CLOUD_PROJECT` | ~2 | **Firestore retriever** plugin | +| 32 | `provider-observability-hello` | `GEMINI_API_KEY` | 1 | **Custom observability** plugin | + +#### Phase 5: Web Framework Integration + +These test deployment infrastructure, not model logic. Bugs here are +isolated to serving layer. + +| # | Sample | Env Vars | Flows | Features Exercised | +|:-:|--------|----------|:-----:|:-------------------| +| 33 | `web-flask-hello` | `GEMINI_API_KEY` | 1 | **Flask** integration, context providers, `genkit_flask_handler` | +| 34 | `web-short-n-long` | `GEMINI_API_KEY` | 14 | **ASGI deployment** (`create_flows_asgi_app`), tools, interrupts, embeddings, image gen, system prompts, multi-turn, streaming | +| 35 | `web-endpoints-hello` | `GEMINI_API_KEY` | 8 | **Production ASGI** (FastAPI/Litestar/Quart), gRPC, rate limiting, circuit breaker, security headers, caching | +| 36 | `web-multi-server` | `GEMINI_API_KEY` | 1 | **Multi-server** architecture, `ServerManager`, multiple ASGI apps | + +### 16c. Feature Coverage Matrix by Phase + +| Feature | Phase 1 | Phase 2 | Phase 3 | Phase 4 | Phase 5 | +|---------|:-------:|:-------:|:-------:|:-------:|:-------:| +| `@ai.flow()` basic | ✅ | ✅ | ✅ | ✅ | ✅ | +| `@ai.tool()` basic | ✅ | ✅ | ✅ | — | ✅ | +| Streaming | ✅ | ✅ | ✅ | — | ✅ | +| Structured output | ✅ | ✅ | ✅ | — | ✅ | +| Tool interrupts | ✅ | — | — | — | ✅ | +| `ToolRunContext` | ✅ | — | — | — | — | +| Context providers | ✅ | — | — | — | ✅ | +| Dynamic tools (DAP) | ✅ | — | — | — | — | +| Dotprompt | ✅ | — | — | — | — | +| Middleware | ✅ | — | — | — | — | +| OpenTelemetry | ✅ | — | — | — | ✅ | +| Sessions | ✅ | — | — | — | — | +| Evaluators | ✅ | — | — | ✅ | — | +| Vision/multimodal | — | ✅ | ✅ | — | — | +| Embeddings | — | ✅ | ✅ | ✅ | ✅ | +| Code execution | — | ✅ | ✅ | — | — | +| TTS/STT audio | — | — | ✅ | — | — | +| Image generation | — | ✅ | ✅ | — | ✅ | +| RAG/retrieval | — | — | ✅ | ✅ | — | +| Reranking | — | — | — | ✅ | — | +| Vector search | — | — | — | ✅ | — | +| Multi-turn chat | ✅ | ✅ | — | — | ✅ | +| System prompts | ✅ | ✅ | — | — | ✅ | +| ASGI deployment | — | — | — | — | ✅ | +| Flask deployment | — | — | — | — | ✅ | +| gRPC endpoints | — | — | — | — | ✅ | +| Rate limiting | — | — | — | — | ✅ | +| Circuit breaker | — | — | — | — | ✅ | + +### 16d. Quick-Start Commands + +```bash +# Run all Phase 1 (core framework) — no API cost, fastest +for s in framework-tool-interrupts framework-context-demo \ + framework-dynamic-tools-demo framework-format-demo \ + framework-prompt-demo framework-middleware-demo; do + py/bin/test_sample_flows "$s" +done + +# Run Phase 2 (Google GenAI) — free tier +for s in provider-google-genai-hello \ + provider-google-genai-code-execution \ + provider-google-genai-media-models-demo; do + py/bin/test_sample_flows "$s" +done + +# Run ALL phases (full regression) +py/bin/test_sample_flows # interactive mode with fzf +``` + +### 16e. Expected Bug Detection by Phase + +| Phase | Estimated Bug Yield | Bugs Caught | +|:-----:|:-------------------:|-------------| +| **1** | ~60% of total | W1 (error hierarchy), W2 (context-only tool), W3 (malformed JSON), W4 (tool validation), W7 (span-id), W8 (force_flush), session bugs, middleware bugs | +| **2** | ~15% of total | Schema regression (W9), config forwarding, multimodal output, generation request construction | +| **3** | ~15% of total | Plugin-specific: W5 (compat-oai raw response), provider schema handling, streaming parity, tool name escaping | +| **4** | ~5% of total | Vector search, retrieval, reranking, eval infrastructure | +| **5** | ~5% of total | ASGI/Flask serving, security middleware, gRPC, rate limiting | + +### 16f. Environment Variable Quick Reference + +| Env Var | Used By | How to Get | +|---------|---------|------------| +| `GEMINI_API_KEY` | All `framework-*`, `provider-google-genai-*`, all `web-*` | [Google AI Studio](https://aistudio.google.com/apikey) (free) | +| `GOOGLE_CLOUD_PROJECT` | `provider-google-genai-vertexai-*`, `provider-vertex-ai-*`, `dev-local-*`, `provider-firestore-*` | [Google Cloud Console](https://console.cloud.google.com) | +| `ANTHROPIC_API_KEY` | `provider-anthropic-hello` | [Anthropic Console](https://console.anthropic.com) | +| `OPENAI_API_KEY` | `provider-compat-oai-hello` | [OpenAI Platform](https://platform.openai.com/api-keys) | +| `DEEPSEEK_API_KEY` | `provider-deepseek-hello` | [DeepSeek Platform](https://platform.deepseek.com) | +| `MISTRAL_API_KEY` | `provider-mistral-hello` | [Mistral Console](https://console.mistral.ai) | +| `XAI_API_KEY` | `provider-xai-hello` | [xAI Console](https://console.x.ai) | +| `HF_TOKEN` | `provider-huggingface-hello` | [Hugging Face](https://huggingface.co/settings/tokens) | +| `AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_ENDPOINT` | `provider-microsoft-foundry-hello` | [Azure Portal](https://portal.azure.com) | +| `COHERE_API_KEY` | `provider-cohere-hello` | [Cohere Dashboard](https://dashboard.cohere.com) | +| `CLOUDFLARE_ACCOUNT_ID` + `CLOUDFLARE_API_TOKEN` | `provider-cloudflare-workers-ai-hello` | [Cloudflare Dashboard](https://dash.cloudflare.com) | +| (none — local Ollama) | `provider-ollama-hello` | `ollama serve` locally |