Please read this first
- Have you read the docs? Yes — Agents SDK docs.
- Have you searched for related issues? Yes — couldn't find an existing request for overlapping tool execution with model streaming.
Describe the feature
Proposal: add a hook (or extension point) in the Agents SDK runner that lets a tool execute the moment its tool_use block finishes streaming, while the rest of the assistant response is still being generated. This pattern — eager tool calling — is strictly faster than today's parallel tool dispatch (which waits for message_stop before firing any tool) and produces identical model outputs.
Mechanism
The model's streaming response emits tool_use blocks sequentially. As soon as a block-stop event lands, the block is sealed, its arguments are JSON-parseable, and the tool can run. Today's runners wait until message_stop to fire any tool, leaving the model's stream and tool execution sequential when they could overlap.
stream + max(tool) → max(stream, max(tool)).
Code shape (in our reference implementation):
# Provider-agnostic core watches the stream, seals each tool_use block
# the moment its tool_call_id changes, and dispatches to the executor.
class EagerRunner:
async def stream_and_dispatch(self, response_stream):
async for event in response_stream:
if event.type == "tool_use_block_stop":
# block sealed; arguments are parseable
self.executor.submit(self.run_tool(event.block))
yield event
# results joined back into the agent loop here
Numbers
- Synthetic harness (16 workloads, 3–15 tools, deterministic): 1.20×–1.50× over parallel dispatch, median ~1.28×
- CloudThinker production (real workloads, real provider jitter): ~50% median latency cut
Full benchmark: https://github.com/cloudthinker-ai/eager-tools/blob/main/bench/results.md
Blog (production story): https://cloudthinker.io/blogs/eager-tool-calling-50-percent-faster-agents
What we have today
We've open-sourced eager-tools (MIT) with:
- A provider-agnostic core, including an OpenAI stream adapter that detects per-block seal events
- A one-line LangGraph middleware
- Benchmark harness, full mechanism doc, HITL /
idempotent safety hooks
What we don't have yet — and where we'd value your input
We do not have a native Agents SDK adapter. The cleanest integration would be a hook in the streaming path that:
- Receives per-block stream events
- Returns a coroutine/Future for tool execution
- Joins results back into the runner without changing the public
Tool / Agent contract
Is there an existing extension point we should target, or would a new hook be welcome? Happy to:
- Open a draft PR with a proposed shape
- Write up an RFC with the integration design
- Iterate based on whatever interface the team prefers
Safety note
Our implementation includes a gate callable for non-idempotent tools (case-by-case HITL with parsed args visible) and an idempotent: bool flag for blanket safety. "When NOT to use it" doc (sub-50ms tools, sequentially dependent tools, non-idempotent tools, non-streaming backends): https://github.com/cloudthinker-ai/eager-tools/blob/main/docs/when-not-to-use.md
Thanks for the SDK — it's been great to build on.
Please read this first
Describe the feature
Proposal: add a hook (or extension point) in the Agents SDK runner that lets a tool execute the moment its
tool_useblock finishes streaming, while the rest of the assistant response is still being generated. This pattern — eager tool calling — is strictly faster than today's parallel tool dispatch (which waits formessage_stopbefore firing any tool) and produces identical model outputs.Mechanism
The model's streaming response emits
tool_useblocks sequentially. As soon as a block-stop event lands, the block is sealed, its arguments are JSON-parseable, and the tool can run. Today's runners wait untilmessage_stopto fire any tool, leaving the model's stream and tool execution sequential when they could overlap.stream + max(tool)→max(stream, max(tool)).Code shape (in our reference implementation):
Numbers
Full benchmark: https://github.com/cloudthinker-ai/eager-tools/blob/main/bench/results.md
Blog (production story): https://cloudthinker.io/blogs/eager-tool-calling-50-percent-faster-agents
What we have today
We've open-sourced eager-tools (MIT) with:
idempotentsafety hooksWhat we don't have yet — and where we'd value your input
We do not have a native Agents SDK adapter. The cleanest integration would be a hook in the streaming path that:
Tool/AgentcontractIs there an existing extension point we should target, or would a new hook be welcome? Happy to:
Safety note
Our implementation includes a
gatecallable for non-idempotent tools (case-by-case HITL with parsed args visible) and anidempotent: boolflag for blanket safety. "When NOT to use it" doc (sub-50ms tools, sequentially dependent tools, non-idempotent tools, non-streaming backends): https://github.com/cloudthinker-ai/eager-tools/blob/main/docs/when-not-to-use.mdThanks for the SDK — it's been great to build on.