Skip to content

Feature request: hook point for eager tool dispatch (overlap tool execution with model streaming) #3404

@bmd1905

Description

@bmd1905

Please read this first

  • Have you read the docs? Yes — Agents SDK docs.
  • Have you searched for related issues? Yes — couldn't find an existing request for overlapping tool execution with model streaming.

Describe the feature

Proposal: add a hook (or extension point) in the Agents SDK runner that lets a tool execute the moment its tool_use block finishes streaming, while the rest of the assistant response is still being generated. This pattern — eager tool calling — is strictly faster than today's parallel tool dispatch (which waits for message_stop before firing any tool) and produces identical model outputs.

Mechanism

The model's streaming response emits tool_use blocks sequentially. As soon as a block-stop event lands, the block is sealed, its arguments are JSON-parseable, and the tool can run. Today's runners wait until message_stop to fire any tool, leaving the model's stream and tool execution sequential when they could overlap.

stream + max(tool)max(stream, max(tool)).

Code shape (in our reference implementation):

# Provider-agnostic core watches the stream, seals each tool_use block
# the moment its tool_call_id changes, and dispatches to the executor.

class EagerRunner:
    async def stream_and_dispatch(self, response_stream):
        async for event in response_stream:
            if event.type == "tool_use_block_stop":
                # block sealed; arguments are parseable
                self.executor.submit(self.run_tool(event.block))
            yield event
        # results joined back into the agent loop here

Numbers

  • Synthetic harness (16 workloads, 3–15 tools, deterministic): 1.20×–1.50× over parallel dispatch, median ~1.28×
  • CloudThinker production (real workloads, real provider jitter): ~50% median latency cut

Full benchmark: https://github.com/cloudthinker-ai/eager-tools/blob/main/bench/results.md
Blog (production story): https://cloudthinker.io/blogs/eager-tool-calling-50-percent-faster-agents

What we have today

We've open-sourced eager-tools (MIT) with:

  • A provider-agnostic core, including an OpenAI stream adapter that detects per-block seal events
  • A one-line LangGraph middleware
  • Benchmark harness, full mechanism doc, HITL / idempotent safety hooks

What we don't have yet — and where we'd value your input

We do not have a native Agents SDK adapter. The cleanest integration would be a hook in the streaming path that:

  1. Receives per-block stream events
  2. Returns a coroutine/Future for tool execution
  3. Joins results back into the runner without changing the public Tool / Agent contract

Is there an existing extension point we should target, or would a new hook be welcome? Happy to:

  • Open a draft PR with a proposed shape
  • Write up an RFC with the integration design
  • Iterate based on whatever interface the team prefers

Safety note

Our implementation includes a gate callable for non-idempotent tools (case-by-case HITL with parsed args visible) and an idempotent: bool flag for blanket safety. "When NOT to use it" doc (sub-50ms tools, sequentially dependent tools, non-idempotent tools, non-streaming backends): https://github.com/cloudthinker-ai/eager-tools/blob/main/docs/when-not-to-use.md

Thanks for the SDK — it's been great to build on.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions