Skip to content

Add around-call plugin hooks for durable model/tool execution #6224

Description

@strickvl

Use case

We are adding Google ADK support to Kitaru, an OSS durable execution/checkpointing runtime for AI workflows.

For ADK, we can already support two useful paths:

  1. Whole-runner checkpointing: wrap an ADK runner invocation as one durable operation.
  2. Explicit model/tool checkpointing: users pass wrapped ADK objects such as KitaruADKModel and KitaruADKTool, and ADK calls those wrapped objects directly.

The second path works because the actual model/tool operation happens inside the wrapper method:

ADK runner
  -> ADK LlmAgent
  -> KitaruADKModel.generate_content_async(...)
       -> Kitaru checkpoint
       -> real model.generate_content_async(...)

ADK runner
  -> ADK tool dispatch
  -> KitaruADKTool.run_async(...)
       -> Kitaru checkpoint
       -> real tool.run_async(...)

That gives replay-safe behavior: if a process crashes after a model/tool call completes, Kitaru can replay from the stored checkpoint instead of repeating the provider call or tool side effect.

What is missing

For arbitrary ADK runners, we do not see a public plugin hook that lets a plugin wrap the actual model/tool call body.

The current plugin/callback shape appears to be before/after/error:

before_model_callback(callback_context, llm_request)
ADK calls the model
after_model_callback(callback_context, llm_response)

and similarly for tools:

before_tool_callback(tool, tool_args, tool_context)
ADK runs the tool
after_tool_callback(tool, tool_args, tool_context, result)

Those hooks are useful for logging, guardrails, short-circuiting, and response modification. But they do not let a durability runtime run the actual provider/tool call inside its own checkpoint.

The failure case is:

1. before_model_callback runs.
2. It returns None, so ADK calls the provider.
3. The provider returns successfully.
4. The process crashes before after_model_callback persists the response.
5. Replay calls the provider again.

For a durability system, that means duplicate model cost or duplicated tool side effects.

Returning a cached response from before_model_callback helps when the cache already exists, but it does not solve the first successful uncached call unless the call itself can happen inside the durable checkpoint.

Requested API shape

Would ADK consider adding middleware-style around-call plugin hooks that receive both the call input and a continuation/proceed callable?

For example, model calls:

async def wrap_model_call(
    self,
    *,
    callback_context: CallbackContext,
    llm_request: LlmRequest,
    proceed: Callable[[LlmRequest], Awaitable[LlmResponse]],
) -> LlmResponse:
    ...

Tool calls:

async def wrap_tool_call(
    self,
    *,
    tool: BaseTool,
    tool_args: dict[str, Any],
    tool_context: ToolContext,
    proceed: Callable[[], Awaitable[dict]],
) -> dict:
    ...

Then a durability plugin can do this:

open checkpoint
  -> call proceed(...)
  -> persist returned response/result as checkpoint output
return response/result to ADK

This differs from before/after callbacks because the plugin controls when the side effect happens and can place that side effect inside its own durable boundary.

Why this matters

This would enable replay-safe integrations for:

  • durable model-call checkpointing
  • durable tool-call checkpointing
  • exact retry/replay behavior after process crashes
  • provider-cost deduplication after successful calls
  • avoiding duplicated tool side effects when tools are idempotent only at the runtime/checkpoint layer

It would also let ADK support durable execution systems without requiring users to manually wrap every model/tool object before constructing their agents.

Current workaround

The current safe workaround is explicit wrapping:

user passes wrapped model/tool objects into ADK
ADK calls the wrappers
wrappers checkpoint their own method bodies

That works, but it is opt-in per model/tool and does not give a runner-level integration for arbitrary ADK runners.

We intentionally do not treat before/after metadata callbacks as replay-safe checkpointing, because that would make users think provider/tool side effects are protected when they may still be repeated on replay.

Questions

  1. Is there already a public around-call hook or continuation-style plugin API that we missed?
  2. If not, would the ADK team be open to a wrap_model_call / wrap_tool_call style plugin API?
  3. Are there constraints in ADK's runner/model/tool execution model that would make this unsafe or hard to support?

Metadata

Metadata

Assignees

Labels

core[Component] This issue is related to the core interface and implementation
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions