Skip to content

Add a plugin hook to observe the final LlmRequest immediately before generate_content_async #6222

Description

@haiyuan-eng-google

🔴 Required Information

Is your feature request related to a specific problem?

Plugin before_model_callbacks run before the agent's own before_model_callbacks and before ADK's post-callback request finalization. As a result, observability plugins that snapshot the LlmRequest in before_model_callback capture the request too early — they cannot observe the request that is actually sent to the model. There is currently no plugin hook positioned after all callbacks/finalization but immediately before the model call.

This surfaced in #4202 (discussion comment): the BigQueryAgentAnalyticsPlugin's LLM_REQUEST event does not reflect context modifications made in a user's before_model_callback.

Current behavior (flows/llm_flows/base_llm_flow.py):

  • _handle_before_model_callback runs plugin before_model_callbacks first, then agent.canonical_before_model_callbacks.
  • After the callbacks return, _call_llm_async finalizes the request (e.g. injects config.labels, including adk_agent_name) and only then calls llm.generate_content_async(llm_request).

So a plugin snapshotting the request in before_model_callback captures:

  • ✅ Everything assembled in _preprocess_async — request processors, system instruction, contents, tools.
  • Misses anything mutated afterward: the agent's before_model_callback, any plugin registered after this one, and ADK's own config.labels injection.

The LlmRequest is mutated in place through this whole chain, so the gap is purely about when a plugin is allowed to observe it — there is no hook at the actual send point.

Describe the Solution You'd Like

Add an additive plugin lifecycle hook that fires in _call_llm_async after all before_model_callbacks and request finalization, immediately before llm.generate_content_async, receiving the final LlmRequest. For example:

async def on_model_request_callback(
    self, *, callback_context: CallbackContext, llm_request: LlmRequest
) -> Optional[LlmResponse]:
    """Called with the final LlmRequest, right before it is sent to the model."""
    ...

This is correct by construction: it observes exactly what is sent, at the right time, and preserves LLM_REQUESTLLM_RESPONSE event ordering and span/latency semantics. It is backward-compatible (no-op for existing plugins) and benefits every observability consumer, not just BigQueryAgentAnalyticsPlugin.

Impact on your work

Any plugin doing request-level observability/auditing (BigQueryAgentAnalyticsPlugin, logging_plugin, custom plugins) currently logs a request that can differ from what the model received. This makes the logged prompt/context untrustworthy for debugging, auditing, and eval/replay — exactly the use cases these plugins exist for. Without a shared hook, each integration has to re-implement its own fragile workaround.

Willingness to contribute

Yes — happy to help with the design and/or a PR.


🟡 Recommended Information

Describe Alternatives You've Considered

Plugin-local workaround (not recommended long-term): a plugin can stash the llm_request reference in before_model_callback and serialize it in after_model_callback / on_model_error_callback, relying on in-place mutation to read the final state. Downsides: the LLM_REQUEST event lands after the response; it needs per-in-flight-call keying; and short-circuited calls (a callback returning an LlmResponse) never reach after_model_callback, so nothing is logged in that case.

Expose the final llm_request to after_model_callback: simpler, but it keeps the timing wrong (request observed after the response). A dedicated pre-send hook keeps event ordering correct.

Proposed API / Implementation

In _call_llm_async, after callbacks + label injection and immediately before the model call:

# after _handle_before_model_callback(...) and config.labels finalization
await invocation_context.plugin_manager.run_on_model_request_callback(
    callback_context=callback_context,
    llm_request=llm_request,
)
# ... then:
llm = self.__get_llm(invocation_context)
async for llm_response in llm.generate_content_async(llm_request, ...):
    ...

BigQueryAgentAnalyticsPlugin would then emit its LLM_REQUEST event from on_model_request_callback instead of before_model_callback.

Additional Context

Metadata

Metadata

Assignees

Labels

core[Component] This issue is related to the core interface and implementation
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions