Skip to content

feat: project finish_reason from LlmResponse to attributes in BigQueryAgentAnalyticsPlugin LLM_RESPONSE events #5644

@roanny

Description

@roanny

Summary

BigQueryAgentAnalyticsPlugin does not project the Vertex AI finish_reason field to the attributes column of LLM_RESPONSE events written to BigQuery. This makes it impossible to classify model failure modes (MAX_TOKENS, SAFETY, MALFORMED_FUNCTION_CALL, RECITATION, etc.) via SQL queries against the analytics table — operators must instead parse unstructured Cloud Logging output to recover this signal.

Adding the field would be a small change (~5 lines) with significant observability value for any project using the plugin.

Current behavior (verified empirically 2026-05-08)

SELECT
  JSON_VALUE(attributes,'$.finishReason')                AS camel,
  JSON_VALUE(attributes,'$.finish_reason')               AS snake,
  JSON_VALUE(attributes,'$.usage_metadata.finish_reason') AS nested,
  COUNT(*) AS row_count
FROM `<project>.<dataset>.events`
WHERE event_type = 'LLM_RESPONSE'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY camel, snake, nested
ORDER BY row_count DESC;

Result against a 7-day window in production:

camel snake nested row_count
NULL NULL NULL 309

finish_reason is not present in any path (camelCase, snake_case, or nested under usage_metadata). Source inspection confirms why: in google/adk/plugins/bigquery_agent_analytics_plugin.py, _EVENT_VIEW_DEFS[\"LLM_RESPONSE\"] (around lines 1813–1846 in 1.32.0) extracts only response, usage_*_tokens, cached_content_token_count, context_cache_hit_rate, total_ms, ttft_ms, model_version, usage_metadata, cache_metadata. There are zero references to finish_reason / finishReason anywhere in the 3500+ lines of the plugin file. The EventData dataclass and after_model_callback likewise do not capture it, even though the LlmResponse parameter the callback receives includes it from Vertex.

Use case / motivation

  1. Failure mode breakdown via SQL: today, distinguishing "this LLM call ended with MAX_TOKENS" from "this LLM call ended with MALFORMED_FUNCTION_CALL" via the analytics table is impossible. The information lives only in unstructured stderr logs, which are not joinable to the structured event stream.

  2. Cache hit ratio segmented by finish_reason: the existing cached_content_token_count projection enables computing cache hit ratio (we use this heavily). But understanding the quality of the cached responses (did the cached prefix lead to clean STOPs or to MALFORMED retries?) requires the finish_reason dimension.

  3. Model migration health checks: when migrating between model families (e.g., Gemini 2.5 → 3.x), SRE-grade observability needs to compare MALFORMED rate, SAFETY rate, etc., across revisions side by side. Without finish_reason in BQ, the comparison must be reconstructed from logs — slow, brittle, and not amenable to alert policies.

  4. Alerting on regression: a Cloud Monitoring alert like "MALFORMED_FUNCTION_CALL rate > 5% sustained over 1h" is straightforward to write against the BQ events table if finish_reason is projected. Without it, the same alert requires a log-based metric pipeline (more moving parts, more cost).

Proposed solution

Add finish_reason (and optionally finish_message) to the LLM_RESPONSE view definition. The field is already present on LlmResponse from Vertex; the plugin only needs to read and project it. Roughly (against 1.32 source structure):

# In _EVENT_VIEW_DEFS[\"LLM_RESPONSE\"]:
\"finish_reason\": lambda llm_response: (
    llm_response.finish_reason.name
    if llm_response.finish_reason is not None
    else None
),
\"finish_message\": lambda llm_response: llm_response.finish_message,

Backwards-compatible: existing consumers reading the documented fields are unaffected; new consumers can opt in via JSON_VALUE(attributes,'$.finish_reason').

If a different snake_case vs camelCase convention is preferred to match other ADK projections, happy to adjust — the empirical probe checked both spellings to be safe.

Alternatives considered

  • after_model_callback writing to state_delta: feasible as a downstream workaround, but every consumer of the plugin needs to re-implement it, and it bloats the event stream with one extra row per LLM call.
  • Custom plugin subclass overriding LLM_RESPONSE EventData: surgical but creates maintenance burden when the upstream plugin schema evolves.
  • Cloud Logging sink → BigQuery via Logs Router: works but introduces a second pipeline (sink config, log-based filters, parsing) for data that is already flowing through the analytics plugin one level up.

All three alternatives are strictly more code and more complexity than projecting the field in the plugin where the data already lives.

Environment

  • google-adk 1.32.0
  • Python 3.12
  • Vertex AI backend (`vertexai=True`)
  • Plugin: BigQueryAgentAnalyticsPlugin (default config, single dataset, single events table)
  • Models tested: gemini-2.5-flash

Happy to send a PR if the team agrees this is in scope. Thanks for the great library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions