Skip to content

Bug: rubric_based_final_response_quality_v1 passes empty developer_instructions to judge when agent makes zero tool calls #5593

@cthurston-clgx

Description

@cthurston-clgx

🔴 Required Information

Describe the Bug:

The rubric_based_final_response_quality_v1 evaluator fails to populate the <developer_instructions> section of the judge prompt when the agent's intermediate_data.invocation_events list is empty. This causes the LLM judge to receive an empty system prompt context, making it impossible to evaluate rubrics that reference the agent's developer instructions (system prompt).

The bug is in src/google/adk/evaluation/rubric_based_final_response_quality_v1.py (Lines 284–300):

developer_instructions = ""
# ...
app_details = actual_invocation.app_details
if app_details:
  if (
      isinstance(actual_invocation.intermediate_data, InvocationEvents)
      and actual_invocation.intermediate_data.invocation_events  # <-- BUG: False when list is empty
  ):
    developer_instructions = app_details.get_developer_instructions(
        agent_name=actual_invocation.intermediate_data.invocation_events[0].author
    )
  tool_declarations = get_tool_declarations_as_json_str(app_details)

developer_instructions is only populated when invocation_events is non-empty because it uses invocation_events[0].author to look up the agent name. When the agent correctly makes zero tool calls (e.g., declining an out-of-scope request), the list is empty, the condition is False, and the judge receives an empty <developer_instructions> block.

Steps to Reproduce:

  1. Create an agent with developer instructions that define scope boundaries (e.g., "only answer questions about topic X; decline everything else")
  2. Create an eval case where the user asks an out-of-scope question
  3. The agent correctly declines without calling any tools → invocation_events is []
  4. Add a rubric_based_final_response_quality_v1 rubric that references the developer instructions, e.g.:

    "The agent's developer instructions explicitly state that this type of request is out-of-scope. Score YES if the agent declined without calling tools."

  5. Run the evaluation
  6. The judge receives empty <developer_instructions> and scores the rubric as failing despite the agent behaving correctly

Expected Behavior:

The developer_instructions should be populated from app_details regardless of whether invocation_events is empty. The agent name could be resolved via a fallback (e.g., the first/root agent name from app_details.agent_details).

The judge should receive the full system prompt in <developer_instructions> so it can evaluate rubrics that reference scope definitions, behavioral rules, or other instructions.

Observed Behavior:

The judge receives an empty <developer_instructions> block and responds:

"In the provided user_prompt, the <developer_instructions> are empty, and therefore do not explicitly state this limitation. [...] Since the condition for the request being 'out-of-scope' (as defined by this property) is not met by the provided user_prompt, the agent's decline is not considered 'correct' according to the property's criteria."

The rubric scores 0.0 even though:

  • The agent's actual system prompt does explicitly define the scope
  • The agent did behave correctly (declined without calling tools)
  • The companion rubric_based_tool_use_quality_v1 metric passes (score 1.0) for the same invocation

Environment Details:

  • ADK Library Version (pip show google-adk): 1.26.0 (But should be present in latest)
  • Desktop OS: macOS
  • Python Version (python -V): 3.13

Model Information:

  • Are you using LiteLLM: Yes
  • Which model is being used: gemini-2.5-flash (as the agent under test and as the judge model)

🟡 Optional Information

Regression:

Unknown — this appears to be a logic oversight present since the rubric_based_final_response_quality_v1 evaluator was introduced. The condition likely exists because invocation_events[0].author is used to determine which agent's instructions to retrieve, with no fallback path for the zero-tool-call case.

Logs:

The judge's rationale from the evaluation report:

rubric_id: out_of_scope_response
score: 0.0
rationale: The property defines a request as "out-of-scope" if "The agent's developer
instructions (system prompt) explicitly state that [topic X] is out-of-scope". In the
provided `user_prompt`, the `<developer_instructions>` are empty, and therefore do not
explicitly state this limitation. Since the condition for the request being "out-of-scope"
(as defined by this property) is not met by the provided `user_prompt`, the agent's
decline is not considered "correct" according to the property's criteria.

Meanwhile the intermediate_data confirms zero tool calls:

"intermediate_data": {
  "invocation_events": []
}

And app_details.agent_details does contain the agent's instructions with explicit scope definitions.

Screenshots / Video:

N/A

Additional Context:

The rubric_based_tool_use_quality_v1 evaluator is not affected by this bug — it doesn't pass developer_instructions to the judge at all (it only passes tool_declarations). This creates an inconsistency where the tool-use rubric passes but the response-quality rubric fails for the exact same correct behavior.

Suggested Fix:

developer_instructions = ""
tool_declarations = "Agent has no tools."
response_steps = get_tool_calls_and_responses_as_json_str(
    actual_invocation.intermediate_data
)

app_details = actual_invocation.app_details
if app_details:
  # Determine agent name from invocation events if available,
  # otherwise fall back to the first (root) agent in app_details
  agent_name = None
  if (
      isinstance(actual_invocation.intermediate_data, InvocationEvents)
      and actual_invocation.intermediate_data.invocation_events
  ):
    agent_name = actual_invocation.intermediate_data.invocation_events[0].author
  elif app_details.agent_details:
    agent_name = next(iter(app_details.agent_details))

  if agent_name:
    developer_instructions = app_details.get_developer_instructions(
        agent_name=agent_name
    )
  tool_declarations = get_tool_declarations_as_json_str(app_details)

Minimal Reproduction Code:

from google.adk.evaluation import EvalCase, Invocation, InvocationEvents
from google.adk.evaluation.rubric_based_final_response_quality_v1 import (
    RubricBasedFinalResponseQualityV1,
)
from google.adk.evaluation.app_details import AppDetails, AgentDetails
from google.genai import types as genai_types

# Agent with explicit scope instructions
app_details = AppDetails(
    agent_details={
        "my_agent": AgentDetails(
            name="my_agent",
            instructions="You are a cooking assistant. Only answer questions about recipes and cooking. Decline all other requests as out-of-scope.",
            tool_declarations=[],
        )
    }
)

# Invocation where agent made ZERO tool calls (correctly declined out-of-scope request)
invocation = Invocation(
    user_content=genai_types.Content(
        parts=[genai_types.Part(text="What is the capital of France?")],
        role="user",
    ),
    final_response=genai_types.Content(
        parts=[genai_types.Part(text="I can only help with cooking and recipes.")],
        role="model",
    ),
    intermediate_data=InvocationEvents(invocation_events=[]),  # <-- empty!
    app_details=app_details,
)

# This rubric references developer instructions — but the judge will see them as empty
evaluator = RubricBasedFinalResponseQualityV1(
    rubrics=[{
        "rubric_id": "scope_check",
        "rubric_content": {
            "text_property": "The developer instructions define scope. Score YES if agent declined correctly."
        },
    }],
    judge_model="gemini-2.5-flash",
)

# BUG: evaluator passes developer_instructions="" to judge
result = evaluator.evaluate(invocation)
# Judge scores 0.0 because it can't see the instructions

How often has this issue occurred?:

  • Always (100%) — reproduces every time the agent makes zero tool calls and rubric_based_final_response_quality_v1 is used with rubrics referencing developer instructions.

Metadata

Metadata

Assignees

Labels

eval[Component] This issue is related to evaluationrequest clarification[Status] The maintainer need clarification or more information from the author

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions