Skip to content

Text accumulation issue for output_key #5590

@vuth-seb

Description

@vuth-seb

🔴 Required Information

Describe the Bug:

When using LlmAgent with output_key parameter and StreamingMode.SSE, text streamed before tool calls is lost. Only text that streams after the final tool execution is saved to the output_key state. This results in 60-70% of agent responses being discarded in production scenarios where agents make tool calls.

Steps to Reproduce:

  1. Install pip install google-adk==1.32.0
  2. Run the reproduction code below
  3. Observe that only text after the final tool call is saved

Minimal Reproduction Code:

#!/usr/bin/env python3
"""
Minimal reproduction of ADK text accumulation bug.
When an agent streams text before tool calls, that text is lost from output_key.
"""

import asyncio
import json
import sys
from pathlib import Path

from dotenv import load_dotenv
load_dotenv()

from google.adk.agents import LlmAgent, RunConfig
from google.adk.agents.run_config import StreamingMode
from google.adk.sessions.in_memory_session_service import InMemorySessionService
from google.adk.runners.in_memory_runner import InMemoryRunner
from google.adk.tools.agent_tool import AgentTool

MODEL = "gemini-2.5-pro"


def create_chart_generator_agent() -> LlmAgent:
    """Create a simple agent that generates chart specifications."""
    return LlmAgent(
        model=MODEL,
        name="chart_generator_agent",
        instruction=(
            "Generate a simple Vega-Lite chart specification. "
            "Create a JSON with 5 random data points for a bar chart."
        ),
        output_key="chart_result",
        response_mime_type="application/json",
    )


async def test_text_accumulation():
    """Test that text before tool calls is preserved in output_key."""
    
    # Create chart generator and wrap as tool
    chart_generator_agent = create_chart_generator_agent()
    chart_tool = AgentTool(chart_generator_agent)

    # Create main agent with output_key (triggers the bug)
    agent = LlmAgent(
        model=MODEL,
        name="chart_test_agent",
        instruction=(
            "You will create 2 chart specifications. Follow this EXACT flow:\n"
            "1. Write 2 sentences explaining what you'll do (introduction)\n"
            "2. Call chart_generator_agent tool\n"
            "3. Write 1 sentence about progress\n"
            "4. Call chart_generator_agent tool again\n"
            "5. Write 2 sentences summarizing (conclusion)\n"
            "Keep each text section distinct and clear."
        ),
        tools=[chart_tool],
        output_key="final_output",  # BUG: Only saves text after final tool call
    )

    # Setup runner with streaming
    session_service = InMemorySessionService()
    runner = InMemoryRunner(agent=agent, session_service=session_service)
    
    session = await session_service.create_session(
        app_name="test_app",
        user_id="test_user"
    )

    # Track all text parts as they stream
    accumulated_text_parts = []
    tool_calls = []
    
    user_message = "Create 2 chart specifications with random data"

    print("Testing Text Accumulation Across Tool Calls")
    print("=" * 80)
    print()

    # Run with streaming enabled
    async for event in runner.run_async(
        user_id="test_user",
        session_id=session.id,
        new_message=user_message,
        run_config=RunConfig(streaming_mode=StreamingMode.SSE),
    ):
        for candidate in getattr(event, 'candidates', []):
            for part in getattr(candidate.content, 'parts', []):
                # Skip thoughts
                if getattr(part, 'thought', False):
                    continue
                
                # Collect text parts
                if hasattr(part, 'text') and part.text:
                    accumulated_text_parts.append(part.text)
                    print(f"  Text: {part.text[:80]}...")
                
                # Track tool calls
                if hasattr(part, 'function_call') and part.function_call:
                    tool_calls.append(part.function_call.name)
                    print(f"  Tool: {part.function_call.name}")

    print()
    print("=" * 80)

    # Get the final state
    final_session = await session_service.get_session(
        app_name="test_app", user_id="test_user", session_id=session.id
    )
    final_output = final_session.state.get("final_output", "")

    # Combine all text parts (what we expect)
    expected_combined = "".join(accumulated_text_parts)

    # Show results
    print(f"Expected: {len(expected_combined)} chars ({len(accumulated_text_parts)} text parts)")
    print(f"Actual:   {len(final_output)} chars (from output_key)")
    
    if expected_combined:
        match_ratio = len(final_output) / len(expected_combined)
        print(f"Match:    {match_ratio:.1%}")
        print()
        
        if match_ratio < 0.3:
            print("❌ FAIL: Text before tool calls was lost")
            return False
        elif match_ratio >= 0.8:
            print("✅ PASS: Text accumulation working")
            return True
        else:
            print(f"⚠️  PARTIAL: Lost ~{100 - match_ratio * 100:.0f}%")
            return False
    else:
        print("❌ FAIL: No text accumulated")
        return False


async def main():
    """Run the test."""
    print("ADK Text Accumulation Bug Reproduction")
    print(f"Package: google-adk==1.32.0")
    print(f"Model:   {MODEL}")
    print()

    await test_text_accumulation()


if __name__ == "__main__":
    asyncio.run(main())

Expected: All text parts (intro + progress + conclusion) saved to output_key.
Actual: Only conclusion text (after final tool call) is saved.

Expected Behavior:

All streamed text parts (both before and after tool calls) should be accumulated and saved to the output_key state parameter. The final output should contain:

  • Introduction text (before tool calls)
  • Progress updates (between tool calls)
  • Conclusion text (after tool calls)

Observed Behavior:

Only text streamed after the final tool call is saved to output_key. All text before tool executions is discarded.

Example from test output:

Expected: 662 chars (6 text parts)
Actual:   231 chars (from output_key)
Match:    34.9%

⚠️  PARTIAL: Lost ~65%

The agent streams:

  1. Intro text: "I will create a chart..." → LOST
  2. Tool call: chart_generator_agent
  3. Progress text: "I will create a chart..." → LOST
  4. Tool call: chart_generator_agent
  5. Conclusion: "The chart has been generated..." → SAVED

Only part 5 appears in the final output.

Environment Details:

  • ADK Library Version: 1.32.0 (pip show google-adk)
  • Desktop OS: macOS (also reproduced on Linux)
  • Python Version: Python 3.13.7 (python -V)

Model Information:

  • Are you using LiteLLM: No
  • Which model is being used: gemini-2.5-pro (also affects flash)

🟡 Optional Information

Regression:

All previous versions, but appears to be a long-standing issue in the __maybe_save_output_to_state() method of LlmAgent which only processes is_final_response() events.

Logs:

Test output showing the bug:

Testing Text Accumulation Across Tool Calls
================================================================================

  Text: I will create a chart
  Text:  for you with random data. Let me use the `chart_generator_agent` to build it.

  Tool: chart_generator_agent
  Text: I will create a chart for you with random data. Let me use the `chart_generator_...
  Tool: chart_generator_agent
  Text: The chart specification has been generated successfully. It includes random data...
  Text:  for a bar chart.

All chart specifications have been generated successfully. Yo...
  Text: The chart specification has been generated successfully. It includes random data...

================================================================================
Expected: 662 chars (6 text parts)
Actual:   231 chars (from output_key)
Match:    34.9%

⚠️  PARTIAL: Lost ~65%

Minimal Reproduction Code:

The full reproduction code is provided in the "Steps to Reproduce" section above. The test creates an agent that:

  1. Streams introduction text
  2. Calls a tool (chart_generator_agent)
  3. Streams progress text
  4. Calls the tool again
  5. Streams conclusion text

Only step 5 (conclusion) is saved to output_key, losing steps 1 and 3 (60-70% of content).

How often has this issue occurred?:

  • Always (100%) - Reproducible on every run with the provided test case

Additional Context:

This bug severely impacts production usage where agents use tools (AgentTool or FunctionTool). Users lose critical context:

  • Explanations of what the agent is doing
  • Reasoning before tool calls
  • Progress updates during multi-step operations

The root cause appears to be in google/adk/agents/llm_agent.py in the __maybe_save_output_to_state() method, which only saves output when is_final_response() returns True. This happens only after all tool executions complete, so earlier text is never persisted.

Impact:

  • Severe: 60-80% of agent responses lost in streaming scenarios with tools
  • Production agents appear broken to users (missing explanations)
  • Affects all LlmAgent instances using output_key + streaming + tools

Metadata

Metadata

Labels

core[Component] This issue is related to the core interface and implementationrequest clarification[Status] The maintainer need clarification or more information from the author

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions