Skip to content

[FEATURE] Tool Execution Content Not Visible in result.message - Design Question #1156

@sundargthb

Description

@sundargthb

Problem Statement

Summary

Tool execution content (e.g., executed code) is stored in result.metrics.tool_metrics but does not appear in result.message, requiring users to implement custom extraction logic. Is this intentional?


Current Behavior

When using tools like Code Interpreter:

result = agent("Calculate 5 + 3")

# ❌ Users expect this to show the code:
print(result.message)  
# Output: "I've calculated the result..." (LLM commentary only)

# ✅ Code is actually HERE:
print(result.metrics.tool_metrics['code_interpreter'].tool['input'])
# Output: Shows the actual Python code executed

Evidence from actual logs:

'tool_metrics': {
    'code_interpreter': ToolMetrics(
        tool={
            'toolUseId': 'tooluse_7rOk7xfMS864-IUMx6oliQ', 
            'name': 'code_interpreter', 
            'input': {
                'code_interpreter_input': {
                    'action': {
                        'type': 'executeCode', 
                        'language': 'python', 
                        'code': '# Your dataset\ndata = [23, 45, 67, 89, 12, 34, 56]...'
                    }
                }
            }
        }
    )
}

The code IS there - just not in result.message where users naturally look!


Impact

  1. Reduced transparency - Users can't see what code was executed without diving into metrics
  2. Poor debugging experience - Requires navigating nested data structures
  3. Security/audit concerns - Executed actions aren't visible in the primary response
  4. Requires boilerplate - Every user must implement custom extraction:
# Current workaround needed (15+ lines):
def format_response(result):
    try:
        tool_metrics = result.metrics.tool_metrics.get('code_interpreter')
        code = tool_metrics.tool['input']['code_interpreter_input']['action']['code']
        return f"Code: {code}\n\nResult: {str(result)}"
    except (AttributeError, KeyError):
        return str(result)

Question

Is this the intended design? Should tool execution content:

Option A: Stay in metrics only (current behavior)
Option B: Also appear in result.message (expected behavior)
Option C: Be accessible via helper method like result.get_tool_inputs()


Expected Behavior

Users expect tool execution details to be visible in the primary response:

result = agent("Calculate 5 + 3")
print(result.message)
# Should show both:
# - Executed code
# - LLM commentary

Environment

  • Strands SDK: Latest
  • Tool: strands_tools.code_interpreter.AgentCoreCodeInterpreter
  • Model: Claude Sonnet 4.5 (Bedrock)
  • Use Case: AWS Bedrock AgentCore integration

Request

Could the Strands team clarify:

  1. Is tool content intentionally separated from result.message?
  2. What's the recommended pattern for accessing tool execution details?
  3. Would you consider adding helper methods or documentation for this?

This affects user experience, especially for educational content, debugging, and audit trails.

Proposed Solution

Automatically append tool execution info to result.message content blocks:

# In event_loop_cycle or _execute_event_loop_cycle:
if tool_result:
    message['content'].append({
        "text": f"[Executed: {tool_name}]\nInput: {tool_input}\nOutput: {tool_output}"
    })

Pros:

  • Natural user experience
  • Consistent with message-based paradigm
  • No API changes needed

Cons:

  • Increases message token count
  • May clutter conversation history

Use Case

Educational/Getting Started Experience:

  • Users learning with Code Interpreter expect to see what code was executed
  • Currently they only see LLM commentary: "I've calculated the result..."
  • The actual executed code is hidden in nested metrics structure

Production Debugging:

  • Developers need quick visibility into what actions the agent took
  • Audit trails require clear records of executed commands
  • Security reviews need to verify what code actually ran

Example:

result = agent("Calculate average of [23, 45, 67, 89, 12]")
print(result.message)  # ❌ Only shows: "The average is 47.2"
                       # ✅ Should also show: The Python code that calculated it

Alternatives Solutions

Add convenience methods to extract tool data:

class AgentResult:
    def get_tool_executions(self) -> list[dict]:
        """Extract all tool execution details."""
        return [
            {
                'name': tool_name,
                'input': metrics.tool['input'],
                'output': metrics.tool_result
            }
            for tool_name, metrics in self.metrics.tool_metrics.items()
        ]
    
    def get_executed_code(self) -> str | None:
        """Convenience method for code_interpreter."""
        ci = self.metrics.tool_metrics.get('code_interpreter')
        if ci:
            return ci.tool['input']['code_interpreter_input']['action'].get('code')

Pros:

  • Clean API
  • Doesn't modify message history
  • Easy to document

Cons:

  • Still requires extra step
  • Not as discoverable

Option 2: Configuration Flag

Let users choose:

agent = Agent(
    model=MODEL_ID,
    tools=[code_interpreter],
    include_tool_details_in_messages=True  # Default: False
)

Pros:

  • Backward compatible
  • User control

Cons:

  • More configuration complexity
  • Split behavior patterns

Additional Context

Current Workaround (Required by All Users):

def format_response(result):
    """Extract code from metrics - 15 lines of boilerplate"""
    try:
        tool_metrics = result.metrics.tool_metrics.get('code_interpreter')
        if tool_metrics and hasattr(tool_metrics, 'tool'):
            action = tool_metrics.tool['input']['code_interpreter_input']['action']
            if 'code' in action:
                code = action['code']
                return f"Executed:\n```python\n{code}\n```\n\nResult: {str(result)}"
    except (AttributeError, KeyError):
        pass
    return str(result)

Impact:

  • Every user must implement this extraction
  • Not documented in getting started guides
  • Reduces transparency for security/audit
  • Poor debugging experience

Related Code References:

  • agent_result.py - AgentResult structure
  • agent.py - _execute_event_loop_cycle() - where tool results are processed
  • agent.py - _record_tool_execution() - shows message recording pattern for direct tool calls

Environment:

  • Strands SDK: Latest (sdk-python)
  • Tool: strands_tools.code_interpreter.AgentCoreCodeInterpreter
  • Model: Claude Sonnet 4.5 via Bedrock
  • Use Case: AWS Bedrock AgentCore integration

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions