Skip to content

Python: Gemini connector drops cached-content and thinking token counts from usage details #6637

Description

@he-yufeng

Describe the bug

The Gemini chat client only surfaces input, output, and total token counts in usage_details. Gemini's GenerateContentResponseUsageMetadata also reports cached_content_token_count (tokens served from context cache) and thoughts_token_count (tokens spent on thinking by reasoning models), but _parse_usage drops both. So for cached prompts and thinking models, cache and reasoning usage silently read as zero, which throws off cost and token accounting.

UsageDetails already has canonical fields for these (cache_read_input_token_count, reasoning_output_token_count), and the OpenAI and Anthropic connectors already populate them — Gemini is the odd one out.

Where

python/packages/gemini/agent_framework_gemini/_chat_client.py, RawGeminiChatClient._parse_usage.

Expected behavior

When the API returns cached_content_token_count / thoughts_token_count, map them to cache_read_input_token_count / reasoning_output_token_count in usage_details, matching the OpenAI and Anthropic connectors.

Metadata

Metadata

Assignees

Labels

pythonUsage: [Issues, PRs], Target: PythonreproducedUsage: [Issues], Target: all issues that can be reproduced by the triage workflow

Type

No fields configured for Bug.

Projects

Status
Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions