feat(acp): report cached and thought tokens in PromptResponse.usage#27986
feat(acp): report cached and thought tokens in PromptResponse.usage#27986VascoSch92 wants to merge 2 commits into
Conversation
The ACP server only reported input/output tokens (via the non-standard _meta.quota.token_count), dropping cachedContentTokenCount and thoughtsTokenCount from Gemini's usageMetadata. ACP clients that estimate cost from token counts therefore treat all input as uncached, overstating cost (~3x for cache-heavy agentic sessions) even though the real spend already benefits from caching. Capture cachedContentTokenCount/thoughtsTokenCount and populate the standard ACP PromptResponse.usage field (inputTokens, outputTokens, cachedReadTokens, thoughtTokens, totalTokens), mirroring Claude Agent ACP and Codex ACP. The existing _meta.quota payload is kept for backward compatibility.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
📊 PR Size: size/M
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the ACP server's token usage reporting by including cached and thought tokens in the standard PromptResponse.usage field. Previously, these metrics were omitted, leading to inaccurate cost reporting for cache-heavy sessions. The changes ensure that ACP clients receive a comprehensive view of token consumption while maintaining full backward compatibility with existing metadata structures. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces standard ACP token usage reporting (including cached and thought tokens) to the session prompt response. The review feedback correctly points out that thought tokens are already included in the output token count in the Gemini API, meaning they should not be added again when calculating totalTokens. Correcting this calculation in both the implementation and the test assertions prevents double-counting.
| const buildUsage = (): acp.Usage => ({ | ||
| inputTokens: totalInputTokens, | ||
| outputTokens: totalOutputTokens, | ||
| cachedReadTokens: totalCachedTokens, | ||
| thoughtTokens: totalThoughtTokens, | ||
| totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens, | ||
| }); |
There was a problem hiding this comment.
In the Gemini API, candidatesTokenCount (which maps to outputTokens) already includes thoughtsTokenCount (which maps to thoughtTokens). Therefore, adding totalThoughtTokens to totalInputTokens + totalOutputTokens results in double-counting the thought tokens in totalTokens. The correct calculation for totalTokens should be totalInputTokens + totalOutputTokens.
| const buildUsage = (): acp.Usage => ({ | |
| inputTokens: totalInputTokens, | |
| outputTokens: totalOutputTokens, | |
| cachedReadTokens: totalCachedTokens, | |
| thoughtTokens: totalThoughtTokens, | |
| totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens, | |
| }); | |
| const buildUsage = (): acp.Usage => ({ | |
| inputTokens: totalInputTokens, | |
| outputTokens: totalOutputTokens, | |
| cachedReadTokens: totalCachedTokens, | |
| thoughtTokens: totalThoughtTokens, | |
| totalTokens: totalInputTokens + totalOutputTokens, | |
| }); |
| // cachedReadTokens is a subset of inputTokens, so totalTokens sums | ||
| // input + output + thought (1000 + 200 + 50). | ||
| expect(result.usage).toEqual({ | ||
| inputTokens: 1000, | ||
| outputTokens: 200, | ||
| cachedReadTokens: 800, | ||
| thoughtTokens: 50, | ||
| totalTokens: 1250, | ||
| }); |
There was a problem hiding this comment.
Update the test assertion to reflect that totalTokens is inputTokens + outputTokens (1200) without double-counting the thought tokens, since thought tokens are already a subset of output tokens.
| // cachedReadTokens is a subset of inputTokens, so totalTokens sums | |
| // input + output + thought (1000 + 200 + 50). | |
| expect(result.usage).toEqual({ | |
| inputTokens: 1000, | |
| outputTokens: 200, | |
| cachedReadTokens: 800, | |
| thoughtTokens: 50, | |
| totalTokens: 1250, | |
| }); | |
| // cachedReadTokens is a subset of inputTokens, and thoughtTokens is a subset of outputTokens, | |
| // so totalTokens is inputTokens + outputTokens (1000 + 200). | |
| expect(result.usage).toEqual({ | |
| inputTokens: 1000, | |
| outputTokens: 200, | |
| cachedReadTokens: 800, | |
| thoughtTokens: 50, | |
| totalTokens: 1200, | |
| }); |
There was a problem hiding this comment.
Code Review
This pull request introduces standard ACP token usage reporting (including cached and thought tokens) to the ACP session implementation and adds a corresponding unit test. The review feedback correctly identifies a double-counting issue where thought tokens are added to the total token count despite already being included in the output token count. Suggestions are provided to fix this calculation in both the session logic and the test assertions.
| const buildUsage = (): acp.Usage => ({ | ||
| inputTokens: totalInputTokens, | ||
| outputTokens: totalOutputTokens, | ||
| cachedReadTokens: totalCachedTokens, | ||
| thoughtTokens: totalThoughtTokens, | ||
| totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens, | ||
| }); |
There was a problem hiding this comment.
In Gemini's usageMetadata, candidatesTokenCount (which maps to totalOutputTokens) already includes thoughtsTokenCount (which maps to totalThoughtTokens) because thoughts are generated as part of the response candidates. Adding totalThoughtTokens to totalTokens results in double-counting the thought tokens, leading to inflated token usage reports. totalTokens should simply be the sum of totalInputTokens and totalOutputTokens.
| const buildUsage = (): acp.Usage => ({ | |
| inputTokens: totalInputTokens, | |
| outputTokens: totalOutputTokens, | |
| cachedReadTokens: totalCachedTokens, | |
| thoughtTokens: totalThoughtTokens, | |
| totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens, | |
| }); | |
| const buildUsage = (): acp.Usage => ({ | |
| inputTokens: totalInputTokens, | |
| outputTokens: totalOutputTokens, | |
| cachedReadTokens: totalCachedTokens, | |
| thoughtTokens: totalThoughtTokens, | |
| totalTokens: totalInputTokens + totalOutputTokens, | |
| }); |
There was a problem hiding this comment.
Code Review
This pull request introduces standard ACP token usage reporting (including cached and thought tokens) to session prompt responses and adds a corresponding unit test. The review feedback correctly points out that Gemini's thought tokens are already a subset of the output tokens, meaning that adding them separately to the total token count results in double-counting. The reviewer suggests correcting the total token calculation in both the session implementation and the test assertions.
| const buildUsage = (): acp.Usage => ({ | ||
| inputTokens: totalInputTokens, | ||
| outputTokens: totalOutputTokens, | ||
| cachedReadTokens: totalCachedTokens, | ||
| thoughtTokens: totalThoughtTokens, | ||
| totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens, | ||
| }); |
There was a problem hiding this comment.
In Gemini, thought/reasoning tokens (thoughtsTokenCount) are generated by the model and are already included as a subset of the total output tokens (candidatesTokenCount). Adding totalThoughtTokens to totalInputTokens + totalOutputTokens results in double-counting the thought tokens in totalTokens. The total tokens should simply be the sum of inputTokens and outputTokens.
| const buildUsage = (): acp.Usage => ({ | |
| inputTokens: totalInputTokens, | |
| outputTokens: totalOutputTokens, | |
| cachedReadTokens: totalCachedTokens, | |
| thoughtTokens: totalThoughtTokens, | |
| totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens, | |
| }); | |
| const buildUsage = (): acp.Usage => ({ | |
| inputTokens: totalInputTokens, | |
| outputTokens: totalOutputTokens, | |
| cachedReadTokens: totalCachedTokens, | |
| thoughtTokens: totalThoughtTokens, | |
| totalTokens: totalInputTokens + totalOutputTokens, | |
| }); |
| // cachedReadTokens is a subset of inputTokens, so totalTokens sums | ||
| // input + output + thought (1000 + 200 + 50). | ||
| expect(result.usage).toEqual({ | ||
| inputTokens: 1000, | ||
| outputTokens: 200, | ||
| cachedReadTokens: 800, | ||
| thoughtTokens: 50, | ||
| totalTokens: 1250, | ||
| }); |
There was a problem hiding this comment.
Update the test assertion to reflect that totalTokens is the sum of inputTokens and outputTokens (1000 + 200 = 1200), as thought tokens are already included in the output tokens.
| // cachedReadTokens is a subset of inputTokens, so totalTokens sums | |
| // input + output + thought (1000 + 200 + 50). | |
| expect(result.usage).toEqual({ | |
| inputTokens: 1000, | |
| outputTokens: 200, | |
| cachedReadTokens: 800, | |
| thoughtTokens: 50, | |
| totalTokens: 1250, | |
| }); | |
| // cachedReadTokens is a subset of inputTokens, and thoughtTokens is a subset of outputTokens, | |
| // so totalTokens is input + output (1000 + 200). | |
| expect(result.usage).toEqual({ | |
| inputTokens: 1000, | |
| outputTokens: 200, | |
| cachedReadTokens: 800, | |
| thoughtTokens: 50, | |
| totalTokens: 1200, | |
| }); |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary
When running as an ACP server (
gemini --acp), per-turn token usage only included input and output tokens. The cached and thought/reasoning counts were dropped, so ACP clients that estimate cost from token counts treat all input as uncached — overstating cost by ~3× for cache-heavy agentic sessions (the real spend already benefits from caching; only the reporting was wrong).This populates the standard ACP
PromptResponse.usagefield with cached and thought tokens, aligning Gemini CLI with Claude Agent ACP and Codex ACP.Details
packages/cli/src/acp/acpSession.tsread onlypromptTokenCount/candidatesTokenCountfrom Gemini'susageMetadataand emitted onlyinput_tokens/output_tokensin the non-standard_meta.quota.token_count.cachedContentTokenCountandthoughtsTokenCountwere available (they're already read inevent-translator.ts,telemetry/types.ts,services/chatRecordingService.ts) but never forwarded over ACP.This PR:
usageMetadata.cachedContentTokenCountandusageMetadata.thoughtsTokenCountper turn and accumulates them;PromptResponse.usage(inputTokens,outputTokens,cachedReadTokens,thoughtTokens,totalTokens) on every completion path;_meta.quotapayload untouched for backward compatibility.cachedReadTokensis a subset ofinputTokens, sototalTokens = input + output + thought.Related Issues
Closes #27985
Related to #24280
How to Validate
npm run typecheck --workspace packages/cli npx vitest run src/acp/acpSession.test.ts # from packages/cli npx eslint packages/cli/src/acp/acpSession.ts packages/cli/src/acp/acpSession.test.ts --max-warnings 0A new unit test asserts that a
Finishedevent withcachedContentTokenCount/thoughtsTokenCountis surfaced inresult.usage. All 29 tests inacpSession.test.tspass.Pre-Merge Checklist
_meta.quotapreserved)