feat(acp): report cached and thought tokens in PromptResponse.usage by VascoSch92 · Pull Request #27986 · google-gemini/gemini-cli

VascoSch92 · 2026-06-17T14:35:37Z

Summary

When running as an ACP server (gemini --acp), per-turn token usage only included input and output tokens. The cached and thought/reasoning counts were dropped, so ACP clients that estimate cost from token counts treat all input as uncached — overstating cost by ~3× for cache-heavy agentic sessions (the real spend already benefits from caching; only the reporting was wrong).

This populates the standard ACP PromptResponse.usage field with cached and thought tokens, aligning Gemini CLI with Claude Agent ACP and Codex ACP.

Details

packages/cli/src/acp/acpSession.ts read only promptTokenCount / candidatesTokenCount from Gemini's usageMetadata and emitted only input_tokens / output_tokens in the non-standard _meta.quota.token_count. cachedContentTokenCount and thoughtsTokenCount were available (they're already read in event-translator.ts, telemetry/types.ts, services/chatRecordingService.ts) but never forwarded over ACP.

This PR:

captures usageMetadata.cachedContentTokenCount and usageMetadata.thoughtsTokenCount per turn and accumulates them;
populates the standard PromptResponse.usage (inputTokens, outputTokens, cachedReadTokens, thoughtTokens, totalTokens) on every completion path;
keeps the existing _meta.quota payload untouched for backward compatibility.

cachedReadTokens is a subset of inputTokens, so totalTokens = input + output + thought.

Related Issues

Closes #27985
Related to #24280

How to Validate

npm run typecheck --workspace packages/cli
npx vitest run src/acp/acpSession.test.ts   # from packages/cli
npx eslint packages/cli/src/acp/acpSession.ts packages/cli/src/acp/acpSession.test.ts --max-warnings 0

A new unit test asserts that a Finished event with cachedContentTokenCount / thoughtsTokenCount is surfaced in result.usage. All 29 tests in acpSession.test.ts pass.

Pre-Merge Checklist

Updated relevant documentation and README (if needed)
Added/updated tests (if needed)
Noted breaking changes (none — additive; _meta.quota preserved)
Validated on required platforms/methods:
- MacOS — npm run typecheck, vitest, eslint

The ACP server only reported input/output tokens (via the non-standard _meta.quota.token_count), dropping cachedContentTokenCount and thoughtsTokenCount from Gemini's usageMetadata. ACP clients that estimate cost from token counts therefore treat all input as uncached, overstating cost (~3x for cache-heavy agentic sessions) even though the real spend already benefits from caching. Capture cachedContentTokenCount/thoughtsTokenCount and populate the standard ACP PromptResponse.usage field (inputTokens, outputTokens, cachedReadTokens, thoughtTokens, totalTokens), mirroring Claude Agent ACP and Codex ACP. The existing _meta.quota payload is kept for backward compatibility.

google-cla · 2026-06-17T14:35:50Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

github-actions · 2026-06-17T14:35:58Z

📊 PR Size: size/M

Lines changed: 62
Additions: +62
Deletions: -0
Files changed: 2

gemini-code-assist · 2026-06-17T14:37:55Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the ACP server's token usage reporting by including cached and thought tokens in the standard PromptResponse.usage field. Previously, these metrics were omitted, leading to inaccurate cost reporting for cache-heavy sessions. The changes ensure that ACP clients receive a comprehensive view of token consumption while maintaining full backward compatibility with existing metadata structures.

Highlights

Enhanced Token Reporting: Updated the ACP session logic to include cached and thought tokens in the standard PromptResponse.usage field, ensuring accurate cost estimation for ACP clients.
Accumulation Logic: Implemented per-turn tracking and accumulation of cached and thought tokens, maintaining backward compatibility by preserving the existing _meta.quota payload.
Testing: Added a new unit test in acpSession.test.ts to verify that cached and thought tokens are correctly surfaced in the usage report.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces standard ACP token usage reporting (including cached and thought tokens) to the session prompt response. The review feedback correctly points out that thought tokens are already included in the output token count in the Gemini API, meaning they should not be added again when calculating totalTokens. Correcting this calculation in both the implementation and the test assertions prevents double-counting.

gemini-code-assist · 2026-06-17T14:41:47Z

+    const buildUsage = (): acp.Usage => ({
+      inputTokens: totalInputTokens,
+      outputTokens: totalOutputTokens,
+      cachedReadTokens: totalCachedTokens,
+      thoughtTokens: totalThoughtTokens,
+      totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,
+    });


In the Gemini API, candidatesTokenCount (which maps to outputTokens) already includes thoughtsTokenCount (which maps to thoughtTokens). Therefore, adding totalThoughtTokens to totalInputTokens + totalOutputTokens results in double-counting the thought tokens in totalTokens. The correct calculation for totalTokens should be totalInputTokens + totalOutputTokens.

Suggested change

const buildUsage = (): acp.Usage => ({

inputTokens: totalInputTokens,

outputTokens: totalOutputTokens,

cachedReadTokens: totalCachedTokens,

thoughtTokens: totalThoughtTokens,

totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,

});

const buildUsage = (): acp.Usage => ({

inputTokens: totalInputTokens,

outputTokens: totalOutputTokens,

cachedReadTokens: totalCachedTokens,

thoughtTokens: totalThoughtTokens,

totalTokens: totalInputTokens + totalOutputTokens,

});

gemini-code-assist · 2026-06-17T14:41:48Z

+    // cachedReadTokens is a subset of inputTokens, so totalTokens sums
+    // input + output + thought (1000 + 200 + 50).
+    expect(result.usage).toEqual({
+      inputTokens: 1000,
+      outputTokens: 200,
+      cachedReadTokens: 800,
+      thoughtTokens: 50,
+      totalTokens: 1250,
+    });


Update the test assertion to reflect that totalTokens is inputTokens + outputTokens (1200) without double-counting the thought tokens, since thought tokens are already a subset of output tokens.

Suggested change

// cachedReadTokens is a subset of inputTokens, so totalTokens sums

// input + output + thought (1000 + 200 + 50).

expect(result.usage).toEqual({

inputTokens: 1000,

outputTokens: 200,

cachedReadTokens: 800,

thoughtTokens: 50,

totalTokens: 1250,

});

// cachedReadTokens is a subset of inputTokens, and thoughtTokens is a subset of outputTokens,

// so totalTokens is inputTokens + outputTokens (1000 + 200).

expect(result.usage).toEqual({

inputTokens: 1000,

outputTokens: 200,

cachedReadTokens: 800,

thoughtTokens: 50,

totalTokens: 1200,

});

gemini-code-assist

Code Review

This pull request introduces standard ACP token usage reporting (including cached and thought tokens) to the ACP session implementation and adds a corresponding unit test. The review feedback correctly identifies a double-counting issue where thought tokens are added to the total token count despite already being included in the output token count. Suggestions are provided to fix this calculation in both the session logic and the test assertions.

gemini-code-assist · 2026-06-17T14:42:25Z

+    const buildUsage = (): acp.Usage => ({
+      inputTokens: totalInputTokens,
+      outputTokens: totalOutputTokens,
+      cachedReadTokens: totalCachedTokens,
+      thoughtTokens: totalThoughtTokens,
+      totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,
+    });


In Gemini's usageMetadata, candidatesTokenCount (which maps to totalOutputTokens) already includes thoughtsTokenCount (which maps to totalThoughtTokens) because thoughts are generated as part of the response candidates. Adding totalThoughtTokens to totalTokens results in double-counting the thought tokens, leading to inflated token usage reports. totalTokens should simply be the sum of totalInputTokens and totalOutputTokens.

Suggested change

const buildUsage = (): acp.Usage => ({

inputTokens: totalInputTokens,

outputTokens: totalOutputTokens,

cachedReadTokens: totalCachedTokens,

thoughtTokens: totalThoughtTokens,

totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,

});

const buildUsage = (): acp.Usage => ({

inputTokens: totalInputTokens,

outputTokens: totalOutputTokens,

cachedReadTokens: totalCachedTokens,

thoughtTokens: totalThoughtTokens,

totalTokens: totalInputTokens + totalOutputTokens,

});

gemini-code-assist

Code Review

This pull request introduces standard ACP token usage reporting (including cached and thought tokens) to session prompt responses and adds a corresponding unit test. The review feedback correctly points out that Gemini's thought tokens are already a subset of the output tokens, meaning that adding them separately to the total token count results in double-counting. The reviewer suggests correcting the total token calculation in both the session implementation and the test assertions.

gemini-code-assist · 2026-06-17T14:43:07Z

+    const buildUsage = (): acp.Usage => ({
+      inputTokens: totalInputTokens,
+      outputTokens: totalOutputTokens,
+      cachedReadTokens: totalCachedTokens,
+      thoughtTokens: totalThoughtTokens,
+      totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,
+    });


In Gemini, thought/reasoning tokens (thoughtsTokenCount) are generated by the model and are already included as a subset of the total output tokens (candidatesTokenCount). Adding totalThoughtTokens to totalInputTokens + totalOutputTokens results in double-counting the thought tokens in totalTokens. The total tokens should simply be the sum of inputTokens and outputTokens.

Suggested change

const buildUsage = (): acp.Usage => ({

inputTokens: totalInputTokens,

outputTokens: totalOutputTokens,

cachedReadTokens: totalCachedTokens,

thoughtTokens: totalThoughtTokens,

totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,

});

const buildUsage = (): acp.Usage => ({

inputTokens: totalInputTokens,

outputTokens: totalOutputTokens,

cachedReadTokens: totalCachedTokens,

thoughtTokens: totalThoughtTokens,

totalTokens: totalInputTokens + totalOutputTokens,

});

gemini-code-assist · 2026-06-17T14:43:08Z

+    // cachedReadTokens is a subset of inputTokens, so totalTokens sums
+    // input + output + thought (1000 + 200 + 50).
+    expect(result.usage).toEqual({
+      inputTokens: 1000,
+      outputTokens: 200,
+      cachedReadTokens: 800,
+      thoughtTokens: 50,
+      totalTokens: 1250,
+    });


Update the test assertion to reflect that totalTokens is the sum of inputTokens and outputTokens (1000 + 200 = 1200), as thought tokens are already included in the output tokens.

Suggested change

// cachedReadTokens is a subset of inputTokens, so totalTokens sums

// input + output + thought (1000 + 200 + 50).

expect(result.usage).toEqual({

inputTokens: 1000,

outputTokens: 200,

cachedReadTokens: 800,

thoughtTokens: 50,

totalTokens: 1250,

});

// cachedReadTokens is a subset of inputTokens, and thoughtTokens is a subset of outputTokens,

// so totalTokens is input + output (1000 + 200).

expect(result.usage).toEqual({

inputTokens: 1000,

outputTokens: 200,

cachedReadTokens: 800,

thoughtTokens: 50,

totalTokens: 1200,

});

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

VascoSch92 requested a review from a team as a code owner June 17, 2026 14:35

github-actions Bot added the size/m A medium sized PR label Jun 17, 2026

gemini-code-assist Bot reviewed Jun 17, 2026

View reviewed changes

Update packages/cli/src/acp/acpSession.test.ts

bd54a30

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-cli Bot added the area/non-interactive Issues related to GitHub Actions, SDK, 3P Integrations, Shell Scripting, Command line automation label Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(acp): report cached and thought tokens in PromptResponse.usage#27986

feat(acp): report cached and thought tokens in PromptResponse.usage#27986
VascoSch92 wants to merge 2 commits into
google-gemini:mainfrom
VascoSch92:acp-report-cached-thought-tokens

VascoSch92 commented Jun 17, 2026

Uh oh!

google-cla Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jun 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VascoSch92 commented Jun 17, 2026

Summary

Details

Related Issues

How to Validate

Pre-Merge Checklist

Uh oh!

google-cla Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Jun 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 17, 2026 •

edited

Loading