Skip to content

feat(acp): report cached and thought tokens in PromptResponse.usage#27986

Open
VascoSch92 wants to merge 2 commits into
google-gemini:mainfrom
VascoSch92:acp-report-cached-thought-tokens
Open

feat(acp): report cached and thought tokens in PromptResponse.usage#27986
VascoSch92 wants to merge 2 commits into
google-gemini:mainfrom
VascoSch92:acp-report-cached-thought-tokens

Conversation

@VascoSch92

Copy link
Copy Markdown

Summary

When running as an ACP server (gemini --acp), per-turn token usage only included input and output tokens. The cached and thought/reasoning counts were dropped, so ACP clients that estimate cost from token counts treat all input as uncached — overstating cost by ~3× for cache-heavy agentic sessions (the real spend already benefits from caching; only the reporting was wrong).

This populates the standard ACP PromptResponse.usage field with cached and thought tokens, aligning Gemini CLI with Claude Agent ACP and Codex ACP.

Details

packages/cli/src/acp/acpSession.ts read only promptTokenCount / candidatesTokenCount from Gemini's usageMetadata and emitted only input_tokens / output_tokens in the non-standard _meta.quota.token_count. cachedContentTokenCount and thoughtsTokenCount were available (they're already read in event-translator.ts, telemetry/types.ts, services/chatRecordingService.ts) but never forwarded over ACP.

This PR:

  • captures usageMetadata.cachedContentTokenCount and usageMetadata.thoughtsTokenCount per turn and accumulates them;
  • populates the standard PromptResponse.usage (inputTokens, outputTokens, cachedReadTokens, thoughtTokens, totalTokens) on every completion path;
  • keeps the existing _meta.quota payload untouched for backward compatibility.

cachedReadTokens is a subset of inputTokens, so totalTokens = input + output + thought.

Related Issues

Closes #27985
Related to #24280

How to Validate

npm run typecheck --workspace packages/cli
npx vitest run src/acp/acpSession.test.ts   # from packages/cli
npx eslint packages/cli/src/acp/acpSession.ts packages/cli/src/acp/acpSession.test.ts --max-warnings 0

A new unit test asserts that a Finished event with cachedContentTokenCount / thoughtsTokenCount is surfaced in result.usage. All 29 tests in acpSession.test.ts pass.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (none — additive; _meta.quota preserved)
  • Validated on required platforms/methods:
    • MacOS — npm run typecheck, vitest, eslint

The ACP server only reported input/output tokens (via the non-standard
_meta.quota.token_count), dropping cachedContentTokenCount and
thoughtsTokenCount from Gemini's usageMetadata. ACP clients that estimate
cost from token counts therefore treat all input as uncached, overstating
cost (~3x for cache-heavy agentic sessions) even though the real spend
already benefits from caching.

Capture cachedContentTokenCount/thoughtsTokenCount and populate the
standard ACP PromptResponse.usage field (inputTokens, outputTokens,
cachedReadTokens, thoughtTokens, totalTokens), mirroring Claude Agent ACP
and Codex ACP. The existing _meta.quota payload is kept for backward
compatibility.
@VascoSch92 VascoSch92 requested a review from a team as a code owner June 17, 2026 14:35
@google-cla

google-cla Bot commented Jun 17, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@github-actions github-actions Bot added the size/m A medium sized PR label Jun 17, 2026
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown

📊 PR Size: size/M

  • Lines changed: 62
  • Additions: +62
  • Deletions: -0
  • Files changed: 2

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the ACP server's token usage reporting by including cached and thought tokens in the standard PromptResponse.usage field. Previously, these metrics were omitted, leading to inaccurate cost reporting for cache-heavy sessions. The changes ensure that ACP clients receive a comprehensive view of token consumption while maintaining full backward compatibility with existing metadata structures.

Highlights

  • Enhanced Token Reporting: Updated the ACP session logic to include cached and thought tokens in the standard PromptResponse.usage field, ensuring accurate cost estimation for ACP clients.
  • Accumulation Logic: Implemented per-turn tracking and accumulation of cached and thought tokens, maintaining backward compatibility by preserving the existing _meta.quota payload.
  • Testing: Added a new unit test in acpSession.test.ts to verify that cached and thought tokens are correctly surfaced in the usage report.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces standard ACP token usage reporting (including cached and thought tokens) to the session prompt response. The review feedback correctly points out that thought tokens are already included in the output token count in the Gemini API, meaning they should not be added again when calculating totalTokens. Correcting this calculation in both the implementation and the test assertions prevents double-counting.

Comment on lines +372 to +378
const buildUsage = (): acp.Usage => ({
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
cachedReadTokens: totalCachedTokens,
thoughtTokens: totalThoughtTokens,
totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In the Gemini API, candidatesTokenCount (which maps to outputTokens) already includes thoughtsTokenCount (which maps to thoughtTokens). Therefore, adding totalThoughtTokens to totalInputTokens + totalOutputTokens results in double-counting the thought tokens in totalTokens. The correct calculation for totalTokens should be totalInputTokens + totalOutputTokens.

Suggested change
const buildUsage = (): acp.Usage => ({
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
cachedReadTokens: totalCachedTokens,
thoughtTokens: totalThoughtTokens,
totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,
});
const buildUsage = (): acp.Usage => ({
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
cachedReadTokens: totalCachedTokens,
thoughtTokens: totalThoughtTokens,
totalTokens: totalInputTokens + totalOutputTokens,
});

Comment thread packages/cli/src/acp/acpSession.test.ts Outdated
Comment on lines +277 to +285
// cachedReadTokens is a subset of inputTokens, so totalTokens sums
// input + output + thought (1000 + 200 + 50).
expect(result.usage).toEqual({
inputTokens: 1000,
outputTokens: 200,
cachedReadTokens: 800,
thoughtTokens: 50,
totalTokens: 1250,
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test assertion to reflect that totalTokens is inputTokens + outputTokens (1200) without double-counting the thought tokens, since thought tokens are already a subset of output tokens.

Suggested change
// cachedReadTokens is a subset of inputTokens, so totalTokens sums
// input + output + thought (1000 + 200 + 50).
expect(result.usage).toEqual({
inputTokens: 1000,
outputTokens: 200,
cachedReadTokens: 800,
thoughtTokens: 50,
totalTokens: 1250,
});
// cachedReadTokens is a subset of inputTokens, and thoughtTokens is a subset of outputTokens,
// so totalTokens is inputTokens + outputTokens (1000 + 200).
expect(result.usage).toEqual({
inputTokens: 1000,
outputTokens: 200,
cachedReadTokens: 800,
thoughtTokens: 50,
totalTokens: 1200,
});

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces standard ACP token usage reporting (including cached and thought tokens) to the ACP session implementation and adds a corresponding unit test. The review feedback correctly identifies a double-counting issue where thought tokens are added to the total token count despite already being included in the output token count. Suggestions are provided to fix this calculation in both the session logic and the test assertions.

Comment on lines +372 to +378
const buildUsage = (): acp.Usage => ({
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
cachedReadTokens: totalCachedTokens,
thoughtTokens: totalThoughtTokens,
totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In Gemini's usageMetadata, candidatesTokenCount (which maps to totalOutputTokens) already includes thoughtsTokenCount (which maps to totalThoughtTokens) because thoughts are generated as part of the response candidates. Adding totalThoughtTokens to totalTokens results in double-counting the thought tokens, leading to inflated token usage reports. totalTokens should simply be the sum of totalInputTokens and totalOutputTokens.

Suggested change
const buildUsage = (): acp.Usage => ({
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
cachedReadTokens: totalCachedTokens,
thoughtTokens: totalThoughtTokens,
totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,
});
const buildUsage = (): acp.Usage => ({
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
cachedReadTokens: totalCachedTokens,
thoughtTokens: totalThoughtTokens,
totalTokens: totalInputTokens + totalOutputTokens,
});

Comment thread packages/cli/src/acp/acpSession.test.ts Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces standard ACP token usage reporting (including cached and thought tokens) to session prompt responses and adds a corresponding unit test. The review feedback correctly points out that Gemini's thought tokens are already a subset of the output tokens, meaning that adding them separately to the total token count results in double-counting. The reviewer suggests correcting the total token calculation in both the session implementation and the test assertions.

Comment on lines +372 to +378
const buildUsage = (): acp.Usage => ({
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
cachedReadTokens: totalCachedTokens,
thoughtTokens: totalThoughtTokens,
totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In Gemini, thought/reasoning tokens (thoughtsTokenCount) are generated by the model and are already included as a subset of the total output tokens (candidatesTokenCount). Adding totalThoughtTokens to totalInputTokens + totalOutputTokens results in double-counting the thought tokens in totalTokens. The total tokens should simply be the sum of inputTokens and outputTokens.

Suggested change
const buildUsage = (): acp.Usage => ({
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
cachedReadTokens: totalCachedTokens,
thoughtTokens: totalThoughtTokens,
totalTokens: totalInputTokens + totalOutputTokens + totalThoughtTokens,
});
const buildUsage = (): acp.Usage => ({
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
cachedReadTokens: totalCachedTokens,
thoughtTokens: totalThoughtTokens,
totalTokens: totalInputTokens + totalOutputTokens,
});

Comment thread packages/cli/src/acp/acpSession.test.ts Outdated
Comment on lines +277 to +285
// cachedReadTokens is a subset of inputTokens, so totalTokens sums
// input + output + thought (1000 + 200 + 50).
expect(result.usage).toEqual({
inputTokens: 1000,
outputTokens: 200,
cachedReadTokens: 800,
thoughtTokens: 50,
totalTokens: 1250,
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update the test assertion to reflect that totalTokens is the sum of inputTokens and outputTokens (1000 + 200 = 1200), as thought tokens are already included in the output tokens.

Suggested change
// cachedReadTokens is a subset of inputTokens, so totalTokens sums
// input + output + thought (1000 + 200 + 50).
expect(result.usage).toEqual({
inputTokens: 1000,
outputTokens: 200,
cachedReadTokens: 800,
thoughtTokens: 50,
totalTokens: 1250,
});
// cachedReadTokens is a subset of inputTokens, and thoughtTokens is a subset of outputTokens,
// so totalTokens is input + output (1000 + 200).
expect(result.usage).toEqual({
inputTokens: 1000,
outputTokens: 200,
cachedReadTokens: 800,
thoughtTokens: 50,
totalTokens: 1200,
});

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@gemini-cli gemini-cli Bot added the area/non-interactive Issues related to GitHub Actions, SDK, 3P Integrations, Shell Scripting, Command line automation label Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/non-interactive Issues related to GitHub Actions, SDK, 3P Integrations, Shell Scripting, Command line automation size/m A medium sized PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ACP: PromptResponse usage omits cached/thought tokens, inflating client cost estimates

1 participant