.Net: Fix Gemini streaming token usage metrics#13944
Open
MohamedOthman1 wants to merge 4 commits intomicrosoft:mainfrom
Open
.Net: Fix Gemini streaming token usage metrics#13944MohamedOthman1 wants to merge 4 commits intomicrosoft:mainfrom
MohamedOthman1 wants to merge 4 commits intomicrosoft:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes inflated OpenTelemetry token usage metrics for the Gemini streaming connector by ensuring accumulated usage metadata is emitted only once per streamed response (instead of once per chunk), aligning streaming behavior with Gemini’s accumulated usage reporting.
Changes:
- Update Gemini streaming response processing to suppress per-chunk usage logging and log usage once after streaming completes, using the last chunk that contains usage metadata.
- Refactor
ProcessChatResponseto optionally skip usage logging so it can be reused for streaming without side effects. - Add a unit test that validates prompt/completion/total token counters are each emitted exactly once for a multi-chunk stream (including a final chunk without usage metadata).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
dotnet/src/Connectors/Connectors.Google/Core/Gemini/Clients/GeminiChatCompletionClient.cs |
Prevents duplicate token metrics in streaming by deferring usage logging until stream completion. |
dotnet/src/Connectors/Connectors.Google.UnitTests/Core/Gemini/Clients/GeminiChatStreamingTests.cs |
Adds regression coverage to ensure streaming usage metrics are emitted once even when the final chunk lacks usage metadata. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Fixes #13382.
Gemini streaming responses include cumulative usage metadata. The current connector records that metadata for every chunk, so a streamed response can inflate token counters by the number of chunks in the stream.
Description
This update keeps streaming chunks flowing as before, but suppresses usage logging while each chunk is parsed. Once the stream finishes normally, the connector records usage a single time from the latest chunk that actually included token usage metadata.
That keeps the non-streaming path unchanged and avoids losing metrics if Gemini ever sends a final chunk without
usageMetadata. If the stream is cancelled or fails before normal completion, usage is not emitted for the partial response.The added regression test uses repeated token counts across multiple stream chunks, followed by a final chunk without usage metadata, and verifies each Google token counter records the expected value once.
Pre-submit Notes
git diff --check.