Skip to content

fix(openai): fix stop_reason null and zero usage fields in stream adapter#226

Open
guunergooner wants to merge 1 commit intoclaude-code-best:mainfrom
guunergooner:fix/openai-stop-reason-usage
Open

fix(openai): fix stop_reason null and zero usage fields in stream adapter#226
guunergooner wants to merge 1 commit intoclaude-code-best:mainfrom
guunergooner:fix/openai-stop-reason-usage

Conversation

@guunergooner
Copy link
Copy Markdown
Contributor

@guunergooner guunergooner commented Apr 9, 2026

Summary

  • Fix stop_reason always being null in assembled AssistantMessage — the value captured from message_delta was never applied to the partial message
  • Fix partialMessage not being reset to null after message_stop, causing a safety fallback that would yield a second identical AssistantMessage (doubled content in next API request)
  • Fix zero usage fields in stream adapter output
  • Add comprehensive unit tests for queryModelOpenAI and streamAdapter

Test plan

  • Run bun test src/services/api/openai/__tests__/ to verify new tests pass
  • Test with an OpenAI-compatible endpoint (e.g. Ollama) and verify stop_reason is correctly populated
  • Verify no duplicate messages appear in conversation turns

Summary by CodeRabbit

  • Bug Fixes

    • More accurate max-output token computation and explicit truncation error when outputs are cut.
    • Consolidated streamed assistant messages to prevent duplicates and ensure a single complete final message (including a safety fallback if the stream ends early).
    • Improved streaming usage accounting to include cache-read/creation fields and correctly incorporate trailing usage arriving after finish.
  • Tests

    • Added extensive streaming and assembly regression tests plus a locale-stable timestamp test setup.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

Deferred OpenAI finish handling and consolidated AssistantMessage assembly: stream adapter now emits finalized usage (including cache fields) at stream end, query assembly yields a single assistant message at stop or stream end (with max_tokens truncation error), plus expanded streaming tests.

Changes

Cohort / File(s) Summary
Query & assembly
src/services/api/openai/index.ts
Compute model max output tokens (override support), capture stopReason, accumulate content blocks and usage, emit one consolidated AssistantMessage at message_stop or on stream end, emit apiError: "max_output_tokens" when truncated, and avoid duplicate final messages.
Stream adapter
src/services/api/openai/streamAdapter.ts
Defer finish handling until stream completion, map OpenAI usage → Anthropic fields (input/output + cache_read_input_tokens and cache_creation_input_tokens), preserve/compute stop_reason (length→max_tokens; tool calls → tool_use), and emit final message_delta with full usage.
Streaming tests
src/services/api/openai/__tests__/queryModelOpenAI.test.ts, src/services/api/openai/__tests__/streamAdapter.test.ts
New/extended tests exercising end-to-end streaming assembly and adapter behavior: stop_reason propagation, trailing usage handling (including cached_tokens variants), usage accumulation (four fields), no duplicate assistant emissions, and forwarding of upstream stream events.
Utils: timestamp
src/utils/__tests__/formatBriefTimestamp.test.ts, src/utils/formatBriefTimestamp.ts
Tests now set/restore process.env.LC_ALL for deterministic locale; added JSDoc for internal startOfDay helper (no behavioral change).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant QueryService as Query (queryModelOpenAI)
  participant Adapter as StreamAdapter (adaptOpenAIStreamToAnthropic)
  participant OpenAI
  participant Assembler as MessageAssembler

  Client->>QueryService: start streaming request
  QueryService->>OpenAI: open streaming connection
  OpenAI-->>Adapter: raw streaming chunks
  Adapter-->>QueryService: adapted events (message_start, content_block_*, message_delta, message_stop) [finish_reason deferred until post-loop]
  QueryService->>Assembler: accumulate content blocks and usage from events
  alt message_stop emitted
    Assembler-->>Client: emit consolidated AssistantMessage (usage, stop_reason, stop_sequence=null)
    alt stop_reason == max_tokens
      Assembler-->>Client: emit AssistantMessage apiError: "max_output_tokens" (contains computed maxTokens)
    end
  else stream ends without message_stop
    Assembler-->>Client: safety-assembled AssistantMessage (same finalization and possible apiError)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

"A rabbit tapped the stream with gentle paws,
counted cache crumbs and stitched the pause.
One final message, no echoes more,
max tokens warned at the trailing shore.
Hooray — the stream yields tidy, neat applause!" 🐇✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main fixes: capturing stop_reason from message_delta and fixing zero usage fields in the stream adapter, which are the core changes reflected across the modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/services/api/openai/index.ts (1)

385-392: ⚠️ Potential issue | 🟠 Major

Duplicate cost tracking — this block executes twice for message_stop.

The cost tracking at lines 385-392 duplicates the logic already executed inside the message_stop case handler at lines 364-367. When event.type === 'message_stop', addToTotalSessionCost is called twice with the same values, inflating the reported session cost.

🐛 Proposed fix: Remove the duplicate block
-      // Track cost and token usage (matching the Anthropic path in claude.ts)
-      if (
-        event.type === 'message_stop' &&
-        usage.input_tokens + usage.output_tokens > 0
-      ) {
-        const costUSD = calculateUSDCost(openaiModel, usage as any)
-        addToTotalSessionCost(costUSD, usage as any, options.model)
-      }
-
       // Also yield as StreamEvent for real-time display (matching Anthropic path)
       yield {
         type: 'stream_event',
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/api/openai/index.ts` around lines 385 - 392, There is a
duplicated cost-tracking block that runs when event.type === 'message_stop'
causing addToTotalSessionCost to be called twice; remove the redundant block
(the one that checks event.type === 'message_stop' and calls calculateUSDCost +
addToTotalSessionCost) so that cost is only recorded by the existing logic
inside the 'message_stop' case handler; ensure calculateUSDCost,
addToTotalSessionCost, usage and options.model are still used exactly once for
that event type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/formatBriefTimestamp.ts`:
- Around line 79-81: The day-boundary calculation in startOfDay() uses UTC
getters (getUTCFullYear/getUTCMonth/getUTCDate) while formatting uses local
timezone (toLocaleTimeString/toLocaleString), causing inconsistent day
classification vs display; update startOfDay() to use local date components
(getFullYear/getMonth/getDate) so boundaries match local formatting, or
alternatively make the display calls use UTC consistently (e.g., pass timeZone:
'UTC' to toLocaleString/toLocaleTimeString) if UTC-based boundaries are
intentional; if keeping UTC behavior, add a clear comment in startOfDay()
explaining the trade-off and test determinism.

---

Outside diff comments:
In `@src/services/api/openai/index.ts`:
- Around line 385-392: There is a duplicated cost-tracking block that runs when
event.type === 'message_stop' causing addToTotalSessionCost to be called twice;
remove the redundant block (the one that checks event.type === 'message_stop'
and calls calculateUSDCost + addToTotalSessionCost) so that cost is only
recorded by the existing logic inside the 'message_stop' case handler; ensure
calculateUSDCost, addToTotalSessionCost, usage and options.model are still used
exactly once for that event type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8febc2f7-9823-4844-9baa-a8e3065b9ad5

📥 Commits

Reviewing files that changed from the base of the PR and between e6affc7 and b23bead.

📒 Files selected for processing (6)
  • src/services/api/openai/__tests__/queryModelOpenAI.test.ts
  • src/services/api/openai/__tests__/streamAdapter.test.ts
  • src/services/api/openai/index.ts
  • src/services/api/openai/streamAdapter.ts
  • src/utils/__tests__/formatBriefTimestamp.test.ts
  • src/utils/formatBriefTimestamp.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/services/api/openai/__tests__/queryModelOpenAI.test.ts (1)

91-94: Use the events parameter directly to remove hidden global coupling

runQueryModel() accepts events but currently ignores it and depends on _nextEvents. This makes the helper easier to misuse and harder to reason about.

♻️ Suggested minimal fix
 async function runQueryModel(
   events: BetaRawMessageStreamEvent[],
   envOverrides: Record<string, string | undefined> = {},
 ) {
+  _nextEvents = events
   // Save + apply env overrides
   const saved: Record<string, string | undefined> = {}

Also applies to: 157-157

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/api/openai/__tests__/queryModelOpenAI.test.ts` around lines 91 -
94, The helper runQueryModel currently ignores its events parameter and reads
from the hidden global _nextEvents; change runQueryModel to use the passed-in
events argument directly (replace any references to _nextEvents inside
runQueryModel with the events parameter) and remove or stop relying on the
global _nextEvents; update any test calls that expected the global behavior to
pass the desired events into runQueryModel (also make the same change for the
other occurrence noted around the second instance) so the helper is pure and no
longer coupled to the global.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/services/api/openai/__tests__/queryModelOpenAI.test.ts`:
- Around line 91-94: The helper runQueryModel currently ignores its events
parameter and reads from the hidden global _nextEvents; change runQueryModel to
use the passed-in events argument directly (replace any references to
_nextEvents inside runQueryModel with the events parameter) and remove or stop
relying on the global _nextEvents; update any test calls that expected the
global behavior to pass the desired events into runQueryModel (also make the
same change for the other occurrence noted around the second instance) so the
helper is pure and no longer coupled to the global.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f02b070-cf4d-477a-9920-f1ccf93a85d5

📥 Commits

Reviewing files that changed from the base of the PR and between b23bead and e6865ab.

📒 Files selected for processing (4)
  • src/services/api/openai/__tests__/queryModelOpenAI.test.ts
  • src/services/api/openai/__tests__/streamAdapter.test.ts
  • src/services/api/openai/index.ts
  • src/utils/formatBriefTimestamp.ts
✅ Files skipped from review due to trivial changes (2)
  • src/utils/formatBriefTimestamp.ts
  • src/services/api/openai/tests/streamAdapter.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/services/api/openai/index.ts

…pter

Signed-off-by: guunergooner <tongchao0923@gmail.com>
@guunergooner guunergooner force-pushed the fix/openai-stop-reason-usage branch from e6865ab to 855ab08 Compare April 9, 2026 16:36
@guunergooner
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

Addressed all review comments:

  1. Duplicate cost tracking — removed the redundant addToTotalSessionCost block at the loop level; cost is now only tracked once inside the message_stop case handler
  2. UTC vs local timezone in startOfDay() — changed to use local date components (getFullYear/getMonth/getDate) to match local-timezone formatting
  3. Docstring coverage — added JSDoc to all functions missing docstrings across the PR's changed files, coverage now well above the 80% threshold
  4. Squashed into a single commit

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

@guunergooner Thanks for addressing all the previous review comments! I'll kick off a full review of the updated PR now, covering the fixes to duplicate cost tracking, the local-timezone startOfDay() correction, the new JSDoc coverage, and all the stream adapter / queryModelOpenAI changes.


🧠 Learnings used
Learnt from: CR
Repo: claude-code-best/claude-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T06:23:32.787Z
Learning: Applies to src/services/api/openai/**/*.{ts,tsx} : OpenAI compatibility layer (CLAUDE_CODE_USE_OPENAI env var): convert Anthropic request format to OpenAI format and adapt SSE stream responses in `src/services/api/openai/`
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/services/api/openai/index.ts (1)

216-252: ⚠️ Potential issue | 🔴 Critical

Forward maxTokens into the OpenAI request body.

The computed maxTokens (line 227) is never included in the request sent to OpenAI. The buildOpenAIRequestBody() function does not accept this parameter and does not serialize max_tokens, so OpenAI-compatible backends fall back to their server defaults. This makes the truncation-prevention fix non-functional, and the subsequent error messages (lines 375, 416) reference a limit that was never enforced.

🔧 Proposed fix
 export function buildOpenAIRequestBody(params: {
   model: string
   messages: any[]
   tools: any[]
   toolChoice: any
   enableThinking: boolean
+  maxTokens: number
   temperatureOverride?: number
 }): Record<string, any> {
-  const { model, messages, tools, toolChoice, enableThinking, temperatureOverride } = params
+  const {
+    model,
+    messages,
+    tools,
+    toolChoice,
+    enableThinking,
+    maxTokens,
+    temperatureOverride,
+  } = params
   return {
     model,
     messages,
+    max_tokens: maxTokens,
     ...(tools.length > 0 && {
       tools,
       ...(toolChoice && { tool_choice: toolChoice }),
@@
     const requestBody = buildOpenAIRequestBody({
       model: openaiModel,
       messages: openaiMessages,
       tools: openaiTools,
       toolChoice: openaiToolChoice,
       enableThinking,
+      maxTokens,
       temperatureOverride: options.temperatureOverride,
     })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/api/openai/index.ts` around lines 216 - 252, The computed
maxTokens from getModelMaxOutputTokens (and options.maxOutputTokensOverride) is
never passed into the OpenAI request; update buildOpenAIRequestBody to accept a
maxTokens (or max_output_tokens) parameter and serialize it as max_tokens in the
returned request body, then pass maxTokens when calling buildOpenAIRequestBody
before client.chat.completions.create; also update any related types/signatures
(and tests) that reference buildOpenAIRequestBody to accept the new parameter so
the request enforces the intended token limit.
🧹 Nitpick comments (1)
src/services/api/openai/index.ts (1)

328-420: Extract the final assistant emission into one helper.

The message_stop branch and the post-loop fallback now duplicate the same message assembly and max_output_tokens error path. This is easy to drift the next time usage or stop-reason handling changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/api/openai/index.ts` around lines 328 - 420, Duplicate logic for
assembling and emitting the final assistant message and the max_output_tokens
error should be extracted into a single helper (e.g., emitFinalAssistantMessage
or assembleFinalAssistantEmission) to avoid drift; implement a helper that
accepts partialMessage, contentBlocks, tools, options.agentId, usage,
stopReason, maxTokens, openaiModel and returns/yields the assembled
AssistantMessage (using normalizeContentFromAPI, randomUUID(), new
Date().toISOString()) and, when stopReason === 'max_tokens', the
createAssistantAPIErrorMessage info. Replace the duplicated blocks in the
'message_stop' case and the post-loop safety fallback with calls to this helper,
preserving the partialMessage = null reset and the cost tracking calls
(calculateUSDCost and addToTotalSessionCost) in the same places or moved into
the helper as appropriate. Ensure the helper maintains stop_sequence:null and
includes usage in the emitted message so tokenCountWithEstimation continues to
work.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/services/api/openai/index.ts`:
- Around line 216-252: The computed maxTokens from getModelMaxOutputTokens (and
options.maxOutputTokensOverride) is never passed into the OpenAI request; update
buildOpenAIRequestBody to accept a maxTokens (or max_output_tokens) parameter
and serialize it as max_tokens in the returned request body, then pass maxTokens
when calling buildOpenAIRequestBody before client.chat.completions.create; also
update any related types/signatures (and tests) that reference
buildOpenAIRequestBody to accept the new parameter so the request enforces the
intended token limit.

---

Nitpick comments:
In `@src/services/api/openai/index.ts`:
- Around line 328-420: Duplicate logic for assembling and emitting the final
assistant message and the max_output_tokens error should be extracted into a
single helper (e.g., emitFinalAssistantMessage or
assembleFinalAssistantEmission) to avoid drift; implement a helper that accepts
partialMessage, contentBlocks, tools, options.agentId, usage, stopReason,
maxTokens, openaiModel and returns/yields the assembled AssistantMessage (using
normalizeContentFromAPI, randomUUID(), new Date().toISOString()) and, when
stopReason === 'max_tokens', the createAssistantAPIErrorMessage info. Replace
the duplicated blocks in the 'message_stop' case and the post-loop safety
fallback with calls to this helper, preserving the partialMessage = null reset
and the cost tracking calls (calculateUSDCost and addToTotalSessionCost) in the
same places or moved into the helper as appropriate. Ensure the helper maintains
stop_sequence:null and includes usage in the emitted message so
tokenCountWithEstimation continues to work.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9a6a9b45-204c-425c-928c-bd37a898574e

📥 Commits

Reviewing files that changed from the base of the PR and between e6865ab and 855ab08.

📒 Files selected for processing (6)
  • src/services/api/openai/__tests__/queryModelOpenAI.test.ts
  • src/services/api/openai/__tests__/streamAdapter.test.ts
  • src/services/api/openai/index.ts
  • src/services/api/openai/streamAdapter.ts
  • src/utils/__tests__/formatBriefTimestamp.test.ts
  • src/utils/formatBriefTimestamp.ts
✅ Files skipped from review due to trivial changes (2)
  • src/utils/formatBriefTimestamp.ts
  • src/services/api/openai/tests/queryModelOpenAI.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/utils/tests/formatBriefTimestamp.test.ts
  • src/services/api/openai/streamAdapter.ts

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/services/api/openai/index.ts (1)

216-248: ⚠️ Potential issue | 🟠 Major

Thread maxTokens into the actual OpenAI request.

Line 226 computes the cap, but Lines 241-248 never pass it into buildOpenAIRequestBody(). The request still relies on the backend default, so compatible endpoints can keep truncating early while the later max_output_tokens message reports a limit that was never sent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/api/openai/index.ts` around lines 216 - 248, The computed
maxTokens (from getModelMaxOutputTokens/openaiModel and
options.maxOutputTokensOverride) is not being passed into the OpenAI request
builder; update the call to buildOpenAIRequestBody to include the resolved
maxTokens so the actual request uses that cap (e.g., add a max_tokens or
maxOutputTokens field to the buildOpenAIRequestBody invocation so requestBody
contains the intended limit).
🧹 Nitpick comments (1)
src/services/api/openai/__tests__/queryModelOpenAI.test.ts (1)

159-167: Capture chat.completions.create() arguments in the mock.

Right now this suite can't assert that the newly computed output-token cap is actually forwarded into the OpenAI request, so that omission still passes all tests. Recording the call payload here would give you a direct regression for that path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/api/openai/__tests__/queryModelOpenAI.test.ts` around lines 159
- 167, The mock for getOpenAIClient currently returns chat.completions.create as
a no-op async iterator; update the mock in queryModelOpenAI.test.ts to capture
the arguments passed into chat.completions.create (e.g., replace the anonymous
function with a jest.fn or a closure that records its call payload to a scoped
variable) and still return an async iterator-compatible object; ensure you
reference getOpenAIClient and chat.completions.create so tests can assert the
forwarded token cap and other request fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/services/api/openai/index.ts`:
- Around line 216-248: The computed maxTokens (from
getModelMaxOutputTokens/openaiModel and options.maxOutputTokensOverride) is not
being passed into the OpenAI request builder; update the call to
buildOpenAIRequestBody to include the resolved maxTokens so the actual request
uses that cap (e.g., add a max_tokens or maxOutputTokens field to the
buildOpenAIRequestBody invocation so requestBody contains the intended limit).

---

Nitpick comments:
In `@src/services/api/openai/__tests__/queryModelOpenAI.test.ts`:
- Around line 159-167: The mock for getOpenAIClient currently returns
chat.completions.create as a no-op async iterator; update the mock in
queryModelOpenAI.test.ts to capture the arguments passed into
chat.completions.create (e.g., replace the anonymous function with a jest.fn or
a closure that records its call payload to a scoped variable) and still return
an async iterator-compatible object; ensure you reference getOpenAIClient and
chat.completions.create so tests can assert the forwarded token cap and other
request fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 05632a66-a747-47e9-b20c-bea7ae3d3423

📥 Commits

Reviewing files that changed from the base of the PR and between 637531f and 855ab08.

📒 Files selected for processing (6)
  • src/services/api/openai/__tests__/queryModelOpenAI.test.ts
  • src/services/api/openai/__tests__/streamAdapter.test.ts
  • src/services/api/openai/index.ts
  • src/services/api/openai/streamAdapter.ts
  • src/utils/__tests__/formatBriefTimestamp.test.ts
  • src/utils/formatBriefTimestamp.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant