fix(agent): strip reasoning parts before forced "done" re-submission#2270
Conversation
|
| Name | Type |
|---|---|
| @browserbasehq/stagehand | Patch |
| @browserbasehq/stagehand-evals | Patch |
| @browserbasehq/stagehand-server-v3 | Patch |
Click here to learn what changesets are, and how to add one.
Click here if you're a maintainer who wants to add a changeset to this PR
There was a problem hiding this comment.
No issues found across 4 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant Agent as V3AgentHandler
participant Done as handleDoneToolCall
participant Strip as stripReasoningParts
participant SDK as AI SDK (generateText)
participant Provider as Model Provider
Note over Agent,Provider: Finalization flow when agent finishes all work (state.completed=false)
Agent->>Done: Call handleDoneToolCall with inputMessages (run history)
Note over Done: Forced "done" finalization re-submits history to model
Done->>Strip: Strip reasoning parts from inputMessages
Note over Strip: Filter out assistant content parts with type "reasoning"<br/>Preserve tool-call/tool-result pairing
alt Messages had reasoning parts
Strip-->>Done: Cleaned messages (reduced length, no reasoning parts)
else Messages had no reasoning parts
Strip-->>Done: Messages unchanged (pass-through)
end
Done->>SDK: generateText(model, systemPrompt, cleaned messages + userPrompt, doneTool)
alt SDK validation succeeds
SDK->>Provider: Send validated prompt
Provider-->>SDK: generateText response
SDK-->>Done: Done tool result (taskComplete, reasoning)
else SDK validation fails (e.g., missing text in reasoning part)
Note over SDK: "Invalid prompt: messages must be a ModelMessage[]"<br/>This error path is now blocked by prior stripping
end
Done-->>Agent: { taskComplete, reasoning, output }
alt taskComplete == true
Agent->>Agent: Mark state.completed = true
Agent-->>Agent: Return { messages, output }
else taskComplete == false
Agent->>Agent: Continue agent loop
end
Note over Agent,Provider: Previous behavior (removed in this PR)
Note over Agent: Removed try/catch fallback that warned and synthesized completion
Note over Agent: No longer masks the error - strip prevents it from occurring
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d7565a93ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (m.role !== "assistant" || typeof m.content === "string") return m; | ||
| return { | ||
| ...m, | ||
| content: m.content.filter((p) => p.type !== "reasoning"), |
There was a problem hiding this comment.
Preserve Anthropic thinking blocks for tool-use histories
When the main agent runs Claude 4.6+/Fable 5, buildAgentProviderOptions enables Anthropic adaptive/always-on thinking, so tool-use assistant turns can contain signed reasoning blocks. This sanitizer removes those blocks from every assistant message while leaving the tool-call/tool-result history intact; Anthropic requires thinking blocks to be resent unmodified during tool use (see their extended thinking docs), so the forced finalization request can be rejected and, with the surrounding catch removed, the completed agent run is reported as failed. Limit this stripping to the OpenAI/malformed reasoning case or preserve Anthropic signed reasoning blocks.
Useful? React with 👍 / 👎.
…alization
Root-cause fix for STG-2335. The forced "done" finalization re-submits the
accumulated run history to the model. When a custom tool returns an object
with an optional field left `undefined` (e.g. PermitFlow's captureField
returning `{ matchedExpected: undefined }`), that `undefined` lands inside a
tool-result `output.value`. The AI SDK's prompt validation rejects it — its
JSON-value schema disallows `undefined` — throwing "Invalid prompt: messages
must be a ModelMessage[]". That flipped a completed run to { success: false }
with a red error, even though every action had already succeeded.
Deep-strip `undefined` from the history before the finalization call, keeping
all real content. Class instances (URL, typed arrays, Date) are passed through
untouched so binary image data isn't corrupted.
Replaces PR #2269's best-effort try/catch (which masked the error and forced
completed=true); reverts that fallback in ensureDone and rewrites the test to
cover the undefined-tool-result re-submission path.
Reproduced end-to-end with openai/gpt-5.5 + a custom tool: fails on main,
succeeds on this branch.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
d7565a9 to
d30883b
Compare
22bc215
into
alyssamaruyama/stg-2335-escalation-permitflow-agent-with-custom-tool-has-error-on
why
Root-cause fix for STG-2335: a non-CUA
agent.execute()that uses a custom tool reports a successful run as{ success: false }, with a redInvalid prompt: messages must be a ModelMessage[]error logged after all the work already completed.After the main agent loop finishes,
ensureDone()runs a forced "done" finalization (handleDoneToolCall) that re-submits the accumulated run history into a freshgenerateTextcall. The main loop never re-validates accumulated tool results — but this re-submission does. When a custom tool returns an optional field leftundefined(e.g. PermitFlow'scaptureFieldreturning{ matchedExpected: undefined }when noexpectedTextis passed), thatundefinedlands inside a tool-resultoutput.value, and the AI SDK'sModelMessagevalidation (standardizePrompt) rejects it — its JSON-value schema disallowsundefined. Finalization throws, flipping the result to{ success: false }even though every action succeeded.PR #2269 patched the symptom (wrap finalization in
try/catchand forcestate.completed = true). This PR fixes the actual defect instead.Note
The original "reasoning traces" hypothesis was ruled out empirically — reasoning parts come back with a valid
text: ""and pass validation. Theundefinedtool-result field is the real trigger.what changed
sanitizeMessagesForResubmission()inhandleDoneToolCall.ts, which deep-stripsundefinedfrom the run history before the forced "done" call. Only plain objects/arrays are traversed, so class instances (URL, typed arrays for binary image data,Date, …) pass through untouched.completed=truefallback inv3AgentHandler.ts, so genuine finalization failures fail loudly again instead of being masked.agent-finalization-resilience.test.ts.test plan
Unit — 4 tests against the real
aiSDK: reproducesInvalidPromptErrorwith an undefined tool-result field → fixed bysanitizeMessagesForResubmission→ real content (reasoning / tool-call / text) preserved → class instances untouched. All pass.End-to-end —
openai/gpt-5.5+ custom tool, mirrors PermitFlow script 038:mainsuccess=false, redInvalid prompterrorsuccess=true,completed=true, no error