fix(core): add CUA addContextNote parity across providers#2038
fix(core): add CUA addContextNote parity across providers#2038BABTUNA wants to merge 2 commits intobrowserbase:mainfrom
Conversation
Implement pending context-note queue + next-turn injection for Google, Anthropic, and Microsoft CUA clients to match OpenAI behavior. Notes are drained after one turn to preserve one-shot semantics. Adds targeted unit coverage for all three clients. Refs browserbase#2037.
|
|
This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run. |
There was a problem hiding this comment.
2 issues found across 4 files
Confidence score: 3/5
- There is a concrete regression risk in context handling: both clients can retain terminal-step notes and apply them in later executions, which may cause one-turn delays and stale context behavior.
- Given the medium severity (6/10) and high confidence (8-9/10), this is more than a cosmetic issue and can be user-facing in multi-step or repeated agent runs.
- Pay close attention to
packages/core/lib/v3/agent/AnthropicCUAClient.ts,packages/core/lib/v3/agent/GoogleCUAClient.ts- note-draining logic currently skips completed steps, allowing cross-execution note carryover.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/agent/AnthropicCUAClient.ts">
<violation number="1" location="packages/core/lib/v3/agent/AnthropicCUAClient.ts:183">
P2: Context notes are drained after model execution and only on non-completed steps, causing one-turn delay and possible stale-note carryover into later executions.</violation>
</file>
<file name="packages/core/lib/v3/agent/GoogleCUAClient.ts">
<violation number="1" location="packages/core/lib/v3/agent/GoogleCUAClient.ts:273">
P2: Context notes may leak across separate executions because queued notes are only drained when `!completed`, leaving terminal-step notes to persist into later runs.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/tests/unit/cua-context-note-parity.test.ts">
<violation number="1" location="packages/core/tests/unit/cua-context-note-parity.test.ts:117">
P2: The new cross-run parity tests assert on the second step of the second execution instead of the first, so they can miss stale context-note leakage into the new run.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| @@ -0,0 +1,374 @@ | |||
| import { describe, expect, it, vi } from "vitest"; | |||
There was a problem hiding this comment.
P2: The new cross-run parity tests assert on the second step of the second execution instead of the first, so they can miss stale context-note leakage into the new run.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/tests/unit/cua-context-note-parity.test.ts, line 117:
<comment>The new cross-run parity tests assert on the second step of the second execution instead of the first, so they can miss stale context-note leakage into the new run.</comment>
<file context>
@@ -69,6 +69,62 @@ describe("CUA context note parity", () => {
+ logger: noopLogger,
+ });
+
+ const secondRunStep2Input = executeStepSpy.mock.calls[2]?.[0] as Array<{
+ role?: string;
+ content?: string;
</file context>
There was a problem hiding this comment.
Mmmm. These clients inject notes after a step, so leak would appear on run-2 step-2, which is what the test checks
|
@cubic-dev-ai re-run review |
@BABTUNA I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
2 issues found across 4 files
Confidence score: 3/5
- There is a concrete regression risk in
packages/core/lib/v3/agent/AnthropicCUAClient.ts: pending context notes are not cleared on error paths, which can leak notes across executions and affect subsequent runs. - The new public
addContextNoteinterface inpackages/core/lib/v3/agent/AnthropicCUAClient.tsappears to be missing requiredflowLoggerinstrumentation, which can reduce traceability and make debugging/monitoring harder. - Given two medium-severity, high-confidence findings in core agent behavior, this carries some merge risk and is worth fixing before relying on it in production flows.
- Pay close attention to
packages/core/lib/v3/agent/AnthropicCUAClient.ts- error-path state cleanup and flowLogger instrumentation for the new public method.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/agent/AnthropicCUAClient.ts">
<violation number="1" location="packages/core/lib/v3/agent/AnthropicCUAClient.ts:117">
P2: Custom agent: **Ensure all public methods added to the stagehand class, agent, or understudy (page, locator, etc.) interfaces are properly instrumented with the flowLogger**
New public `addContextNote` interface mutates agent state and affects subsequent LLM input, but it is untracked by flowLogger.</violation>
<violation number="2" location="packages/core/lib/v3/agent/AnthropicCUAClient.ts:180">
P2: Context notes can leak across executions because pending notes are not cleared on error paths.</violation>
</file>
Architecture diagram
sequenceDiagram
participant H as V3CuaAgentHandler
participant C as CUA Client (Anthropic/Google/MS)
participant Q as pendingContextNotes (Queue)
participant Hist as Provider History/Input
participant API as Model API (Claude/Gemini/Fara)
Note over H, API: Runtime Captcha/Context Injection Flow
H->>C: NEW: addContextNote(note)
C->>Q: Store note in queue
H->>C: execute(task)
loop Until Task Completed
C->>API: getAction(currentInput)
API-->>C: stepResult (completed: false)
C->>C: internal: drainContextNotes()
C->>Q: NEW: Get and clear all notes
Q-->>C: notes[]
alt NEW: notes exist AND not completed
C->>Hist: CHANGED: Map notes to "user" messages
Note right of Hist: Anthropic: nextInputItems<br/>Google: history<br/>Microsoft: conversationHistory
end
C->>C: Update currentInput for next turn
end
Note over C, Q: Terminal step or execution end
C->>Q: NEW: drainContextNotes() (Prevent leaks to next run)
C-->>H: finalResult
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| // Update completion status | ||
| completed = result.completed; | ||
|
|
||
| const contextNotes = this.drainContextNotes(); |
There was a problem hiding this comment.
P2: Context notes can leak across executions because pending notes are not cleared on error paths.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/AnthropicCUAClient.ts, line 180:
<comment>Context notes can leak across executions because pending notes are not cleared on error paths.</comment>
<file context>
@@ -172,9 +177,20 @@ export class AnthropicCUAClient extends AgentClient {
// Update completion status
completed = result.completed;
+ const contextNotes = this.drainContextNotes();
+
// Update the input items for the next step if we're continuing
</file context>
| this.tools = tools; | ||
| } | ||
|
|
||
| addContextNote(note: string): void { |
There was a problem hiding this comment.
P2: Custom agent: Ensure all public methods added to the stagehand class, agent, or understudy (page, locator, etc.) interfaces are properly instrumented with the flowLogger
New public addContextNote interface mutates agent state and affects subsequent LLM input, but it is untracked by flowLogger.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/AnthropicCUAClient.ts, line 117:
<comment>New public `addContextNote` interface mutates agent state and affects subsequent LLM input, but it is untracked by flowLogger.</comment>
<file context>
@@ -113,6 +114,10 @@ export class AnthropicCUAClient extends AgentClient {
this.tools = tools;
}
+ addContextNote(note: string): void {
+ this.pendingContextNotes.push(note);
+ }
</file context>
why
V3CuaAgentHandlerinjects runtime captcha guidance viaaddContextNote(...), but onlyOpenAICUAClientconsumed those notes.That meant Google/Anthropic/Microsoft CUA clients silently dropped this guidance, causing behavior drift in captcha-related flows.
Closes #2037.
what changed
GoogleCUAClientAnthropicCUAClientMicrosoftCUAClientaddContextNote(note)override in each client.drainContextNotes) in each client.historynextInputItemsconversationHistorytests
Added
packages/core/tests/unit/cua-context-note-parity.test.tscovering:Regression check run:
tests/unit/openai-cua-client.test.tsvalidation run
npm.cmd exec prettier -- --write packages/core/lib/v3/agent/GoogleCUAClient.ts packages/core/lib/v3/agent/AnthropicCUAClient.ts packages/core/lib/v3/agent/MicrosoftCUAClient.ts packages/core/tests/unit/cua-context-note-parity.test.tsnode node_modules/vitest/vitest.mjs run --config .tmp-vitest-unit-config.mjsfrompackages/core(temporary local config targeting:tests/unit/cua-context-note-parity.test.tstests/unit/openai-cua-client.test.ts)Summary by cubic
Adds parity for
addContextNoteacross CUA providers so runtime captcha guidance is applied consistently; notes are injected once on the next turn and drained on terminal steps to avoid leaks. Fixes #2037.addContextNote(...)inGoogleCUAClient,AnthropicCUAClient, andMicrosoftCUAClient.history, Anthropic →nextInputItems, Microsoft →conversationHistory) and drain on terminal steps to prevent carry-over.Written for commit 49e832d. Summary will update on new commits. Review in cubic