fix(core): infer Google CUA screenshot MIME type from data URLs#2048
fix(core): infer Google CUA screenshot MIME type from data URLs#2048BABTUNA wants to merge 1 commit intobrowserbase:mainfrom
Conversation
|
|
This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run. |
There was a problem hiding this comment.
1 issue found across 2 files
Confidence score: 3/5
- There is a concrete medium-severity risk in
packages/core/lib/v3/agent/GoogleCUAClient.ts: overly strict data URL parsing can corrupt valid parameterized base64 URLs and lead to malformed screenshot payloads. - Given the 6/10 severity with high confidence (8/10), this looks user-impacting enough to warrant caution before merging, even though the scope appears limited to one client path.
- Pay close attention to
packages/core/lib/v3/agent/GoogleCUAClient.ts- ensure data URL handling preserves valid parameterized base64 inputs without altering payload integrity.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/agent/GoogleCUAClient.ts">
<violation number="1" location="packages/core/lib/v3/agent/GoogleCUAClient.ts:982">
P2: Data URL parsing is too strict and can corrupt valid parameterized base64 data URLs, producing malformed screenshot payloads.</violation>
</file>
Architecture diagram
sequenceDiagram
participant App as Agent/Orchestrator
participant CUA as GoogleCUAClient
participant SP as ScreenshotProvider
participant Gemini as Gemini API
Note over App, Gemini: Google Computer Use (CUA) Flow
App->>CUA: handleToolCall()
rect rgb(240, 240, 240)
Note right of CUA: Screenshot Capture Phase
CUA->>CUA: captureScreenshot()
CUA->>SP: call provider()
SP-->>CUA: return image data (base64 or data URL)
CUA->>CUA: CHANGED: normalizeScreenshotDataUrl()
Note right of CUA: If raw base64, wrap in PNG data URL.<br/>If existing data URL, preserve it.
end
rect rgb(240, 240, 240)
Note right of CUA: Data Preparation Phase
CUA->>CUA: NEW: parseScreenshotDataUrl()
alt Data URL contains image MIME
CUA->>CUA: Extract mimeType and base64 payload
opt mimeType is image/jpg
CUA->>CUA: CHANGED: Normalize to image/jpeg
end
else Raw or non-image data URL
CUA->>CUA: Fallback to image/png
end
end
CUA->>Gemini: POST generateContent (FunctionResponse)
Note right of CUA: NEW: inlineData.mimeType now reflects<br/>the actual source image type
Gemini-->>CUA: Model Response
CUA-->>App: Result
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
|
|
||
| private normalizeScreenshotDataUrl(imageData: string): string { | ||
| const trimmedImageData = imageData.trim(); | ||
| if (/^data:[^;]+;base64,/i.test(trimmedImageData)) { |
There was a problem hiding this comment.
P2: Data URL parsing is too strict and can corrupt valid parameterized base64 data URLs, producing malformed screenshot payloads.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/GoogleCUAClient.ts, line 982:
<comment>Data URL parsing is too strict and can corrupt valid parameterized base64 data URLs, producing malformed screenshot payloads.</comment>
<file context>
@@ -979,6 +977,46 @@ export class GoogleCUAClient extends AgentClient {
+ private normalizeScreenshotDataUrl(imageData: string): string {
+ const trimmedImageData = imageData.trim();
+ if (/^data:[^;]+;base64,/i.test(trimmedImageData)) {
+ return trimmedImageData;
+ }
</file context>
why
Google CUA function responses currently hardcode
image/pngand only strip a PNG data URL prefix. If the screenshot source is JPEG (or any non-PNG image data URL), metadata and payload can drift.Closes #2046.
what changed
GoogleCUAClientto extract:image/jpeg,image/png, etc.)inlineData.mimeType: "image/png"in function responses with parsed MIMEimage/pngtests
Added
packages/core/tests/unit/google-cua-client.test.tscovering:captureScreenshotimage/jpgnormalized toimage/jpeg)Validation run:
npm.cmd exec prettier -- --write packages/core/lib/v3/agent/GoogleCUAClient.ts packages/core/tests/unit/google-cua-client.test.tsnode node_modules/vitest/vitest.mjs run --config .tmp-vitest-unit-config.mjs(targetinggoogle-cua-client.test.tsandsafety-confirmation.test.ts)Summary by cubic
Google CUA now parses screenshot data URLs to preserve the real image MIME (jpeg, png, etc.) and send matching inline data. This prevents payload/metadata drift for non-PNG screenshots (closes #2046).
image/jpgtoimage/jpeg.Written for commit bea0c24. Summary will update on new commits. Review in cubic