Skip to content

fix(core): infer Google CUA screenshot MIME type from data URLs#2048

Open
BABTUNA wants to merge 1 commit intobrowserbase:mainfrom
BABTUNA:fix-google-cua-mime-inference
Open

fix(core): infer Google CUA screenshot MIME type from data URLs#2048
BABTUNA wants to merge 1 commit intobrowserbase:mainfrom
BABTUNA:fix-google-cua-mime-inference

Conversation

@BABTUNA
Copy link
Copy Markdown
Contributor

@BABTUNA BABTUNA commented Apr 24, 2026

why

Google CUA function responses currently hardcode image/png and only strip a PNG data URL prefix. If the screenshot source is JPEG (or any non-PNG image data URL), metadata and payload can drift.

Closes #2046.

what changed

  • Added screenshot data URL parsing in GoogleCUAClient to extract:
    • actual MIME type (image/jpeg, image/png, etc.)
    • base64 payload without data URL prefix
  • Replaced hardcoded inlineData.mimeType: "image/png" in function responses with parsed MIME
  • Kept compatibility fallback behavior:
    • raw/non-image inputs fall back to image/png
    • raw base64 screenshot inputs are still normalized to PNG data URLs

tests

Added packages/core/tests/unit/google-cua-client.test.ts covering:

  • data URL passthrough in captureScreenshot
  • PNG fallback for raw base64 input
  • MIME extraction (image/jpg normalized to image/jpeg)
  • fallback parsing for non-image data URLs

Validation run:

  • npm.cmd exec prettier -- --write packages/core/lib/v3/agent/GoogleCUAClient.ts packages/core/tests/unit/google-cua-client.test.ts
  • node node_modules/vitest/vitest.mjs run --config .tmp-vitest-unit-config.mjs (targeting google-cua-client.test.ts and safety-confirmation.test.ts)

Summary by cubic

Google CUA now parses screenshot data URLs to preserve the real image MIME (jpeg, png, etc.) and send matching inline data. This prevents payload/metadata drift for non-PNG screenshots (closes #2046).

  • Bug Fixes
    • Parse screenshot data URLs to extract MIME and base64; normalize image/jpg to image/jpeg.
    • Use the parsed MIME in function responses; keep PNG as fallback for raw/non-image inputs; preserve/normalize data URLs in screenshot capture.

Written for commit bea0c24. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 24, 2026

⚠️ No Changeset found

Latest commit: bea0c24

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

@github-actions github-actions Bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Confidence score: 3/5

  • There is a concrete medium-severity risk in packages/core/lib/v3/agent/GoogleCUAClient.ts: overly strict data URL parsing can corrupt valid parameterized base64 URLs and lead to malformed screenshot payloads.
  • Given the 6/10 severity with high confidence (8/10), this looks user-impacting enough to warrant caution before merging, even though the scope appears limited to one client path.
  • Pay close attention to packages/core/lib/v3/agent/GoogleCUAClient.ts - ensure data URL handling preserves valid parameterized base64 inputs without altering payload integrity.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/agent/GoogleCUAClient.ts">

<violation number="1" location="packages/core/lib/v3/agent/GoogleCUAClient.ts:982">
P2: Data URL parsing is too strict and can corrupt valid parameterized base64 data URLs, producing malformed screenshot payloads.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant App as Agent/Orchestrator
    participant CUA as GoogleCUAClient
    participant SP as ScreenshotProvider
    participant Gemini as Gemini API

    Note over App, Gemini: Google Computer Use (CUA) Flow

    App->>CUA: handleToolCall()
    
    rect rgb(240, 240, 240)
        Note right of CUA: Screenshot Capture Phase
        CUA->>CUA: captureScreenshot()
        CUA->>SP: call provider()
        SP-->>CUA: return image data (base64 or data URL)
        
        CUA->>CUA: CHANGED: normalizeScreenshotDataUrl()
        Note right of CUA: If raw base64, wrap in PNG data URL.<br/>If existing data URL, preserve it.
    end

    rect rgb(240, 240, 240)
        Note right of CUA: Data Preparation Phase
        CUA->>CUA: NEW: parseScreenshotDataUrl()
        
        alt Data URL contains image MIME
            CUA->>CUA: Extract mimeType and base64 payload
            opt mimeType is image/jpg
                CUA->>CUA: CHANGED: Normalize to image/jpeg
            end
        else Raw or non-image data URL
            CUA->>CUA: Fallback to image/png
        end
    end

    CUA->>Gemini: POST generateContent (FunctionResponse)
    Note right of CUA: NEW: inlineData.mimeType now reflects<br/>the actual source image type
    Gemini-->>CUA: Model Response
    CUA-->>App: Result
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.


private normalizeScreenshotDataUrl(imageData: string): string {
const trimmedImageData = imageData.trim();
if (/^data:[^;]+;base64,/i.test(trimmedImageData)) {
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Data URL parsing is too strict and can corrupt valid parameterized base64 data URLs, producing malformed screenshot payloads.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/GoogleCUAClient.ts, line 982:

<comment>Data URL parsing is too strict and can corrupt valid parameterized base64 data URLs, producing malformed screenshot payloads.</comment>

<file context>
@@ -979,6 +977,46 @@ export class GoogleCUAClient extends AgentClient {
 
+  private normalizeScreenshotDataUrl(imageData: string): string {
+    const trimmedImageData = imageData.trim();
+    if (/^data:[^;]+;base64,/i.test(trimmedImageData)) {
+      return trimmedImageData;
+    }
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. external-contributor Tracks PRs mirrored from external contributor forks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

core(cua): Google function-response image handling hardcodes PNG mimeType

1 participant