Skip to content

fix: Route visual grounding fallback through a dedicated feature and prompt#87

Merged
arnoldlaishram merged 1 commit intomainfrom
worktree-wiggly-percolating-willow
Apr 15, 2026
Merged

fix: Route visual grounding fallback through a dedicated feature and prompt#87
arnoldlaishram merged 1 commit intomainfrom
worktree-wiggly-percolating-willow

Conversation

@droid-ash
Copy link
Copy Markdown
Contributor

@droid-ash droid-ash commented Apr 14, 2026

Summary

When the grounder returns needsVisualGrounding: true, the fallback was re-calling FEATURE_GROUNDER with the hierarchy stripped. That prompt is built around matching ui_elements by index, so the call reliably fails on canvas/WebView targets — the tap never happens, the screen doesn't change, and the planner correctly re-plans the same act. Result: a 110-iteration loop on Visual grounding failed.

This PR introduces a dedicated FEATURE_VISUAL_GROUNDER with a simpler coordinate-only prompt. VisualGrounder now calls the new feature instead of reusing the primary grounder, so the fallback gets a prompt designed to return {x, y, reason} from a screenshot + description.

Changes

  • packages/common/src/constants.ts — add FEATURE_VISUAL_GROUNDER = 'visual-grounder'.
  • packages/goal-executor/src/prompts/visual-grounder.md — new minimal prompt: act + screenshot in, {x, y, reason} or {isError, reason} out.
  • packages/goal-executor/src/ai/AIAgent.ts — import the new constant; route it to the new prompt in _getPromptKeyForFeature.
  • packages/goal-executor/src/ai/VisualGrounder.ts — switch feature from FEATURE_GROUNDER to FEATURE_VISUAL_GROUNDER; drop the stale "no hierarchy" trick comment.
  • packages/goal-executor/src/ActionExecutor.test.ts — existing visual-fallback test now asserts the feature name on both calls (FEATURE_GROUNDER for primary, FEATURE_VISUAL_GROUNDER for fallback).

Test plan

  • npm test --workspace=packages/goal-executor — the three visual-grounding tests pass (records visual fallback in trace, preserves ground failure when no fallback, surfaces terminal provider failures from fallback). Three unrelated Gemini-thinking-level tests fail on main too — not touched here.
  • End-to-end: pick a test targeting a canvas/WebView element missing from the hierarchy; confirm one iteration succeeds instead of looping 110 times. Logs should show action.ground (feature=grounder) followed by action.visual_fallback (feature=visual-grounder).
  • Error path: mock the visual grounder to return {isError: true}; confirm the step fails once with a clear reason instead of spinning.

Follow-up

Backend model-config should get an entry for the visual-grounder feature so it's independently tunable. Until then, unknown features fall back to the default model config on the backend, which still works but isn't independently configurable.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added visual grounder capability to identify and locate UI elements within screenshots with precise pixel coordinates for improved automation accuracy.

The fallback was re-calling FEATURE_GROUNDER with the hierarchy stripped,
which re-uses a prompt built around matching ui_elements. When an element
is visible in the screenshot but missing from the hierarchy, that call
reliably fails, the tap never happens, and the planner correctly re-plans
the same act -- producing a 110-iteration loop on Visual grounding failed.

Introduce FEATURE_VISUAL_GROUNDER with a simpler coordinate-only prompt
(act + screenshot -> {x, y, reason}). VisualGrounder now calls the new
feature instead of reusing the primary grounder.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Introduces a new visual grounder feature constant and wires it through the goal executor system. The constant FEATURE_VISUAL_GROUNDER is added, imported across test and AI agent components, mapped to appropriate prompt keys, and utilized in the visual fallback grounding workflow. A new visual coordinate locator prompt is added.

Changes

Cohort / File(s) Summary
Feature Constant Definition
packages/common/src/constants.ts
Added new exported constant FEATURE_VISUAL_GROUNDER with value 'visual-grounder'.
AIAgent Integration
packages/goal-executor/src/ai/AIAgent.ts
Imported FEATURE_VISUAL_GROUNDER and extended _getPromptKeyForFeature() and ground() methods to handle the new feature constant with appropriate prompt key mapping.
VisualGrounder Implementation
packages/goal-executor/src/ai/VisualGrounder.ts
Updated fallback grounding call to pass FEATURE_VISUAL_GROUNDER instead of FEATURE_GROUNDER to AIAgent.ground(). Removed inline comments about hierarchy behavior.
Test Assertions
packages/goal-executor/src/ActionExecutor.test.ts
Added imports and explicit assertions verifying that visual fallback paths use FEATURE_VISUAL_GROUNDER while initial grounding uses FEATURE_GROUNDER.
Visual Grounder Prompt
packages/goal-executor/src/prompts/visual-grounder.md
New prompt file defining instructions for visual coordinate locator that identifies UI elements in screenshots and returns pixel-center coordinates or error responses.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

bug

Suggested reviewers

  • arnoldlaishram

Poem

🐰 A grounder for visuals hops into the light,
With coordinates precise and screenshots bright,
Through prompt files and tests, the feature takes flight,
Constants and mappings aligned just right,
The rabbit declares: "Vision's working tonight!" ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing a dedicated feature and prompt to handle visual grounding fallbacks, which is the core problem-solution pair in this PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-wiggly-percolating-willow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@droid-ash droid-ash self-assigned this Apr 14, 2026
@droid-ash
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

@droid-ash: Sure, I'll review the changes in this PR right away!

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/goal-executor/src/prompts/visual-grounder.md (1)

20-22: Unify the output contract wording to a single JSON shape.

Line 20–22 describes top-level {x, y, reason} / {isError: true, reason}, while Line 29–37 requires {"output": {...}}. Keeping one canonical shape will reduce model drift.

Proposed prompt wording adjustment
-- If the element is clearly visible: return `{x, y, reason}` with the center coordinates.
-- If the element is not visible at all: return `{isError: true, reason}`. Do not guess coordinates.
+- If the element is clearly visible: return `{"output":{"x":<int>,"y":<int>,"reason":"..."}}`.
+- If the element is not visible at all: return `{"output":{"isError":true,"reason":"..."}}`. Do not guess coordinates.

Also applies to: 29-37

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/goal-executor/src/prompts/visual-grounder.md` around lines 20 - 22,
Unify the response contract in visual-grounder.md so the prompt always expects a
single JSON top-level shape: return {"output": {...}}; inside "output" include
either {x, y, reason} for a visible element or {isError: true, reason} when not
visible or uncertain. Update all guidance sentences (the earlier lines that
mention `{x, y, reason}` / `{isError: true, reason}` and the later lines that
reference `{"output": {...}}`) to use this canonical shape and wording, and
ensure examples and instructions consistently reference the "output" wrapper and
the two allowed inner objects.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/goal-executor/src/prompts/visual-grounder.md`:
- Around line 20-22: Unify the response contract in visual-grounder.md so the
prompt always expects a single JSON top-level shape: return {"output": {...}};
inside "output" include either {x, y, reason} for a visible element or {isError:
true, reason} when not visible or uncertain. Update all guidance sentences (the
earlier lines that mention `{x, y, reason}` / `{isError: true, reason}` and the
later lines that reference `{"output": {...}}`) to use this canonical shape and
wording, and ensure examples and instructions consistently reference the
"output" wrapper and the two allowed inner objects.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 77479e21-176c-460d-ac38-f7e38af41b59

📥 Commits

Reviewing files that changed from the base of the PR and between c45216c and 2f96873.

📒 Files selected for processing (5)
  • packages/common/src/constants.ts
  • packages/goal-executor/src/ActionExecutor.test.ts
  • packages/goal-executor/src/ai/AIAgent.ts
  • packages/goal-executor/src/ai/VisualGrounder.ts
  • packages/goal-executor/src/prompts/visual-grounder.md

@droid-ash droid-ash marked this pull request as ready for review April 14, 2026 20:42
@droid-ash droid-ash added the bug Something isn't working label Apr 14, 2026
@arnoldlaishram arnoldlaishram merged commit 606b19f into main Apr 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants