fix: Route visual grounding fallback through a dedicated feature and prompt by droid-ash · Pull Request #87 · final-run/finalrun-agent

droid-ash · 2026-04-14T00:33:26Z

Summary

When the grounder returns needsVisualGrounding: true, the fallback was re-calling FEATURE_GROUNDER with the hierarchy stripped. That prompt is built around matching ui_elements by index, so the call reliably fails on canvas/WebView targets — the tap never happens, the screen doesn't change, and the planner correctly re-plans the same act. Result: a 110-iteration loop on Visual grounding failed.

This PR introduces a dedicated FEATURE_VISUAL_GROUNDER with a simpler coordinate-only prompt. VisualGrounder now calls the new feature instead of reusing the primary grounder, so the fallback gets a prompt designed to return {x, y, reason} from a screenshot + description.

Changes

packages/common/src/constants.ts — add FEATURE_VISUAL_GROUNDER = 'visual-grounder'.
packages/goal-executor/src/prompts/visual-grounder.md — new minimal prompt: act + screenshot in, {x, y, reason} or {isError, reason} out.
packages/goal-executor/src/ai/AIAgent.ts — import the new constant; route it to the new prompt in _getPromptKeyForFeature.
packages/goal-executor/src/ai/VisualGrounder.ts — switch feature from FEATURE_GROUNDER to FEATURE_VISUAL_GROUNDER; drop the stale "no hierarchy" trick comment.
packages/goal-executor/src/ActionExecutor.test.ts — existing visual-fallback test now asserts the feature name on both calls (FEATURE_GROUNDER for primary, FEATURE_VISUAL_GROUNDER for fallback).

Test plan

npm test --workspace=packages/goal-executor — the three visual-grounding tests pass (records visual fallback in trace, preserves ground failure when no fallback, surfaces terminal provider failures from fallback). Three unrelated Gemini-thinking-level tests fail on main too — not touched here.
End-to-end: pick a test targeting a canvas/WebView element missing from the hierarchy; confirm one iteration succeeds instead of looping 110 times. Logs should show action.ground (feature=grounder) followed by action.visual_fallback (feature=visual-grounder).
Error path: mock the visual grounder to return {isError: true}; confirm the step fails once with a clear reason instead of spinning.

Follow-up

Backend model-config should get an entry for the visual-grounder feature so it's independently tunable. Until then, unknown features fall back to the default model config on the backend, which still works but isn't independently configurable.

Summary by CodeRabbit

Release Notes

New Features
- Added visual grounder capability to identify and locate UI elements within screenshots with precise pixel coordinates for improved automation accuracy.

The fallback was re-calling FEATURE_GROUNDER with the hierarchy stripped, which re-uses a prompt built around matching ui_elements. When an element is visible in the screenshot but missing from the hierarchy, that call reliably fails, the tap never happens, and the planner correctly re-plans the same act -- producing a 110-iteration loop on Visual grounding failed. Introduce FEATURE_VISUAL_GROUNDER with a simpler coordinate-only prompt (act + screenshot -> {x, y, reason}). VisualGrounder now calls the new feature instead of reusing the primary grounder.

coderabbitai · 2026-04-14T00:33:33Z

📝 Walkthrough

Walkthrough

Introduces a new visual grounder feature constant and wires it through the goal executor system. The constant FEATURE_VISUAL_GROUNDER is added, imported across test and AI agent components, mapped to appropriate prompt keys, and utilized in the visual fallback grounding workflow. A new visual coordinate locator prompt is added.

Changes

Cohort / File(s)	Summary
Feature Constant Definition `packages/common/src/constants.ts`	Added new exported constant `FEATURE_VISUAL_GROUNDER` with value `'visual-grounder'`.
AIAgent Integration `packages/goal-executor/src/ai/AIAgent.ts`	Imported `FEATURE_VISUAL_GROUNDER` and extended `_getPromptKeyForFeature()` and `ground()` methods to handle the new feature constant with appropriate prompt key mapping.
VisualGrounder Implementation `packages/goal-executor/src/ai/VisualGrounder.ts`	Updated fallback grounding call to pass `FEATURE_VISUAL_GROUNDER` instead of `FEATURE_GROUNDER` to `AIAgent.ground()`. Removed inline comments about hierarchy behavior.
Test Assertions `packages/goal-executor/src/ActionExecutor.test.ts`	Added imports and explicit assertions verifying that visual fallback paths use `FEATURE_VISUAL_GROUNDER` while initial grounding uses `FEATURE_GROUNDER`.
Visual Grounder Prompt `packages/goal-executor/src/prompts/visual-grounder.md`	New prompt file defining instructions for visual coordinate locator that identifies UI elements in screenshots and returns pixel-center coordinates or error responses.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

bug

Suggested reviewers

arnoldlaishram

Poem

🐰 A grounder for visuals hops into the light,
With coordinates precise and screenshots bright,
Through prompt files and tests, the feature takes flight,
Constants and mappings aligned just right,
The rabbit declares: "Vision's working tonight!" ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: introducing a dedicated feature and prompt to handle visual grounding fallbacks, which is the core problem-solution pair in this PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch worktree-wiggly-percolating-willow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

droid-ash · 2026-04-14T02:44:50Z

@coderabbitai review

coderabbitai · 2026-04-14T02:45:01Z

@droid-ash: Sure, I'll review the changes in this PR right away!

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

🧹 Nitpick comments (1)

packages/goal-executor/src/prompts/visual-grounder.md (1)

20-22: Unify the output contract wording to a single JSON shape.

Line 20–22 describes top-level {x, y, reason} / {isError: true, reason}, while Line 29–37 requires {"output": {...}}. Keeping one canonical shape will reduce model drift.

Proposed prompt wording adjustment

-- If the element is clearly visible: return `{x, y, reason}` with the center coordinates.
-- If the element is not visible at all: return `{isError: true, reason}`. Do not guess coordinates.
+- If the element is clearly visible: return `{"output":{"x":<int>,"y":<int>,"reason":"..."}}`.
+- If the element is not visible at all: return `{"output":{"isError":true,"reason":"..."}}`. Do not guess coordinates.

Also applies to: 29-37

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/goal-executor/src/prompts/visual-grounder.md` around lines 20 - 22,
Unify the response contract in visual-grounder.md so the prompt always expects a
single JSON top-level shape: return {"output": {...}}; inside "output" include
either {x, y, reason} for a visible element or {isError: true, reason} when not
visible or uncertain. Update all guidance sentences (the earlier lines that
mention `{x, y, reason}` / `{isError: true, reason}` and the later lines that
reference `{"output": {...}}`) to use this canonical shape and wording, and
ensure examples and instructions consistently reference the "output" wrapper and
the two allowed inner objects.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/goal-executor/src/prompts/visual-grounder.md`:
- Around line 20-22: Unify the response contract in visual-grounder.md so the
prompt always expects a single JSON top-level shape: return {"output": {...}};
inside "output" include either {x, y, reason} for a visible element or {isError:
true, reason} when not visible or uncertain. Update all guidance sentences (the
earlier lines that mention `{x, y, reason}` / `{isError: true, reason}` and the
later lines that reference `{"output": {...}}`) to use this canonical shape and
wording, and ensure examples and instructions consistently reference the
"output" wrapper and the two allowed inner objects.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 77479e21-176c-460d-ac38-f7e38af41b59

📥 Commits

Reviewing files that changed from the base of the PR and between c45216c and 2f96873.

📒 Files selected for processing (5)

packages/common/src/constants.ts
packages/goal-executor/src/ActionExecutor.test.ts
packages/goal-executor/src/ai/AIAgent.ts
packages/goal-executor/src/ai/VisualGrounder.ts
packages/goal-executor/src/prompts/visual-grounder.md

droid-ash requested a review from arnoldlaishram April 14, 2026 02:44

droid-ash self-assigned this Apr 14, 2026

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

droid-ash marked this pull request as ready for review April 14, 2026 20:42

droid-ash added the bug Something isn't working label Apr 14, 2026

arnoldlaishram approved these changes Apr 15, 2026

View reviewed changes

arnoldlaishram merged commit 606b19f into main Apr 15, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Route visual grounding fallback through a dedicated feature and prompt#87

fix: Route visual grounding fallback through a dedicated feature and prompt#87
arnoldlaishram merged 1 commit intomainfrom
worktree-wiggly-percolating-willow

droid-ash commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

droid-ash commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

droid-ash commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Follow-up

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

droid-ash commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

droid-ash commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 14, 2026 •

edited

Loading