ci(ui-preview-smoke): wire MCP via .mcp.json and post via workflow steps#2242
ci(ui-preview-smoke): wire MCP via .mcp.json and post via workflow steps#2242teeohhem wants to merge 3 commits into
Conversation
The first PR run hit two real bugs: 1. claude-code-action@v1 has no `mcp_servers` input — the action warned and ignored the block, so the Playwright MCP server was never registered. Most of the 26 permission_denials in that run were the agent attempting to call mcp__playwright__* tools that didn't exist. 2. The action does NOT auto-post structured output as a PR comment. claude-code-review.yml does it via three follow-up steps (peter-evans/find-comment → jq extract → create-or-update-comment). Skipping that wiring is why no comment appeared on the PR even though the agent returned a summary. Fixes: - Drop the broken `mcp_servers` input. Add a step that writes `.mcp.json` at the working-directory root before the agent runs. The action auto-loads it via `enableAllProjectMcpServers: true` (visible in last run's log). - Add `id: agent` to the action step so subsequent steps can read its `structured_output`. - Add the same find-comment / jq-extract / create-or-update-comment trio used by claude-code-review.yml, including the defensive double-unwrap for cases where the model nests its own JSON. - Update prompt: explicitly forbid the agent from posting comments itself, require the `<!-- ui-preview-smoke -->` marker on the first line of `summary` so the comment is sticky across runs, and put the skip text in `summary` instead of telling the agent to post it directly.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
🟢 Tier 1 — TrivialDocs, images, lock files, or a dependency bump. No functional code changes detected. Why this tier:
Review process: Auto-merge once CI passes. No human review required. Stats
|
PR Review✅ No critical issues found. The fix correctly addresses both bugs identified in the PR description:
Defensive measures look solid for a
Minor (non-blocking):
|
Deep Review🔴 P0/P1 -- must fix
🟡 P2 -- recommended
🔵 P3 nitpicks (3)
Reviewers (10): correctness, testing, maintainability, project-standards, agent-native, learnings-researcher, security, reliability, adversarial, previous-comments. Testing gaps:
|
E2E Test Results✅ All tests passed • 163 passed • 3 skipped • 1165s
Tests ran across 4 shards in parallel. |
Today the agent itself decides whether to skip — but only after the
workflow waits up to 10 min for Vercel, installs Node + Playwright +
Chromium, and spins up the agent (~2 min, ~$0.85). For PRs where the
author didn't fill in the "How to test on Vercel preview" template,
this is ~7 min and ~$1 to land a one-line skip comment.
Add a github-script pre-flight that parses /tmp/pr-body.md (already
fetched by the previous step) and gates the expensive setup behind
`has_plan == 'true'`. Required for `has_plan == 'true'`:
- "### How to test on Vercel preview" heading present
- Section non-empty after HTML comments are stripped
- Section not marked "N/A" or "non-UI change"
- "**Preview routes:**" line has non-empty content after the colon
- "**Steps:**" block has at least one numbered item with content
(catches the empty `1.\n2.\n3.` template placeholder)
When the gate fails, post the skip comment immediately via
peter-evans/create-or-update-comment using the same `<!-- ui-preview-smoke -->`
sticky marker. Move the find-comment step above both branches so it's
shared. Total runtime for no-plan PRs drops to ~30s and ~$0.
|
Pushed 1. Origin extraction silently passing through (P0/P1)
Now: extract via While I was here, I checked the 2. Empty / non-JSON structured_output silently skipped post (P0/P1)Added 3. Plan-check throw leaves no signal (P0/P1)If Added a final fallback step gated on 4. Vercel timeout silently skips (P0/P1)Added Coverage checkWalked each failure mode through the conditions; exactly one comment poster fires per scenario:
All fallback |
|
Pushed Findings closedP1/P2:
P3 (correctness):
Final failure-mode matrixEach of these scenarios fires exactly one poster, all sharing the same
Acknowledged residuals (not closed in this PR)
Reviewer reports themselves indicated no exploitable P0 paths against the latest pre-fix state and confirmed mutual exclusivity of the comment posters. The remaining residuals are documented as out-of-scope. |
Resolves all six findings on the ui-preview-smoke workflow with a disciplined +79-line diff (file lands at 397 lines). F1 — heredoc injection. Replaced static UI_SMOKE_EOF delimiter with per-run random `EOF_$(openssl rand -hex 16)`. F2 — broad MCP / gh access. Pinned @playwright/mcp@0.0.75 (was @latest). Added `--allowed-origins=${ORIGIN}` from the validated Vercel URL. Dropped `Bash(gh pr view:*)` from the agent's allowlist entirely — the prompt routes the agent to /tmp/pr-body.md, so gh was never used. Documented inline that the package's own README says --allowed-origins "is not a security boundary"; that's a residual we accept. F3 — silent sed pass-through. Replaced sed with bash regex match plus explicit abort: [[ ! "$VERCEL_URL" =~ ^(https?://[^/]+) ]] || ORIGIN=... which produces a `::error::` workflow annotation on bad input rather than silently degrading to an empty allowlist. F4 — empty summary silently skipped. Added `set -euo pipefail` and `|| SUMMARY=''` to the extract step so any malformed structured_output deterministically yields an empty summary. Then handled by F5/F6's fallback poster. F5 — plan-check throw, no signal. Added one consolidated "Post infrastructure-failure comment" step that fires `if: always() &&` when plan threw OR Vercel didn't produce a usable preview OR the agent extract produced no summary. Single sticky comment (`<!-- ui-preview-smoke -->`), links to the workflow run for the actual reason. Avoids the five-different-fallback-posters trap from the prior attempt. F6 — Vercel timeout no fallback. `continue-on-error: true` on the Vercel step. Downstream steps (Setup Node, Install Playwright, Write MCP, Run agent) gated on `steps.vercel.outcome == 'success'`. The consolidated poster covers the failure surface. What I deliberately did NOT do: - Preprocess the PR body into a strict route/step JSON before the agent reads it. The agent's output is already constrained by the JSON schema; preprocessing would require a contract change between workflow and prompt and was the path that ballooned the previous attempt to +543 lines. - Add per-failure-mode posters. One generic poster + a workflow-run link is sufficient — the run page is the source of truth for what specifically broke.
f40f44b to
3e03b8f
Compare
Restart with disciplined resolutionI rebased the branch back to The maximalist version is preserved at the local git tag Per-finding resolution: F1 — heredoc injection. Per-run F2 — broad MCP / gh access.
I deliberately did NOT preprocess the PR body into strict route/step JSON before the agent sees it. The agent's output is already schema-constrained, and preprocessing was the path that ballooned the prior attempt — see retrospective below. F3 — silent sed pass-through. Replaced sed with bash regex + explicit abort: [[ ! "$VERCEL_URL" =~ ^(https?://[^/]+) ]] && exit 1Pinned MCP version (per F2) handles the second half. F4 — empty summary silently skipped. F5 — plan-check throw, no signal. ONE consolidated F6 — Vercel timeout, no fallback. Retrospective on why this is the second attemptThe original resolution went through 5 review rounds with cumulative +543 lines because each round's "fix" was defensive code, not a triage decision. By round 4 the validator chain was a moving target; round 4's adversarial reviewer recommended a contract restructure (agent emits booleans, workflow renders template) which closed several classes of vulnerabilities but added even more code. Net result: ~600-line file solving a smoke-test problem.
Going forward: review feedback gets handed to |
|
Closing unmerged. Stepping back: the agent-driven UI smoke job overlaps significantly with the existing Playwright e2e suite ( Maintenance cost of the workflow has been real: 5 review rounds, ~543 added lines in the maximalist version (then trimmed to +79 with Following up with a revert PR for |
Summary
The first run of
ui-preview-smoke(on PR #2241) showed two bugs that combined to make the workflow appear successful while doing nothing useful:mcp_serversis not a valid input onclaude-code-action@v1. GitHub Actions warned about this in the log and silently ignored the block. The Playwright MCP server was never registered, somcp__playwright__*tools didn't exist. Most of the run's 26permission_denials_countwere the agent flailing trying to call Playwright tools that weren't there.The action does not auto-post structured output as a PR comment.
claude-code-review.ymlposts its review via three follow-up steps after the action:peter-evans/find-comment→ extract viajq→peter-evans/create-or-update-comment. I skipped that wiring assuming the action would post on its own. It doesn't — even when the agent returns a perfectly-formedsummaryfield.Net effect: the agent ran for 2 minutes, returned a structured output, hit 26 permission denials, and posted nothing on PR #2241 — including the "skipped, please add a test plan" comment that should have fired.
Fix
mcp_servers:input. Add a step that writes.mcp.jsonat the working-directory root before the agent runs. The action auto-loads it viaenableAllProjectMcpServers: true(visible in the last run's log:Updated settings with enableAllProjectMcpServers: true).id: agentto the agent step so subsequent steps can read itsstructured_output.claude-code-review.ymluses, including the defensive double-unwrap for the case where the model nests its own JSON.<!-- ui-preview-smoke -->marker on the first line ofsummaryso the comment is sticky across runs, and put the skip text inside thesummaryfield instead of asking the agent to post it directly.How to test on Vercel preview
N/A — CI workflow change.
Verifying the fix
After merge, the next push to a PR matching
packages/app/**should:<!-- ui-preview-smoke --> ## UI Preview Smoke … Skipped: …comment if the PR body has noHow to test on Vercel previewplan, orIf you want to dry-run before that, use
workflow_dispatch:PR #2241 is a good target — it has no test plan, so the expected output is the skip comment.
References
claude-code-review.ymlposting pattern (ci(deep-review): restructure review output for scannability #2230)