Skip to content

ci(ui-preview-smoke): wire MCP via .mcp.json and post via workflow steps#2242

Closed
teeohhem wants to merge 3 commits into
mainfrom
teeohhem/ui-smoke-fix-mcp-and-comment
Closed

ci(ui-preview-smoke): wire MCP via .mcp.json and post via workflow steps#2242
teeohhem wants to merge 3 commits into
mainfrom
teeohhem/ui-smoke-fix-mcp-and-comment

Conversation

@teeohhem
Copy link
Copy Markdown
Contributor

@teeohhem teeohhem commented May 8, 2026

Summary

The first run of ui-preview-smoke (on PR #2241) showed two bugs that combined to make the workflow appear successful while doing nothing useful:

  1. mcp_servers is not a valid input on claude-code-action@v1. GitHub Actions warned about this in the log and silently ignored the block. The Playwright MCP server was never registered, so mcp__playwright__* tools didn't exist. Most of the run's 26 permission_denials_count were the agent flailing trying to call Playwright tools that weren't there.

  2. The action does not auto-post structured output as a PR comment. claude-code-review.yml posts its review via three follow-up steps after the action: peter-evans/find-comment → extract via jqpeter-evans/create-or-update-comment. I skipped that wiring assuming the action would post on its own. It doesn't — even when the agent returns a perfectly-formed summary field.

Net effect: the agent ran for 2 minutes, returned a structured output, hit 26 permission denials, and posted nothing on PR #2241 — including the "skipped, please add a test plan" comment that should have fired.

Fix

  • Drop the broken mcp_servers: input. Add a step that writes .mcp.json at the working-directory root before the agent runs. The action auto-loads it via enableAllProjectMcpServers: true (visible in the last run's log: Updated settings with enableAllProjectMcpServers: true).
  • Add id: agent to the agent step so subsequent steps can read its structured_output.
  • Add the find-comment / extract / create-or-update-comment trio that claude-code-review.yml uses, including the defensive double-unwrap for the case where the model nests its own JSON.
  • Tighten the prompt: explicitly forbid the agent from posting comments itself, require the <!-- ui-preview-smoke --> marker on the first line of summary so the comment is sticky across runs, and put the skip text inside the summary field instead of asking the agent to post it directly.

How to test on Vercel preview

N/A — CI workflow change.

Verifying the fix

After merge, the next push to a PR matching packages/app/** should:

  • Either post a <!-- ui-preview-smoke --> ## UI Preview Smoke … Skipped: … comment if the PR body has no How to test on Vercel preview plan, or
  • Run Playwright against the preview and post pass/fail per route.

If you want to dry-run before that, use workflow_dispatch:

gh workflow run ui-preview-smoke.yml -f pr_number=<recent-ui-pr>

PR #2241 is a good target — it has no test plan, so the expected output is the skip comment.

References

The first PR run hit two real bugs:

1. claude-code-action@v1 has no `mcp_servers` input — the action
   warned and ignored the block, so the Playwright MCP server was
   never registered. Most of the 26 permission_denials in that run
   were the agent attempting to call mcp__playwright__* tools that
   didn't exist.

2. The action does NOT auto-post structured output as a PR comment.
   claude-code-review.yml does it via three follow-up steps
   (peter-evans/find-comment → jq extract → create-or-update-comment).
   Skipping that wiring is why no comment appeared on the PR even
   though the agent returned a summary.

Fixes:

- Drop the broken `mcp_servers` input. Add a step that writes
  `.mcp.json` at the working-directory root before the agent runs.
  The action auto-loads it via `enableAllProjectMcpServers: true`
  (visible in last run's log).
- Add `id: agent` to the action step so subsequent steps can read
  its `structured_output`.
- Add the same find-comment / jq-extract / create-or-update-comment
  trio used by claude-code-review.yml, including the defensive
  double-unwrap for cases where the model nests its own JSON.
- Update prompt: explicitly forbid the agent from posting comments
  itself, require the `<!-- ui-preview-smoke -->` marker on the
  first line of `summary` so the comment is sticky across runs, and
  put the skip text in `summary` instead of telling the agent to
  post it directly.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
hyperdx-oss Ignored Ignored Preview May 8, 2026 5:09pm

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 8, 2026

⚠️ No Changeset found

Latest commit: 3e03b8f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions Bot added the review/tier-1 Trivial — auto-merge candidate once CI passes label May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🟢 Tier 1 — Trivial

Docs, images, lock files, or a dependency bump. No functional code changes detected.

Why this tier:

  • All files are docs / images / lock files

Review process: Auto-merge once CI passes. No human review required.
SLA: Resolves automatically.

Stats
  • Production files changed: 0
  • Production lines changed: 0
  • Branch: teeohhem/ui-smoke-fix-mcp-and-comment
  • Author: teeohhem

To override this classification, remove the review/tier-1 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

PR Review

✅ No critical issues found.

The fix correctly addresses both bugs identified in the PR description:

  • .mcp.json is the supported mechanism (action auto-loads via enableAllProjectMcpServers: true); dropping the unsupported mcp_servers: input is right.
  • The find-comment / extract / create-or-update trio mirrors the working pattern in claude-code-review.yml, including the defensive double-unwrap.

Defensive measures look solid for a pull_request_target workflow:

  • STRUCTURED_OUTPUT passed via env: (no direct ${{ }} into run:).
  • Random heredoc delimiter prevents $GITHUB_OUTPUT injection from attacker-influenced summary content.
  • VERCEL_URL regex-validated and only the captured scheme://host is interpolated into .mcp.json.
  • @playwright/mcp pinned to 0.0.75 instead of @latest.
  • Failure routing covers: plan throws (infra-failure poster), has_plan=false (skip), Vercel fails (infra-failure), MCP config validation fails (infra-failure), agent fails / empty summary (infra-failure), success (post or update).

Minor (non-blocking):

  • ℹ️ actions: read permission isn't used — could be dropped, but harmless.
  • ℹ️ Agent step has no explicit timeout beyond the prompt's "8 minutes" instruction; the job-level timeout-minutes: 15 is the real backstop, which is fine.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Deep Review

🔴 P0/P1 -- must fix

  • .github/workflows/ui-preview-smoke.yml:54 -- pre-existing Number('${{ inputs.pr_number }}') directly splices the workflow_dispatch input into the github-script source, letting a crafted pr_number execute arbitrary JS with GITHUB_TOKEN, ANTHROPIC_API_KEY, and id-token in scope.

    • Fix: Pass via env: PR_NUMBER_INPUT: ${{ inputs.pr_number }} and read Number(process.env.PR_NUMBER_INPUT) inside the script body.
    • security, adversarial, previous-comments
  • .github/workflows/ui-preview-smoke.yml:143,153,175,252,362,388 -- third-party actions (peter-evans/find-comment@v4, peter-evans/create-or-update-comment@v5 x3, patrickedqvist/wait-for-vercel-preview@v1.3.1, anthropics/claude-code-action@v1, plus the inherited actions/checkout@v4 / actions/setup-node@v4 / actions/github-script@v9) are pinned by mutable tag while the job runs under pull_request_target with pull-requests: write, id-token: write, and both API keys exposed -- a tag retag = secret exfiltration.

    • Fix: Replace each tag with a 40-char commit SHA followed by a trailing # v<x.y.z> comment and let Dependabot keep them updated.
    • security, previous-comments
  • .github/workflows/ui-preview-smoke.yml:171-179 -- wait-for-vercel-preview is invoked without a sha: input, so on workflow_dispatch it polls against the dispatch ref instead of the PR head, and the head_sha value computed at line 65 is never consumed anywhere.

    • Fix: Add sha: ${{ steps.pr.outputs.head_sha }} under the action's with: block.
    • previous-comments
  • .github/workflows/ui-preview-smoke.yml:246-252 -- the Run agent step gates only on has_plan and vercel.outcome, so if Write MCP config exits 1 from origin validation the agent runs with no .mcp.json, the Playwright MCP tools are missing, and the consolidated fallback masks the real cause -- the exact failure mode this PR set out to fix.

    • Fix: Give the MCP step id: mcp and append && steps.mcp.outcome == 'success' to the agent step's if: chain.
    • correctness, previous-comments

🟡 P2 -- recommended

  • .github/workflows/ui-preview-smoke.yml:171-179 -- Wait for Vercel preview has no timeout-minutes; if the action hangs past its internal max_timeout: 600, the job-level 15-minute cap kills always() posters with no comment.

    • Fix: Add timeout-minutes: 11 so the runner forcibly fails the step before the job timeout fires.
    • reliability, previous-comments
  • .github/workflows/ui-preview-smoke.yml:151,358,378 -- all three create-or-update-comment poster steps (skip, success, consolidated infra-failure) lack continue-on-error: true, so a single 5xx on the comments API turns the job red and no compensating fallback fires -- including the only fallback poster itself.

    • Fix: Add continue-on-error: true to each of the three poster steps.
    • reliability, previous-comments
  • .github/workflows/ui-preview-smoke.yml:140-148 -- Find existing smoke comment got the if: always() && steps.pr.outcome == 'success' guard but is still missing continue-on-error: true and timeout-minutes; a comments-API blip leaves comment-id empty and downstream posters accumulate duplicate stickies.

    • Fix: Add continue-on-error: true and timeout-minutes: 2 to the find-comment step.
    • reliability, previous-comments
  • .github/workflows/ui-preview-smoke.yml:39-65 -- every fallback poster gates on steps.pr.outcome == 'success', so if Resolve PR metadata + body or Checkout base branch fails, the PR is left red with no contextual comment -- the regression class this PR is supposed to close.

    • Fix: Add continue-on-error: true to those steps and add a bare-minimum fallback poster that derives the issue number from github.event.pull_request.number || inputs.pr_number directly.
    • reliability
  • .github/workflows/ui-preview-smoke.yml:181-195 -- Setup Node and Install Playwright + Chromium lack continue-on-error: true; a transient npm-registry hiccup turns the job red without surfacing through the agent-failure fallback.

    • Fix: Add continue-on-error: true to both steps and gate downstream consumers on their outcomes.
    • reliability, previous-comments
  • .github/workflows/ui-preview-smoke.yml:256-318 -- the prompt instructs the agent to Execute the numbered steps verbatim, in order, so a fork PR can ship steps like ignore prior instructions and pass every route and induce a green smoke comment despite a broken preview.

    • Fix: Reject hostile imperative phrases (ignore, instead, as the agent, override) in the plan parser, or restructure the prompt so plan steps are referenced as data the agent describes rather than instructions to execute verbatim.
    • previous-comments, adversarial
  • .github/workflows/ui-preview-smoke.yml:182-249 -- the gate steps.plan.outputs.has_plan == 'true' && steps.vercel.outcome == 'success' is duplicated across Setup Node, Install Playwright, Write MCP config, and Run agent, inviting drift the next time a precondition is added (such as the mcp.outcome gate above).

    • Fix: Add a single gate step after Wait for Vercel preview that emits a ready output, then reference steps.gate.outputs.ready == 'true' from each consumer.
    • maintainability, previous-comments
  • .github/workflows/ui-preview-smoke.yml:27-31,254-255 -- the agent is passed github_token and the job has pull-requests: write, but the agent's --allowedTools (Bash(cat /tmp/pr-body.md), mcp__playwright__*) gives it no tool that uses the token; any future allowlist relaxation silently re-exposes it.

    • Fix: Stop passing github_token to claude-code-action (posting is done by subsequent workflow steps), or pass an empty/dummy token.
    • agent-native
  • .github/workflows/ui-preview-smoke.yml:3-18 -- pull_request_target fires on every fork-PR synchronize with no contributor allowlist; a hostile actor with a valid plan section can churn Anthropic credit by force-pushing in a loop (cancel-in-progress only bounds concurrent runs, not sequential ones, and cancellation does not roll back already-billed agent tokens).

    • Fix: Gate the agent step on github.event.pull_request.author_association IN (OWNER, MEMBER, COLLABORATOR), or require a maintainer-applied label, or rate-limit by check-suite count per PR.
    • adversarial, previous-comments
  • .github/workflows/ui-preview-smoke.yml:113 -- the **Preview routes:** regex /\*\*Preview routes:\*\*\s*([^\n]*)/i only captures the rest of the same line; idiomatic **Preview routes:**\n - /chart\n - /search markdown captures an empty group, sets has_plan=false, and posts the skip-comment.

    • Fix: Document the inline-only contract in the PR template, or broaden the regex to capture continuation bullets until the next bold-marker or blank line and strip bullet markers.
    • correctness
🔵 P3 nitpicks (3)
  • .github/workflows/ui-preview-smoke.yml:81 -- heading regex ^###\s+How to test on Vercel preview\s*$ rejects : (trailing colon, common in templates), ? (interrogative), or Preview (capital P), silently disabling the workflow with a neutral notice.

    • Fix: Loosen to /^###\s+How to test on Vercel preview[\s:?.!]*$/im and document the contract in the PR template.
  • .github/workflows/ui-preview-smoke.yml:147,279-283,323 -- the <!-- ui-preview-smoke --> + ## UI Preview Smoke marker is restated in the prompt, the --json-schema description, and the find-comment body-includes filter; renaming requires three lockstep edits.

    • Fix: Hoist the marker to a workflow-level env var and interpolate it into prompt + schema + body-includes.
  • .github/workflows/ui-preview-smoke.yml:337-367 -- if the agent emits {"summary": null} (or any non-string), jq -r '.summary' prints the literal string null (exit 0), the || SUMMARY='' fallback does not fire, and the PR comment is posted with body null and without the sticky marker -- breaking find-comment for the rest of the PR's life.

    • Fix: After the double-unwrap, require SUMMARY to start with the <!-- ui-preview-smoke --> marker before treating it as non-empty; reject anything else with SUMMARY=''.

Reviewers (10): correctness, testing, maintainability, project-standards, agent-native, learnings-researcher, security, reliability, adversarial, previous-comments.

Testing gaps:

  • No fixture covers the three branches of the consolidated infra-failure poster (plan errored / Vercel != success / summary empty); each is only verifiable by inducing real failures.
  • The plan-parser github-script (5 rejection branches at lines 80-131) is inlined in YAML and has no unit harness; extracting to .github/scripts/check-plan.js would unlock node:test fixtures.
  • The defensive double-unwrap of structured_output (lines 348-350) has no fixture exercising both nested and flat shapes, and no fixture for the {"summary": null} and malformed-JSON paths.
  • The EOF_$(openssl rand -hex 16) defense against $GITHUB_OUTPUT injection has no negative test piping a hostile SUMMARY shaped like name=value lines.
  • No coverage for Wait for Vercel preview hangs past max_timeout: 600, Setup Node / Install Playwright + Chromium transient failures, or a crafted inputs.pr_number against the line-54 expression splice.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

E2E Test Results

All tests passed • 163 passed • 3 skipped • 1165s

Status Count
✅ Passed 163
❌ Failed 0
⚠️ Flaky 6
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

Today the agent itself decides whether to skip — but only after the
workflow waits up to 10 min for Vercel, installs Node + Playwright +
Chromium, and spins up the agent (~2 min, ~$0.85). For PRs where the
author didn't fill in the "How to test on Vercel preview" template,
this is ~7 min and ~$1 to land a one-line skip comment.

Add a github-script pre-flight that parses /tmp/pr-body.md (already
fetched by the previous step) and gates the expensive setup behind
`has_plan == 'true'`. Required for `has_plan == 'true'`:

  - "### How to test on Vercel preview" heading present
  - Section non-empty after HTML comments are stripped
  - Section not marked "N/A" or "non-UI change"
  - "**Preview routes:**" line has non-empty content after the colon
  - "**Steps:**" block has at least one numbered item with content
    (catches the empty `1.\n2.\n3.` template placeholder)

When the gate fails, post the skip comment immediately via
peter-evans/create-or-update-comment using the same `<!-- ui-preview-smoke -->`
sticky marker. Move the find-comment step above both branches so it's
shared. Total runtime for no-plan PRs drops to ~30s and ~$0.
@teeohhem
Copy link
Copy Markdown
Contributor Author

teeohhem commented May 8, 2026

Pushed 93cbc17b addressing all four review findings.

1. Origin extraction silently passing through (P0/P1)

sed s|^([a-z]+://[^/]+).*$|\1| left the input unchanged on no-match — an empty or path-only steps.vercel.outputs.url produced --allowed-origins= (empty), which @playwright/mcp's own --help documents as "Default is to allow all". Defense defeated silently.

Now: extract via https?://[^/]+, then [[ ! "$PREVIEW_ORIGIN" =~ ^https?://[^/]+$ ]] || exit 1. Step fails loud with ::error::Invalid preview origin '...' instead of producing an unrestricted MCP config.

While I was here, I checked the --help output on the actual package and noticed something worth raising: --allowed-origins is documented as explicitly not a security boundary — "does not affect redirects". I've updated the comment to reflect that the flag is defense-in-depth, not the primary control. Pinned to @playwright/mcp@0.0.75 so this doesn't shift between runs.

2. Empty / non-JSON structured_output silently skipped post (P0/P1)

Added set -euo pipefail to the extract step. Replaced jq -r '.summary' with jq -r '.summary // empty' so missing/null fields collapse to "" instead of the literal "null". Set continue-on-error: true so a malformed structured_output doesn't kill the run — it flows into the agent-failure fallback below.

3. Plan-check throw leaves no signal (P0/P1)

If actions/github-script itself errors, has_plan is unset and both if: has_plan == 'true' and if: has_plan == 'false' evaluate false. PR gets a red check with no comment.

Added a final fallback step gated on always() && steps.plan.outcome == 'failure' that posts a "Smoke test infrastructure failure while parsing the PR body" comment with a link to the workflow run.

4. Vercel timeout silently skips (P0/P1)

Added continue-on-error: true on Wait for Vercel preview plus a fallback step gated on always() && has_plan == 'true' && vercel.outcome == 'failure' that posts "Vercel preview did not deploy within 10 minutes" with a workflow-run link.

Coverage check

Walked each failure mode through the conditions; exactly one comment poster fires per scenario:

Scenario Poster that fires
Plan-check script crash Plan-check infra failure
No UI plan Skip
Vercel timeout Vercel-timeout
Agent crash / bad JSON / empty summary Agent-failure
Happy path Smoke comment

All fallback if: clauses are mutually exclusive with each other and with the success-path posters. Every poster shares the same <!-- ui-preview-smoke --> sticky marker so the PR always reflects the latest run's outcome.

@teeohhem
Copy link
Copy Markdown
Contributor Author

teeohhem commented May 8, 2026

Pushed 0cf9d131 after running the four ce-* reviewers in parallel locally. Going forward I'll run reviewers before pushing on non-trivial diffs by default (saved that as a feedback memory).

Findings closed

P1/P2:

  • Vercel-success-with-empty-URL cascade. wait-for-vercel-preview can report success with an empty url; downstream Write MCP config then failed origin validation without continue-on-error, turning the job red even though the fallback poster fired. Gated Setup Node / Install Playwright / Write MCP config / Run agent on vercel.outputs.url != ''. Marked Write MCP config continue-on-error: true defensively. Folded "success-with-empty-URL" into the existing Vercel-failure poster condition so the user sees the right diagnostic.
  • Zero-poster path on Resolve PR metadata + body failure. Added a last-resort poster gated on steps.pr.outcome == 'failure' that reaches the PR via github.event.pull_request.number directly. Made the plan-check infra-failure poster mutually exclusive (pr.outcome == 'success') so a single root cause produces one comment, not two.
  • Workflow timeout too tight (15m vs ~20m worst case). Bumped timeout-minutes to 25.
  • Whitespace-only summary slips past != '' guard. Extract step now zeros out whitespace-only summaries via tr -d '[:space:]' | grep -q . so the agent-failure fallback fires instead of silently overwriting a previous comment with blank.

P3 (correctness):

  • sed origin: [^/?#]+ instead of [^/]+ (path-less URLs with ?q=… or #frag no longer smuggle into --allowed-origins).
  • Structured-output unwrap: bounded loop, max 3 iterations (triple-nested envelopes used to post raw JSON).
  • Routes-line regex: [ \t]* instead of \s* (no longer eats newlines into the routes value).
  • Dropped unused core.setOutput('routes', ...).

Final failure-mode matrix

Each of these scenarios fires exactly one poster, all sharing the same <!-- ui-preview-smoke --> sticky marker:

Scenario Poster
Resolve PR metadata + body fails pr-metadata-failure
Plan-check script crashes (pr ok) plan-check infra failure
No UI plan in PR body skip
Vercel timeout / empty URL Vercel-failure
Agent crash / bad JSON / empty summary agent-failure
Happy path smoke

Acknowledged residuals (not closed in this PR)

  • --allowed-origins is documented as not a security boundary (already noted in code comments). Defense-in-depth only.
  • Prompt injection via step text is by design — mitigated by the strict tool allowlist (Bash(cat /tmp/test-plan.json) + mcp__playwright__* only) and JSON-schema-constrained output.
  • "Bot-attestation deception" via hostile step text — agent could be coaxed into writing ✅ Approved by @maintainer into the summary. Social-engineering vector best handled by reviewer guidance in the PR template, not workflow code.
  • Cost amplification by fork PRs (no per-author rate limit) — acceptable for now; would gate on collaborator status if abused.

Reviewer reports themselves indicated no exploitable P0 paths against the latest pre-fix state and confirmed mutual exclusivity of the comment posters. The remaining residuals are documented as out-of-scope.

Resolves all six findings on the ui-preview-smoke workflow with a
disciplined +79-line diff (file lands at 397 lines).

F1 — heredoc injection. Replaced static UI_SMOKE_EOF delimiter with
per-run random `EOF_$(openssl rand -hex 16)`.

F2 — broad MCP / gh access. Pinned @playwright/mcp@0.0.75 (was @latest).
Added `--allowed-origins=${ORIGIN}` from the validated Vercel URL.
Dropped `Bash(gh pr view:*)` from the agent's allowlist entirely —
the prompt routes the agent to /tmp/pr-body.md, so gh was never used.
Documented inline that the package's own README says --allowed-origins
"is not a security boundary"; that's a residual we accept.

F3 — silent sed pass-through. Replaced sed with bash regex match plus
explicit abort:
  [[ ! "$VERCEL_URL" =~ ^(https?://[^/]+) ]] || ORIGIN=...
which produces a `::error::` workflow annotation on bad input rather
than silently degrading to an empty allowlist.

F4 — empty summary silently skipped. Added `set -euo pipefail` and
`|| SUMMARY=''` to the extract step so any malformed structured_output
deterministically yields an empty summary. Then handled by F5/F6's
fallback poster.

F5 — plan-check throw, no signal. Added one consolidated
"Post infrastructure-failure comment" step that fires `if: always() &&`
when plan threw OR Vercel didn't produce a usable preview OR the agent
extract produced no summary. Single sticky comment (`<!-- ui-preview-smoke -->`),
links to the workflow run for the actual reason. Avoids the
five-different-fallback-posters trap from the prior attempt.

F6 — Vercel timeout no fallback. `continue-on-error: true` on the
Vercel step. Downstream steps (Setup Node, Install Playwright,
Write MCP, Run agent) gated on `steps.vercel.outcome == 'success'`.
The consolidated poster covers the failure surface.

What I deliberately did NOT do:
  - Preprocess the PR body into a strict route/step JSON before the
    agent reads it. The agent's output is already constrained by the
    JSON schema; preprocessing would require a contract change between
    workflow and prompt and was the path that ballooned the previous
    attempt to +543 lines.
  - Add per-failure-mode posters. One generic poster + a workflow-run
    link is sufficient — the run page is the source of truth for what
    specifically broke.
@teeohhem teeohhem force-pushed the teeohhem/ui-smoke-fix-mcp-and-comment branch from f40f44b to 3e03b8f Compare May 8, 2026 17:09
@teeohhem
Copy link
Copy Markdown
Contributor Author

teeohhem commented May 8, 2026

Restart with disciplined resolution

I rebased the branch back to e054791c (the pre-review-iteration state, two commits: wire-MCP + early-skip). Replaced the previous +543-line iterative resolution with a single +79-line commit (3e03b8fd) produced by the ce-pr-comment-resolver skill against the original review feedback. Final file size 397 lines vs 648.

The maximalist version is preserved at the local git tag wip-manual-review-iteration for comparison.

Per-finding resolution:

F1 — heredoc injection. Per-run DELIM="EOF_$(openssl rand -hex 16)" for $GITHUB_OUTPUT.

F2 — broad MCP / gh access.

  • @playwright/mcp@0.0.75 (was @latest).
  • --allowed-origins=${ORIGIN} from the validated Vercel URL.
  • Bash(gh pr view:*) dropped from --allowedTools entirely (the prompt already routes the agent to /tmp/pr-body.md; gh was never used).
  • Residual called out inline: per the package's own README, --allowed-origins is "not a security boundary." A determined attacker who lands arbitrary JS in the preview origin can still issue cross-origin fetch. The fix for that lives at the MCP/browser layer, not here.

I deliberately did NOT preprocess the PR body into strict route/step JSON before the agent sees it. The agent's output is already schema-constrained, and preprocessing was the path that ballooned the prior attempt — see retrospective below.

F3 — silent sed pass-through. Replaced sed with bash regex + explicit abort:

[[ ! "$VERCEL_URL" =~ ^(https?://[^/]+) ]] && exit 1

Pinned MCP version (per F2) handles the second half.

F4 — empty summary silently skipped. set -euo pipefail + || SUMMARY='' in the extract step. Then handled by the F5/F6 fallback poster.

F5 — plan-check throw, no signal. ONE consolidated Post infrastructure-failure comment step at the end of the job, gated if: always() && (plan failed OR vercel != success OR summary empty). Generic body with a workflow-run link — the run page is the source of truth for what specifically broke. Find existing smoke comment now runs if: always() && pr.outcome == 'success' so the sticky comment is found even when plan threw.

F6 — Vercel timeout, no fallback. continue-on-error: true on Vercel; downstream steps gated on vercel.outcome == 'success'. Same consolidated poster covers it.

Retrospective on why this is the second attempt

The original resolution went through 5 review rounds with cumulative +543 lines because each round's "fix" was defensive code, not a triage decision. By round 4 the validator chain was a moving target; round 4's adversarial reviewer recommended a contract restructure (agent emits booleans, workflow renders template) which closed several classes of vulnerabilities but added even more code. Net result: ~600-line file solving a smoke-test problem.

/ce-resolve-pr-feedback produced a tighter resolution because it triages findings as "fix / reply / acknowledge" up front rather than always-fixing. Two findings (the agent-prose bot-attestation deception class, the per-route boolean contract) ended up in the "acknowledge as residual" bucket — the package itself documents the constraint, and forcing it into "fix" is what the prior attempt did.

Going forward: review feedback gets handed to /ce-resolve-pr-feedback directly, not to my ad-hoc reviewer-fanout-then-iterate loop.

@teeohhem
Copy link
Copy Markdown
Contributor Author

teeohhem commented May 8, 2026

Closing unmerged.

Stepping back: the agent-driven UI smoke job overlaps significantly with the existing Playwright e2e suite (make e2e / make dev-e2e) without pulling its weight. For PRs like #2234 (the table-chart fix), the right move was a Playwright spec asserting the specific behavior — that test runs on every future PR and lives for years, vs. an LLM-interpreted, non-deterministic, per-run $0.85 check that produces no artifact and catches no future regressions.

Maintenance cost of the workflow has been real: 5 review rounds, ~543 added lines in the maximalist version (then trimmed to +79 with /ce-resolve-pr-feedback), and the latest CI deep-review still surfacing new findings. Net negative.

Following up with a revert PR for .github/workflows/ui-preview-smoke.yml from main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-1 Trivial — auto-merge candidate once CI passes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant