feat(wizard-ci): real-TUI e2e + snapshot review by gewenyu99 · Pull Request #2012 · PostHog/wizard-workbench

gewenyu99 · 2026-06-21T15:07:28Z

Usage

All four drive the real wizard TUI and surface the current run's key-moment frames (no baseline comparison).

1. Local — run + HTML report

pnpm wizard-ci-snapshots

Runs the real agent flow against express-todo through the real wizard TUI, captures each key moment, writes report.html of the current frames (WIZARD_PATH auto-resolves to a sibling wizard-e2e; creds from .env).

2. Local — review bundle (no PR)

pnpm wizard-ci-snapshot-review express-todo --dry-run   # drop --dry-run to open the PR

3. Manual dispatch (CI) — the snapshots switch picks which CI runs; the normal evaluator is the default and is unchanged:

# this version — real-TUI snapshots:
gh workflow run wizard-ci.yml --ref wizard-e2e-control-plane \
  -f snapshots=true -f app=basic-integration/next-js \
  -f wizard_ref=main -f evaluate=false
# normal eval CI: omit -f snapshots (or pass -f snapshots=false)

Opens one review PR per app.
Example output — the next.js matrix from this PR:

4. PR comment — distinct commands pick the mode (members, via the bot):

/wizard-ci <app>                      # normal evaluator (unchanged)
/wizard-snapshots <app> [wizard_ref]  # this version — real-TUI snapshots

Both go through the same wizard-ci-trigger repository_dispatch and the single SNAPSHOTS branch in wizard-ci.yml; /wizard-snapshots just adds snapshots: true to the payload (handled in PostHog/wizard#728). The handler only parses + dispatches — no untrusted PR code is checked out. Requires both this PR and PostHog/wizard#728 merged (repository_dispatch only reaches workflows on the default branch).

What this is

wizard-ci --e2e and wizard-ci-snapshots drive the real wizard TUI (via the wizard repo's tui-snapshots): the real startTUI, driven by state manipulation, captured per key moment as text.

--e2e asserts on the result JSON the run emits (run completed, posthog dep / .env, reached keep-skills).
wizard-ci-snapshots surfaces the current run's real-TUI key-moment frames as report.html (no baseline comparison for now).
snapshot-review rasterizes the frames to an image PR (one frame per row, titled [CI] (snapshots) <app>), and posts the report back as a comment when triggered with a comment_pr.

The snapshot path reuses the existing Wizard CI workflow: one snapshots input switches the Execute step from the evaluator to the real-TUI review.

Companion PRs: PostHog/wizard#702 (the harness, merged) · PostHog/wizard#728 (the /wizard-snapshots comment trigger).

Adds two modes to the existing wizard-ci, as an alternative to classic --ci (LoggingUI: agent-only, stdout-grep). --e2e drives the WHOLE interactive flow headlessly through the wizard-ci-tools control plane and asserts on structured state; --replay plays a recorded run back in the terminal. Core files: - services/wizard-ci/e2e.ts — runE2e(): /tmp app-copy isolation, env hygiene (strips host CLAUDE*/ANTHROPIC* so the spawned agent auths with the phx key instead of deferring to the host), scoped --project-id, the happy-path policy (skip mcp+slack, delete skills, continue past health issues), spawns the wizard repo's headless harness, then asserts the structured result (runPhase=completed, posthog dep/.env, reached keep-skills, skillsComplete). replayRecording(): shells to the wizard repo's terminal replayer. - services/wizard-ci/index.ts — wires --e2e (positional app, --project-id, --keep-skills) and --replay (--step/--delay) into the CLI + --help. Engine lives in the wizard repo (store + driver must run in-process); point WIZARD_PATH at it. See PostHog/wizard PR for src/lib/ci-driver + harness. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…nitions Run each CI-e2e test definition (for now: integration on express-todo) as a real --e2e agent run, render every key-moment frame of the recording to a real-Ink ANSI snapshot, and diff against a committed baseline. Surfaces run-to-run differences (e.g. the agent enqueuing tasks differently) side-by-side for a human to review — same screens every run, deltas flagged. No mocks: real agent, real recording, real render. - services/wizard-ci/snapshots.ts — the flow (run → render → diff → report) - services/wizard-ci/ansi-html.ts — dependency-free ANSI→HTML for the side-by-side - services/wizard-ci/snapshots/express-todo/ — committed baseline (47 frames) - pnpm wizard-ci-snapshots (+ mprocs entry); --update to accept a new baseline Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The snapshots.ts header now lists what the flow needs in .env (POSTHOG_PERSONAL_API_KEY, POSTHOG_WIZARD_PROJECT_ID, POSTHOG_REGION) and that WIZARD_PATH must point at a checkout containing e2e-harness/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A real agent emits frames a little differently run to run (different number of status updates → shifted indices), so drift is expected. Print the per-frame diffs + report.html and exit 0; only a genuine failure (run died, no recording) exits non-zero. Accept a new baseline with --update. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

After the diff, prompt "Replay <name> snapshots in the terminal? [y/N]" and, on yes, launch the replay stepper directly on the run's recording — no copy/paste. TTY-only (auto-declines in CI so nothing hangs); the replayer inherits stdio for its own Enter-to-step loop. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Document handing the Wizard to an agent to run/drive/explore it headlessly, pointing at the runbook (wizard repo e2e-harness/EXPLORING-AS-AN-AGENT.md) with a copy-paste example prompt that targets wasp-lang/open-saas — the agent works out how to build + run the target itself. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… comments The agentic-exploration section belongs in the wizard repo's README, not here. Also trim snapshots.ts / index.ts comments to concise current-behavior. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…wright) services/wizard-ci/screenshot.ts rasterizes the side-by-side report — one PNG per key-moment frame (baseline │ current) plus a full-flow strip — for attaching to a review PR. Reuses the report HTML (ansi-html), so no new ANSI logic. Adds playwright as a dev dep. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…shots snapshot-review.ts runs the e2e, renders the report to side-by-side PNGs, and opens a review PR whose body embeds them (raw URLs), changed frames first — instead of running the agent evaluator. --dry-run writes the bundle locally. wizard-snapshots.yml dispatches it (bot token, setup-wizard-deps, Playwright). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

`wizard-ci --e2e` and `wizard-ci-snapshots` run the wizard repo's tui-snapshots: the real wizard TUI, driven by store state manipulation, captured per screen as text. --e2e asserts on the result JSON it emits; snapshots diff the captured screens against a committed baseline; snapshot-review rasterizes them to a side-by-side image PR. Drops the recording/replay plumbing (the --replay flag, the render step, ansi-html) — the captured screens are already clean text. WIZARD_PATH defaults to a sibling wizard checkout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…as a comment Comment `/wizard-ci [app] [wizard_ref]` on a PR to run the real-TUI e2e. The workflow acks with 👀, checks out the PR, runs snapshot-review, and posts a comment on the PR with the flow strip and a link to the full side-by-side review (--comment-pr). Restricted to repo members/owner/collaborators. Manual workflow_dispatch still works; no auto-run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rue) A standalone snapshots job, independent of the evaluator (the eval still runs as normal). Dispatch "Wizard CI" with snapshots=true to also open a real-TUI review PR for the app — same app token + setup-wizard-deps + PostHog key as the evaluator, project hard-coded to 2 (the bot key's project). Because wizard-ci.yml is on main, this is dispatchable from a PR branch (pre-merge). The /wizard-ci comment trigger stays in wizard-snapshots.yml. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…snapshot) The wizard-ci job's Execute step runs the headless eval by default, or — with the snapshots=true input — the real-TUI snapshot review for the app. One switch, the same job; no separate parallel job. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…sterize) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…R step) git('rev-parse HEAD'), not the array form — the helper runs git ${cmd}. Pass cwd to getRepoRoot too. tsc clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…able text) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Match the eval path's title and body conventions (buildPRTitle/buildPRBody): [CI] (snapshots) <app> title, plain metadata lines, no marketing prose. Drop the unreadable full-flow composite (_flow.png) from the PR body, the comment, and screenshot.ts; the per-frame side-by-side images remain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Align the snapshot path's project id with the evaluator's convention (secrets.GH_APP_POSTHOG_WIZARD_CI_BOT_TARGET_PROJECT_ID) instead of the '2' placeholder, and drop the now-stale _flow.png mentions from the header. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

main (#2042) already sets POSTHOG_WIZARD_PROJECT_ID from the bot's target-project secret on the Execute step. Keep that one; this branch adds only the snapshots switch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Replace the explicit strip list with the same /^(CLAUDE|ANTHROPIC|AI_AGENT)/ predicate the wizard's MCP host uses, so the two can't drift. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…OCTOU' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

The issue_comment trigger checked out and ran the PR's code with the bot secret and write perms — CodeQL untrusted-checkout-toctou (7 critical). The Copilot autofix only pinned the ref (still runs untrusted code, and issue_comment payloads have no pull_request.head.sha so it fell back to the merge ref). Use the existing comment mechanism instead: the trusted handler parses the comment and fires a repository_dispatch; this workflow runs only on dispatch and checks out its own trusted ref. comment_pr in the payload still drives the post-back. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

semgrep run-shell-injection: event-controlled values were interpolated straight into the run script. Bind them to env vars and reference shell variables instead, matching wizard-ci.yml's resolve-inputs step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Keep the PR small: remove the committed baselines and the baseline-vs-current comparison. The snapshot run now captures the real-TUI key-moment frames and surfaces them (report.html + a review PR with one image per frame); regression comparison is out of scope for now. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… set snapshot-review forwards its app to snapshots.ts, which ignored it and always ran express-todo. Honor a positional app arg so per-app CI (matrix / dispatch) snapshots the requested app. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…workflow The snapshots switch only fired on workflow_dispatch (inputs.snapshots is empty for repository_dispatch), so the comment path needed its own workflow. Have SNAPSHOTS also read client_payload.snapshots, so the /wizard-snapshots comment routes through the same wizard-ci-trigger dispatch and the one 'if SNAPSHOTS' branch. Removes wizard-snapshots.yml. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…vices Resolve wizardRepo() once per run; hoist OUT_ROOT and the Shot type into e2e.ts and import them in snapshots/snapshot-review/screenshot instead of re-declaring. Fix a stale doc reference (/wizard-ci -> /wizard-snapshots). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

snapshots.ts (writer) and snapshot-review.ts (reader) each rebuilt join(OUT_ROOT, name, ...) independently — a path contract that could drift. Hoist reportDirFor(app) into e2e.ts and use it on both sides. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Derive each frame's seconds-since-the-first-frame from its write time (frameTimings) and append it to the report headings and the PR-body headings (e.g. '23-outro.txt (+1m23s)'), so a reviewer can see how long each key moment took to reach. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gewenyu99 · 2026-06-25T15:29:22Z

e2e.ts: runs the snapshotting path, for local run through mprocs and for CI

snapshots.ts: local viewing, zero external deps.

screenshot.ts: isolates the Chromium dep; standalone. this is to make it easy to read as comments in CI

snapshot-review.ts: isolates the GitHub side; CI-only.

gewenyu99 changed the title ~~feat(wizard-ci): full e2e via control plane (--e2e) + replay (--replay)~~ feat(wizard-ci): real-TUI e2e + snapshot review Jun 23, 2026

github-advanced-security AI found potential problems Jun 23, 2026

View reviewed changes

Comment thread .github/workflows/wizard-snapshots.yml Fixed

Comment thread .github/workflows/wizard-snapshots.yml Fixed

Comment thread .github/workflows/wizard-snapshots.yml Fixed

Comment thread .github/workflows/wizard-snapshots.yml Fixed

gewenyu99 mentioned this pull request Jun 23, 2026

feat(e2e-harness): drive and snapshot the real wizard TUI PostHog/wizard#702

Merged

gewenyu99 marked this pull request as ready for review June 23, 2026 20:36

gewenyu99 and others added 21 commits June 23, 2026 21:30

fix(wizard-ci): print an openable command for the snapshot report

682ea07

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

refactor(wizard-ci): the e2e run always runs the agent (drop RUN_AGENT)

ddfaf77

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ci(wizard-ci): install Chromium for the snapshots path (screenshot ra…

9aeded2

…sterize) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

fix(wizard-ci): git() takes a string, not an array (snapshot-review P…

7eefbd7

…R step) git('rev-parse HEAD'), not the array form — the helper runs git ${cmd}. Pass cwd to getRepoRoot too. tsc clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(wizard-ci): commit raw .txt snapshots into the review PR (debugg…

d629f62

…able text) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gewenyu99 force-pushed the wizard-e2e-control-plane branch from 86e2813 to 1bdfbed Compare June 24, 2026 01:31

gewenyu99 and others added 4 commits June 24, 2026 11:31

refactor(wizard-ci): unify host-auth env strip on one regex

f13d97d

Replace the explicit strip list with the same /^(CLAUDE|ANTHROPIC|AI_AGENT)/ predicate the wizard's MCP host uses, so the two can't drift. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Potential fix for pull request finding 'CodeQL / Untrusted Checkout T…

d70e67f

…OCTOU' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Merge branch 'main' into wizard-e2e-control-plane

bbadf94

gewenyu99 and others added 3 commits June 24, 2026 19:02

gewenyu99 mentioned this pull request Jun 25, 2026

ci: add /wizard-snapshots comment trigger PostHog/wizard#728

Open

gewenyu99 and others added 4 commits June 25, 2026 09:57

gewenyu99 commented Jun 25, 2026

View reviewed changes

gewenyu99 requested a review from a team June 25, 2026 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(wizard-ci): real-TUI e2e + snapshot review#2012

feat(wizard-ci): real-TUI e2e + snapshot review#2012
gewenyu99 wants to merge 32 commits into
mainfrom
wizard-e2e-control-plane

gewenyu99 commented Jun 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gewenyu99 Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gewenyu99 commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

What this is

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gewenyu99 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gewenyu99 commented Jun 21, 2026 •

edited

Loading