refactor(review): add per-agent reviewers and orchestrator (2/3) by peyton-alt · Pull Request #1106 · entireio/cli

peyton-alt · 2026-05-04T16:44:49Z

https://entire.io/gh/entireio/cli/trails/287

Summary

Second of three stacked PRs rebuilding `entire review`'s internals. Stacked on PR-A1 (#1105). Implements the abstractions PR-A1 declared: per-agent `AgentReviewer` for the three launchable agents, the `Sink`/`Run` orchestrator that consumes them, and the prompt + scope composition layer. Still no user-visible behaviour change — `runReview` is rewired in PR-A3.

Stack:

A1 (refactor(review): replace pending-marker file; add reviewer interfaces (1/3) #1105 — merge first): env-var handshake + types
A2 (this PR): per-agent reviewers, orchestrator, prompt composition, scope detection
A3 → a2: wire `entire review` on the new architecture; delete legacy `review.go`

What's in this PR

CU3 — Per-agent reviewers (`74133788e`)

`agent/claudecode/reviewer.go`, `agent/codex/reviewer.go` + `output_filter.go` (chrome stripping), `agent/geminicli/reviewer.go`
Each implements `AgentReviewer` with per-agent quirks (claude argv, codex stdin + chrome filter, gemini stdin) owned inside that agent's package
Anchored regexes in codex's filter prevent eating narrative containing `exec` / `version N.M` (regression-tested)
All three parsers correctly surface scanner errors as `RunError` + `Finished{Success: false}` instead of false-success

CU4 — Sink + DumpSink + single-agent orchestrator (`a7b916143`)

`review/types/sink.go`: `Sink` interface (with serial-dispatch concurrency contract), `RunSummary`, `AgentRun`, `AgentStatus` enum
`review/dump.go`: `DumpSink` post-run terminal renderer
`review/run.go`: `Run(ctx, reviewer, cfg, sinks) (RunSummary, error)` orchestrator
`classifyStatus` checks `ctx.Err()` before `waitErr` so signal-killed processes are correctly classified as `Cancelled` (regression from PR Feat/entire review multi agent #1018)

CU5 — Prompt composition + scope detection (`2bc20b166`)

`review/prompt.go`: `ComposeReviewPrompt(cfg)` joins skills + always-prompt + per-run-prompt + scope clause
`review/scope.go`: `detectScopeBaseRef` finds closest non-self ancestor branch by tip timestamp (fallback chain origin/HEAD → origin/main → origin/master → main → master); `ComputeScopeStats`/`formatScopeBanner` produce "Reviewing feat/X vs main: 3 commits, 7 files changed, 2 uncommitted"
Consolidates the three placeholder `composePrompt` helpers from CU3 into a single shared composer

CU5b — Extract ReviewerTemplate refactor (`013d5f4a2`)

`review/types/template.go`: shared scaffolding (`Spawn → pipe stdout → run parser → forward events → close`) for per-agent reviewers
Each agent file shrinks from ~140 lines to ~70-80 — agent files are now genuinely just "this agent's quirks" (argv + parser)
Pure refactor; all CU3 tests pass unchanged

Test plan

`mise run check` (fmt + lint + test:ci) green
Per-agent contract tests: `go test ./cmd/entire/cli/agent/{claudecode,codex,geminicli}/` (each agent's argv, stdin, env vars, parser, scanner.Err handling)
Orchestrator tests: `go test ./cmd/entire/cli/review/` (15 new tests covering Run() success/cancel/start-error paths, token overwrite semantics, sink fan-out)
Output filter tests: `go test ./cmd/entire/cli/agent/codex/` covering banner/exec-block/CSI/hook/timestamp drops + narrative passthrough including "version N.M" and "exec succeeded"

Smoke test (package-level contract)

The per-agent reviewer tests exercise each reviewer end-to-end with golden-file fixtures (real argv construction, real env, real parser pipeline). Plus the orchestrator tests drive `Run()` against stub reviewers under success/cancel/error paths.

```bash
git checkout feat/entire-review-v2-a2
go build -o /tmp/entire-smoke-a2 ./cmd/entire # confirms compilation
go test ./cmd/entire/cli/agent/{claudecode,codex,geminicli}/ \
./cmd/entire/cli/review/ \
./cmd/entire/cli/review/types/ -count=1
```

Expected: all 5 packages green.

Actual output (verified locally on `feat/entire-review-v2-a2` @ 013d5f4):
```
ok cmd/entire/cli/agent/claudecode 3.999s
ok cmd/entire/cli/agent/codex 0.543s
ok cmd/entire/cli/agent/geminicli 0.833s
ok cmd/entire/cli/review 1.202s
ok cmd/entire/cli/review/types 1.101s
```
✓ All package-level smoke green.

A real-agent end-to-end smoke (orchestrator + DumpSink against live claude-code) is more meaningful at PR-A3 where `entire review` actually invokes this layer; deferred to that PR's smoke procedure.

Anti-features explicitly NOT recreated

`Launcher` + `HeadlessLauncher` as separate interfaces (single `AgentReviewer` instead)
`filterCodexOutput` in shared multi-agent code (lives in `agent/codex/output_filter.go`)
`sync.Once`-guarded onCancel + parallel `signal.Notify` goroutine (single cancel function from start; multi-agent fan-in preserves it)
Default scope divergence between agents (`origin/main...HEAD` vs working-tree-only) — `ComposeReviewPrompt` adds an explicit scope clause for all agents

Note

Medium Risk
Introduces new review execution plumbing (process spawning, event parsing, git scope detection, and status classification), which could affect correctness of review runs and cancellation/error reporting despite being well-tested.

Overview
Adds the next layer of the entire review rewrite: per-agent AgentReviewer implementations for claude-code, codex, and gemini-cli, including codex stdout chrome stripping and consistent parser error reporting as RunError + Finished{Success:false}.

Introduces a single-agent Run() orchestrator with Sink/RunSummary/AgentRun/AgentStatus and a DumpSink renderer, plus shared ComposeReviewPrompt and git-based scope detection (detectScopeBaseRef/ComputeScopeStats) to standardize what commits the agents should review.

Also tightens the env-var handshake with AppendReviewEnv (deduping stale ENTIRE_REVIEW_* entries), updates agent import-architecture allowlists, and expands unit/golden tests across agents, orchestrator, prompt, env, and scope.

^{Reviewed by Cursor Bugbot for commit ce17205. Configure here.}

Copilot

Pull request overview

This PR (A2/3) implements the concrete building blocks for the upcoming entire review internal refactor: per-agent AgentReviewer adapters, a shared ReviewerTemplate, a single-agent Run orchestrator with sink fan-out, and shared prompt/scope composition utilities (still not wired into the user-facing command until PR-A3).

Changes:

Added ReviewerTemplate scaffolding in review/types and implemented per-agent reviewers for claude-code, codex (with chrome output filtering), and gemini-cli, each with contract-style tests.
Introduced review run consumption primitives: Sink, RunSummary/AgentRun, DumpSink, and the single-agent Run() orchestrator with tests.
Added shared prompt composition (ComposeReviewPrompt) and git scope detection/statistics (detectScopeBaseRef, ComputeScopeStats) with thorough test coverage.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
cmd/entire/cli/review/types/template.go	Adds shared `ReviewerTemplate` implementation for spawning/parsing agent processes.
cmd/entire/cli/review/types/template_test.go	Unit tests for event forwarding and wait behavior of `ReviewerTemplate`.
cmd/entire/cli/review/types/sink.go	Defines `Sink`, `RunSummary`, `AgentRun`, and `AgentStatus`.
cmd/entire/cli/review/types/sink_test.go	Tests `AgentStatus.String` and sink interface compilation/zero-value behavior.
cmd/entire/cli/review/run.go	Implements single-agent orchestrator `Run()` plus status classification.
cmd/entire/cli/review/run_test.go	Tests orchestrator success/cancel/failure/start-error paths and sink fan-out.
cmd/entire/cli/review/dump.go	Adds `DumpSink` post-run renderer reading `AgentRun.Buffer`.
cmd/entire/cli/review/dump_test.go	Tests dump output for success/failure/cancel/mixed/empty summaries.
cmd/entire/cli/review/prompt.go	Adds shared prompt composition including optional scope clause.
cmd/entire/cli/review/prompt_test.go	Tests prompt section joining and whitespace handling.
cmd/entire/cli/review/scope.go	Implements base-ref detection + scope stats via go-git + git CLI helpers.
cmd/entire/cli/review/scope_test.go	Integration-heavy tests for scope detection and counts in temp repos.
cmd/entire/cli/review/env.go	Adds `AppendReviewEnv` helper to set ENTIRE_REVIEW_* env vars on spawned agents.
cmd/entire/cli/agent/claudecode/reviewer.go	Claude per-agent reviewer (argv prompt) using shared template/prompt/env.
cmd/entire/cli/agent/claudecode/reviewer_test.go	Contract tests for Claude reviewer argv/env parsing and scanner error handling.
cmd/entire/cli/agent/claudecode/testdata/canned_session.txt	Golden fixture for Claude parser tests.
cmd/entire/cli/agent/codex/reviewer.go	Codex per-agent reviewer (stdin prompt) with filtered stdout parsing.
cmd/entire/cli/agent/codex/reviewer_test.go	Contract tests for Codex reviewer argv/env and filtered narrative output.
cmd/entire/cli/agent/codex/output_filter.go	Codex chrome stripping (banners/hooks/exec/CSI/timestamp logs).
cmd/entire/cli/agent/codex/output_filter_test.go	Unit tests for codex output filtering rules and regression cases.
cmd/entire/cli/agent/codex/testdata/canned_exec.txt	Golden fixture exercising codex chrome filtering + narrative retention.
cmd/entire/cli/agent/geminicli/reviewer.go	Gemini per-agent reviewer (stdin prompt) using shared template/prompt/env.
cmd/entire/cli/agent/geminicli/reviewer_test.go	Contract tests for Gemini reviewer argv/env parsing and scanner error handling.
cmd/entire/cli/agent/geminicli/testdata/canned_session.txt	Golden fixture for Gemini parser tests.
cmd/entire/cli/agent/architecture_test.go	Updates forbidden-import allowlist to permit importing the new review package.
.golangci.yaml	Allows returning `review/types.Process` for `ireturn` linter configuration.

Each agent implements types.AgentReviewer in its own package, owning its quirks: argv/stdin shape, env propagation, output adapter. - claude-code: argv `claude -p <prompt>`. Plain-text assistant output emitted as AssistantText events. - codex: argv `codex exec --skip-git-repo-check -`, prompt via stdin. output_filter.go strips banners, exec-block tool calls, hook-firing notices, CSI sequences, and timestamped error logs before AssistantText emission. Anchored regexes (^exec$ / ^exec <cmd> in /<path>) avoid eating narrative containing 'exec'. - gemini-cli: argv `gemini -p ' '`, prompt via stdin. Clean stdout, no chrome filtering. All three set ENTIRE_REVIEW_{SESSION,AGENT,SKILLS,PROMPT,STARTING_SHA} on the spawned process so the lifecycle hook (CU1) adopts the session as a review. Per-agent contract tests cover argv shape, stdin wiring, env vars set, deferred binary lookup (TestNoBinaryRequiredAtConstruction), and a golden-file event-stream test feeding canned stdout fixtures through the parser. Codex's chrome filter is exercised by output_filter_test.go separately from reviewer_test.go: filter logic stays inside agent/codex/, never in shared code. Compile-time interface check (var _ AgentReviewer = ...) ensures all three implementations satisfy the interface defined in CU2. Also: add review/* to architecture_test.go allowedPrefixes (per-agent reviewer.go files import review env consts and review/types), and add review/types.Process to ireturn allow list in .golangci.yaml. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The orchestration layer consuming AgentReviewer from CU3. - types/sink.go: Sink interface (AgentEvent during run, RunFinished once at end), RunSummary, AgentRun (per-agent post-run data), AgentStatus enum. Sinks MUST NOT block. - review/dump.go: DumpSink — post-run terminal narrative renderer. AgentEvent is a no-op; rendering reads from AgentRun.Buffer in RunFinished. Handles succeeded/failed/cancelled per-agent distinctly; closes with an N-agents-done counts line. - review/run.go: Run() — single-agent orchestrator. Forwards events to all sinks as they arrive, classifies terminal status, calls RunFinished on each sink with the populated summary. classifyStatus checks ctx.Err() before waitErr so signal-killed processes (ExitError exit code -1) are correctly classified as Cancelled rather than Failed (regression class from PR #1018 commit 8e82e40). AgentRun.Buffer holds the full event log per agent for post-run rendering. A TODO(memory) flags this for revisiting if very long reviews put pressure on memory. CU8 generalizes Run to RunMulti for N-agent fan-out, re-using the same Sink and RunSummary types.

peyton-alt · 2026-05-04T18:07:35Z

@BugBot review

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit ce17205. Configure here.}

review/prompt.go's ComposeReviewPrompt assembles the agent prompt from Skills + AlwaysPrompt + PerRunPrompt + scope clause. Empty sections are skipped (no triple-newline gaps — regression class from PR #1018 commit 13415f5). The scope clause pins agents to 'commits unique to this branch vs <closest-ancestor>' to prevent the divergent-default problem where codex used origin/main...HEAD and claude used working-tree-only on the same invocation (fixed in #1018 commit b9ed9c0, structurally enforced here). review/scope.go's detectScopeBaseRef finds the closest non-self ancestor branch by tip timestamp, with fallback chain origin/HEAD → origin/main → origin/master → main → master. ComputeScopeStats and formatScopeBanner produce the 'Reviewing feat/X vs main: 3 commits, 7 files changed, 2 uncommitted' output that CU6 will print before agent launch. Consolidates CU3's placeholder composePrompt and duplicated appendReviewEnv into shared helpers in review/. Each agent reviewer (claudecode, codex, geminicli) now calls review.ComposeReviewPrompt and review.AppendReviewEnv directly. Per-agent local copies removed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

CU3's three reviewers (claudecode, codex, geminicli) shared ~50 lines of identical Start/Process scaffolding. Only buildCmd (argv/env) and the stdout parser genuinely differ per agent. types.ReviewerTemplate now owns the shared lifecycle (spawn → pipe stdout → run parser → forward events → close on exit). Each agent's reviewer.go is reduced to: - New() returning a *ReviewerTemplate populated with the agent's name, BuildCmd, and Parser. - buildReviewCmd: argv + ENTIRE_REVIEW_* env propagation. - parseXxxOutput: agent-specific stdout-to-Event stream (with the scanner.Err handling from the CU3 fix loop preserved unchanged). Per-agent file: ~140 → ~80 lines. Total: ~150 LoC saved. Behavior identical — all CU3 tests pass without modification. Adding a 4th launchable agent now requires only its buildCmd and parser; the lifecycle scaffolding is no longer per-agent code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 4, 2026 16:44

peyton-alt requested a review from a team as a code owner May 4, 2026 16:44

Copilot started reviewing on behalf of peyton-alt May 4, 2026 16:45 View session

peyton-alt mentioned this pull request May 4, 2026

refactor(review): wire entire review on new architecture; delete legacy (3/3) #1107

Open

5 tasks

peyton-alt changed the title ~~refactor(review): per-agent reviewers + orchestrator + scope detection (PR-A2 of 3)~~ refactor(review): add per-agent reviewers and orchestrator (2/3) May 4, 2026

cursor Bot reviewed May 4, 2026

View reviewed changes

Comment thread cmd/entire/cli/review/scope.go

Copilot AI reviewed May 4, 2026

View reviewed changes

peyton-alt force-pushed the feat/entire-review-v2-a2 branch from 013d5f4 to c6c5aef Compare May 4, 2026 17:10

peyton-alt force-pushed the feat/entire-review-v2-a1 branch from c4b7bce to 37f5f0f Compare May 4, 2026 17:10

peyton-alt and others added 2 commits May 4, 2026 10:40

peyton-alt force-pushed the feat/entire-review-v2-a2 branch from c6c5aef to ce17205 Compare May 4, 2026 17:45

cursor Bot reviewed May 4, 2026

View reviewed changes

Comment thread cmd/entire/cli/review/scope.go

Comment thread cmd/entire/cli/agent/codex/testdata/canned_exec.txt

This was referenced May 4, 2026

refactor(review): multi-agent picker, orchestrator, and live TUI (1/2) #1111

Open

refactor(review): cross-agent synthesis, plugin-cache dedupe, docs (2/2) #1112

Open

peyton-alt and others added 2 commits May 4, 2026 17:28

peyton-alt force-pushed the feat/entire-review-v2-a2 branch from ce17205 to aa8be02 Compare May 5, 2026 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(review): add per-agent reviewers and orchestrator (2/3)#1106

refactor(review): add per-agent reviewers and orchestrator (2/3)#1106
peyton-alt wants to merge 4 commits intofeat/entire-review-v2-a1from
feat/entire-review-v2-a2

peyton-alt commented May 4, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peyton-alt commented May 4, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

peyton-alt commented May 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in this PR

Test plan

Smoke test (package-level contract)

Anti-features explicitly NOT recreated

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peyton-alt commented May 4, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

peyton-alt commented May 4, 2026 •

edited by cursor Bot

Loading