Skip to content

refactor(review): add per-agent reviewers and orchestrator (2/3)#1106

Open
peyton-alt wants to merge 4 commits intofeat/entire-review-v2-a1from
feat/entire-review-v2-a2
Open

refactor(review): add per-agent reviewers and orchestrator (2/3)#1106
peyton-alt wants to merge 4 commits intofeat/entire-review-v2-a1from
feat/entire-review-v2-a2

Conversation

@peyton-alt
Copy link
Copy Markdown
Contributor

@peyton-alt peyton-alt commented May 4, 2026

https://entire.io/gh/entireio/cli/trails/287

Summary

Second of three stacked PRs rebuilding `entire review`'s internals. Stacked on PR-A1 (#1105). Implements the abstractions PR-A1 declared: per-agent `AgentReviewer` for the three launchable agents, the `Sink`/`Run` orchestrator that consumes them, and the prompt + scope composition layer. Still no user-visible behaviour change — `runReview` is rewired in PR-A3.

Stack:

What's in this PR

CU3 — Per-agent reviewers (`74133788e`)

  • `agent/claudecode/reviewer.go`, `agent/codex/reviewer.go` + `output_filter.go` (chrome stripping), `agent/geminicli/reviewer.go`
  • Each implements `AgentReviewer` with per-agent quirks (claude argv, codex stdin + chrome filter, gemini stdin) owned inside that agent's package
  • Anchored regexes in codex's filter prevent eating narrative containing `exec` / `version N.M` (regression-tested)
  • All three parsers correctly surface scanner errors as `RunError` + `Finished{Success: false}` instead of false-success

CU4 — Sink + DumpSink + single-agent orchestrator (`a7b916143`)

  • `review/types/sink.go`: `Sink` interface (with serial-dispatch concurrency contract), `RunSummary`, `AgentRun`, `AgentStatus` enum
  • `review/dump.go`: `DumpSink` post-run terminal renderer
  • `review/run.go`: `Run(ctx, reviewer, cfg, sinks) (RunSummary, error)` orchestrator
  • `classifyStatus` checks `ctx.Err()` before `waitErr` so signal-killed processes are correctly classified as `Cancelled` (regression from PR Feat/entire review multi agent #1018)

CU5 — Prompt composition + scope detection (`2bc20b166`)

  • `review/prompt.go`: `ComposeReviewPrompt(cfg)` joins skills + always-prompt + per-run-prompt + scope clause
  • `review/scope.go`: `detectScopeBaseRef` finds closest non-self ancestor branch by tip timestamp (fallback chain origin/HEAD → origin/main → origin/master → main → master); `ComputeScopeStats`/`formatScopeBanner` produce "Reviewing feat/X vs main: 3 commits, 7 files changed, 2 uncommitted"
  • Consolidates the three placeholder `composePrompt` helpers from CU3 into a single shared composer

CU5b — Extract ReviewerTemplate refactor (`013d5f4a2`)

  • `review/types/template.go`: shared scaffolding (`Spawn → pipe stdout → run parser → forward events → close`) for per-agent reviewers
  • Each agent file shrinks from ~140 lines to ~70-80 — agent files are now genuinely just "this agent's quirks" (argv + parser)
  • Pure refactor; all CU3 tests pass unchanged

Test plan

  • `mise run check` (fmt + lint + test:ci) green
  • Per-agent contract tests: `go test ./cmd/entire/cli/agent/{claudecode,codex,geminicli}/` (each agent's argv, stdin, env vars, parser, scanner.Err handling)
  • Orchestrator tests: `go test ./cmd/entire/cli/review/` (15 new tests covering Run() success/cancel/start-error paths, token overwrite semantics, sink fan-out)
  • Output filter tests: `go test ./cmd/entire/cli/agent/codex/` covering banner/exec-block/CSI/hook/timestamp drops + narrative passthrough including "version N.M" and "exec succeeded"

Smoke test (package-level contract)

The per-agent reviewer tests exercise each reviewer end-to-end with golden-file fixtures (real argv construction, real env, real parser pipeline). Plus the orchestrator tests drive `Run()` against stub reviewers under success/cancel/error paths.

```bash
git checkout feat/entire-review-v2-a2
go build -o /tmp/entire-smoke-a2 ./cmd/entire # confirms compilation
go test ./cmd/entire/cli/agent/{claudecode,codex,geminicli}/ \
./cmd/entire/cli/review/ \
./cmd/entire/cli/review/types/ -count=1
```

Expected: all 5 packages green.

Actual output (verified locally on `feat/entire-review-v2-a2` @ 013d5f4):
```
ok cmd/entire/cli/agent/claudecode 3.999s
ok cmd/entire/cli/agent/codex 0.543s
ok cmd/entire/cli/agent/geminicli 0.833s
ok cmd/entire/cli/review 1.202s
ok cmd/entire/cli/review/types 1.101s
```
✓ All package-level smoke green.

A real-agent end-to-end smoke (orchestrator + DumpSink against live claude-code) is more meaningful at PR-A3 where `entire review` actually invokes this layer; deferred to that PR's smoke procedure.

Anti-features explicitly NOT recreated

  • `Launcher` + `HeadlessLauncher` as separate interfaces (single `AgentReviewer` instead)
  • `filterCodexOutput` in shared multi-agent code (lives in `agent/codex/output_filter.go`)
  • `sync.Once`-guarded onCancel + parallel `signal.Notify` goroutine (single cancel function from start; multi-agent fan-in preserves it)
  • Default scope divergence between agents (`origin/main...HEAD` vs working-tree-only) — `ComposeReviewPrompt` adds an explicit scope clause for all agents

Note

Medium Risk
Introduces new review execution plumbing (process spawning, event parsing, git scope detection, and status classification), which could affect correctness of review runs and cancellation/error reporting despite being well-tested.

Overview
Adds the next layer of the entire review rewrite: per-agent AgentReviewer implementations for claude-code, codex, and gemini-cli, including codex stdout chrome stripping and consistent parser error reporting as RunError + Finished{Success:false}.

Introduces a single-agent Run() orchestrator with Sink/RunSummary/AgentRun/AgentStatus and a DumpSink renderer, plus shared ComposeReviewPrompt and git-based scope detection (detectScopeBaseRef/ComputeScopeStats) to standardize what commits the agents should review.

Also tightens the env-var handshake with AppendReviewEnv (deduping stale ENTIRE_REVIEW_* entries), updates agent import-architecture allowlists, and expands unit/golden tests across agents, orchestrator, prompt, env, and scope.

Reviewed by Cursor Bugbot for commit ce17205. Configure here.

Copilot AI review requested due to automatic review settings May 4, 2026 16:44
@peyton-alt peyton-alt requested a review from a team as a code owner May 4, 2026 16:44
@peyton-alt peyton-alt changed the title refactor(review): per-agent reviewers + orchestrator + scope detection (PR-A2 of 3) refactor(review): add per-agent reviewers and orchestrator (2/3) May 4, 2026
Comment thread cmd/entire/cli/review/scope.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR (A2/3) implements the concrete building blocks for the upcoming entire review internal refactor: per-agent AgentReviewer adapters, a shared ReviewerTemplate, a single-agent Run orchestrator with sink fan-out, and shared prompt/scope composition utilities (still not wired into the user-facing command until PR-A3).

Changes:

  • Added ReviewerTemplate scaffolding in review/types and implemented per-agent reviewers for claude-code, codex (with chrome output filtering), and gemini-cli, each with contract-style tests.
  • Introduced review run consumption primitives: Sink, RunSummary/AgentRun, DumpSink, and the single-agent Run() orchestrator with tests.
  • Added shared prompt composition (ComposeReviewPrompt) and git scope detection/statistics (detectScopeBaseRef, ComputeScopeStats) with thorough test coverage.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
cmd/entire/cli/review/types/template.go Adds shared ReviewerTemplate implementation for spawning/parsing agent processes.
cmd/entire/cli/review/types/template_test.go Unit tests for event forwarding and wait behavior of ReviewerTemplate.
cmd/entire/cli/review/types/sink.go Defines Sink, RunSummary, AgentRun, and AgentStatus.
cmd/entire/cli/review/types/sink_test.go Tests AgentStatus.String and sink interface compilation/zero-value behavior.
cmd/entire/cli/review/run.go Implements single-agent orchestrator Run() plus status classification.
cmd/entire/cli/review/run_test.go Tests orchestrator success/cancel/failure/start-error paths and sink fan-out.
cmd/entire/cli/review/dump.go Adds DumpSink post-run renderer reading AgentRun.Buffer.
cmd/entire/cli/review/dump_test.go Tests dump output for success/failure/cancel/mixed/empty summaries.
cmd/entire/cli/review/prompt.go Adds shared prompt composition including optional scope clause.
cmd/entire/cli/review/prompt_test.go Tests prompt section joining and whitespace handling.
cmd/entire/cli/review/scope.go Implements base-ref detection + scope stats via go-git + git CLI helpers.
cmd/entire/cli/review/scope_test.go Integration-heavy tests for scope detection and counts in temp repos.
cmd/entire/cli/review/env.go Adds AppendReviewEnv helper to set ENTIRE_REVIEW_* env vars on spawned agents.
cmd/entire/cli/agent/claudecode/reviewer.go Claude per-agent reviewer (argv prompt) using shared template/prompt/env.
cmd/entire/cli/agent/claudecode/reviewer_test.go Contract tests for Claude reviewer argv/env parsing and scanner error handling.
cmd/entire/cli/agent/claudecode/testdata/canned_session.txt Golden fixture for Claude parser tests.
cmd/entire/cli/agent/codex/reviewer.go Codex per-agent reviewer (stdin prompt) with filtered stdout parsing.
cmd/entire/cli/agent/codex/reviewer_test.go Contract tests for Codex reviewer argv/env and filtered narrative output.
cmd/entire/cli/agent/codex/output_filter.go Codex chrome stripping (banners/hooks/exec/CSI/timestamp logs).
cmd/entire/cli/agent/codex/output_filter_test.go Unit tests for codex output filtering rules and regression cases.
cmd/entire/cli/agent/codex/testdata/canned_exec.txt Golden fixture exercising codex chrome filtering + narrative retention.
cmd/entire/cli/agent/geminicli/reviewer.go Gemini per-agent reviewer (stdin prompt) using shared template/prompt/env.
cmd/entire/cli/agent/geminicli/reviewer_test.go Contract tests for Gemini reviewer argv/env parsing and scanner error handling.
cmd/entire/cli/agent/geminicli/testdata/canned_session.txt Golden fixture for Gemini parser tests.
cmd/entire/cli/agent/architecture_test.go Updates forbidden-import allowlist to permit importing the new review package.
.golangci.yaml Allows returning review/types.Process for ireturn linter configuration.

Comment thread cmd/entire/cli/review/run.go Outdated
Comment thread cmd/entire/cli/review/dump.go
Comment thread cmd/entire/cli/review/types/template.go
Comment thread cmd/entire/cli/review/scope.go Outdated
Comment thread cmd/entire/cli/agent/codex/output_filter.go Outdated
Comment thread cmd/entire/cli/review/env.go
@peyton-alt peyton-alt force-pushed the feat/entire-review-v2-a2 branch from 013d5f4 to c6c5aef Compare May 4, 2026 17:10
@peyton-alt peyton-alt force-pushed the feat/entire-review-v2-a1 branch from c4b7bce to 37f5f0f Compare May 4, 2026 17:10
peyton-alt and others added 2 commits May 4, 2026 10:40
Each agent implements types.AgentReviewer in its own package, owning
its quirks: argv/stdin shape, env propagation, output adapter.

  - claude-code: argv `claude -p <prompt>`. Plain-text assistant
    output emitted as AssistantText events.
  - codex: argv `codex exec --skip-git-repo-check -`, prompt via
    stdin. output_filter.go strips banners, exec-block tool calls,
    hook-firing notices, CSI sequences, and timestamped error logs
    before AssistantText emission. Anchored regexes (^exec$ /
    ^exec <cmd> in /<path>) avoid eating narrative containing 'exec'.
  - gemini-cli: argv `gemini -p ' '`, prompt via stdin. Clean stdout,
    no chrome filtering.

All three set ENTIRE_REVIEW_{SESSION,AGENT,SKILLS,PROMPT,STARTING_SHA}
on the spawned process so the lifecycle hook (CU1) adopts the session
as a review.

Per-agent contract tests cover argv shape, stdin wiring, env vars set,
deferred binary lookup (TestNoBinaryRequiredAtConstruction), and a
golden-file event-stream test feeding canned stdout fixtures through
the parser.

Codex's chrome filter is exercised by output_filter_test.go separately
from reviewer_test.go: filter logic stays inside agent/codex/, never
in shared code.

Compile-time interface check (var _ AgentReviewer = ...) ensures all
three implementations satisfy the interface defined in CU2.

Also: add review/* to architecture_test.go allowedPrefixes (per-agent
reviewer.go files import review env consts and review/types), and add
review/types.Process to ireturn allow list in .golangci.yaml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The orchestration layer consuming AgentReviewer from CU3.

  - types/sink.go: Sink interface (AgentEvent during run, RunFinished
    once at end), RunSummary, AgentRun (per-agent post-run data),
    AgentStatus enum. Sinks MUST NOT block.
  - review/dump.go: DumpSink — post-run terminal narrative renderer.
    AgentEvent is a no-op; rendering reads from AgentRun.Buffer in
    RunFinished. Handles succeeded/failed/cancelled per-agent
    distinctly; closes with an N-agents-done counts line.
  - review/run.go: Run() — single-agent orchestrator. Forwards events
    to all sinks as they arrive, classifies terminal status, calls
    RunFinished on each sink with the populated summary.

classifyStatus checks ctx.Err() before waitErr so signal-killed
processes (ExitError exit code -1) are correctly classified as
Cancelled rather than Failed (regression class from PR #1018
commit 8e82e40).

AgentRun.Buffer holds the full event log per agent for post-run
rendering. A TODO(memory) flags this for revisiting if very long
reviews put pressure on memory.

CU8 generalizes Run to RunMulti for N-agent fan-out, re-using the
same Sink and RunSummary types.
@peyton-alt peyton-alt force-pushed the feat/entire-review-v2-a2 branch from c6c5aef to ce17205 Compare May 4, 2026 17:45
@peyton-alt
Copy link
Copy Markdown
Contributor Author

@BugBot review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit ce17205. Configure here.

Comment thread cmd/entire/cli/review/scope.go
Comment thread cmd/entire/cli/agent/codex/testdata/canned_exec.txt
peyton-alt and others added 2 commits May 4, 2026 17:28
review/prompt.go's ComposeReviewPrompt assembles the agent prompt from
Skills + AlwaysPrompt + PerRunPrompt + scope clause. Empty sections
are skipped (no triple-newline gaps — regression class from PR #1018
commit 13415f5). The scope clause pins agents to 'commits unique
to this branch vs <closest-ancestor>' to prevent the divergent-default
problem where codex used origin/main...HEAD and claude used
working-tree-only on the same invocation (fixed in #1018 commit
b9ed9c0, structurally enforced here).

review/scope.go's detectScopeBaseRef finds the closest non-self
ancestor branch by tip timestamp, with fallback chain origin/HEAD →
origin/main → origin/master → main → master. ComputeScopeStats and
formatScopeBanner produce the 'Reviewing feat/X vs main: 3 commits,
7 files changed, 2 uncommitted' output that CU6 will print before
agent launch.

Consolidates CU3's placeholder composePrompt and duplicated
appendReviewEnv into shared helpers in review/. Each agent reviewer
(claudecode, codex, geminicli) now calls review.ComposeReviewPrompt
and review.AppendReviewEnv directly. Per-agent local copies removed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CU3's three reviewers (claudecode, codex, geminicli) shared ~50 lines
of identical Start/Process scaffolding. Only buildCmd (argv/env) and
the stdout parser genuinely differ per agent.

types.ReviewerTemplate now owns the shared lifecycle (spawn → pipe
stdout → run parser → forward events → close on exit). Each agent's
reviewer.go is reduced to:

  - New() returning a *ReviewerTemplate populated with the agent's
    name, BuildCmd, and Parser.
  - buildReviewCmd: argv + ENTIRE_REVIEW_* env propagation.
  - parseXxxOutput: agent-specific stdout-to-Event stream (with the
    scanner.Err handling from the CU3 fix loop preserved unchanged).

Per-agent file: ~140 → ~80 lines. Total: ~150 LoC saved.

Behavior identical — all CU3 tests pass without modification.

Adding a 4th launchable agent now requires only its buildCmd and
parser; the lifecycle scaffolding is no longer per-agent code.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@peyton-alt peyton-alt force-pushed the feat/entire-review-v2-a2 branch from ce17205 to aa8be02 Compare May 5, 2026 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants