Skip to content

feat(review): custom review profiles (backend)#913

Open
backnotprop wants to merge 3 commits into
mainfrom
feat/custom-reviews
Open

feat(review): custom review profiles (backend)#913
backnotprop wants to merge 3 commits into
mainfrom
feat/custom-reviews

Conversation

@backnotprop

Copy link
Copy Markdown
Owner

What

Named, reusable review profiles — "Security", "Performance", "API Contract" — that run through the existing Claude/Codex background review pipeline. A developer drops a JSON file, picks it at launch, and findings come back as the same external diff annotations as today.

{ "instructions": "Focus only on security-impacting issues." }

That one-line file is a complete, valid profile.

This PR is backend only — no UI yet. It's fully drivable over HTTP, and the existing "Run Review" path is unchanged: an absent reviewProfileId resolves to builtin:default and produces a byte-identical prompt. Nothing regresses while this lands.

Authoritative spec: docs/custom-reviews.md. Current-pipeline reference map it builds on: docs/agent-jobs-code-review-handoff.md.

How it works

pick profile → resolve (builtin/user/repo) → sectioned prompt → Claude/Codex
  → parse → severity-normalize → in-diff filter → external annotations
  • packages/shared/review-profiles.ts (new, runtime-agnostic) — types, shape validation, filename → id inference, label inference, trivial name-clash resolution, and the sectioned prompt composer.
  • packages/server/review-profile-loader.ts (new) — discovery across built-in + user dir (${PLANNOTATOR_DATA_DIR}/reviews/) + repo dir (.plannotator/reviews/), skip-and-log on malformed files.
  • GET /api/agents/review-profiles — discovery endpoint (Bun + Pi).
  • reviewProfileId on POST /api/agents/jobs — parsed and forwarded; rejects unknown fields; no inline prompt text accepted. reviewProfileId/reviewProfileLabel added to AgentJobInfo.
  • Launch-time resolution at the existing launch-state snapshot boundary; repo profiles scoped to one unambiguous local repo (absent in remote-only PR / ambiguous workspace mode).
  • Severity normalization — Codex priority 0-3 → the shared important | nit | pre_existing scale, in one place, so badges read the same regardless of engine.
  • In-diff filter + completion semantics — drop findings not in the reviewed patch; fail the job if all findings are out-of-diff or output is missing/unparseable (the Code Tour precedent). Calm one-liner: 8 findings · 2 skipped (not in the reviewed diff).
  • Profile label as annotation metadataauthor stays the engine name (Claude Code / Codex); the profile rides alongside.
  • Bun/Pi parity maintained via vendor.sh.

Scope discipline

The spec has an explicit "Deliberately not building" section, honored here: no precedence engine, no content hashing, no diagnostics channel, no ReviewEngineAdapter interface, no anchoring module, no inline reviewPrompt, no temp-profile machinery. Two new modules, the rest small edits.

Deferred (not in this PR)

Frontend: the engine/review picker in AgentsTab.tsx, sending reviewProfileId from useAgentJobs.ts, the repo-source tag, and the profile tag on annotations.

Verification

  • Typecheck: clean (packages/shared + packages/server tsconfigs)
  • New suites: review-profiles, loader, launch, prompt composition, ingestion — all green
  • Regression: full server + AI suites pass, 0 fail
  • Vendor: vendor.sh idempotent (no Pi drift on re-run)

Built via a phased pipeline with per-phase perspective-diverse adversarial review (correctness / spec-fidelity / Bun-Pi parity) and a final whole-feature audit. Verification re-run independently after handoff.

Review notes

The two-engine prompt/severity seams (claude-review.ts, codex-review.ts, and onJobComplete in review.ts) and the Bun↔Pi mirror in serverReview.ts / apps/pi-extension/server/agent-jobs.ts are the highest-value places to scrutinize. The byte-identical-default-prompt invariant has a dedicated test but is worth a human eye.

🤖 Implemented with Claude Code via a multi-agent workflow; verified before PR.

Named, reusable review profiles ("Security", "Performance", "API Contract")
that run through the existing Claude/Codex background review pipeline. Drop a
JSON file in ~/.plannotator/reviews/ or .plannotator/reviews/, pick it at
launch, and findings come back as the same external annotations as today.

Backend only — no UI yet. The existing "Run Review" path is untouched: an
absent profile resolves to builtin:default and produces a byte-identical prompt.

- packages/shared/review-profiles.ts: types, shape validation, filename->id
  inference, label inference, trivial name-clash resolution (builtin reserved,
  user beats repo), sectioned prompt composer
- packages/server/review-profile-loader.ts: builtin + user-dir + repo-dir
  discovery (repo scoped to one unambiguous local repo)
- GET /api/agents/review-profiles discovery endpoint (Bun + Pi)
- reviewProfileId on POST /api/agents/jobs (rejects unknown fields; no inline
  prompts); reviewProfileId/Label added to AgentJobInfo
- launch-time profile resolution at the existing launch-state snapshot boundary
- Codex priority -> shared important|nit|pre_existing severity scale
- in-diff finding filter + fail-on-all-out-of-diff completion semantics
- profile label carried as annotation metadata (author stays the engine name)
- Bun/Pi parity via vendor.sh

Spec: docs/custom-reviews.md (with a "Deliberately not building" boundary —
no precedence engine, hashing, diagnostics channel, adapter interface, or
anchoring module).

Tests: review-profiles, loader, launch, prompt composition, ingestion. Full
server + ai suites green; typecheck clean; vendor idempotent.
Cuts agreed during review — the feature comes out smaller and sharper
(net -392 lines). A profile is now just a name + instructions in your
global folder, run with any engine/model, and every finding it produces
appears in the panel.

Cuts:
- Remove the in-diff finding filter and all reviewChangedFiles plumbing
  (Bun + Pi). Findings flow straight to addAnnotations like main — nothing
  is filtered, dropped, or hidden.
- Remove per-repo profiles. Global ~/.plannotator/reviews/ only; drops the
  repo source, repoCwd gating, and the PR/fork trust surface.
- Remove the engines restriction. A profile is id/label/instructions; engine
  and model stay independent runtime choices. Also drops the dead
  reviewProfile params from the Claude/Codex command builders.

Fixes & cleanups from review:
- Fail Claude jobs on empty/unparseable output (was silently "done";
  pre-existing gap, also helps the default review). Codex already did this.
- void the dangling debugLog call in the Codex transform.
- Remove the unreachable builtin:default reserved-id guard (namespacing
  already reserves it).
- Move markJobReviewFailed + REVIEW_OUTPUT_FAILED into packages/shared so
  Bun and Pi stop duplicating them.
- Document GET /api/agents/review-profiles in the endpoint table.
- Update docs/custom-reviews.md to match what ships (no per-repo, no
  engines, no finding-dropping).

Verified: typecheck clean; 765 server+ai+shared tests pass; vendor
idempotent; whole-repo remnant sweep clean.
- Guard the completion-semantics fix with tests: empty/whitespace/garbage
  Claude output -> parseClaudeStreamOutput null, and markJobReviewFailed
  flips the job to failed with a leak-free reason. (The fix had no test
  after the filter tests were removed.)
- Make Codex completion symmetric with Claude: handle the codex job
  unconditionally and treat a missing output path as REVIEW_OUTPUT_FAILED
  instead of silently leaving the job "done".
- Stop stamping reviewProfileLabel on default reviews — only custom
  profiles get a tag, so the future UI won't paint "Default" on every
  ordinary finding.
- Sort review-profile filenames so a bare-name clash resolves
  deterministically (first-seen wins) rather than by filesystem order.
- Extract the shared annotation map+ingest block in onJobComplete (was
  duplicated across the codex/claude branches) into one local helper.

Bun + Pi mirrored. Typecheck clean; 768 server+ai+shared tests pass;
vendor idempotent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant