feat(review): custom review profiles (backend)#913
Open
backnotprop wants to merge 3 commits into
Open
Conversation
Named, reusable review profiles ("Security", "Performance", "API Contract")
that run through the existing Claude/Codex background review pipeline. Drop a
JSON file in ~/.plannotator/reviews/ or .plannotator/reviews/, pick it at
launch, and findings come back as the same external annotations as today.
Backend only — no UI yet. The existing "Run Review" path is untouched: an
absent profile resolves to builtin:default and produces a byte-identical prompt.
- packages/shared/review-profiles.ts: types, shape validation, filename->id
inference, label inference, trivial name-clash resolution (builtin reserved,
user beats repo), sectioned prompt composer
- packages/server/review-profile-loader.ts: builtin + user-dir + repo-dir
discovery (repo scoped to one unambiguous local repo)
- GET /api/agents/review-profiles discovery endpoint (Bun + Pi)
- reviewProfileId on POST /api/agents/jobs (rejects unknown fields; no inline
prompts); reviewProfileId/Label added to AgentJobInfo
- launch-time profile resolution at the existing launch-state snapshot boundary
- Codex priority -> shared important|nit|pre_existing severity scale
- in-diff finding filter + fail-on-all-out-of-diff completion semantics
- profile label carried as annotation metadata (author stays the engine name)
- Bun/Pi parity via vendor.sh
Spec: docs/custom-reviews.md (with a "Deliberately not building" boundary —
no precedence engine, hashing, diagnostics channel, adapter interface, or
anchoring module).
Tests: review-profiles, loader, launch, prompt composition, ingestion. Full
server + ai suites green; typecheck clean; vendor idempotent.
Cuts agreed during review — the feature comes out smaller and sharper (net -392 lines). A profile is now just a name + instructions in your global folder, run with any engine/model, and every finding it produces appears in the panel. Cuts: - Remove the in-diff finding filter and all reviewChangedFiles plumbing (Bun + Pi). Findings flow straight to addAnnotations like main — nothing is filtered, dropped, or hidden. - Remove per-repo profiles. Global ~/.plannotator/reviews/ only; drops the repo source, repoCwd gating, and the PR/fork trust surface. - Remove the engines restriction. A profile is id/label/instructions; engine and model stay independent runtime choices. Also drops the dead reviewProfile params from the Claude/Codex command builders. Fixes & cleanups from review: - Fail Claude jobs on empty/unparseable output (was silently "done"; pre-existing gap, also helps the default review). Codex already did this. - void the dangling debugLog call in the Codex transform. - Remove the unreachable builtin:default reserved-id guard (namespacing already reserves it). - Move markJobReviewFailed + REVIEW_OUTPUT_FAILED into packages/shared so Bun and Pi stop duplicating them. - Document GET /api/agents/review-profiles in the endpoint table. - Update docs/custom-reviews.md to match what ships (no per-repo, no engines, no finding-dropping). Verified: typecheck clean; 765 server+ai+shared tests pass; vendor idempotent; whole-repo remnant sweep clean.
- Guard the completion-semantics fix with tests: empty/whitespace/garbage Claude output -> parseClaudeStreamOutput null, and markJobReviewFailed flips the job to failed with a leak-free reason. (The fix had no test after the filter tests were removed.) - Make Codex completion symmetric with Claude: handle the codex job unconditionally and treat a missing output path as REVIEW_OUTPUT_FAILED instead of silently leaving the job "done". - Stop stamping reviewProfileLabel on default reviews — only custom profiles get a tag, so the future UI won't paint "Default" on every ordinary finding. - Sort review-profile filenames so a bare-name clash resolves deterministically (first-seen wins) rather than by filesystem order. - Extract the shared annotation map+ingest block in onJobComplete (was duplicated across the codex/claude branches) into one local helper. Bun + Pi mirrored. Typecheck clean; 768 server+ai+shared tests pass; vendor idempotent.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Named, reusable review profiles — "Security", "Performance", "API Contract" — that run through the existing Claude/Codex background review pipeline. A developer drops a JSON file, picks it at launch, and findings come back as the same external diff annotations as today.
{ "instructions": "Focus only on security-impacting issues." }That one-line file is a complete, valid profile.
This PR is backend only — no UI yet. It's fully drivable over HTTP, and the existing "Run Review" path is unchanged: an absent
reviewProfileIdresolves tobuiltin:defaultand produces a byte-identical prompt. Nothing regresses while this lands.How it works
packages/shared/review-profiles.ts(new, runtime-agnostic) — types, shape validation,filename → idinference, label inference, trivial name-clash resolution, and the sectioned prompt composer.packages/server/review-profile-loader.ts(new) — discovery across built-in + user dir (${PLANNOTATOR_DATA_DIR}/reviews/) + repo dir (.plannotator/reviews/), skip-and-log on malformed files.GET /api/agents/review-profiles— discovery endpoint (Bun + Pi).reviewProfileIdonPOST /api/agents/jobs— parsed and forwarded; rejects unknown fields; no inline prompt text accepted.reviewProfileId/reviewProfileLabeladded toAgentJobInfo.priority 0-3→ the sharedimportant | nit | pre_existingscale, in one place, so badges read the same regardless of engine.8 findings · 2 skipped (not in the reviewed diff).authorstays the engine name (Claude Code/Codex); the profile rides alongside.vendor.sh.Scope discipline
The spec has an explicit "Deliberately not building" section, honored here: no precedence engine, no content hashing, no diagnostics channel, no
ReviewEngineAdapterinterface, no anchoring module, no inlinereviewPrompt, no temp-profile machinery. Two new modules, the rest small edits.Deferred (not in this PR)
Frontend: the engine/review picker in
AgentsTab.tsx, sendingreviewProfileIdfromuseAgentJobs.ts, the repo-source tag, and the profile tag on annotations.Verification
packages/shared+packages/servertsconfigs)vendor.shidempotent (no Pi drift on re-run)Built via a phased pipeline with per-phase perspective-diverse adversarial review (correctness / spec-fidelity / Bun-Pi parity) and a final whole-feature audit. Verification re-run independently after handoff.
Review notes
The two-engine prompt/severity seams (
claude-review.ts,codex-review.ts, andonJobCompleteinreview.ts) and the Bun↔Pi mirror inserverReview.ts/apps/pi-extension/server/agent-jobs.tsare the highest-value places to scrutinize. The byte-identical-default-prompt invariant has a dedicated test but is worth a human eye.🤖 Implemented with Claude Code via a multi-agent workflow; verified before PR.