docs(problems): add graduated approval policy problem doc#2012
docs(problems): add graduated approval policy problem doc#2012Benkapner wants to merge 2 commits into
Conversation
Proposes risk-scored approval routing to replace binary approve/reject verdicts, with scoring signals, routing rules, and multiple implementation approaches. References issues fullsend-ai#1143, fullsend-ai#1453, fullsend-ai#1462. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Benjamin Kapner <bkapner@redhat.com>
ReviewFindingsMedium
Low
Previous runReviewFindingsMedium
Low
|
Correct autonomy-spectrum.md vs intent-representation.md distinction. Replace broken tool-call-risk-assessment.md link with PR fullsend-ai#2009 reference. Add README.md entry. Clarify Challenger as intra-agent verification, not inter-agent disagreement. Add mixed-path PR semantics for CODEOWNERS interaction. Qualify change-type scoring for additions of new attack surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Benjamin Kapner <bkapner@redhat.com>
|
Addressed all findings in b099173. [internal-consistency] (medium, line 97) Corrected the autonomy-spectrum.md reference. The autonomy spectrum defines binary per-repo autonomy (autonomous or not), while Tier 0-3 classification comes from intent-representation.md. Updated the text to reference both documents accurately and clarify that graduated approval operates orthogonally to both. [broken-reference] (medium, line 106) Replaced the broken markdown link to [missing-readme-link] (medium) Added README.md entry for graduated-approval-policy.md, positioned after the Code Review entry. [internal-consistency] (low, line 88) Corrected the Challenger characterization. The Challenger is an intra-agent verification step within the orchestrator's own process (step 6e), not disagreement between independent sub-agents. Updated the text to reflect this distinction. [edge-case] (low, line 61) Added a note clarifying mixed-path PR semantics: when a PR touches both CODEOWNERS-guarded and non-guarded paths, CODEOWNERS takes precedence for the entire PR, but the risk score still provides context to the human reviewer. [algorithm-logic] (low, line 38) Qualified the change-type scoring statement. Additions of new attack surface (API endpoints, dependencies, permission grants) can have equal or greater blast radius than modifications to existing code, and should be scored independently. |
Review agents currently make a single binary decision: approve or request changes. There is no middle ground. This forces a trade-off between two failure modes.
When the agent leans toward approving, risky changes slip through. Submodule bumps get approved without the agent inspecting the actual changes (#1462). Medium-severity correctness findings get suppressed to avoid false-positive noise (#1453). Author uncertainty signals like "i think this is right but i'm not sure" get ignored in the approval decision (#1143).
When the agent leans toward blocking, legitimate work gets stuck. Humans spend time reviewing changes that are almost certainly fine, defeating the purpose of automation.
Human reviewers don't work this way. A human would say "LGTM, but get a second pair of eyes on the crypto changes" or "looks fine to me but the test coverage concerns me." That graduated response doesn't exist for agents today. It's yes or no.
This doc proposes replacing the binary verdict with risk-scored approval routing. The review agent (or a separate scoring layer) assigns a risk score based on multiple signals:
From the diff: which files changed (auth code is riskier than docs), change type (deletions score higher than additions), whether tests changed too (no tests = higher risk), and whether binary/opaque files are present.
From the review findings: number and severity of findings, confidence level, and whether sub-agents disagreed with each other (disagreement = uncertainty = higher risk).
From context: author history (first-time contributors score higher), branch target (default branch = higher risk), and whether related code was recently modified.
The score maps to a routing action:
Thresholds are configurable per repo and per org.
The doc explores three implementation approaches: scoring inside the review agent (simplest but model-dependent), a separate deterministic scoring layer in the harness (more robust against manipulation), and multi-reviewer consensus where disagreement triggers escalation (expensive but naturally handles uncertainty). The parallel sub-agent architecture from #1550 partially enables the third approach, since the Challenger role already contests findings, but disagreement between sub-agents does not currently influence the approval decision.
Addresses #1143, #1453, and #1462.