Skip to content

docs(problems): add graduated approval policy problem doc#2012

Open
Benkapner wants to merge 2 commits into
fullsend-ai:mainfrom
Benkapner:docs/graduated-approval-policy
Open

docs(problems): add graduated approval policy problem doc#2012
Benkapner wants to merge 2 commits into
fullsend-ai:mainfrom
Benkapner:docs/graduated-approval-policy

Conversation

@Benkapner

@Benkapner Benkapner commented Jun 8, 2026

Copy link
Copy Markdown

Review agents currently make a single binary decision: approve or request changes. There is no middle ground. This forces a trade-off between two failure modes.

When the agent leans toward approving, risky changes slip through. Submodule bumps get approved without the agent inspecting the actual changes (#1462). Medium-severity correctness findings get suppressed to avoid false-positive noise (#1453). Author uncertainty signals like "i think this is right but i'm not sure" get ignored in the approval decision (#1143).

When the agent leans toward blocking, legitimate work gets stuck. Humans spend time reviewing changes that are almost certainly fine, defeating the purpose of automation.

Human reviewers don't work this way. A human would say "LGTM, but get a second pair of eyes on the crypto changes" or "looks fine to me but the test coverage concerns me." That graduated response doesn't exist for agents today. It's yes or no.

This doc proposes replacing the binary verdict with risk-scored approval routing. The review agent (or a separate scoring layer) assigns a risk score based on multiple signals:

From the diff: which files changed (auth code is riskier than docs), change type (deletions score higher than additions), whether tests changed too (no tests = higher risk), and whether binary/opaque files are present.

From the review findings: number and severity of findings, confidence level, and whether sub-agents disagreed with each other (disagreement = uncertainty = higher risk).

From context: author history (first-time contributors score higher), branch target (default branch = higher risk), and whether related code was recently modified.

The score maps to a routing action:

Score Action
Low (0-30) Auto-approve
Medium-low (30-50) Approve with advisory comment
Medium (50-70) Request secondary human review
High (70-90) Request changes with explanation
Critical (90-100) Block and alert security team

Thresholds are configurable per repo and per org.

The doc explores three implementation approaches: scoring inside the review agent (simplest but model-dependent), a separate deterministic scoring layer in the harness (more robust against manipulation), and multi-reviewer consensus where disagreement triggers escalation (expensive but naturally handles uncertainty). The parallel sub-agent architecture from #1550 partially enables the third approach, since the Challenger role already contests findings, but disagreement between sub-agents does not currently influence the approval decision.

Addresses #1143, #1453, and #1462.

Proposes risk-scored approval routing to replace binary
approve/reject verdicts, with scoring signals, routing rules,
and multiple implementation approaches.

References issues fullsend-ai#1143, fullsend-ai#1453, fullsend-ai#1462.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Benjamin Kapner <bkapner@redhat.com>
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review

Findings

Medium

  • [logic-error] docs/problems/graduated-approval-policy.md:89 — Approach 3 states "Challenger outcomes do not currently influence the approval decision; they only affect whether a finding survives." This is factually incorrect. In the current implementation (pr-review SKILL.md step 6e–6f), the Challenger pass removes or retains findings, and the verdict in step 6f is determined by the severity of surviving findings. By removing a high-severity finding, the Challenger directly changes the outcome from request-changes to approve. Challenger outcomes influence the approval decision precisely because finding survival determines the verdict.
    Remediation: Reword to: "Challenger outcomes indirectly influence the approval decision by determining which findings survive to the verdict step, but the Challenger is an intra-agent verification step within the orchestrator's own process, not disagreement between independent sub-agents."

  • [architectural-coherence] docs/problems/graduated-approval-policy.md — This document overlaps significantly with existing discussions in code-review.md without explaining the relationship. Specifically: (1) code-review.md lines 158–163 describe an "Escalation-based" composition model where sub-agents produce findings and a separate decision layer evaluates them — essentially the pattern this doc proposes with risk scoring. (2) code-review.md lines 166–170 ("The confidence problem") asks the same core questions: when should agents escalate, how do they express uncertainty? (3) autonomy-spectrum.md lines 67–94 describe "per-decision escalation dimensions" (reversibility, blast radius, visibility) that also modulate per-change risk assessment at runtime. The new doc should position itself relative to these existing mechanisms — is it an elaboration of the escalation-based model, a replacement, or a complementary layer?
    Remediation: Add a paragraph near the beginning explaining how this doc relates to code-review.md's escalation-based composition model and confidence problem, and to autonomy-spectrum.md's per-decision escalation dimensions.

Low

Previous run

Review

Findings

Medium

  • [internal-consistency] docs/problems/graduated-approval-policy.md:97 — The document states "The autonomy-spectrum.md defines per-repo autonomy tiers (Tier 0-3) based on change type." This is factually incorrect on two counts: (1) autonomy-spectrum.md defines a binary per-repo autonomy model (autonomous or not), not tiered levels; (2) Tier 0-3 classification is defined in intent-representation.md as intent tiers based on change type (standing rules, tactical, strategic, organizational). The document conflates two distinct concepts.
    Remediation: Correct to reference both documents accurately, e.g.: "The autonomy-spectrum.md defines a binary per-repo autonomy model. Separately, intent-representation.md defines per-change intent tiers (Tier 0-3). Graduated approval operates orthogonally to both: even among changes at the same intent tier within an autonomous repo, some are higher risk than others."

  • [broken-reference] docs/problems/graduated-approval-policy.md:106 — The document links to [tool-call-risk-assessment.md](tool-call-risk-assessment.md) but this file does not exist in docs/problems/. While the text hedges with "If a tool call risk assessment layer is implemented," the markdown link syntax implies the document exists and creates a broken link.
    Remediation: Either remove the hyperlink and keep it as plain text (e.g., "If a tool call risk assessment layer is implemented"), or create a stub document, or note explicitly that this references a planned document.

  • [missing-readme-link] docs/problems/graduated-approval-policy.md — CLAUDE.md states: "When adding new problem areas, create a new file in docs/problems/ and link it from README.md." The README.md maintains an explicit index of all problem documents, and this new document is not listed. This violates an explicit repository convention.
    Remediation: Add an entry for graduated-approval-policy.md to the problem document list in README.md, following the existing format with a one-line description.

Low

  • [internal-consistency] docs/problems/graduated-approval-policy.md:88 — The document claims the Challenger role is part of "the parallel sub-agent architecture (PR feat(review): parallel specialized sub-agents for PR review #1550)" and frames it as inter-agent disagreement ("disagreement between sub-agents does not currently influence the approval decision"). In the current codebase, the Challenger is step 6e of the orchestrator's own process — a self-verification pass within a single agent run, not disagreement between independent sub-agents.
    Remediation: Verify how PR feat(review): parallel specialized sub-agents for PR review #1550 implements the Challenger concept and correct the characterization if it is an intra-agent verification step rather than inter-agent disagreement.

  • [edge-case] docs/problems/graduated-approval-policy.md:61 — The routing rules and the CODEOWNERS relationship section create an ambiguity for mixed-path PRs: when a PR touches both CODEOWNERS-guarded and non-guarded paths, it is unclear whether the graduated policy applies to the non-guarded portion only or is bypassed entirely.
    Remediation: Add a note clarifying interaction semantics for mixed-path PRs.

  • [algorithm-logic] docs/problems/graduated-approval-policy.md:38 — The input signals state "deletions and modifications score higher than additions" as a general rule. In security contexts, additions can be equally or more risky — adding a new API endpoint, a new dependency, or a new permission grant can have greater blast radius than modifying existing code.
    Remediation: Qualify the statement to acknowledge that additions of new attack surface should be scored independently.

@fullsend-ai-review fullsend-ai-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

Comment thread docs/problems/graduated-approval-policy.md
Comment thread docs/problems/graduated-approval-policy.md
Comment thread docs/problems/graduated-approval-policy.md
Comment thread docs/problems/graduated-approval-policy.md
Comment thread docs/problems/graduated-approval-policy.md
Correct autonomy-spectrum.md vs intent-representation.md distinction.
Replace broken tool-call-risk-assessment.md link with PR fullsend-ai#2009 reference.
Add README.md entry. Clarify Challenger as intra-agent verification,
not inter-agent disagreement. Add mixed-path PR semantics for CODEOWNERS
interaction. Qualify change-type scoring for additions of new attack
surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Benjamin Kapner <bkapner@redhat.com>
@Benkapner

Copy link
Copy Markdown
Author

Addressed all findings in b099173.

[internal-consistency] (medium, line 97) Corrected the autonomy-spectrum.md reference. The autonomy spectrum defines binary per-repo autonomy (autonomous or not), while Tier 0-3 classification comes from intent-representation.md. Updated the text to reference both documents accurately and clarify that graduated approval operates orthogonally to both.

[broken-reference] (medium, line 106) Replaced the broken markdown link to tool-call-risk-assessment.md with a reference to PR #2009 instead. The document exists on a separate branch and is not yet merged, so a relative link would 404.

[missing-readme-link] (medium) Added README.md entry for graduated-approval-policy.md, positioned after the Code Review entry.

[internal-consistency] (low, line 88) Corrected the Challenger characterization. The Challenger is an intra-agent verification step within the orchestrator's own process (step 6e), not disagreement between independent sub-agents. Updated the text to reflect this distinction.

[edge-case] (low, line 61) Added a note clarifying mixed-path PR semantics: when a PR touches both CODEOWNERS-guarded and non-guarded paths, CODEOWNERS takes precedence for the entire PR, but the risk score still provides context to the human reviewer.

[algorithm-logic] (low, line 38) Qualified the change-type scoring statement. Additions of new attack surface (API endpoints, dependencies, permission grants) can have equal or greater blast radius than modifications to existing code, and should be scored independently.

@fullsend-ai-review fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

requires-manual-review Review requires human judgment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant