From 647e27cefa12bcccdc88b1e5d8d9c70c0117e6b6 Mon Sep 17 00:00:00 2001 From: zazabap Date: Tue, 31 Mar 2026 16:13:42 +0000 Subject: [PATCH 01/23] docs: add design spec for proposed reductions Typst note Covers 9 reductions: 2 NP-hardness chain extensions (#973, #198), 4 Tier 1a blocked issues (#379, #380, #888, #822), and 3 Tier 1b blocked issues (#892, #894, #890). Co-Authored-By: Claude Opus 4.6 (1M context) --- ...6-03-31-proposed-reductions-note-design.md | 125 ++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md diff --git a/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md b/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md new file mode 100644 index 000000000..9ce7a1a90 --- /dev/null +++ b/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md @@ -0,0 +1,125 @@ +# Design: Proposed Reduction Rules — Typst Verification Note + +**Date:** 2026-03-31 +**Goal:** Create a standalone Typst document with compiled PDF that formalizes 9 reduction rules from issue #770, resolving blockers for 7 incomplete issues and adding 2 high-leverage NP-hardness chain extensions. + +## Scope + +### 9 Reductions + +**Group 1 — NP-hardness proof chain extensions:** + +| Issue | Reduction | Impact | +|-------|-----------|--------| +| #973 | SubsetSum → Partition | Unlocks ~12 downstream problems | +| #198 | MinimumVertexCover → HamiltonianCircuit | Unlocks ~9 downstream problems | + +**Group 2 — Tier 1a blocked issues (fix + formalize):** + +| Issue | Reduction | Current blocker | +|-------|-----------|----------------| +| #379 | DominatingSet → MinMaxMulticenter | Decision vs optimization MDS model | +| #380 | DominatingSet → MinSumMulticenter | Same | +| #888 | OptimalLinearArrangement → RootedTreeArrangement | Witness extraction impossible for naive approach | +| #822 | ExactCoverBy3Sets → AcyclicPartition | Missing algorithm (unpublished reference) | + +**Group 3 — Tier 1b blocked issues (fix + formalize):** + +| Issue | Reduction | Current blocker | +|-------|-----------|----------------| +| #892 | VertexCover → HamiltonianPath | Depends on #198 (VC→HC) being resolved | +| #894 | VertexCover → PartialFeedbackEdgeSet | Missing Yannakakis 1978b paper | +| #890 | MaxCut → OptimalLinearArrangement | Placeholder algorithm, no actual construction | + +## Deliverables + +1. **`docs/paper/proposed-reductions.typ`** — standalone Typst document +2. **`docs/paper/proposed-reductions.pdf`** — compiled PDF checked into repo +3. **Updated GitHub issues** — #379, #380, #888, #822, #892, #894, #890 corrected with verified algorithms +4. **One PR** containing the note, PDF, and issue updates + +## Document Structure + +``` +Title: Proposed Reduction Rules — Verification Notes +Abstract: Motivation (NP-hardness gaps, blocked issues, impact analysis) + +§1 Notation & Conventions + - Standard symbols (G, V, E, w, etc.) + - Proof structure: Construction → Correctness (⟹/⟸) → Solution Extraction + - Overhead notation + +§2 NP-Hardness Chain Extensions + §2.1 SubsetSum → Partition (#973) + §2.2 MinimumVertexCover → HamiltonianCircuit (#198) + §2.3 VertexCover → HamiltonianPath (#892) + +§3 Graph Reductions + §3.1 MaxCut → OptimalLinearArrangement (#890) + §3.2 OptimalLinearArrangement → RootedTreeArrangement (#888) + +§4 Set & Domination Reductions + §4.1 DominatingSet → MinMaxMulticenter (#379) + §4.2 DominatingSet → MinSumMulticenter (#380) + §4.3 ExactCoverBy3Sets → AcyclicPartition (#822) + +§5 Feedback Set Reductions + §5.1 VertexCover → PartialFeedbackEdgeSet (#894) +``` + +## Per-Reduction Entry Format + +Each reduction follows the `reductions.typ` convention: + +1. **Theorem statement** — 1-3 sentence intuition, citation (e.g., `[GJ79, ND50]`) +2. **Proof** with three mandatory subsections: + - _Construction._ Numbered algorithm steps, all symbols defined before use + - _Correctness._ Bidirectional: (⟹) forward direction, (⟸) backward direction + - _Solution extraction._ How to map target solution back to source +3. **Overhead table** — target size fields as functions of source size fields +4. **Worked example** — concrete small instance, full construction steps, solution verification + +Mathematical notation uses Typst math mode: `$V$`, `$E$`, `$arrow.r.double$`, `$overline(x)$`, etc. + +## Research Plan for Blocked Issues + +For each blocked reduction: + +1. **Search** for the original reference via the citation in Garey & Johnson +2. **Reconstruct** the correct algorithm from the paper or from first principles +3. **Verify** correctness with a hand-worked example in the note +4. **Resolve** the blocker: + - #379/#380: Clarify that the reduction operates on the decision variant; note model alignment needed + - #888: Research Gavril 1977a gadget construction for forcing path-tree solutions + - #822: Research the acyclic partition reduction from G&J or construct from first principles + - #892: Chain through #198 (VC→HC→HP); detail the HC→HP modification + - #894: Search for Yannakakis 1978b or reconstruct the gadget + - #890: Research the Garey-Johnson-Stockmeyer 1976 construction + +If a reference is unavailable, construct a novel reduction and clearly mark it as such. + +## Typst Setup + +- Standalone document (not importing from `reductions.typ`) +- Uses: `ctheorems` for theorem/proof environments, `cetz` if diagrams needed +- Page: A4, New Computer Modern 10pt +- Theorem numbering: `Theorem 2.1`, `Theorem 2.2`, etc. +- No dependency on `examples.json` or `reduction_graph.json` (standalone) +- Compile command: `typst compile docs/paper/proposed-reductions.typ docs/paper/proposed-reductions.pdf` + +## Quality Criteria + +Each reduction must satisfy: +1. **Math equations correct** — all formulas verified against source paper or hand-derivation +2. **Provable correctness** — both directions of the proof are rigorous, no hand-waving +3. **Algorithm clear** — detailed enough that a developer can implement `reduce_to()` and `extract_solution()` directly from the proof +4. **From math to code verifiable** — overhead expressions match the construction, worked example can be used as a test case + +## PR Structure + +- Branch: `feat/proposed-reductions-note` +- Files: + - `docs/paper/proposed-reductions.typ` (new) + - `docs/paper/proposed-reductions.pdf` (new, compiled) +- No code changes — this is a documentation-only PR +- Issue updates done via `gh issue edit` (not in the PR diff) From aa5735372b0318515b98998c3a7919643cc7da1e Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 05:26:21 +0000 Subject: [PATCH 02/23] feat: add verify-reduction skill for mathematical verification of reductions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New skill: /verify-reduction End-to-end pipeline that takes a reduction rule issue and produces: 1. Typst proof (Construction/Correctness/Extraction/Overhead + YES/NO examples) 2. Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5) 3. Lean 4 lemmas (non-trivial structural proofs required) Follows issue-to-pr conventions: creates worktree, works in isolation, submits PR. Strict quality gates (zero tolerance): - No "trivial" category — every reduction ≥5000 checks - 7 mandatory Python sections including NO (infeasible) example - Non-trivial Lean required (rfl/omega tautologies rejected) - Zero hand-waving in Typst ("clearly", "obviously" → rejected) - Mandatory gap analysis: every proof claim must have a test - Self-review checklist with 20+ items across 4 categories Developed and validated through PR #975 (800K+ checks, 3 bugs caught) and tested on issues #868 (caught wrong example) and #841 (35K checks). Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/CLAUDE.md | 1 + .claude/skills/verify-reduction/SKILL.md | 460 +++++++++++++++++++++++ 2 files changed, 461 insertions(+) create mode 100644 .claude/skills/verify-reduction/SKILL.md diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 27ce2026c..659825237 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -26,6 +26,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [propose](skills/propose/SKILL.md) -- Interactive brainstorming to help domain experts propose a new model or rule. Asks one question at a time, uses mathematical language (no programming jargon), and files a GitHub issue. - [final-review](skills/final-review/SKILL.md) -- Interactive maintainer review for PRs in "Final review" column. Merges main, walks through agentic review bullets with human, then merge or hold. - [dev-setup](skills/dev-setup/SKILL.md) -- Interactive wizard to install and configure all development tools for new maintainers. +- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), and Lean lemmas (non-trivial required). Iterates until all checks pass. Creates worktree + PR. - [tutorial](skills/tutorial/SKILL.md) -- Interactive tutorial — walk through the pred CLI to explore, reduce, and solve NP-hard problems. No Rust internals. ## Codex Compatibility diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md new file mode 100644 index 000000000..9adff5f62 --- /dev/null +++ b/.claude/skills/verify-reduction/SKILL.md @@ -0,0 +1,460 @@ +# Verify Reduction + +End-to-end skill that takes a reduction rule issue, produces a verified mathematical proof with computational and formal verification, iterating until all checks pass. Creates a worktree, works in isolation, and submits a PR — following `issue-to-pr` conventions. + +Outputs: Typst proof entry, Python verification script, Lean lemmas — all at PR #975 quality level. + +**This skill is STRICT. Cutting corners produces reductions that get rejected during review.** + +## Invocation + +``` +/verify-reduction 868 # from a GitHub issue number +/verify-reduction SubsetSum Partition # from source/target names +``` + +## Prerequisites + +- `sympy` and `networkx` installed (`pip install sympy networkx`) +- Both source and target models must exist in the codebase (`pred show `) +- For Lean: `elan` installed with Lean 4 toolchain + +## Process + +```dot +digraph verify { + rankdir=TB; + "Parse input" [shape=box]; + "Create worktree" [shape=box]; + "Read issue" [shape=box]; + "Study models" [shape=box]; + "Write Typst proof" [shape=box]; + "Compile PDF" [shape=box]; + "Write Python script" [shape=box]; + "Run script" [shape=box]; + "All pass?" [shape=diamond]; + "Fix proof + script" [shape=box]; + "Enough checks?" [shape=diamond]; + "Enhance script" [shape=box]; + "Add Lean lemmas" [shape=box]; + "Lean builds?" [shape=diamond]; + "Fix Lean" [shape=box]; + "Self-review (HARSH)" [shape=box]; + "Create PR" [shape=box]; + "Report" [shape=doublecircle]; + + "Parse input" -> "Create worktree"; + "Create worktree" -> "Read issue"; + "Read issue" -> "Study models"; + "Study models" -> "Write Typst proof"; + "Write Typst proof" -> "Compile PDF"; + "Compile PDF" -> "Write Python script"; + "Write Python script" -> "Run script"; + "Run script" -> "All pass?"; + "All pass?" -> "Enough checks?" [label="yes"]; + "All pass?" -> "Fix proof + script" [label="no"]; + "Fix proof + script" -> "Run script"; + "Enough checks?" -> "Add Lean lemmas" [label="yes"]; + "Enough checks?" -> "Enhance script" [label="no"]; + "Enhance script" -> "Run script"; + "Add Lean lemmas" -> "Lean builds?"; + "Lean builds?" -> "Self-review (HARSH)" [label="yes"]; + "Lean builds?" -> "Fix Lean" [label="no"]; + "Fix Lean" -> "Lean builds?"; + "Self-review (HARSH)" -> "Create PR"; + "Create PR" -> "Report"; +} +``` + +--- + +## Step 0: Parse Input and Create Worktree + +### 0a. Parse input + +```bash +REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner) +ISSUE= +ISSUE_JSON=$(gh issue view "$ISSUE" --json title,body,number) +``` + +### 0b. Create worktree + +```bash +REPO_ROOT=$(pwd) +BRANCH_JSON=$(python3 scripts/pipeline_worktree.py prepare-issue-branch \ + --issue "$ISSUE" --slug "verify--" --base main --format json) +BRANCH=$(printf '%s\n' "$BRANCH_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['branch'])") +WORKTREE_JSON=$(python3 scripts/pipeline_worktree.py enter --name "verify-$ISSUE" --format json) +WORKTREE_DIR=$(printf '%s\n' "$WORKTREE_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['worktree_dir'])") +cd "$WORKTREE_DIR" && git checkout "$BRANCH" +``` + +If already inside a worktree, skip creation and use the current branch. + +## Step 1: Read Issue and Study Models + +```bash +gh issue view "$ISSUE" --json title,body +pred show --json +pred show --json +``` + +Extract: construction algorithm, correctness argument, overhead formulas, worked example, reference. + +If the issue is incomplete, use WebSearch to find the original reference. + +## Step 2: Write Typst Proof + +Append to `docs/paper/proposed-reductions.typ` (or create a standalone `proposed-reductions-.typ` if the main file is on another branch). + +### MANDATORY structure + +```typst +== Source $arrow.r$ Target + +#theorem[...] + +#proof[ + _Construction._ ... + _Correctness._ + ($arrow.r.double$) ... + ($arrow.l.double$) ... + _Solution extraction._ ... +] + +*Overhead.* (table) + +*Feasible example.* (YES instance, fully worked) + +*Infeasible example.* (NO instance, fully worked — show WHY no solution exists) +``` + +### HARD requirements + +- **Construction**: numbered steps, every symbol defined before first use +- **Correctness**: genuinely independent ⟹ and ⟸ — NOT "the converse is similar" +- **No hand-waving**: ZERO instances of "clearly", "obviously", "it is easy to see", "straightforward" +- **No scratch work**: ZERO instances of "Wait", "Hmm", "Actually", "Let me try" +- **TWO examples minimum**: one YES instance (satisfiable/feasible) and one NO instance (unsatisfiable/infeasible). Both fully worked with numerical verification. +- **Example must be non-trivial**: the example must have ≥ 3 variables/vertices. A 1-variable or 2-vertex example is too degenerate to catch bugs. + +Compile: +```bash +python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')" +``` + +## Step 3: Write Python Verification Script + +Create `docs/paper/verify-reductions/verify__.py`. + +### ALL 7 sections are MANDATORY + +```python +#!/usr/bin/env python3 +"""§X.Y Source → Target (#NNN): exhaustive + structural verification.""" +import itertools, sys +from sympy import symbols, simplify # Section 1 is NOT optional + +passed = failed = 0 + +def check(condition, msg=""): + global passed, failed + if condition: passed += 1 + else: failed += 1; print(f" FAIL: {msg}") + +def main(): + # === Section 1: Symbolic checks (sympy) — MANDATORY === + # At minimum: verify overhead formula symbolically for general n. + # For algebraic reductions: verify key identities. + # "The overhead is trivial" is NOT an excuse to skip this section. + + # === Section 2: Exhaustive forward + backward — MANDATORY === + # n ≤ 5 MINIMUM for all reduction types. + # n ≤ 6 for identity/algebraic reductions. + # Test ALL instances (or sample ≥ 300 per (n,m) if exhaustive is infeasible). + + # === Section 3: Solution extraction — MANDATORY === + # For EVERY feasible instance: extract source solution from target, + # verify it satisfies the source problem. + # This is the most commonly skipped section. DO NOT SKIP IT. + + # === Section 4: Overhead formula — MANDATORY === + # Build the actual target instance, measure its size fields, + # compare against the overhead formula. + + # === Section 5: Structural properties — MANDATORY === + # Even for "trivial" reductions, verify at least: + # - Target instance is well-formed (valid graph, valid formula, etc.) + # - No degenerate cases (empty subsets, isolated vertices, etc.) + # For gadget reductions: girth, connectivity, widget structure. + + # === Section 6: YES example from Typst — MANDATORY === + # Reproduce the exact numbers from the Typst proof's feasible example. + + # === Section 7: NO example from Typst — MANDATORY === + # Reproduce the exact numbers from the Typst proof's infeasible example. + # Verify that both source and target are infeasible. + + print(f"Source → Target: {passed} passed, {failed} failed") + return 1 if failed else 0 + +if __name__ == "__main__": + sys.exit(main()) +``` + +### Minimum check counts — STRICTLY ENFORCED + +| Type | Minimum checks | Minimum n | Strategy | +|------|---------------|-----------|----------| +| Identity (same graph, different objective) | 10,000 | n ≤ 6 | Exhaustive ALL graphs | +| Algebraic (padding, complement, De Morgan) | 10,000 | n ≤ 5 | Symbolic + exhaustive | +| Gadget (widget, cycle construction) | 5,000 | n ≤ 5 | Construction + formula + structural | +| Composition (A→B→C) | 10,000 | n ≤ 5 | Exhaustive per step | + +**There is no "trivial" category.** Every reduction gets at least 5,000 checks and n ≤ 5 exhaustive testing. + +## Step 4: Run and Iterate (THE CRITICAL LOOP) + +```bash +python3 docs/paper/verify-reductions/verify__.py +``` + +### Iteration 1: First run + +Run the script. Fix any failures. Re-run. + +### Iteration 2: Check count audit — STRICT + +Print this table and fill it in honestly: + +``` +CHECK COUNT AUDIT: + Total checks: ___ (minimum: 5,000) + Forward direction: ___ instances tested (minimum: all n ≤ 5) + Backward direction: ___ instances tested (minimum: all n ≤ 5) + Solution extraction: ___ feasible instances tested + Overhead formula: ___ instances compared + Symbolic (sympy): ___ identities verified + YES example: verified? [yes/no] + NO example: verified? [yes/no] + Structural properties: ___ checks +``` + +If ANY line is below minimum, enhance the script and re-run. Do NOT proceed. + +### Iteration 3: Gap analysis — MANDATORY + +List EVERY claim in the Typst proof. For each, state whether it's tested: + +``` +CLAIM TESTED BY +"Universe has 2n elements" Section 4: overhead formula ✓ +"Complementarity forces consistency" Section 3: extraction ✓ +"Clause subset is non-monochromatic" Section 2: forward direction ✓ +"No clause is all-true or all-false" Section 2: backward direction ✓ +... +``` + +If any claim has no test, add one. If it's untestable, document WHY. + +## Step 5: Add Lean Lemmas — STRICT REQUIREMENTS + +### HARD requirement: at least one NON-TRIVIAL lemma + +`n + m = n + m := rfl` does NOT count. A "non-trivial" lemma must satisfy at least one of: + +1. **Uses a Mathlib tactic beyond `rfl`/`omega`**: e.g., `simp`, `Finset.sum_union`, lattice operations +2. **States a structural property**: e.g., "complementarity subsets partition the universe into pairs" +3. **Formalizes the key invariant**: e.g., "NAE-satisfying ↔ 2-colorable hypergraph" + +If Mathlib genuinely lacks the infrastructure (no Hamiltonian paths, no DAG quotients), write the strongest lemma you CAN prove and document what WOULD be proved with better Mathlib support. + +### Examples of acceptable vs unacceptable Lean lemmas + +**Unacceptable (trivial arithmetic):** +```lean +theorem overhead (n m : ℕ) : n + m = n + m := rfl -- proves nothing +theorem universe (n : ℕ) : 2 * n = 2 * n := rfl -- proves nothing +``` + +**Acceptable (structural):** +```lean +-- Uses Mathlib's lattice theory to prove a graph-structural fact +theorem complement_partition (G : SimpleGraph V) : G ⊔ Gᶜ = ⊤ := sup_compl_eq_top + +-- Formalizes the key definition used in the proof +def IsNAESatisfying (assignment : Fin n → Bool) (clause : Finset (Fin n × Bool)) : Prop := + ¬(∀ l ∈ clause, ...) ∧ ¬(∀ l ∈ clause, ...) + +-- Proves the overhead formula requires actual computation +theorem overhead_nontrivial (n m : ℕ) (h : m > 0) : + 2 * n + (n + m) > 2 * n := by omega -- at least uses a hypothesis +``` + +### Build and verify + +```bash +cd docs/paper/verify-reductions/lean +export PATH="$HOME/.elan/bin:$PATH" +lake build +``` + +## Step 6: Self-Review — THE HARSHEST STEP + +Before declaring verified, run through this checklist. **Every item must be YES.** If any is NO, go back and fix it. + +### Typst proof + +- [ ] Compiles without errors +- [ ] Has Construction with numbered steps +- [ ] Has Correctness with independent ⟹ and ⟸ paragraphs +- [ ] Has Solution extraction +- [ ] Has Overhead table with formula +- [ ] Has YES example (feasible, ≥ 3 variables/vertices, fully worked) +- [ ] Has NO example (infeasible, fully worked with explanation of WHY infeasible) +- [ ] Zero instances of "clearly", "obviously", "it is easy to see" +- [ ] Zero instances of "Wait", "Hmm", "Actually", scratch work +- [ ] Every symbol defined before first use + +### Python script + +- [ ] 0 failures +- [ ] ≥ 5,000 total checks +- [ ] Section 1 (symbolic) present and non-empty +- [ ] Section 2 (exhaustive) covers n ≤ 5 minimum +- [ ] Section 3 (extraction) tests EVERY feasible instance +- [ ] Section 4 (overhead) compares formula vs actual for all tested instances +- [ ] Section 5 (structural) has at least one non-trivial check +- [ ] Section 6 (YES example) reproduces Typst example numbers exactly +- [ ] Section 7 (NO example) reproduces Typst infeasible example exactly +- [ ] Gap analysis performed — every Typst claim has a corresponding test + +### Lean + +- [ ] Builds without errors (warnings OK) +- [ ] At least one non-trivial lemma (not just `rfl` or `omega` on a tautology) +- [ ] Every `sorry` has a comment explaining WHY + +### Cross-consistency + +- [ ] The Python script's `reduce()` function implements EXACTLY the Typst construction +- [ ] The Python script's `extract_solution()` implements EXACTLY the Typst extraction +- [ ] The overhead formula in Python matches the Typst overhead table +- [ ] The examples in Python match the Typst examples (same numbers, same instances) + +## Step 7: Report + +``` +=== Verification Report: Source → Target (#NNN) === + +Typst proof: §X.Y + - Construction: ✓ (N steps) + - Correctness: ✓ (⟹ + ⟸) + - Extraction: ✓ + - Overhead: ✓ + - YES example: ✓ (N vars/vertices) + - NO example: ✓ (N vars/vertices, reason: ...) + +Python: verify__.py + - Checks: N passed, 0 failed + - Sections: 1(sympy) 2(exhaustive) 3(extraction) 4(overhead) 5(structural) 6(YES) 7(NO) + - Forward: exhaustive n ≤ K + - Backward: exhaustive n ≤ K + - Gap analysis: all claims covered + +Lean: ReductionProofs/Basic.lean (or .lean) + - Non-trivial lemmas: N + - Trivial lemmas: M + - Sorry: J (with justification) + +Bugs found: +Iterations: N rounds + +Verdict: VERIFIED / OPEN (with reason) +``` + +## Step 8: Commit, Create PR, Clean Up + +### 8a. Commit + +```bash +git add docs/paper/.typ docs/paper/verify-reductions/verify_*.py \ + docs/paper/verify-reductions/lean/ReductionProofs/*.lean +git add -f docs/paper/.pdf +git commit -m "docs: /verify-reduction # VERIFIED + +Typst: Construction + Correctness + Extraction + Overhead + YES/NO examples +Python: N checks, 0 failures (exhaustive n ≤ K, 7 sections) +Lean: M non-trivial lemmas + +Co-Authored-By: Claude Opus 4.6 (1M context) " +``` + +### 8b. Push and create PR + +```bash +git push -u origin "$BRANCH" +gh pr create --title "docs: verify reduction #" --body "..." +``` + +### 8c. Clean up worktree + +```bash +cd "$REPO_ROOT" +python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" +``` + +### 8d. Comment on issue + +```bash +gh issue comment "$ISSUE" --body "verify-reduction report: VERIFIED (PR #)..." +``` + +## Quality Gates — NON-NEGOTIABLE + +A reduction is **VERIFIED** when ALL of these hold: + +- [ ] Typst compiles, has all mandatory sections including YES and NO examples +- [ ] Zero hand-waving language +- [ ] Python has 0 failures AND ≥ 5,000 checks +- [ ] All 7 Python sections present and non-empty +- [ ] Exhaustive n ≤ 5 minimum +- [ ] Solution extraction verified for all feasible instances +- [ ] Overhead formula matches actual construction +- [ ] Both Typst examples reproduced by script +- [ ] Gap analysis shows all Typst claims tested +- [ ] At least 1 non-trivial Lean lemma +- [ ] Cross-consistency between Typst and Python verified + +**If even ONE gate fails, the reduction is NOT verified. Go back and fix it.** + +## Common Mistakes — ZERO TOLERANCE + +| Mistake | Consequence | +|---------|-------------| +| Lean lemma is just `rfl` or `omega` on a tautology | Rejected — add structural lemma | +| No symbolic checks (Section 1 empty) | Rejected — add sympy verification | +| Only YES example, no NO example | Rejected — add infeasible instance | +| n ≤ 3 or n ≤ 4 "because it's simple" | Rejected — minimum n ≤ 5 | +| "Passed on first run" without gap analysis | Rejected — perform gap analysis | +| Example has < 3 variables | Rejected — too degenerate | +| Script has < 5,000 checks | Rejected — enhance exhaustive testing | +| Extraction not tested | Rejected — add Section 3 | +| Typst proof says "clearly" | Rejected — rewrite without hand-waving | + +## Integration + +- **After `add-rule`**: invoke `/verify-reduction` before creating PR +- **After `write-rule-in-paper`**: invoke to verify paper entry +- **During `review-structural`**: check verification script exists and passes +- **Before `issue-to-pr --execute`**: pre-validate the algorithm + +## Reference: PR #975 Quality Level + +Target quality defined by PR #975: +- 800,000+ total checks, 0 unexpected failures +- 3 bugs caught through iteration loop +- Every script has forward + backward + extraction + overhead + example +- Non-trivial Lean: `G ⊔ Gᶜ = ⊤` via `sup_compl_eq_top` +- Failures marked OPEN honestly with diagnosis From 83e39318d06456163edfc3434c4c579ed553e753 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 05:34:17 +0000 Subject: [PATCH 03/23] style: add YAML frontmatter + professional tone to verify-reduction skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Added frontmatter (name, description) matching other skills' convention - Toned down aggressive language ("ZERO TOLERANCE", "THE HARSHEST STEP", "NON-NEGOTIABLE") to professional but firm language - All quality gates unchanged — same strictness, better presentation Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/verify-reduction/SKILL.md | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index 9adff5f62..beecc5d46 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -1,11 +1,14 @@ +--- +name: verify-reduction +description: End-to-end verification of a reduction rule — generates Typst proof, Python verification script (≥5000 checks), and Lean lemmas, iterating until all checks pass. Creates worktree + PR. +--- + # Verify Reduction End-to-end skill that takes a reduction rule issue, produces a verified mathematical proof with computational and formal verification, iterating until all checks pass. Creates a worktree, works in isolation, and submits a PR — following `issue-to-pr` conventions. Outputs: Typst proof entry, Python verification script, Lean lemmas — all at PR #975 quality level. -**This skill is STRICT. Cutting corners produces reductions that get rejected during review.** - ## Invocation ``` @@ -212,9 +215,9 @@ if __name__ == "__main__": | Gadget (widget, cycle construction) | 5,000 | n ≤ 5 | Construction + formula + structural | | Composition (A→B→C) | 10,000 | n ≤ 5 | Exhaustive per step | -**There is no "trivial" category.** Every reduction gets at least 5,000 checks and n ≤ 5 exhaustive testing. +Every reduction gets at least 5,000 checks and n ≤ 5 exhaustive testing regardless of perceived simplicity. -## Step 4: Run and Iterate (THE CRITICAL LOOP) +## Step 4: Run and Iterate ```bash python3 docs/paper/verify-reductions/verify__.py @@ -258,7 +261,7 @@ CLAIM TESTED BY If any claim has no test, add one. If it's untestable, document WHY. -## Step 5: Add Lean Lemmas — STRICT REQUIREMENTS +## Step 5: Add Lean Lemmas ### HARD requirement: at least one NON-TRIVIAL lemma @@ -300,7 +303,7 @@ export PATH="$HOME/.elan/bin:$PATH" lake build ``` -## Step 6: Self-Review — THE HARSHEST STEP +## Step 6: Self-Review Before declaring verified, run through this checklist. **Every item must be YES.** If any is NO, go back and fix it. @@ -411,7 +414,7 @@ python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" gh issue comment "$ISSUE" --body "verify-reduction report: VERIFIED (PR #)..." ``` -## Quality Gates — NON-NEGOTIABLE +## Quality Gates A reduction is **VERIFIED** when ALL of these hold: @@ -427,9 +430,9 @@ A reduction is **VERIFIED** when ALL of these hold: - [ ] At least 1 non-trivial Lean lemma - [ ] Cross-consistency between Typst and Python verified -**If even ONE gate fails, the reduction is NOT verified. Go back and fix it.** +If any gate fails, go back and fix it before declaring the reduction verified. -## Common Mistakes — ZERO TOLERANCE +## Common Mistakes | Mistake | Consequence | |---------|-------------| From d09823d860b117f4248b5b1bdc6f13ecc74ac033 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 06:04:36 +0000 Subject: [PATCH 04/23] feat: adversarial multi-agent verification in verify-reduction skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replaces Lean-required gates with adversarial second agent: - Step 5: Adversary agent independently implements reduce() and extract_solution() from theorem statement only (not constructor's script) - Step 5c: Cross-comparison of both implementations on 1000 instances - Lean downgraded from required to optional - hypothesis property-based testing for n up to 50 - Quality gates: 2 independent scripts ≥5000 checks each + cross-comparison Design rationale (docs/superpowers/specs/2026-04-01-adversarial-verification-design.md): - Same agent writing proof + test is the #1 risk for AI verification - Two independent implementations agreeing > one + trivial Lean - Lean caught 0 bugs in PR #975; Python caught all 4 Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/verify-reduction/SKILL.md | 302 +++++++++++++----- ...6-04-01-adversarial-verification-design.md | 145 +++++++++ 2 files changed, 372 insertions(+), 75 deletions(-) create mode 100644 docs/superpowers/specs/2026-04-01-adversarial-verification-design.md diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index beecc5d46..8981aa8d4 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -1,13 +1,13 @@ --- name: verify-reduction -description: End-to-end verification of a reduction rule — generates Typst proof, Python verification script (≥5000 checks), and Lean lemmas, iterating until all checks pass. Creates worktree + PR. +description: End-to-end verification of a reduction rule — generates Typst proof, constructor Python script (>=5000 checks), and adversary Python script (>=5000 independent checks), iterating until all checks pass. Creates worktree + PR. --- # Verify Reduction -End-to-end skill that takes a reduction rule issue, produces a verified mathematical proof with computational and formal verification, iterating until all checks pass. Creates a worktree, works in isolation, and submits a PR — following `issue-to-pr` conventions. +End-to-end skill that takes a reduction rule issue, produces a verified mathematical proof with computational verification from two independent implementations, iterating until all checks pass. Creates a worktree, works in isolation, and submits a PR — following `issue-to-pr` conventions. -Outputs: Typst proof entry, Python verification script, Lean lemmas — all at PR #975 quality level. +Outputs: Typst proof entry, constructor Python verification script, adversary Python verification script — all at PR #975 quality level. ## Invocation @@ -18,9 +18,9 @@ Outputs: Typst proof entry, Python verification script, Lean lemmas — all at P ## Prerequisites -- `sympy` and `networkx` installed (`pip install sympy networkx`) +- `sympy`, `networkx`, and `hypothesis` installed (`pip install sympy networkx hypothesis`) - Both source and target models must exist in the codebase (`pred show `) -- For Lean: `elan` installed with Lean 4 toolchain +- Optional: `elan` with Lean 4 toolchain (for formal lemmas — not required) ## Process @@ -33,16 +33,19 @@ digraph verify { "Study models" [shape=box]; "Write Typst proof" [shape=box]; "Compile PDF" [shape=box]; - "Write Python script" [shape=box]; - "Run script" [shape=box]; + "Write constructor script" [shape=box]; + "Run constructor" [shape=box]; "All pass?" [shape=diamond]; "Fix proof + script" [shape=box]; "Enough checks?" [shape=diamond]; "Enhance script" [shape=box]; - "Add Lean lemmas" [shape=box]; - "Lean builds?" [shape=diamond]; - "Fix Lean" [shape=box]; - "Self-review (HARSH)" [shape=box]; + "Dispatch adversary" [shape=box]; + "Run adversary" [shape=box]; + "Adversary pass?" [shape=diamond]; + "Cross-compare" [shape=box]; + "Verdict table" [shape=diamond]; + "Investigate discrepancy" [shape=box]; + "Self-review" [shape=box]; "Create PR" [shape=box]; "Report" [shape=doublecircle]; @@ -51,20 +54,23 @@ digraph verify { "Read issue" -> "Study models"; "Study models" -> "Write Typst proof"; "Write Typst proof" -> "Compile PDF"; - "Compile PDF" -> "Write Python script"; - "Write Python script" -> "Run script"; - "Run script" -> "All pass?"; + "Compile PDF" -> "Write constructor script"; + "Write constructor script" -> "Run constructor"; + "Run constructor" -> "All pass?"; "All pass?" -> "Enough checks?" [label="yes"]; "All pass?" -> "Fix proof + script" [label="no"]; - "Fix proof + script" -> "Run script"; - "Enough checks?" -> "Add Lean lemmas" [label="yes"]; + "Fix proof + script" -> "Run constructor"; + "Enough checks?" -> "Dispatch adversary" [label="yes"]; "Enough checks?" -> "Enhance script" [label="no"]; - "Enhance script" -> "Run script"; - "Add Lean lemmas" -> "Lean builds?"; - "Lean builds?" -> "Self-review (HARSH)" [label="yes"]; - "Lean builds?" -> "Fix Lean" [label="no"]; - "Fix Lean" -> "Lean builds?"; - "Self-review (HARSH)" -> "Create PR"; + "Enhance script" -> "Run constructor"; + "Dispatch adversary" -> "Run adversary"; + "Run adversary" -> "Adversary pass?"; + "Adversary pass?" -> "Cross-compare" [label="yes or no"]; + "Cross-compare" -> "Verdict table"; + "Verdict table" -> "Self-review" [label="verified"]; + "Verdict table" -> "Investigate discrepancy" [label="discrepancy"]; + "Investigate discrepancy" -> "Write constructor script"; + "Self-review" -> "Create PR"; "Create PR" -> "Report"; } ``` @@ -147,7 +153,7 @@ Compile: python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')" ``` -## Step 3: Write Python Verification Script +## Step 3: Write Constructor Python Verification Script Create `docs/paper/verify-reductions/verify__.py`. @@ -261,51 +267,164 @@ CLAIM TESTED BY If any claim has no test, add one. If it's untestable, document WHY. -## Step 5: Add Lean Lemmas +## Step 5: Adversary Verification -### HARD requirement: at least one NON-TRIVIAL lemma +The adversary step provides independent verification by a second agent that implements the reduction from scratch, using only the Typst proof as specification. This catches bugs that the constructor's own tests cannot find (confirmation bias, shared implementation errors). -`n + m = n + m := rfl` does NOT count. A "non-trivial" lemma must satisfy at least one of: +### 5a. Dispatch adversary agent -1. **Uses a Mathlib tactic beyond `rfl`/`omega`**: e.g., `simp`, `Finset.sum_union`, lattice operations -2. **States a structural property**: e.g., "complementarity subsets partition the universe into pairs" -3. **Formalizes the key invariant**: e.g., "NAE-satisfying ↔ 2-colorable hypergraph" +Launch a subagent with the following prompt template. The adversary must not see the constructor's Python script — it reads only the Typst proof. -If Mathlib genuinely lacks the infrastructure (no Hamiltonian paths, no DAG quotients), write the strongest lemma you CAN prove and document what WOULD be proved with better Mathlib support. +**Adversary prompt:** -### Examples of acceptable vs unacceptable Lean lemmas +```` +You are an adversary verifier for a mathematical reduction proof. -**Unacceptable (trivial arithmetic):** -```lean -theorem overhead (n m : ℕ) : n + m = n + m := rfl -- proves nothing -theorem universe (n : ℕ) : 2 * n = 2 * n := rfl -- proves nothing -``` +Your goal: independently implement and test the reduction described in the Typst +proof below, trying to find bugs. You have no access to the constructor's +implementation. Write your own from scratch based solely on the mathematical +specification. + +## Input + +Typst proof file: docs/paper/proposed-reductions.typ (section on ) +Issue: # + +## Your task + +1. Read the Typst proof carefully. Extract: + - The construction algorithm (how to build the target instance) + - The correctness argument (forward and backward directions) + - The solution extraction procedure + - The overhead formula + - The YES and NO examples -**Acceptable (structural):** -```lean --- Uses Mathlib's lattice theory to prove a graph-structural fact -theorem complement_partition (G : SimpleGraph V) : G ⊔ Gᶜ = ⊤ := sup_compl_eq_top +2. Create `docs/paper/verify-reductions/adversary__.py` with: + - Your own `reduce()` function implementing the construction from scratch + - Your own `extract_solution()` function implementing solution extraction + - Your own `is_feasible_source()` and `is_feasible_target()` validators + - Exhaustive testing for n ≤ 5 (forward + backward + extraction) + - Overhead formula verification + - Reproduction of both Typst examples (YES and NO) + - Property-based testing using `hypothesis` (at least 2 strategies) + - Minimum 5,000 total checks --- Formalizes the key definition used in the proof -def IsNAESatisfying (assignment : Fin n → Bool) (clause : Finset (Fin n × Bool)) : Prop := - ¬(∀ l ∈ clause, ...) ∧ ¬(∀ l ∈ clause, ...) +3. Use `hypothesis` for property-based testing. Example strategies: + ```python + from hypothesis import given, settings + from hypothesis import strategies as st --- Proves the overhead formula requires actual computation -theorem overhead_nontrivial (n m : ℕ) (h : m > 0) : - 2 * n + (n + m) > 2 * n := by omega -- at least uses a hypothesis + @given(st.lists(st.integers(0, 1), min_size=3, max_size=8)) + @settings(max_examples=500) + def test_roundtrip(assignment): + # Build source from assignment, reduce, check equivalence + ... + ``` + +4. Run the script. Report pass/fail counts and any bugs found. + +5. Do NOT read or import from `verify__.py`. Your implementation + must be independent. + +## Output format + +Print at the end: +``` +ADVERSARY: : N passed, M failed +BUGS FOUND: ``` +```` -### Build and verify +### 5b. Run adversary script ```bash -cd docs/paper/verify-reductions/lean -export PATH="$HOME/.elan/bin:$PATH" -lake build +python3 docs/paper/verify-reductions/adversary__.py ``` +### 5c. Cross-comparison + +After both scripts have run, compare their `reduce()` outputs on a shared set of instances. Create and run this comparison inline: + +```python +#!/usr/bin/env python3 +"""Cross-compare constructor and adversary implementations.""" +import sys +sys.path.insert(0, "docs/paper/verify-reductions") + +from verify__ import reduce as constructor_reduce +from adversary__ import reduce as adversary_reduce + +# Also import feasibility checkers from both sides +from verify__ import is_feasible_source, is_feasible_target +from adversary__ import ( + is_feasible_source as adv_is_feasible_source, + is_feasible_target as adv_is_feasible_target, +) + +def normalize(target_instance): + """Convert target instance to a canonical hashable form for comparison. + Adapt this to the specific target problem type.""" + # For graph problems: sorted edge list + vertex count + # For formula problems: sorted clause list + # For set problems: frozenset of frozensets + return tuple(sorted(str(x) for x in target_instance)) + +agree = disagree = 0 +feasibility_mismatch = 0 + +# Generate shared test instances (adapt to source problem type) +import itertools +for n in range(3, 6): + for instance in generate_all_instances(n): # problem-specific generator + c_target = constructor_reduce(instance) + a_target = adversary_reduce(instance) + + # Compare structural equivalence + if normalize(c_target) == normalize(a_target): + agree += 1 + else: + disagree += 1 + print(f" DISAGREE on instance {instance}") + print(f" Constructor: {c_target}") + print(f" Adversary: {a_target}") + + # Compare feasibility verdicts + c_feas = is_feasible_target(c_target) + a_feas = adv_is_feasible_target(a_target) + if c_feas != a_feas: + feasibility_mismatch += 1 + print(f" FEASIBILITY MISMATCH on {instance}: " + f"constructor={c_feas}, adversary={a_feas}") + +print(f"\nCross-comparison: {agree} agree, {disagree} disagree, " + f"{feasibility_mismatch} feasibility mismatches") +if disagree > 0 or feasibility_mismatch > 0: + print("ACTION REQUIRED: investigate discrepancies before proceeding") + sys.exit(1) +``` + +Adapt the `normalize()` function and instance generator to the specific source/target problem types. + +### 5d. Verdict criteria + +Use this table to determine the outcome. All three signals must be considered together: + +| Constructor | Adversary | Cross-comparison | Verdict | Action | +|-------------|-----------|-----------------|---------|--------| +| Pass | Pass | Agree | Verified | Proceed to Step 6 | +| Pass | Pass | Disagree | Suspect | Both produce valid but structurally different targets. Investigate whether both are correct reductions (e.g., isomorphic gadgets) or one has a latent bug. Resolve before proceeding. | +| Pass | Fail | Agree | Adversary bug | Review adversary script for implementation errors. If adversary misread the Typst spec, fix spec clarity and re-run adversary. | +| Pass | Fail | Disagree | Suspect | Constructor may have a bug masked by its own tests. Investigate the disagreeing instances manually. | +| Fail | Pass | Agree | Constructor bug | Fix constructor script, re-run from Step 4. | +| Fail | Pass | Disagree | Suspect | Investigate both — the agreeing instances may be coincidental. | +| Fail | Fail | Agree | Proof bug | The reduction itself is likely wrong. Re-examine the Typst proof. Return to Step 2. | +| Fail | Fail | Disagree | Proof bug | Same as above — shared failures with structural disagreement indicate a fundamental problem. | + +When the verdict is "Suspect," the resolution is to manually inspect the disagreeing instances until the root cause is identified, then loop back to the appropriate step. + ## Step 6: Self-Review -Before declaring verified, run through this checklist. **Every item must be YES.** If any is NO, go back and fix it. +Before declaring verified, run through this checklist. Every item must be YES. If any is NO, go back and fix it. ### Typst proof @@ -320,7 +439,7 @@ Before declaring verified, run through this checklist. **Every item must be YES. - [ ] Zero instances of "Wait", "Hmm", "Actually", scratch work - [ ] Every symbol defined before first use -### Python script +### Constructor Python script - [ ] 0 failures - [ ] ≥ 5,000 total checks @@ -333,18 +452,31 @@ Before declaring verified, run through this checklist. **Every item must be YES. - [ ] Section 7 (NO example) reproduces Typst infeasible example exactly - [ ] Gap analysis performed — every Typst claim has a corresponding test -### Lean +### Adversary Python script -- [ ] Builds without errors (warnings OK) -- [ ] At least one non-trivial lemma (not just `rfl` or `omega` on a tautology) -- [ ] Every `sorry` has a comment explaining WHY +- [ ] 0 failures +- [ ] ≥ 5,000 total checks +- [ ] Implemented independently (no imports from constructor script) +- [ ] Uses `hypothesis` property-based testing (at least 2 strategies) +- [ ] Exhaustive testing for n ≤ 5 +- [ ] Reproduces both Typst examples ### Cross-consistency -- [ ] The Python script's `reduce()` function implements EXACTLY the Typst construction -- [ ] The Python script's `extract_solution()` implements EXACTLY the Typst extraction -- [ ] The overhead formula in Python matches the Typst overhead table -- [ ] The examples in Python match the Typst examples (same numbers, same instances) +- [ ] Cross-comparison script ran with 0 disagreements +- [ ] Cross-comparison script ran with 0 feasibility mismatches +- [ ] The constructor script's `reduce()` implements the Typst construction +- [ ] The adversary script's `reduce()` implements the Typst construction independently +- [ ] The overhead formula in both scripts matches the Typst overhead table +- [ ] The examples in both scripts match the Typst examples (same numbers, same instances) + +### Lean (optional) + +If Lean lemmas were added: + +- [ ] Builds without errors (warnings OK) +- [ ] At least one non-trivial lemma (not just `rfl` or `omega` on a tautology) +- [ ] Every `sorry` has a comment explaining WHY ## Step 7: Report @@ -359,14 +491,26 @@ Typst proof: §X.Y - YES example: ✓ (N vars/vertices) - NO example: ✓ (N vars/vertices, reason: ...) -Python: verify__.py +Constructor: verify__.py - Checks: N passed, 0 failed - Sections: 1(sympy) 2(exhaustive) 3(extraction) 4(overhead) 5(structural) 6(YES) 7(NO) - Forward: exhaustive n ≤ K - Backward: exhaustive n ≤ K - Gap analysis: all claims covered -Lean: ReductionProofs/Basic.lean (or .lean) +Adversary: adversary__.py + - Checks: N passed, 0 failed + - Property-based: M hypothesis examples + - Forward: exhaustive n ≤ K + - Backward: exhaustive n ≤ K + - Bugs found: + +Cross-comparison: + - Instances compared: N + - Structural agreement: N/N + - Feasibility agreement: N/N + +Lean: .lean (optional) - Non-trivial lemmas: N - Trivial lemmas: M - Sorry: J (with justification) @@ -382,14 +526,16 @@ Verdict: VERIFIED / OPEN (with reason) ### 8a. Commit ```bash -git add docs/paper/.typ docs/paper/verify-reductions/verify_*.py \ - docs/paper/verify-reductions/lean/ReductionProofs/*.lean +git add docs/paper/.typ \ + docs/paper/verify-reductions/verify_*.py \ + docs/paper/verify-reductions/adversary_*.py git add -f docs/paper/.pdf git commit -m "docs: /verify-reduction # VERIFIED Typst: Construction + Correctness + Extraction + Overhead + YES/NO examples -Python: N checks, 0 failures (exhaustive n ≤ K, 7 sections) -Lean: M non-trivial lemmas +Constructor: N checks, 0 failures (exhaustive n ≤ K, 7 sections) +Adversary: M checks, 0 failures (independent impl + hypothesis) +Cross-comparison: all instances agree Co-Authored-By: Claude Opus 4.6 (1M context) " ``` @@ -420,15 +566,18 @@ A reduction is **VERIFIED** when ALL of these hold: - [ ] Typst compiles, has all mandatory sections including YES and NO examples - [ ] Zero hand-waving language -- [ ] Python has 0 failures AND ≥ 5,000 checks -- [ ] All 7 Python sections present and non-empty -- [ ] Exhaustive n ≤ 5 minimum +- [ ] Constructor Python has 0 failures and ≥ 5,000 checks +- [ ] All 7 constructor Python sections present and non-empty +- [ ] Adversary Python has 0 failures and ≥ 5,000 checks +- [ ] Adversary Python uses `hypothesis` property-based testing +- [ ] Adversary implementation is independent (no shared code with constructor) +- [ ] Exhaustive n ≤ 5 minimum (both scripts) - [ ] Solution extraction verified for all feasible instances - [ ] Overhead formula matches actual construction -- [ ] Both Typst examples reproduced by script +- [ ] Both Typst examples reproduced by both scripts - [ ] Gap analysis shows all Typst claims tested -- [ ] At least 1 non-trivial Lean lemma -- [ ] Cross-consistency between Typst and Python verified +- [ ] Cross-comparison shows 0 disagreements and 0 feasibility mismatches +- [ ] Cross-consistency between Typst, constructor, and adversary verified If any gate fails, go back and fix it before declaring the reduction verified. @@ -436,15 +585,18 @@ If any gate fails, go back and fix it before declaring the reduction verified. | Mistake | Consequence | |---------|-------------| -| Lean lemma is just `rfl` or `omega` on a tautology | Rejected — add structural lemma | +| Adversary imports from constructor script | Rejected — must be independent | +| No property-based testing in adversary | Rejected — add hypothesis strategies | | No symbolic checks (Section 1 empty) | Rejected — add sympy verification | | Only YES example, no NO example | Rejected — add infeasible instance | | n ≤ 3 or n ≤ 4 "because it's simple" | Rejected — minimum n ≤ 5 | | "Passed on first run" without gap analysis | Rejected — perform gap analysis | | Example has < 3 variables | Rejected — too degenerate | -| Script has < 5,000 checks | Rejected — enhance exhaustive testing | +| Either script has < 5,000 checks | Rejected — enhance exhaustive testing | | Extraction not tested | Rejected — add Section 3 | | Typst proof says "clearly" | Rejected — rewrite without hand-waving | +| Cross-comparison skipped | Rejected — run comparison script | +| Disagreements dismissed without investigation | Rejected — resolve every discrepancy | ## Integration @@ -459,5 +611,5 @@ Target quality defined by PR #975: - 800,000+ total checks, 0 unexpected failures - 3 bugs caught through iteration loop - Every script has forward + backward + extraction + overhead + example -- Non-trivial Lean: `G ⊔ Gᶜ = ⊤` via `sup_compl_eq_top` +- Two independent implementations agreeing on all test instances - Failures marked OPEN honestly with diagnosis diff --git a/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md b/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md new file mode 100644 index 000000000..c42684f08 --- /dev/null +++ b/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md @@ -0,0 +1,145 @@ +# Design: Adversarial Multi-Agent Reduction Verification + +**Date:** 2026-04-01 +**Goal:** Strengthen `/verify-reduction` with an adversarial second agent that independently verifies the reduction, catching systematic errors that the single-agent approach misses. + +## Problem Statement + +The current verification flow has a fundamental weakness: the same AI agent writes both the mathematical proof AND the verification script. If the agent systematically misunderstands the reduction (e.g., wrong variable mapping, wrong extraction direction), it will write a wrong proof and a wrong script that agree with each other. + +Evidence from PR #975: +- Python caught 4 bugs; Typst proofs caught 0; Lean caught 0 +- All bugs were construction errors or formula mistakes +- No bug was caused by "proof is right but script is wrong" — they were always "both are wrong in the same way" + +The adversarial approach addresses this by having two independent agents verify the same reduction. + +## Architecture + +``` +/verify-reduction + │ + ├── Steps 0-2: Parse input, create worktree, read issue, write Typst proof + │ + ├── Step 3: Constructor Agent + │ └── Writes verify__.py (7 sections, ≥5000 checks) + │ + ├── Step 4: Run constructor script, iterate until 0 failures + │ + ├── Step 4b: Adversary Agent (dispatched as subagent with worktree isolation) + │ ├── Input: theorem statement + construction steps ONLY + │ │ (stripped of constructor's script, internal reasoning, examples) + │ ├── Writes: adversary__.py + │ │ ├── Independent reduce() implementation + │ │ ├── Independent extract_solution() + │ │ ├── Property-based testing (hypothesis, n up to 50) + │ │ ├── Targeted counterexample search + │ │ └── ≥5000 independent checks + │ └── Output: pass/fail + counterexamples + │ + ├── Step 4c: Cross-Comparison + │ ├── constructor.reduce(x) vs adversary.reduce(x) for 1000 instances + │ ├── constructor.check(adversary.reduce(x)) vs adversary.check(constructor.reduce(x)) + │ └── Disagreement → diagnose (real bug vs script bug) + │ + ├── Steps 5-8: Self-review, report, commit, PR + │ + └── Final PR includes BOTH scripts + cross-comparison results +``` + +## Adversary Agent Design + +### What the adversary receives + +A stripped prompt containing ONLY: +- The theorem statement (1-3 sentences) +- The construction steps (numbered) +- The extraction procedure +- The overhead formula + +The adversary does NOT receive: +- The constructor's Python script +- The Typst proof's correctness argument +- The worked examples +- Any internal reasoning from Step 3 + +### What the adversary produces + +`adversary__.py` with: + +1. **Independent `reduce()` function** — implemented from the construction steps, without seeing the constructor's implementation +2. **Independent `extract_solution()` function** — same +3. **Exhaustive testing** — forward + backward for n ≤ 5 (matching constructor's minimum) +4. **Property-based testing** — `hypothesis` with `@given(st.integers(min_value=2, max_value=50))` for random instances +5. **Targeted counterexample search**: + - All-True / all-False assignments + - Single-variable instances + - Maximum and minimum clause/subset sizes + - Instances where source is barely feasible (one satisfying assignment) + - Instances where source is barely infeasible (off by one element) +6. **Diagnostic output** — any counterexample prints the full instance, expected result, actual result + +### Cross-comparison + +After both scripts pass independently, the orchestrator runs: + +```python +# Import both implementations +from verify_source_target import reduce as constructor_reduce, check as constructor_check +from adversary_source_target import reduce as adversary_reduce, check as adversary_check + +# 1. Do they produce the same target instance? +for instance in random_instances(1000): + t1 = constructor_reduce(instance) + t2 = adversary_reduce(instance) + assert t1 == t2, f"Reductions disagree on {instance}" + +# 2. Cross-check feasibility +for instance in random_instances(1000): + assert constructor_check(adversary_reduce(instance)) == adversary_check(constructor_reduce(instance)) +``` + +## Verdict Criteria + +| Constructor | Adversary | Cross-check | Verdict | +|-------------|-----------|-------------|---------| +| 0 fail | 0 fail | Agreement | **VERIFIED** | +| 0 fail | Counterexample found | — | Real bug (constructor has blind spot) | +| Fail | — | — | Bug found before adversary needed | +| 0 fail | 0 fail | Disagreement | One implementation is wrong — diagnose | + +## Why Lean is Dropped + +Lean's contribution to this project: +- **Caught bugs:** 0 out of 4 +- **Useful proofs:** `G ⊔ Gᶜ = ⊤` (1 genuinely meaningful theorem) +- **Trivial proofs:** `n + m = n + m`, `14m + (2m-n) + 2nK = 16m-n+2nK` (add no confidence) +- **Infrastructure gap:** No `ReduceTo`, `NAESatisfiability`, `SetSplitting`, or reduction framework in Mathlib +- **ROI:** Negative. Every Lean lemma requires duplicating problem definitions that already exist in Rust. + +The adversarial Python approach provides stronger evidence: +- Two independent implementations agreeing > one implementation + arithmetic Lean proofs +- Property-based testing catches scale-dependent bugs that Lean can't +- No Mathlib infrastructure needed + +Lean remains **optional** (nice for credibility) but is removed from the required quality gates. + +## Changes to verify-reduction Skill + +| Step | Before | After | +|------|--------|-------| +| Step 3 | Constructor writes script | Unchanged | +| Step 4 | Iterate until 0 failures | Unchanged | +| **Step 4b** | — | Adversary agent writes independent script | +| **Step 4c** | — | Cross-comparison | +| Step 5 | Lean required | Lean optional | +| Quality gates | 1 script ≥5000 checks | 2 independent scripts ≥5000 each + cross-comparison | +| Deliverables | 3 files (Typst + Python + Lean) | 3 files (Typst + constructor Python + adversary Python) | + +## Cost + +- Compute: ~2x (two agents instead of one) +- Time: ~1.5x (adversary runs in parallel via subagent) +- Complexity: moderate (orchestrator manages two outputs) + +Worth it: a wrong reduction that passes review is far more expensive than 2x compute. From 518824a273370a910169a5cf8466581d31adef6d Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 07:55:09 +0000 Subject: [PATCH 05/23] docs: design spec for verify-reduction skill enhancements MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Typst↔Python auto-matching, test vectors JSON for downstream consumption by add-rule and review-pipeline, adversary tailoring by reduction type, compositional verification via pred CLI. Co-Authored-By: Claude Opus 4.6 (1M context) --- ...01-verify-reduction-enhancements-design.md | 174 ++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md diff --git a/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md b/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md new file mode 100644 index 000000000..1ea071e75 --- /dev/null +++ b/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md @@ -0,0 +1,174 @@ +# Verify-Reduction Skill Enhancements + +**Date:** 2026-04-01 +**Status:** Draft +**Context:** Lessons from running `/verify-reduction` on 3 reductions (#841 NAE→SS, #973 SS→Partition, #198 VC→HC) and comparison with PR #975's verification approach. + +## Problem + +The verify-reduction skill (PR #979) produces high-quality mathematical verification (Typst proofs + dual Python scripts), but: + +1. **Typst ↔ Python matching is manual.** The human writes examples in both places and hopes they agree. No automated cross-check. +2. **Gap analysis is manual.** The skill asks the agent to list every Typst claim and map it to a test — this is done as a text block, not verified programmatically. +3. **Verification is disconnected from Rust implementation.** The Python scripts verify the math, but when `/add-rule` implements the Rust code, it re-reads the issue from scratch. Verified artifacts aren't consumed downstream. +4. **No compositional testing.** The strongest verification — testing via alternative `pred` reduction paths — only becomes possible after the Rust implementation exists. The current skill doesn't set this up. + +## Design + +### Pipeline Integration + +The verify-reduction skill is a **pre-verification gate** in a larger pipeline: + +``` +Issue → /verify-reduction → /add-rule → /review-pipeline + (Typst + Python) (Rust impl) (agentic test) +``` + +The Python `reduce()` function is the verified spec. `/add-rule` translates it to Rust. `/review-pipeline`'s agentic test (`pred reduce`, `pred solve`) confirms the Rust matches. + +### Enhancement 1: Typst ↔ Python Auto-Matching + +#### Example-level (mandatory) + +The constructor script exports test vectors to a JSON sidecar file: + +``` +docs/paper/verify-reductions/test_vectors__.json +``` + +Contents: +```json +{ + "source": "SubsetSum", + "target": "Partition", + "yes_instance": { + "input": {"sizes": [3, 5, 7, 1, 4], "target": 8}, + "output": {"sizes": [3, 5, 7, 1, 4, 4]}, + "source_feasible": true, + "target_feasible": true, + "source_solution": [0, 1, 1, 0, 0], + "extracted_solution": [0, 1, 1, 0, 0] + }, + "no_instance": { + "input": {"sizes": [3, 7, 11], "target": 5}, + "output": {"sizes": [3, 7, 11, 11]}, + "source_feasible": false, + "target_feasible": false + }, + "overhead": { + "num_elements": "num_elements + 1" + } +} +``` + +A new validation step (Step 4.5) loads both the test vectors JSON and the Typst file, then checks that key numerical values from the JSON (input sizes, targets, output sizes, feasibility verdicts) appear in the Typst YES/NO example sections. The check is substring-based on the rendered numbers — not a full Typst parser. This catches divergence between the proof and the scripts (e.g., Typst says `S = {3, 5, 7}` but Python tests `S = {3, 5, 8}`). + +The same JSON is consumed downstream: +- `/add-rule` reads it to generate Rust `#[test]` closed-loop cases +- `/review-pipeline`'s agentic test uses it for `pred reduce`/`pred solve` verification + +#### Claim-level (best-effort) + +The constructor script emits structured claim tags: + +```python +claim("overhead_universe_size", "2*n", verified=True) +claim("forward_case2", "T + d = Sigma - T", verified=True) +claim("extraction_opposite_side", "sigma < 2T => opposite side from padding", verified=True) +``` + +Implementation: a `claims` list accumulated during `main()`, exported alongside the test vectors. The self-review step checks: every claim keyword should plausibly map to a Typst proof section. Claims without `verified=True` get flagged. This replaces the manual gap analysis table. + +### Enhancement 2: Adversary Tailoring by Reduction Type + +The adversary prompt is currently generic. Tailor it by reduction type: + +| Type | Adversary Focus | Rationale | +|------|----------------|-----------| +| Identity (same graph, different objective) | Exhaustive enumeration, edge-case configs | Bugs hide in subtle objective differences | +| Algebraic (padding, complement, case split) | Symbolic case boundaries, off-by-one in padding | SS→Partition had 3 cases; boundary conditions are error-prone | +| Gadget (widget construction) | Widget structure, traversal patterns, connectivity | VC→HC: the 3 traversal patterns and cross-edge positions are the core invariant | + +The skill detects type from the issue description (gadget keywords: "widget", "component", "gadget"; algebraic keywords: "padding", "complement", "case"; identity: everything else) and adjusts the adversary prompt template. + +### Enhancement 3: Downstream Integration with add-rule + +When `/add-rule` is invoked for a reduction that has verified artifacts: + +1. **Check for existing verification:** Look for `docs/paper/verify-reductions/verify__.py` and `test_vectors__.json`. +2. **Read Python `reduce()` as pseudocode:** The verified Python function is the spec for the Rust `reduce_to()` implementation. The agent translates it directly. +3. **Read overhead from test vectors JSON:** The `overhead` field maps directly to `#[reduction(overhead = {...})]` expressions. +4. **Generate Rust tests from test vectors:** The YES/NO instances become closed-loop test cases: construct source → reduce → solve target → extract → verify. +5. **Read Typst proof for documentation:** The proof's Construction section becomes the doc comment on the Rust impl. + +This eliminates re-derivation from the issue. The verified artifacts are the single source of truth. + +### Enhancement 4: Compositional Verification in review-pipeline + +After the Rust implementation exists, review-pipeline's agentic test gains a compositional check: + +1. Load test vectors from `test_vectors__.json` +2. For each test vector, run `pred solve ` via the new path +3. If an alternative path exists (e.g., both `NAE→SS→ILP` and `NAE→ILP`), run via both and compare +4. Disagreement = bug in either the new reduction or the test vector + +This is the strongest verification because it tests the actual Rust code through independent code paths. It happens naturally in review-pipeline's existing agentic test step — we just need the test vectors file to drive it. + +## Changes to SKILL.md + +### New Step 4.5: Auto-Matching Validation + +After the constructor passes (Step 4), before the adversary (Step 5): + +1. Constructor exports `test_vectors__.json` with YES/NO instances, overhead, and claims +2. Parse the Typst proof for numerical values in example sections +3. Verify the Typst examples match the JSON test vectors +4. Verify all `claim()` tags have `verified=True` +5. Flag any untested claims + +### Updated Step 5: Typed Adversary Prompt + +The adversary prompt template includes a reduction-type section: + +- **Identity reductions:** "Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration." +- **Algebraic reductions:** "Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently." +- **Gadget reductions:** "Focus on widget structure invariants. Verify each traversal pattern independently. Test that interior vertices have no external edges. Check connectivity after removing widgets." + +### Updated Step 8: Downstream Artifacts + +The commit includes the test vectors JSON alongside the Typst and Python files: + +```bash +git add docs/paper/verify-reductions/test_vectors__.json +``` + +The PR description notes: "Test vectors available for `/add-rule` to consume when implementing the Rust reduction." + +### Updated Integration Section + +``` +- **Before `/add-rule`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON +- **During `/add-rule`**: reads Python `reduce()` as pseudocode, overhead from JSON, generates Rust tests from test vectors +- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances, compositional check via alternative paths if available +``` + +## Non-Goals + +- **No Rust code generation.** verify-reduction stays in Typst + Python. Rust translation is add-rule's job. +- **No manifest format.** The Python script + test vectors JSON are the artifacts. No intermediate spec language. +- **No Lean changes.** Lean remains optional and orthogonal to this enhancement. + +## Files Changed + +1. `.claude/skills/verify-reduction/SKILL.md` — new Step 4.5, updated Step 5 and Step 8 +2. `.claude/skills/add-rule/SKILL.md` — add "check for existing verification artifacts" to Step 1 +3. `.claude/skills/review-pipeline/SKILL.md` — add compositional test vector check to agentic test step (or document as optional enhancement) + +## Success Criteria + +After implementing these enhancements, running `/verify-reduction` → `/add-rule` → `/review-pipeline` on a new reduction should: + +1. Produce a test vectors JSON consumed by both add-rule and review-pipeline +2. Auto-detect Typst ↔ Python example divergence (if introduced deliberately as a test) +3. Generate Rust tests from the same test vectors that the Python scripts verified +4. Catch Rust implementation bugs via `pred reduce`/`pred solve` on those test vectors From 05d06ac86199acdfd1c715dcbf89ab6dbcaedd7e Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 08:15:23 +0000 Subject: [PATCH 06/23] docs: implementation plan for verify-reduction enhancements 5 tasks: update verify-reduction (Step 4.5 auto-matching, Step 5 typed adversary, Step 8 downstream artifacts), create add-reduction skill, register in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) --- ...026-04-01-verify-reduction-enhancements.md | 466 ++++++++++++++++++ 1 file changed, 466 insertions(+) create mode 100644 docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md diff --git a/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md b/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md new file mode 100644 index 000000000..debd6c62f --- /dev/null +++ b/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md @@ -0,0 +1,466 @@ +# Verify-Reduction Enhancements Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Enhance `/verify-reduction` with auto-matching, test vector export, and typed adversary prompts. Create new `/add-reduction` skill that consumes verified artifacts. Keep original `/add-rule` and `/review-pipeline` unchanged. + +**Architecture:** Three file changes — update verify-reduction SKILL.md (new Step 4.5, updated Step 5/8), create add-reduction SKILL.md (fork of add-rule that reads Python reduce() + test vectors JSON), register add-reduction in CLAUDE.md. + +**Tech Stack:** Markdown skill definitions only — no code changes. + +--- + +### Task 1: Update verify-reduction SKILL.md — Add Step 4.5 (Auto-Matching + Test Vectors Export) + +**Files:** +- Modify: `.claude/skills/verify-reduction/SKILL.md` + +The new Step 4.5 goes between the current Step 4 (Run and Iterate) and Step 5 (Adversary Verification). It adds three things: (a) constructor exports test vectors JSON, (b) Typst ↔ JSON cross-check, (c) structured claim tags replace manual gap analysis. + +- [ ] **Step 1: Insert new Step 4.5 after the current "Iteration 3: Gap analysis" block** + +In `.claude/skills/verify-reduction/SKILL.md`, find the line `If any claim has no test, add one. If it's untestable, document WHY.` (end of Step 4, Iteration 3). After it, insert: + +```markdown +### Iteration 4: Export test vectors and validate Typst matching — MANDATORY + +After all checks pass and gap analysis is complete, the constructor script must export a test vectors JSON file for downstream consumption: + +**File:** `docs/paper/verify-reductions/test_vectors__.json` + +```json +{ + "source": "", + "target": "", + "issue": , + "yes_instance": { + "input": { ... }, + "output": { ... }, + "source_feasible": true, + "target_feasible": true, + "source_solution": [ ... ], + "extracted_solution": [ ... ] + }, + "no_instance": { + "input": { ... }, + "output": { ... }, + "source_feasible": false, + "target_feasible": false + }, + "overhead": { + "field_name": "expression using source getters" + }, + "claims": [ + {"tag": "overhead_universe_size", "formula": "2*n", "verified": true}, + {"tag": "forward_correctness", "description": "NAE-sat implies valid splitting", "verified": true} + ] +} +``` + +Add this export at the end of `main()` in the constructor script: + +```python +import json + +test_vectors = { + "source": "", + "target": "", + "issue": , + "yes_instance": { ... }, # from Section 6 + "no_instance": { ... }, # from Section 7 + "overhead": { ... }, # from Section 1/4 + "claims": claims_list, # accumulated claim() calls +} + +with open("docs/paper/verify-reductions/test_vectors__.json", "w") as f: + json.dump(test_vectors, f, indent=2) +print(f"Test vectors exported to test_vectors__.json") +``` + +**Typst ↔ JSON cross-check:** + +After exporting, load both the test vectors JSON and the Typst file. For each key numerical value in the JSON (input sizes, target values, output sizes, feasibility verdicts), check that it appears as a substring in the Typst YES/NO example sections. This is a substring search on the raw Typst text, not a full parser: + +```python +typst_text = open("").read() +for val in [str(v) for v in yes_instance["input"].values() if isinstance(v, (int, list))]: + assert str(val) in typst_text, f"Typst missing YES value: {val}" +for val in [str(v) for v in no_instance["input"].values() if isinstance(v, (int, list))]: + assert str(val) in typst_text, f"Typst missing NO value: {val}" +``` + +If any value is missing, the Typst proof and Python script are out of sync — fix before proceeding. + +**Structured claims (best-effort replacement for manual gap analysis):** + +Instead of the manual CLAIM/TESTED BY table, accumulate claims programmatically: + +```python +claims_list = [] + +def claim(tag, formula_or_desc, verified=True): + claims_list.append({"tag": tag, "formula": formula_or_desc, "verified": verified}) +``` + +Call `claim()` throughout the constructor script wherever a Typst proof claim is verified. The self-review step (Step 6) checks that all claims have `verified: true` and that the claim count is reasonable (at least 5 claims for any non-trivial reduction). +``` + +- [ ] **Step 2: Verify the edit is well-placed** + +Read the file and confirm Step 4.5 appears between Step 4 and Step 5 with correct markdown heading level. + +- [ ] **Step 3: Commit** + +```bash +git add .claude/skills/verify-reduction/SKILL.md +git commit -m "feat: add Step 4.5 to verify-reduction — test vectors export + Typst auto-matching" +``` + +--- + +### Task 2: Update verify-reduction SKILL.md — Typed Adversary Prompt (Step 5) + +**Files:** +- Modify: `.claude/skills/verify-reduction/SKILL.md` + +- [ ] **Step 1: Add reduction-type detection and tailored instructions to the adversary prompt** + +In `.claude/skills/verify-reduction/SKILL.md`, find the adversary prompt template in Step 5a (the block starting `You are an adversary verifier`). Before the `## Your task` section of the adversary prompt, insert: + +````markdown +## Reduction type + +Detect the reduction type from the Typst proof and tailor your testing focus: + +- **Identity reduction** (same graph/structure, different objective — keywords: "complement", "same graph", "negation"): Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration. Edge-case configs (all-zero, all-one, alternating) are highest priority. + +- **Algebraic reduction** (padding, case split, formula transformation — keywords: "padding", "case", "if Σ", "d ="): Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently. Include at least one hypothesis strategy targeting boundary values. + +- **Gadget reduction** (widget/component construction — keywords: "widget", "component", "gadget", "cover-testing"): Focus on widget structure invariants. Verify each traversal/usage pattern independently. Test that interior vertices/elements have no external connections. Check structural properties (connectivity, edge counts, degree sequences) across all small instances. +```` + +- [ ] **Step 2: Commit** + +```bash +git add .claude/skills/verify-reduction/SKILL.md +git commit -m "feat: add typed adversary prompts to verify-reduction (identity/algebraic/gadget)" +``` + +--- + +### Task 3: Update verify-reduction SKILL.md — Downstream Artifacts (Step 8) and Integration + +**Files:** +- Modify: `.claude/skills/verify-reduction/SKILL.md` + +- [ ] **Step 1: Update Step 8a commit to include test vectors JSON** + +Find the `git add` block in Step 8a. Change it to: + +```bash +git add docs/paper/.typ \ + docs/paper/verify-reductions/verify_*.py \ + docs/paper/verify-reductions/adversary_*.py \ + docs/paper/verify-reductions/test_vectors_*.json +git add -f docs/paper/.pdf +``` + +- [ ] **Step 2: Update the Integration section at the bottom of the file** + +Replace the current Integration section with: + +```markdown +## Integration + +### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline + +`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` function is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. + +- **Before `/add-reduction`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON +- **During `/add-reduction`**: reads Python `reduce()` as pseudocode, overhead from test vectors JSON, generates Rust tests from test vectors +- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances; compositional check via alternative paths if available + +### Standalone usage + +- **After `write-rule-in-paper`**: invoke to verify paper entry +- **During `review-structural`**: check verification script exists and passes +- **Before `issue-to-pr --execute`**: pre-validate the algorithm +``` + +- [ ] **Step 3: Add test vectors JSON to the Quality Gates checklist** + +In the Quality Gates section, add after the cross-consistency items: + +```markdown +- [ ] Test vectors JSON exported with YES/NO instances, overhead, and claims +- [ ] Typst ↔ JSON auto-matching passed (key values present in Typst text) +``` + +- [ ] **Step 4: Add test vectors JSON to the Self-Review checklist (Step 6)** + +In Step 6, add a new subsection after "Cross-consistency": + +```markdown +### Test vectors and auto-matching + +- [ ] `test_vectors__.json` exported successfully +- [ ] YES instance in JSON matches Typst feasible example (values present) +- [ ] NO instance in JSON matches Typst infeasible example (values present) +- [ ] All claims have `verified: true` +- [ ] At least 5 claims for non-trivial reductions +``` + +- [ ] **Step 5: Commit** + +```bash +git add .claude/skills/verify-reduction/SKILL.md +git commit -m "feat: update verify-reduction Step 8 + Integration for downstream pipeline" +``` + +--- + +### Task 4: Create add-reduction skill + +**Files:** +- Create: `.claude/skills/add-reduction/SKILL.md` + +This is a fork of add-rule that reads verified artifacts instead of deriving from the issue. + +- [ ] **Step 1: Create the skill directory** + +```bash +mkdir -p .claude/skills/add-reduction +``` + +- [ ] **Step 2: Write the SKILL.md** + +Create `.claude/skills/add-reduction/SKILL.md` with the following content: + +```markdown +--- +name: add-reduction +description: Add a new reduction rule using verified artifacts from /verify-reduction — reads Python reduce() as pseudocode, test vectors JSON for Rust tests, overhead from JSON +--- + +# Add Reduction (from Verified Artifacts) + +Step-by-step guide for adding a new reduction rule (A → B) when `/verify-reduction` has already produced verified artifacts (Typst proof, Python scripts, test vectors JSON). This skill consumes those artifacts directly instead of re-deriving from the issue. + +**When to use:** After `/verify-reduction` has produced a PR with verified artifacts for a reduction rule issue. Use `/add-rule` instead when no verification artifacts exist. + +## Step 0: Locate Verified Artifacts + +Check for existing verification artifacts: + +```bash +ls docs/paper/verify-reductions/verify__.py +ls docs/paper/verify-reductions/test_vectors__.json +ls docs/paper/verify-reductions/_.typ +``` + +If any are missing, run `/verify-reduction` first. + +### Read the artifacts + +1. **Python `reduce()` function** — this is the verified spec for the Rust `reduce_to()` implementation. Read it carefully; translate the algorithm, not the syntax. +2. **Test vectors JSON** — contains YES/NO instances with exact input/output values, overhead expressions, and verified claims. +3. **Typst proof** — the Construction section describes the algorithm in mathematical notation. Use for doc comments. + +```bash +# Load test vectors +TEST_VECTORS=$(cat docs/paper/verify-reductions/test_vectors__.json) +``` + +Extract from test vectors JSON: +- `overhead` → use directly in `#[reduction(overhead = { ... })]` +- `yes_instance.input` / `yes_instance.output` → first closed-loop test case +- `no_instance.input` / `no_instance.output` → infeasible test case +- `claims` → verify each is preserved in the Rust implementation + +## Reference Implementations + +Same as `/add-rule`: +- **Reduction rule:** `src/rules/minimumvertexcover_maximumindependentset.rs` +- **Reduction tests:** `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` +- **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` +- **Traits:** `src/rules/traits.rs` (`ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult`) + +## Step 1: Implement the reduction + +Create `src/rules/_.rs`. + +**Translation guide:** Map the Python `reduce()` function to Rust: + +| Python | Rust | +|--------|------| +| `reduce(n, clauses)` → `(universe_size, subsets)` | `fn reduce_to(&self) -> Self::Result` | +| `extract_assignment(n, config)` | `fn extract_solution(&self, target_sol: &[usize]) -> Vec` | +| `literal_to_element(lit, n)` | Private helper method | +| Python list of ints | `Vec`, `Vec`, etc. (match problem type) | + +**Overhead from test vectors JSON:** The `overhead` field maps directly to the `#[reduction]` macro: + +```rust +#[reduction(overhead = { + // Copy expressions verbatim from test_vectors JSON "overhead" field + field_name = "expression", +})] +``` + +The rest of the implementation structure follows `/add-rule` Step 1 exactly: ReductionResult struct, trait impl, ReduceTo impl. + +## Step 2: Register in mod.rs + +Same as `/add-rule` Step 2. Add `mod _;` to `src/rules/mod.rs`. + +## Step 3: Write unit tests from test vectors + +Create `src/unit_tests/rules/_.rs`. + +**Generate tests directly from test vectors JSON:** + +The YES instance becomes the primary closed-loop test: + +```rust +#[test] +fn test__to__closed_loop() { + // Construct source from test_vectors.yes_instance.input + let source = ::try_new(/* fields from JSON */).unwrap(); + + // Reduce + let reduction = ReduceTo::::reduce_to(&source); + + // Verify target matches test_vectors.yes_instance.output + let target = reduction.target_problem(); + assert_eq!(target.(), /* value from JSON output */); + + // Solve and extract + let solver = BruteForce; + for witness in solver.find_all_witnesses(target).unwrap() { + let extracted = reduction.extract_solution(&witness); + // Verify extracted solution is valid for source + let val = source.evaluate(&extracted); + assert!(val.0); // Or check objective value + } +} +``` + +The NO instance becomes the infeasible test: + +```rust +#[test] +fn test__to__infeasible() { + // Construct source from test_vectors.no_instance.input + let source = ::try_new(/* fields from JSON */).unwrap(); + + // Reduce + let reduction = ReduceTo::::reduce_to(&source); + + // Verify target is also infeasible + let solver = BruteForce; + let witnesses = solver.find_all_witnesses(reduction.target_problem()); + assert!(witnesses.is_none() || witnesses.unwrap().is_empty()); +} +``` + +Add additional structural tests as needed (target size, edge count, etc.) guided by the `claims` field in the test vectors JSON. + +## Step 4: Add canonical example to example_db + +Same as `/add-rule` Step 4. The YES instance from the test vectors JSON is a good candidate for the canonical example. + +## Step 5: Document in paper + +The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template. The proof text, worked examples, and overhead table are already written — adapt them to the paper's macros (`reduction-rule`, `problem-def`, etc.). + +Follow `/add-rule` Step 5 for the exact format. The heavy writing is already done; this step is reformatting. + +## Step 6: Regenerate exports and verify + +Same as `/add-rule` Step 6: + +```bash +cargo run --example export_graph +cargo run --example export_schemas +make regenerate-fixtures +make test clippy +``` + +## Solver Rules + +Same as `/add-rule`. If the target problem needs ILP, implement alongside. + +## CLI Impact + +Same as `/add-rule`. No CLI changes needed for witness-preserving reductions. + +## File Naming + +Same as `/add-rule`: +- Rule file: `src/rules/_.rs` +- Test file: `src/unit_tests/rules/_.rs` +- Canonical example: builder function in `src/example_db/rule_builders.rs` + +## Common Mistakes + +All mistakes from `/add-rule` apply, plus: + +| Mistake | Fix | +|---------|-----| +| Re-deriving algorithm from issue instead of reading Python `reduce()` | The Python function is the verified spec — translate it, don't reinvent | +| Ignoring test vectors JSON | Use the YES/NO instances for Rust tests directly | +| Overhead expressions don't match test vectors JSON | Copy verbatim from the `overhead` field | +| Skipping the infeasible (NO) test case | The NO instance is in the test vectors — always include it | +| Not integrating the existing Typst proof into the paper | The proof is already written; reformat, don't rewrite | +``` + +- [ ] **Step 3: Commit** + +```bash +git add .claude/skills/add-reduction/SKILL.md +git commit -m "feat: create add-reduction skill — consumes verified artifacts from verify-reduction" +``` + +--- + +### Task 5: Register add-reduction in CLAUDE.md + +**Files:** +- Modify: `.claude/CLAUDE.md` + +- [ ] **Step 1: Add add-reduction entry to the Skills list** + +In `.claude/CLAUDE.md`, find the line: +``` +- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. +``` + +Add immediately after it: +``` +- [add-reduction](skills/add-reduction/SKILL.md) -- Add a new reduction rule from verified artifacts (Python reduce() + test vectors JSON from `/verify-reduction`). Use instead of `add-rule` when verification artifacts exist. +``` + +- [ ] **Step 2: Update the verify-reduction description** + +Find the line starting `- [verify-reduction]` and replace it with: + +``` +- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), adversary Python script (≥5000 independent checks), and test vectors JSON for downstream consumption by `add-reduction`. Iterates until all checks pass. Creates worktree + PR. +``` + +- [ ] **Step 3: Commit** + +```bash +git add .claude/CLAUDE.md +git commit -m "feat: register add-reduction skill in CLAUDE.md, update verify-reduction description" +``` + +--- + +Plan complete and saved to `docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md`. Two execution options: + +**1. Subagent-Driven (recommended)** - I dispatch a fresh subagent per task, review between tasks, fast iteration + +**2. Inline Execution** - Execute tasks in this session using executing-plans, batch execution with checkpoints + +Which approach? \ No newline at end of file From d5b782189e250ca500dc859bce345ecebd8531d4 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 08:27:11 +0000 Subject: [PATCH 07/23] feat: enhance verify-reduction with test vectors export, typed adversary, pipeline integration Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/verify-reduction/SKILL.md | 115 ++++++++++++++++++++++- 1 file changed, 113 insertions(+), 2 deletions(-) diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index 8981aa8d4..d2db24139 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -267,6 +267,87 @@ CLAIM TESTED BY If any claim has no test, add one. If it's untestable, document WHY. +### Iteration 4: Export test vectors and validate Typst matching — MANDATORY + +After all checks pass and gap analysis is complete, the constructor script must export a test vectors JSON file for downstream consumption by `/add-reduction`: + +**File:** `docs/paper/verify-reductions/test_vectors__.json` + +```json +{ + "source": "", + "target": "", + "issue": "", + "yes_instance": { + "input": { "...problem-specific fields..." }, + "output": { "...reduced instance fields..." }, + "source_feasible": true, + "target_feasible": true, + "source_solution": ["...config..."], + "extracted_solution": ["...config..."] + }, + "no_instance": { + "input": { "...problem-specific fields..." }, + "output": { "...reduced instance fields..." }, + "source_feasible": false, + "target_feasible": false + }, + "overhead": { + "field_name": "expression using source getters" + }, + "claims": [ + {"tag": "claim_tag", "formula": "formula or description", "verified": true} + ] +} +``` + +Add this export at the end of `main()` in the constructor script: + +```python +import json + +test_vectors = { + "source": "", + "target": "", + "issue": , + "yes_instance": { ... }, # from Section 6 + "no_instance": { ... }, # from Section 7 + "overhead": { ... }, # from Section 1/4 + "claims": claims_list, # accumulated claim() calls +} + +with open("docs/paper/verify-reductions/test_vectors__.json", "w") as f: + json.dump(test_vectors, f, indent=2) +print(f"Test vectors exported to test_vectors__.json") +``` + +**Typst ↔ JSON cross-check:** + +After exporting, load both the test vectors JSON and the Typst file. For each key numerical value in the JSON (input sizes, target values, output sizes), check that it appears as a substring in the Typst YES/NO example sections. This is a substring search on the raw Typst text — not a full parser: + +```python +typst_text = open("").read() +for val in [str(v) for v in yes_instance["input"].values() if isinstance(v, (int, list))]: + assert str(val) in typst_text, f"Typst missing YES value: {val}" +for val in [str(v) for v in no_instance["input"].values() if isinstance(v, (int, list))]: + assert str(val) in typst_text, f"Typst missing NO value: {val}" +``` + +If any value is missing, the Typst proof and Python script are out of sync — fix before proceeding. + +**Structured claims (best-effort replacement for manual gap analysis):** + +Instead of the manual CLAIM/TESTED BY table, accumulate claims programmatically: + +```python +claims_list = [] + +def claim(tag, formula_or_desc, verified=True): + claims_list.append({"tag": tag, "formula": formula_or_desc, "verified": verified}) +``` + +Call `claim()` throughout the constructor script wherever a Typst proof claim is verified. The self-review step (Step 6) checks that all claims have `verified: true` and that the claim count is reasonable (at least 5 claims for any non-trivial reduction). + ## Step 5: Adversary Verification The adversary step provides independent verification by a second agent that implements the reduction from scratch, using only the Typst proof as specification. This catches bugs that the constructor's own tests cannot find (confirmation bias, shared implementation errors). @@ -290,6 +371,16 @@ specification. Typst proof file: docs/paper/proposed-reductions.typ (section on ) Issue: # +## Reduction type + +Detect the reduction type from the Typst proof and tailor your testing focus: + +- **Identity reduction** (same graph/structure, different objective — keywords: "complement", "same graph", "negation"): Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration. Edge-case configs (all-zero, all-one, alternating) are highest priority. + +- **Algebraic reduction** (padding, case split, formula transformation — keywords: "padding", "case", "if Σ", "d ="): Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently. Include at least one hypothesis strategy targeting boundary values. + +- **Gadget reduction** (widget/component construction — keywords: "widget", "component", "gadget", "cover-testing"): Focus on widget structure invariants. Verify each traversal/usage pattern independently. Test that interior vertices/elements have no external connections. Check structural properties (connectivity, edge counts, degree sequences) across all small instances. + ## Your task 1. Read the Typst proof carefully. Extract: @@ -470,6 +561,14 @@ Before declaring verified, run through this checklist. Every item must be YES. I - [ ] The overhead formula in both scripts matches the Typst overhead table - [ ] The examples in both scripts match the Typst examples (same numbers, same instances) +### Test vectors and auto-matching + +- [ ] `test_vectors__.json` exported successfully +- [ ] YES instance in JSON matches Typst feasible example (values present) +- [ ] NO instance in JSON matches Typst infeasible example (values present) +- [ ] All claims have `verified: true` +- [ ] At least 5 claims for non-trivial reductions + ### Lean (optional) If Lean lemmas were added: @@ -528,7 +627,8 @@ Verdict: VERIFIED / OPEN (with reason) ```bash git add docs/paper/.typ \ docs/paper/verify-reductions/verify_*.py \ - docs/paper/verify-reductions/adversary_*.py + docs/paper/verify-reductions/adversary_*.py \ + docs/paper/verify-reductions/test_vectors_*.json git add -f docs/paper/.pdf git commit -m "docs: /verify-reduction # VERIFIED @@ -578,6 +678,8 @@ A reduction is **VERIFIED** when ALL of these hold: - [ ] Gap analysis shows all Typst claims tested - [ ] Cross-comparison shows 0 disagreements and 0 feasibility mismatches - [ ] Cross-consistency between Typst, constructor, and adversary verified +- [ ] Test vectors JSON exported with YES/NO instances, overhead, and claims +- [ ] Typst ↔ JSON auto-matching passed (key values present in Typst text) If any gate fails, go back and fix it before declaring the reduction verified. @@ -600,7 +702,16 @@ If any gate fails, go back and fix it before declaring the reduction verified. ## Integration -- **After `add-rule`**: invoke `/verify-reduction` before creating PR +### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline + +`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` function is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. + +- **Before `/add-reduction`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON +- **During `/add-reduction`**: reads Python `reduce()` as pseudocode, overhead from test vectors JSON, generates Rust tests from test vectors +- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances; compositional check via alternative paths if available + +### Standalone usage + - **After `write-rule-in-paper`**: invoke to verify paper entry - **During `review-structural`**: check verification script exists and passes - **Before `issue-to-pr --execute`**: pre-validate the algorithm From 7d8c417fa24bf3893931254700570be84374cbca Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 08:27:22 +0000 Subject: [PATCH 08/23] =?UTF-8?q?feat:=20create=20add-reduction=20skill=20?= =?UTF-8?q?=E2=80=94=20consumes=20verified=20artifacts=20from=20verify-red?= =?UTF-8?q?uction?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .claude/skills/add-reduction/SKILL.md | 175 ++++++++++++++++++++++++++ 1 file changed, 175 insertions(+) create mode 100644 .claude/skills/add-reduction/SKILL.md diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md new file mode 100644 index 000000000..325180acf --- /dev/null +++ b/.claude/skills/add-reduction/SKILL.md @@ -0,0 +1,175 @@ +--- +name: add-reduction +description: Add a new reduction rule using verified artifacts from /verify-reduction — reads Python reduce() as pseudocode, test vectors JSON for Rust tests, overhead from JSON +--- + +# Add Reduction (from Verified Artifacts) + +Step-by-step guide for adding a new reduction rule (A → B) when `/verify-reduction` has already produced verified artifacts (Typst proof, Python scripts, test vectors JSON). This skill consumes those artifacts directly instead of re-deriving from the issue. + +**When to use:** After `/verify-reduction` has produced a PR with verified artifacts for a reduction rule issue. Use `/add-rule` instead when no verification artifacts exist. + +## Step 0: Locate Verified Artifacts + +Check for existing verification artifacts: + +```bash +ls docs/paper/verify-reductions/verify__.py +ls docs/paper/verify-reductions/test_vectors__.json +ls docs/paper/verify-reductions/_.typ +``` + +If any are missing, run `/verify-reduction` first. + +### Read the artifacts + +1. **Python `reduce()` function** — this is the verified spec for the Rust `reduce_to()` implementation. Read it carefully; translate the algorithm, not the syntax. +2. **Test vectors JSON** — contains YES/NO instances with exact input/output values, overhead expressions, and verified claims. +3. **Typst proof** — the Construction section describes the algorithm in mathematical notation. Use for doc comments. + +```bash +# Load test vectors +TEST_VECTORS=$(cat docs/paper/verify-reductions/test_vectors__.json) +``` + +Extract from test vectors JSON: +- `overhead` → use directly in `#[reduction(overhead = { ... })]` +- `yes_instance.input` / `yes_instance.output` → first closed-loop test case +- `no_instance.input` / `no_instance.output` → infeasible test case +- `claims` → verify each is preserved in the Rust implementation + +## Reference Implementations + +Same as `/add-rule`: +- **Reduction rule:** `src/rules/minimumvertexcover_maximumindependentset.rs` +- **Reduction tests:** `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` +- **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` +- **Traits:** `src/rules/traits.rs` (`ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult`) + +## Step 1: Implement the reduction + +Create `src/rules/_.rs`. + +**Translation guide:** Map the Python `reduce()` function to Rust: + +| Python | Rust | +|--------|------| +| `reduce(n, clauses)` → `(universe_size, subsets)` | `fn reduce_to(&self) -> Self::Result` | +| `extract_assignment(n, config)` | `fn extract_solution(&self, target_sol: &[usize]) -> Vec` | +| `literal_to_element(lit, n)` | Private helper method | +| Python list of ints | `Vec`, `Vec`, etc. (match problem type) | + +**Overhead from test vectors JSON:** The `overhead` field maps directly to the `#[reduction]` macro: + +```rust +#[reduction(overhead = { + // Copy expressions verbatim from test_vectors JSON "overhead" field + field_name = "expression", +})] +``` + +The rest of the implementation structure follows `/add-rule` Step 1 exactly: ReductionResult struct, trait impl, ReduceTo impl. + +## Step 2: Register in mod.rs + +Same as `/add-rule` Step 2. Add `mod _;` to `src/rules/mod.rs`. + +## Step 3: Write unit tests from test vectors + +Create `src/unit_tests/rules/_.rs`. + +**Generate tests directly from test vectors JSON:** + +The YES instance becomes the primary closed-loop test: + +```rust +#[test] +fn test__to__closed_loop() { + // Construct source from test_vectors.yes_instance.input + let source = ::try_new(/* fields from JSON */).unwrap(); + + // Reduce + let reduction = ReduceTo::::reduce_to(&source); + + // Verify target matches test_vectors.yes_instance.output + let target = reduction.target_problem(); + assert_eq!(target.(), /* value from JSON output */); + + // Solve and extract + let solver = BruteForce; + for witness in solver.find_all_witnesses(target).unwrap() { + let extracted = reduction.extract_solution(&witness); + // Verify extracted solution is valid for source + let val = source.evaluate(&extracted); + assert!(val.0); // Or check objective value + } +} +``` + +The NO instance becomes the infeasible test: + +```rust +#[test] +fn test__to__infeasible() { + // Construct source from test_vectors.no_instance.input + let source = ::try_new(/* fields from JSON */).unwrap(); + + // Reduce + let reduction = ReduceTo::::reduce_to(&source); + + // Verify target is also infeasible + let solver = BruteForce; + let witnesses = solver.find_all_witnesses(reduction.target_problem()); + assert!(witnesses.is_none() || witnesses.unwrap().is_empty()); +} +``` + +Add additional structural tests as needed (target size, edge count, etc.) guided by the `claims` field in the test vectors JSON. + +## Step 4: Add canonical example to example_db + +Same as `/add-rule` Step 4. The YES instance from the test vectors JSON is a good candidate for the canonical example. + +## Step 5: Document in paper + +The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template. The proof text, worked examples, and overhead table are already written — adapt them to the paper's macros (`reduction-rule`, `problem-def`, etc.). + +Follow `/add-rule` Step 5 for the exact format. The heavy writing is already done; this step is reformatting. + +## Step 6: Regenerate exports and verify + +Same as `/add-rule` Step 6: + +```bash +cargo run --example export_graph +cargo run --example export_schemas +make regenerate-fixtures +make test clippy +``` + +## Solver Rules + +Same as `/add-rule`. If the target problem needs ILP, implement alongside. + +## CLI Impact + +Same as `/add-rule`. No CLI changes needed for witness-preserving reductions. + +## File Naming + +Same as `/add-rule`: +- Rule file: `src/rules/_.rs` +- Test file: `src/unit_tests/rules/_.rs` +- Canonical example: builder function in `src/example_db/rule_builders.rs` + +## Common Mistakes + +All mistakes from `/add-rule` apply, plus: + +| Mistake | Fix | +|---------|-----| +| Re-deriving algorithm from issue instead of reading Python `reduce()` | The Python function is the verified spec — translate it, don't reinvent | +| Ignoring test vectors JSON | Use the YES/NO instances for Rust tests directly | +| Overhead expressions don't match test vectors JSON | Copy verbatim from the `overhead` field | +| Skipping the infeasible (NO) test case | The NO instance is in the test vectors — always include it | +| Not integrating the existing Typst proof into the paper | The proof is already written; reformat, don't rewrite | From 022c1c0a9520c63bcf889a4257983c0106223994 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 08:55:51 +0000 Subject: [PATCH 09/23] feat: register add-reduction skill in CLAUDE.md, update verify-reduction description Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/CLAUDE.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 659825237..a3b771d5a 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -10,6 +10,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [issue-to-pr](skills/issue-to-pr/SKILL.md) -- Convert a GitHub issue into a PR with an implementation plan. Default rule: one item per PR. Exception: a `[Model]` issue that explicitly claims direct ILP solvability should implement the model and its direct ` -> ILP` rule together; `[Rule]` issues still require both models to exist on `main`. - [add-model](skills/add-model/SKILL.md) -- Add a new problem model. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. - [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. +- [add-reduction](skills/add-reduction/SKILL.md) -- Add a new reduction rule from verified artifacts (Python reduce() + test vectors JSON from `/verify-reduction`). Use instead of `add-rule` when verification artifacts exist. - [review-structural](skills/review-structural/SKILL.md) -- Project-specific structural completeness check: model/rule checklists, build, semantic correctness, issue compliance. Read-only, no code changes. Called by `review-pipeline`. - [review-quality](skills/review-quality/SKILL.md) -- Generic code quality review: DRY, KISS, cohesion/coupling, test quality, HCI. Read-only, no code changes. Called by `review-pipeline`. - [fix-pr](skills/fix-pr/SKILL.md) -- Resolve PR review comments, fix CI failures, and address codecov coverage gaps. Uses `gh api` for codecov (not local `cargo-llvm-cov`). @@ -26,7 +27,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [propose](skills/propose/SKILL.md) -- Interactive brainstorming to help domain experts propose a new model or rule. Asks one question at a time, uses mathematical language (no programming jargon), and files a GitHub issue. - [final-review](skills/final-review/SKILL.md) -- Interactive maintainer review for PRs in "Final review" column. Merges main, walks through agentic review bullets with human, then merge or hold. - [dev-setup](skills/dev-setup/SKILL.md) -- Interactive wizard to install and configure all development tools for new maintainers. -- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), and Lean lemmas (non-trivial required). Iterates until all checks pass. Creates worktree + PR. +- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), adversary Python script (≥5000 independent checks), and test vectors JSON for downstream consumption by `add-reduction`. Iterates until all checks pass. Creates worktree + PR. - [tutorial](skills/tutorial/SKILL.md) -- Interactive tutorial — walk through the pred CLI to explore, reduce, and solve NP-hard problems. No Rust internals. ## Codex Compatibility From 1895610ea091c8fcbdb0144639b23ea2879aa0da Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 10:17:57 +0000 Subject: [PATCH 10/23] fix: three improvements to verify-reduction and add-reduction skills MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1. verify-reduction Step 1: type compatibility gate — checks source/target Value types before proceeding. Stops and comments on issue if types are incompatible (e.g., optimization → decision needs K parameter). 2. add-reduction Step 7: mandatory cleanup of verification artifacts from docs/paper/verify-reductions/ — Python scripts, JSON, Typst, PDF must not get into the library. 3. add-reduction Steps 4/4b/5: mandatory requirements from #974 — canonical example in rule_builders.rs (Check 9), example-db lookup test (Check 10), paper reduction-rule entry (Check 11). Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-reduction/SKILL.md | 72 ++++++++++++++++++++++-- .claude/skills/verify-reduction/SKILL.md | 35 ++++++++++++ 2 files changed, 102 insertions(+), 5 deletions(-) diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index 325180acf..2848e9df6 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -126,15 +126,58 @@ fn test__to__infeasible() { Add additional structural tests as needed (target size, edge count, etc.) guided by the `claims` field in the test vectors JSON. -## Step 4: Add canonical example to example_db +## Step 4: Add canonical example to example_db (Check 9 from #974) -Same as `/add-rule` Step 4. The YES instance from the test vectors JSON is a good candidate for the canonical example. +**This step is MANDATORY — many past PRs skipped it.** Add a builder function in `src/example_db/rule_builders.rs` that constructs a small canonical instance for this reduction. -## Step 5: Document in paper +Use the YES instance from the test vectors JSON as the canonical example. Follow existing patterns in `rule_builders.rs`: -The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template. The proof text, worked examples, and overhead table are already written — adapt them to the paper's macros (`reduction-rule`, `problem-def`, etc.). +```rust +pub fn build__to__example() -> RuleExample { + // Construct the source instance from test vectors YES instance + let source = ::new(/* fields from test_vectors.yes_instance.input */); + let reduction = ReduceTo::::reduce_to(&source); + RuleExample { + source: Box::new(source), + target: Box::new(reduction.target_problem().clone()), + reduction_result: Box::new(reduction), + } +} +``` + +Register the builder in `build_rule_examples()` in the same file. + +## Step 4b: Add example-db lookup test (Check 10 from #974) + +**Also MANDATORY.** Add a test in `src/unit_tests/example_db.rs` that verifies the canonical example can be loaded: + +```rust +#[test] +fn test_example__to_() { + let examples = build_rule_examples(); + let found = examples.iter().any(|e| /* matches source/target names */); + assert!(found, "Missing canonical example for "); +} +``` + +Or add the rule to the existing exhaustive lookup test if one exists. + +## Step 5: Document in paper (Check 11 from #974) + +**Also MANDATORY.** The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template: + +```typst +#reduction-rule("Source", "Target", + example: true, + example-caption: [Description], +)[ + Rule statement from Typst proof... +][ + Proof from Typst proof (Construction + Correctness + Extraction)... +] +``` -Follow `/add-rule` Step 5 for the exact format. The heavy writing is already done; this step is reformatting. +The proof text, worked examples, and overhead table are already written in the verification Typst file — adapt them to the paper's macros. Follow `/add-rule` Step 5 for the exact format. ## Step 6: Regenerate exports and verify @@ -147,6 +190,21 @@ make regenerate-fixtures make test clippy ``` +## Step 7: Clean up verification artifacts + +**MANDATORY.** After the Rust implementation is complete and all tests pass, remove the verification artifacts from `docs/paper/verify-reductions/`. These artifacts served their purpose as a pre-verification gate and must NOT be committed into the library. + +```bash +git rm docs/paper/verify-reductions/verify__.py +git rm docs/paper/verify-reductions/adversary__.py +git rm docs/paper/verify-reductions/cross_compare__.py +git rm docs/paper/verify-reductions/test_vectors__.json +git rm docs/paper/verify-reductions/_.typ +git rm docs/paper/verify-reductions/_.pdf +``` + +The Typst proof content lives on in the paper entry (`docs/paper/reductions.typ`). The Python scripts and test vectors were scaffolding — the Rust tests are the permanent verification. + ## Solver Rules Same as `/add-rule`. If the target problem needs ILP, implement alongside. @@ -173,3 +231,7 @@ All mistakes from `/add-rule` apply, plus: | Overhead expressions don't match test vectors JSON | Copy verbatim from the `overhead` field | | Skipping the infeasible (NO) test case | The NO instance is in the test vectors — always include it | | Not integrating the existing Typst proof into the paper | The proof is already written; reformat, don't rewrite | +| Missing canonical example in `rule_builders.rs` (Check 9 from #974) | MANDATORY — add builder function using the YES test vector | +| Missing example-db lookup test (Check 10 from #974) | MANDATORY — add test in `example_db.rs` | +| Missing paper `reduction-rule` entry (Check 11 from #974) | MANDATORY — integrate Typst proof into `reductions.typ` | +| Leaving verification artifacts in `docs/paper/verify-reductions/` | MANDATORY cleanup — `git rm` all Python scripts, JSON, Typst, PDF from verify-reductions | diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index d2db24139..21310f016 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -109,6 +109,41 @@ pred show --json pred show --json ``` +### Type compatibility gate — MANDATORY + +Before proceeding, check that the source and target Value types are compatible for a witness-preserving reduction (`ReduceTo` / `ReductionResult`): + +```bash +# Check Value types in the model source files +grep "type Value = " src/models/*/.rs +grep "type Value = " src/models/*/.rs +``` + +**Compatible pairs:** +- `Or` → `Or` (feasibility → feasibility) — always OK +- `Min` → `Min` or `Max` → `Max` (same optimization sense) — OK if witness extraction preserves optimality +- `Or` → `Max`/`Min` — OK for decision-to-optimization embeddings + +**Incompatible pairs that require a K parameter:** +- `Min` or `Max` → `Or` (optimization → decision) — the G&J-style reductions need a threshold K that doesn't exist on the source type. These cannot be implemented as `ReduceTo`. + +If the types are incompatible, **STOP immediately** and comment on the issue: + +```bash +gh issue comment "$ISSUE" --body "verify-reduction: **BLOCKED** — type mismatch. + +Source \`\` has \`Value = \` (optimization) but target \`\` has \`Value = \` (decision). The G&J reduction requires a threshold parameter K that doesn't exist on the source type. This reduction cannot be implemented as \`ReduceTo\` in the current type system. + +Options: +1. Add a decision-variant model with a K parameter +2. Use \`ReduceToAggregate\` for value-only reduction (no witness extraction) +3. Choose a different source/target pair with compatible types" +``` + +Do NOT create a worktree, write proofs, or run verification for incompatible reductions. + +### Continue if compatible + Extract: construction algorithm, correctness argument, overhead formulas, worked example, reference. If the issue is incomplete, use WebSearch to find the original reference. From deb05981047cf88b03eb69aae73818b6d93e872b Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 10:31:59 +0000 Subject: [PATCH 11/23] =?UTF-8?q?refactor:=20concise=20verify-reduction=20?= =?UTF-8?q?(761=E2=86=92124=20lines)=20+=20self-contained=20add-reduction?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit verify-reduction: removed verbose templates, condensed checklists into prose, kept all requirements but removed boilerplate code blocks that the agent can derive from context. add-reduction: integrated add-rule Steps 1-6, write-rule-in-paper Steps 1-6, and #974 requirements (Checks 9/10/11) into a single self-contained skill. No need to read 3 other skills. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-reduction/SKILL.md | 263 ++++---- .claude/skills/verify-reduction/SKILL.md | 743 ++--------------------- 2 files changed, 163 insertions(+), 843 deletions(-) diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index 2848e9df6..f47c0fb8e 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -5,13 +5,11 @@ description: Add a new reduction rule using verified artifacts from /verify-redu # Add Reduction (from Verified Artifacts) -Step-by-step guide for adding a new reduction rule (A → B) when `/verify-reduction` has already produced verified artifacts (Typst proof, Python scripts, test vectors JSON). This skill consumes those artifacts directly instead of re-deriving from the issue. +Complete pipeline for adding a reduction rule when `/verify-reduction` has produced verified artifacts. Translates Python `reduce()` to Rust, writes tests from test vectors, adds example-db entry, writes paper entry, then cleans up verification artifacts. -**When to use:** After `/verify-reduction` has produced a PR with verified artifacts for a reduction rule issue. Use `/add-rule` instead when no verification artifacts exist. +**When to use:** After `/verify-reduction` PR exists. Use `/add-rule` when no verification artifacts exist. -## Step 0: Locate Verified Artifacts - -Check for existing verification artifacts: +## Step 0: Locate and Read Verified Artifacts ```bash ls docs/paper/verify-reductions/verify__.py @@ -21,217 +19,176 @@ ls docs/paper/verify-reductions/_.typ If any are missing, run `/verify-reduction` first. -### Read the artifacts - -1. **Python `reduce()` function** — this is the verified spec for the Rust `reduce_to()` implementation. Read it carefully; translate the algorithm, not the syntax. -2. **Test vectors JSON** — contains YES/NO instances with exact input/output values, overhead expressions, and verified claims. -3. **Typst proof** — the Construction section describes the algorithm in mathematical notation. Use for doc comments. - -```bash -# Load test vectors -TEST_VECTORS=$(cat docs/paper/verify-reductions/test_vectors__.json) -``` - -Extract from test vectors JSON: -- `overhead` → use directly in `#[reduction(overhead = { ... })]` -- `yes_instance.input` / `yes_instance.output` → first closed-loop test case -- `no_instance.input` / `no_instance.output` → infeasible test case -- `claims` → verify each is preserved in the Rust implementation +**Read these three artifacts:** +1. **Python `reduce()` function** — the verified spec. Translate the algorithm to Rust, not the syntax. +2. **Test vectors JSON** — YES/NO instances with exact values, overhead expressions, verified claims. +3. **Typst proof** — Construction section for doc comments, proof for paper entry. -## Reference Implementations +## Step 1: Implement the Reduction -Same as `/add-rule`: -- **Reduction rule:** `src/rules/minimumvertexcover_maximumindependentset.rs` -- **Reduction tests:** `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` -- **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` -- **Traits:** `src/rules/traits.rs` (`ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult`) +Create `src/rules/_.rs`. Follow the pattern in `src/rules/satisfiability_naesatisfiability.rs`. -## Step 1: Implement the reduction - -Create `src/rules/_.rs`. - -**Translation guide:** Map the Python `reduce()` function to Rust: +**Translation guide from Python to Rust:** | Python | Rust | |--------|------| | `reduce(n, clauses)` → `(universe_size, subsets)` | `fn reduce_to(&self) -> Self::Result` | | `extract_assignment(n, config)` | `fn extract_solution(&self, target_sol: &[usize]) -> Vec` | -| `literal_to_element(lit, n)` | Private helper method | -| Python list of ints | `Vec`, `Vec`, etc. (match problem type) | +| Helper functions (e.g., `literal_to_element`) | Private functions in the same file | +| Python list of ints | `Vec`, `Vec`, etc. (match the target problem's API) | -**Overhead from test vectors JSON:** The `overhead` field maps directly to the `#[reduction]` macro: +**Required structure:** ```rust +/// Result struct — holds target + any state needed for extraction. +#[derive(Debug, Clone)] +pub struct ReductionXToY { target: TargetType, /* mapping state */ } + +impl ReductionResult for ReductionXToY { + type Source = SourceType; + type Target = TargetType; + fn target_problem(&self) -> &Self::Target { &self.target } + fn extract_solution(&self, target_solution: &[usize]) -> Vec { + // Translate Python extract_assignment() logic + } +} + #[reduction(overhead = { - // Copy expressions verbatim from test_vectors JSON "overhead" field + // Copy verbatim from test_vectors JSON "overhead" field field_name = "expression", })] -``` +impl ReduceTo for SourceType { + type Result = ReductionXToY; + fn reduce_to(&self) -> Self::Result { + // Translate Python reduce() logic + } +} -The rest of the implementation structure follows `/add-rule` Step 1 exactly: ReductionResult struct, trait impl, ReduceTo impl. +#[cfg(test)] +#[path = "../unit_tests/rules/_.rs"] +mod tests; +``` ## Step 2: Register in mod.rs -Same as `/add-rule` Step 2. Add `mod _;` to `src/rules/mod.rs`. +Add to `src/rules/mod.rs`: +```rust +pub(crate) mod _; +``` -## Step 3: Write unit tests from test vectors +## Step 3: Write Unit Tests from Test Vectors Create `src/unit_tests/rules/_.rs`. -**Generate tests directly from test vectors JSON:** - -The YES instance becomes the primary closed-loop test: +**From test vectors JSON, generate at minimum:** -```rust -#[test] -fn test__to__closed_loop() { - // Construct source from test_vectors.yes_instance.input - let source = ::try_new(/* fields from JSON */).unwrap(); - - // Reduce - let reduction = ReduceTo::::reduce_to(&source); - - // Verify target matches test_vectors.yes_instance.output - let target = reduction.target_problem(); - assert_eq!(target.(), /* value from JSON output */); - - // Solve and extract - let solver = BruteForce; - for witness in solver.find_all_witnesses(target).unwrap() { - let extracted = reduction.extract_solution(&witness); - // Verify extracted solution is valid for source - let val = source.evaluate(&extracted); - assert!(val.0); // Or check objective value - } -} -``` +1. **Closed-loop test** (`test__to__closed_loop`) — construct source from `yes_instance.input`, reduce, verify target matches `yes_instance.output`, solve target with `BruteForce`, extract solutions, verify each is valid for source. -The NO instance becomes the infeasible test: +2. **Infeasible test** (`test__to__infeasible`) — construct source from `no_instance.input`, reduce, verify target is also infeasible (no witnesses). -```rust -#[test] -fn test__to__infeasible() { - // Construct source from test_vectors.no_instance.input - let source = ::try_new(/* fields from JSON */).unwrap(); - - // Reduce - let reduction = ReduceTo::::reduce_to(&source); - - // Verify target is also infeasible - let solver = BruteForce; - let witnesses = solver.find_all_witnesses(reduction.target_problem()); - assert!(witnesses.is_none() || witnesses.unwrap().is_empty()); -} -``` +3. **Structural test** — verify target dimensions match overhead formula, check well-formedness. -Add additional structural tests as needed (target size, edge count, etc.) guided by the `claims` field in the test vectors JSON. +Reference: `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` -## Step 4: Add canonical example to example_db (Check 9 from #974) +## Step 4: Add Canonical Example to example_db -**This step is MANDATORY — many past PRs skipped it.** Add a builder function in `src/example_db/rule_builders.rs` that constructs a small canonical instance for this reduction. +**MANDATORY (Check 9 from #974).** Add a builder in `src/example_db/rule_builders.rs` using the YES test vector instance. Register in `build_rule_examples()`. Follow existing patterns in the file. -Use the YES instance from the test vectors JSON as the canonical example. Follow existing patterns in `rule_builders.rs`: +## Step 4b: Add Example-DB Lookup Test -```rust -pub fn build__to__example() -> RuleExample { - // Construct the source instance from test vectors YES instance - let source = ::new(/* fields from test_vectors.yes_instance.input */); - let reduction = ReduceTo::::reduce_to(&source); - RuleExample { - source: Box::new(source), - target: Box::new(reduction.target_problem().clone()), - reduction_result: Box::new(reduction), - } -} -``` +**MANDATORY (Check 10 from #974).** Verify the new example is discoverable in `src/unit_tests/example_db.rs`. Add the rule to the existing exhaustive lookup test, or add a standalone test. -Register the builder in `build_rule_examples()` in the same file. +## Step 5: Write Paper Entry -## Step 4b: Add example-db lookup test (Check 10 from #974) +**MANDATORY (Check 11 from #974).** The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the paper's macros. -**Also MANDATORY.** Add a test in `src/unit_tests/example_db.rs` that verifies the canonical example can be loaded: +### 5a. Load example data -```rust -#[test] -fn test_example__to_() { - let examples = build_rule_examples(); - let found = examples.iter().any(|e| /* matches source/target names */); - assert!(found, "Missing canonical example for "); -} +```typst +#let src_tgt = load-example("Source", "Target") +#let src_tgt_sol = src_tgt.solutions.at(0) ``` -Or add the rule to the existing exhaustive lookup test if one exists. - -## Step 5: Document in paper (Check 11 from #974) - -**Also MANDATORY.** The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template: +### 5b. Write reduction-rule entry ```typst #reduction-rule("Source", "Target", example: true, - example-caption: [Description], + example-caption: [Description ($n = ...$, $|E| = ...$)], )[ - Rule statement from Typst proof... + // Theorem body: complexity + construction summary + overhead hint ][ - Proof from Typst proof (Construction + Correctness + Extraction)... + // Proof body: _Construction._ ... _Correctness._ ... _Solution extraction._ ... + // Adapt from the verified Typst proof — reformat, don't rewrite ] ``` -The proof text, worked examples, and overhead table are already written in the verification Typst file — adapt them to the paper's macros. Follow `/add-rule` Step 5 for the exact format. +### 5c. Write worked example (extra block) + +Step-by-step walkthrough with concrete numbers from JSON data. Must include: +- `pred-commands()` block at top (create/reduce/solve/evaluate) +- Source instance display (graph visualization if applicable) +- Construction walkthrough with intermediate values +- Solution verified end-to-end +- Witness multiplicity note + +**Gold-standard reference:** Search for `reduction-rule("KColoring", "QUBO"` in `reductions.typ`. -## Step 6: Regenerate exports and verify +### 5d. Register display name (if new problem in paper) -Same as `/add-rule` Step 6: +Add to `display-name` dictionary if the problem doesn't have an entry yet. + +### 5e. Build + +```bash +make paper # Must compile without errors or new completeness warnings +``` + +## Step 6: Regenerate Exports and Verify ```bash cargo run --example export_graph cargo run --example export_schemas -make regenerate-fixtures +make regenerate-fixtures # Slow — needs fixtures for paper example data make test clippy ``` -## Step 7: Clean up verification artifacts +## Step 7: Clean Up Verification Artifacts -**MANDATORY.** After the Rust implementation is complete and all tests pass, remove the verification artifacts from `docs/paper/verify-reductions/`. These artifacts served their purpose as a pre-verification gate and must NOT be committed into the library. +**MANDATORY.** Verification artifacts must NOT be committed into the library. Remove them after the Rust implementation is complete: ```bash -git rm docs/paper/verify-reductions/verify__.py -git rm docs/paper/verify-reductions/adversary__.py -git rm docs/paper/verify-reductions/cross_compare__.py -git rm docs/paper/verify-reductions/test_vectors__.json -git rm docs/paper/verify-reductions/_.typ -git rm docs/paper/verify-reductions/_.pdf +git rm -f docs/paper/verify-reductions/*** ``` -The Typst proof content lives on in the paper entry (`docs/paper/reductions.typ`). The Python scripts and test vectors were scaffolding — the Rust tests are the permanent verification. - -## Solver Rules +The Typst proof content lives on in the paper entry. The Python scripts were scaffolding — the Rust tests are the permanent verification. -Same as `/add-rule`. If the target problem needs ILP, implement alongside. +## Step 8: Commit and Create PR -## CLI Impact - -Same as `/add-rule`. No CLI changes needed for witness-preserving reductions. - -## File Naming - -Same as `/add-rule`: -- Rule file: `src/rules/_.rs` -- Test file: `src/unit_tests/rules/_.rs` -- Canonical example: builder function in `src/example_db/rule_builders.rs` +```bash +git add src/rules/_.rs \ + src/unit_tests/rules/_.rs \ + src/rules/mod.rs \ + src/example_db/rule_builders.rs \ + src/unit_tests/example_db.rs \ + docs/paper/reductions.typ +git commit -m "feat: add reduction (#) + +Implemented via /verify-reduction → /add-reduction pipeline. +Verification: N checks (constructor) + M checks (adversary), 0 failures." +git push -u origin "" +gh pr create --title "feat: add reduction (#)" --body "..." +``` ## Common Mistakes -All mistakes from `/add-rule` apply, plus: - | Mistake | Fix | |---------|-----| -| Re-deriving algorithm from issue instead of reading Python `reduce()` | The Python function is the verified spec — translate it, don't reinvent | -| Ignoring test vectors JSON | Use the YES/NO instances for Rust tests directly | -| Overhead expressions don't match test vectors JSON | Copy verbatim from the `overhead` field | -| Skipping the infeasible (NO) test case | The NO instance is in the test vectors — always include it | -| Not integrating the existing Typst proof into the paper | The proof is already written; reformat, don't rewrite | -| Missing canonical example in `rule_builders.rs` (Check 9 from #974) | MANDATORY — add builder function using the YES test vector | -| Missing example-db lookup test (Check 10 from #974) | MANDATORY — add test in `example_db.rs` | -| Missing paper `reduction-rule` entry (Check 11 from #974) | MANDATORY — integrate Typst proof into `reductions.typ` | -| Leaving verification artifacts in `docs/paper/verify-reductions/` | MANDATORY cleanup — `git rm` all Python scripts, JSON, Typst, PDF from verify-reductions | +| Re-deriving algorithm from issue instead of Python `reduce()` | Python function is the verified spec — translate it | +| Overhead expressions don't match test vectors JSON | Copy verbatim from `overhead` field | +| Skipping infeasible (NO) test case | NO instance is in test vectors — always include | +| Missing canonical example in `rule_builders.rs` | MANDATORY — Check 9 from #974 | +| Missing example-db lookup test | MANDATORY — Check 10 from #974 | +| Missing paper `reduction-rule` entry | MANDATORY — Check 11 from #974 | +| Leaving verification artifacts in repo | MANDATORY cleanup — Step 7 | +| Not regenerating fixtures after example-db | `make regenerate-fixtures` required for paper | diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index 21310f016..89b11ee2a 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -5,103 +5,32 @@ description: End-to-end verification of a reduction rule — generates Typst pro # Verify Reduction -End-to-end skill that takes a reduction rule issue, produces a verified mathematical proof with computational verification from two independent implementations, iterating until all checks pass. Creates a worktree, works in isolation, and submits a PR — following `issue-to-pr` conventions. +Pre-verification gate for reduction rules. Produces Typst proof + dual Python verification scripts + test vectors JSON, iterating until all checks pass. Creates worktree + PR. Downstream: `/add-reduction` consumes the artifacts to implement Rust code. -Outputs: Typst proof entry, constructor Python verification script, adversary Python verification script — all at PR #975 quality level. +``` +Issue → /verify-reduction (Typst + Python) → /add-reduction (Rust) → /review-pipeline +``` ## Invocation ``` -/verify-reduction 868 # from a GitHub issue number -/verify-reduction SubsetSum Partition # from source/target names +/verify-reduction 868 +/verify-reduction SubsetSum Partition ``` -## Prerequisites - -- `sympy`, `networkx`, and `hypothesis` installed (`pip install sympy networkx hypothesis`) -- Both source and target models must exist in the codebase (`pred show `) -- Optional: `elan` with Lean 4 toolchain (for formal lemmas — not required) - -## Process - -```dot -digraph verify { - rankdir=TB; - "Parse input" [shape=box]; - "Create worktree" [shape=box]; - "Read issue" [shape=box]; - "Study models" [shape=box]; - "Write Typst proof" [shape=box]; - "Compile PDF" [shape=box]; - "Write constructor script" [shape=box]; - "Run constructor" [shape=box]; - "All pass?" [shape=diamond]; - "Fix proof + script" [shape=box]; - "Enough checks?" [shape=diamond]; - "Enhance script" [shape=box]; - "Dispatch adversary" [shape=box]; - "Run adversary" [shape=box]; - "Adversary pass?" [shape=diamond]; - "Cross-compare" [shape=box]; - "Verdict table" [shape=diamond]; - "Investigate discrepancy" [shape=box]; - "Self-review" [shape=box]; - "Create PR" [shape=box]; - "Report" [shape=doublecircle]; - - "Parse input" -> "Create worktree"; - "Create worktree" -> "Read issue"; - "Read issue" -> "Study models"; - "Study models" -> "Write Typst proof"; - "Write Typst proof" -> "Compile PDF"; - "Compile PDF" -> "Write constructor script"; - "Write constructor script" -> "Run constructor"; - "Run constructor" -> "All pass?"; - "All pass?" -> "Enough checks?" [label="yes"]; - "All pass?" -> "Fix proof + script" [label="no"]; - "Fix proof + script" -> "Run constructor"; - "Enough checks?" -> "Dispatch adversary" [label="yes"]; - "Enough checks?" -> "Enhance script" [label="no"]; - "Enhance script" -> "Run constructor"; - "Dispatch adversary" -> "Run adversary"; - "Run adversary" -> "Adversary pass?"; - "Adversary pass?" -> "Cross-compare" [label="yes or no"]; - "Cross-compare" -> "Verdict table"; - "Verdict table" -> "Self-review" [label="verified"]; - "Verdict table" -> "Investigate discrepancy" [label="discrepancy"]; - "Investigate discrepancy" -> "Write constructor script"; - "Self-review" -> "Create PR"; - "Create PR" -> "Report"; -} -``` - ---- - ## Step 0: Parse Input and Create Worktree -### 0a. Parse input - ```bash REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner) ISSUE= ISSUE_JSON=$(gh issue view "$ISSUE" --json title,body,number) -``` - -### 0b. Create worktree - -```bash REPO_ROOT=$(pwd) -BRANCH_JSON=$(python3 scripts/pipeline_worktree.py prepare-issue-branch \ - --issue "$ISSUE" --slug "verify--" --base main --format json) -BRANCH=$(printf '%s\n' "$BRANCH_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['branch'])") WORKTREE_JSON=$(python3 scripts/pipeline_worktree.py enter --name "verify-$ISSUE" --format json) WORKTREE_DIR=$(printf '%s\n' "$WORKTREE_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['worktree_dir'])") -cd "$WORKTREE_DIR" && git checkout "$BRANCH" +cd "$WORKTREE_DIR" && git checkout -b "issue/$ISSUE-verify--" ``` -If already inside a worktree, skip creation and use the current branch. - -## Step 1: Read Issue and Study Models +## Step 1: Read Issue, Study Models, Type Check ```bash gh issue view "$ISSUE" --json title,body @@ -111,651 +40,85 @@ pred show --json ### Type compatibility gate — MANDATORY -Before proceeding, check that the source and target Value types are compatible for a witness-preserving reduction (`ReduceTo` / `ReductionResult`): +Check source/target `Value` types before any work: ```bash -# Check Value types in the model source files -grep "type Value = " src/models/*/.rs -grep "type Value = " src/models/*/.rs +grep "type Value = " src/models/*/.rs src/models/*/.rs ``` -**Compatible pairs:** -- `Or` → `Or` (feasibility → feasibility) — always OK -- `Min` → `Min` or `Max` → `Max` (same optimization sense) — OK if witness extraction preserves optimality -- `Or` → `Max`/`Min` — OK for decision-to-optimization embeddings - -**Incompatible pairs that require a K parameter:** -- `Min` or `Max` → `Or` (optimization → decision) — the G&J-style reductions need a threshold K that doesn't exist on the source type. These cannot be implemented as `ReduceTo`. - -If the types are incompatible, **STOP immediately** and comment on the issue: +**Compatible:** `Or`→`Or`, `Min`→`Min`, `Max`→`Max`, `Or`→`Min`/`Max`. +**Incompatible:** `Min`/`Max`→`Or` (needs threshold K that the source type lacks). -```bash -gh issue comment "$ISSUE" --body "verify-reduction: **BLOCKED** — type mismatch. - -Source \`\` has \`Value = \` (optimization) but target \`\` has \`Value = \` (decision). The G&J reduction requires a threshold parameter K that doesn't exist on the source type. This reduction cannot be implemented as \`ReduceTo\` in the current type system. - -Options: -1. Add a decision-variant model with a K parameter -2. Use \`ReduceToAggregate\` for value-only reduction (no witness extraction) -3. Choose a different source/target pair with compatible types" -``` +If incompatible, STOP and comment on the issue explaining the type mismatch and options (add decision variant, use `ReduceToAggregate`, or choose different pair). Do NOT proceed. -Do NOT create a worktree, write proofs, or run verification for incompatible reductions. +### If compatible -### Continue if compatible - -Extract: construction algorithm, correctness argument, overhead formulas, worked example, reference. - -If the issue is incomplete, use WebSearch to find the original reference. +Extract: construction algorithm, correctness argument, overhead formulas, worked example, reference. Use WebSearch if the issue is incomplete. ## Step 2: Write Typst Proof -Append to `docs/paper/proposed-reductions.typ` (or create a standalone `proposed-reductions-.typ` if the main file is on another branch). - -### MANDATORY structure - -```typst -== Source $arrow.r$ Target - -#theorem[...] - -#proof[ - _Construction._ ... - _Correctness._ - ($arrow.r.double$) ... - ($arrow.l.double$) ... - _Solution extraction._ ... -] - -*Overhead.* (table) - -*Feasible example.* (YES instance, fully worked) - -*Infeasible example.* (NO instance, fully worked — show WHY no solution exists) -``` - -### HARD requirements - -- **Construction**: numbered steps, every symbol defined before first use -- **Correctness**: genuinely independent ⟹ and ⟸ — NOT "the converse is similar" -- **No hand-waving**: ZERO instances of "clearly", "obviously", "it is easy to see", "straightforward" -- **No scratch work**: ZERO instances of "Wait", "Hmm", "Actually", "Let me try" -- **TWO examples minimum**: one YES instance (satisfiable/feasible) and one NO instance (unsatisfiable/infeasible). Both fully worked with numerical verification. -- **Example must be non-trivial**: the example must have ≥ 3 variables/vertices. A 1-variable or 2-vertex example is too degenerate to catch bugs. - -Compile: -```bash -python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')" -``` - -## Step 3: Write Constructor Python Verification Script - -Create `docs/paper/verify-reductions/verify__.py`. +Create standalone `docs/paper/verify-reductions/_.typ`. -### ALL 7 sections are MANDATORY +**Mandatory structure:** Construction (numbered steps, symbols defined before use) → Correctness (independent ⟹ and ⟸) → Solution extraction → Overhead table → YES example (≥3 variables, fully worked) → NO example (fully worked, explain WHY infeasible). -```python -#!/usr/bin/env python3 -"""§X.Y Source → Target (#NNN): exhaustive + structural verification.""" -import itertools, sys -from sympy import symbols, simplify # Section 1 is NOT optional +**Hard rules:** Zero "clearly"/"obviously"/"straightforward". Zero scratch work. Two examples minimum with ≥3 variables. -passed = failed = 0 +Compile: `python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')"` -def check(condition, msg=""): - global passed, failed - if condition: passed += 1 - else: failed += 1; print(f" FAIL: {msg}") +## Step 3: Write Constructor Python Script -def main(): - # === Section 1: Symbolic checks (sympy) — MANDATORY === - # At minimum: verify overhead formula symbolically for general n. - # For algebraic reductions: verify key identities. - # "The overhead is trivial" is NOT an excuse to skip this section. +Create `docs/paper/verify-reductions/verify__.py` with ALL 7 mandatory sections: - # === Section 2: Exhaustive forward + backward — MANDATORY === - # n ≤ 5 MINIMUM for all reduction types. - # n ≤ 6 for identity/algebraic reductions. - # Test ALL instances (or sample ≥ 300 per (n,m) if exhaustive is infeasible). +1. **Symbolic checks** (sympy) — overhead formulas, key algebraic identities +2. **Exhaustive forward + backward** — n ≤ 5 minimum, all instances or ≥300 sampled +3. **Solution extraction** — extract source solution from every feasible target witness +4. **Overhead formula** — compare actual target size against formula +5. **Structural properties** — well-formedness, no degenerate cases, gadget structure +6. **YES example** — reproduce exact Typst numbers +7. **NO example** — reproduce exact Typst numbers, verify both sides infeasible - # === Section 3: Solution extraction — MANDATORY === - # For EVERY feasible instance: extract source solution from target, - # verify it satisfies the source problem. - # This is the most commonly skipped section. DO NOT SKIP IT. - - # === Section 4: Overhead formula — MANDATORY === - # Build the actual target instance, measure its size fields, - # compare against the overhead formula. - - # === Section 5: Structural properties — MANDATORY === - # Even for "trivial" reductions, verify at least: - # - Target instance is well-formed (valid graph, valid formula, etc.) - # - No degenerate cases (empty subsets, isolated vertices, etc.) - # For gadget reductions: girth, connectivity, widget structure. - - # === Section 6: YES example from Typst — MANDATORY === - # Reproduce the exact numbers from the Typst proof's feasible example. - - # === Section 7: NO example from Typst — MANDATORY === - # Reproduce the exact numbers from the Typst proof's infeasible example. - # Verify that both source and target are infeasible. - - print(f"Source → Target: {passed} passed, {failed} failed") - return 1 if failed else 0 - -if __name__ == "__main__": - sys.exit(main()) -``` - -### Minimum check counts — STRICTLY ENFORCED - -| Type | Minimum checks | Minimum n | Strategy | -|------|---------------|-----------|----------| -| Identity (same graph, different objective) | 10,000 | n ≤ 6 | Exhaustive ALL graphs | -| Algebraic (padding, complement, De Morgan) | 10,000 | n ≤ 5 | Symbolic + exhaustive | -| Gadget (widget, cycle construction) | 5,000 | n ≤ 5 | Construction + formula + structural | -| Composition (A→B→C) | 10,000 | n ≤ 5 | Exhaustive per step | - -Every reduction gets at least 5,000 checks and n ≤ 5 exhaustive testing regardless of perceived simplicity. +**Minimum:** 5,000 checks regardless of reduction type. 10,000 for identity/algebraic. ## Step 4: Run and Iterate -```bash -python3 docs/paper/verify-reductions/verify__.py -``` - -### Iteration 1: First run - -Run the script. Fix any failures. Re-run. - -### Iteration 2: Check count audit — STRICT - -Print this table and fill it in honestly: - -``` -CHECK COUNT AUDIT: - Total checks: ___ (minimum: 5,000) - Forward direction: ___ instances tested (minimum: all n ≤ 5) - Backward direction: ___ instances tested (minimum: all n ≤ 5) - Solution extraction: ___ feasible instances tested - Overhead formula: ___ instances compared - Symbolic (sympy): ___ identities verified - YES example: verified? [yes/no] - NO example: verified? [yes/no] - Structural properties: ___ checks -``` - -If ANY line is below minimum, enhance the script and re-run. Do NOT proceed. - -### Iteration 3: Gap analysis — MANDATORY - -List EVERY claim in the Typst proof. For each, state whether it's tested: +Run the script. Fix failures. Re-run until 0 failures. -``` -CLAIM TESTED BY -"Universe has 2n elements" Section 4: overhead formula ✓ -"Complementarity forces consistency" Section 3: extraction ✓ -"Clause subset is non-monochromatic" Section 2: forward direction ✓ -"No clause is all-true or all-false" Section 2: backward direction ✓ -... -``` - -If any claim has no test, add one. If it's untestable, document WHY. - -### Iteration 4: Export test vectors and validate Typst matching — MANDATORY - -After all checks pass and gap analysis is complete, the constructor script must export a test vectors JSON file for downstream consumption by `/add-reduction`: - -**File:** `docs/paper/verify-reductions/test_vectors__.json` - -```json -{ - "source": "", - "target": "", - "issue": "", - "yes_instance": { - "input": { "...problem-specific fields..." }, - "output": { "...reduced instance fields..." }, - "source_feasible": true, - "target_feasible": true, - "source_solution": ["...config..."], - "extracted_solution": ["...config..."] - }, - "no_instance": { - "input": { "...problem-specific fields..." }, - "output": { "...reduced instance fields..." }, - "source_feasible": false, - "target_feasible": false - }, - "overhead": { - "field_name": "expression using source getters" - }, - "claims": [ - {"tag": "claim_tag", "formula": "formula or description", "verified": true} - ] -} -``` +**Check count audit:** Print totals for each category. If any is below minimum, enhance and re-run. -Add this export at the end of `main()` in the constructor script: - -```python -import json - -test_vectors = { - "source": "", - "target": "", - "issue": , - "yes_instance": { ... }, # from Section 6 - "no_instance": { ... }, # from Section 7 - "overhead": { ... }, # from Section 1/4 - "claims": claims_list, # accumulated claim() calls -} - -with open("docs/paper/verify-reductions/test_vectors__.json", "w") as f: - json.dump(test_vectors, f, indent=2) -print(f"Test vectors exported to test_vectors__.json") -``` - -**Typst ↔ JSON cross-check:** - -After exporting, load both the test vectors JSON and the Typst file. For each key numerical value in the JSON (input sizes, target values, output sizes), check that it appears as a substring in the Typst YES/NO example sections. This is a substring search on the raw Typst text — not a full parser: - -```python -typst_text = open("").read() -for val in [str(v) for v in yes_instance["input"].values() if isinstance(v, (int, list))]: - assert str(val) in typst_text, f"Typst missing YES value: {val}" -for val in [str(v) for v in no_instance["input"].values() if isinstance(v, (int, list))]: - assert str(val) in typst_text, f"Typst missing NO value: {val}" -``` +**Gap analysis:** List every Typst claim and its corresponding test. Add tests for untested claims. -If any value is missing, the Typst proof and Python script are out of sync — fix before proceeding. - -**Structured claims (best-effort replacement for manual gap analysis):** - -Instead of the manual CLAIM/TESTED BY table, accumulate claims programmatically: - -```python -claims_list = [] - -def claim(tag, formula_or_desc, verified=True): - claims_list.append({"tag": tag, "formula": formula_or_desc, "verified": verified}) -``` - -Call `claim()` throughout the constructor script wherever a Typst proof claim is verified. The self-review step (Step 6) checks that all claims have `verified: true` and that the claim count is reasonable (at least 5 claims for any non-trivial reduction). +**Export test vectors** (`test_vectors__.json`) with YES/NO instances, overhead expressions, and structured `claim()` tags. Cross-check that key values from the JSON appear in the Typst file text. ## Step 5: Adversary Verification -The adversary step provides independent verification by a second agent that implements the reduction from scratch, using only the Typst proof as specification. This catches bugs that the constructor's own tests cannot find (confirmation bias, shared implementation errors). - -### 5a. Dispatch adversary agent - -Launch a subagent with the following prompt template. The adversary must not see the constructor's Python script — it reads only the Typst proof. - -**Adversary prompt:** - -```` -You are an adversary verifier for a mathematical reduction proof. - -Your goal: independently implement and test the reduction described in the Typst -proof below, trying to find bugs. You have no access to the constructor's -implementation. Write your own from scratch based solely on the mathematical -specification. - -## Input - -Typst proof file: docs/paper/proposed-reductions.typ (section on ) -Issue: # - -## Reduction type - -Detect the reduction type from the Typst proof and tailor your testing focus: - -- **Identity reduction** (same graph/structure, different objective — keywords: "complement", "same graph", "negation"): Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration. Edge-case configs (all-zero, all-one, alternating) are highest priority. - -- **Algebraic reduction** (padding, case split, formula transformation — keywords: "padding", "case", "if Σ", "d ="): Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently. Include at least one hypothesis strategy targeting boundary values. - -- **Gadget reduction** (widget/component construction — keywords: "widget", "component", "gadget", "cover-testing"): Focus on widget structure invariants. Verify each traversal/usage pattern independently. Test that interior vertices/elements have no external connections. Check structural properties (connectivity, edge counts, degree sequences) across all small instances. - -## Your task - -1. Read the Typst proof carefully. Extract: - - The construction algorithm (how to build the target instance) - - The correctness argument (forward and backward directions) - - The solution extraction procedure - - The overhead formula - - The YES and NO examples - -2. Create `docs/paper/verify-reductions/adversary__.py` with: - - Your own `reduce()` function implementing the construction from scratch - - Your own `extract_solution()` function implementing solution extraction - - Your own `is_feasible_source()` and `is_feasible_target()` validators - - Exhaustive testing for n ≤ 5 (forward + backward + extraction) - - Overhead formula verification - - Reproduction of both Typst examples (YES and NO) - - Property-based testing using `hypothesis` (at least 2 strategies) - - Minimum 5,000 total checks - -3. Use `hypothesis` for property-based testing. Example strategies: - ```python - from hypothesis import given, settings - from hypothesis import strategies as st - - @given(st.lists(st.integers(0, 1), min_size=3, max_size=8)) - @settings(max_examples=500) - def test_roundtrip(assignment): - # Build source from assignment, reduce, check equivalence - ... - ``` - -4. Run the script. Report pass/fail counts and any bugs found. - -5. Do NOT read or import from `verify__.py`. Your implementation - must be independent. - -## Output format - -Print at the end: -``` -ADVERSARY: : N passed, M failed -BUGS FOUND: -``` -```` - -### 5b. Run adversary script - -```bash -python3 docs/paper/verify-reductions/adversary__.py -``` - -### 5c. Cross-comparison - -After both scripts have run, compare their `reduce()` outputs on a shared set of instances. Create and run this comparison inline: - -```python -#!/usr/bin/env python3 -"""Cross-compare constructor and adversary implementations.""" -import sys -sys.path.insert(0, "docs/paper/verify-reductions") - -from verify__ import reduce as constructor_reduce -from adversary__ import reduce as adversary_reduce - -# Also import feasibility checkers from both sides -from verify__ import is_feasible_source, is_feasible_target -from adversary__ import ( - is_feasible_source as adv_is_feasible_source, - is_feasible_target as adv_is_feasible_target, -) - -def normalize(target_instance): - """Convert target instance to a canonical hashable form for comparison. - Adapt this to the specific target problem type.""" - # For graph problems: sorted edge list + vertex count - # For formula problems: sorted clause list - # For set problems: frozenset of frozensets - return tuple(sorted(str(x) for x in target_instance)) - -agree = disagree = 0 -feasibility_mismatch = 0 - -# Generate shared test instances (adapt to source problem type) -import itertools -for n in range(3, 6): - for instance in generate_all_instances(n): # problem-specific generator - c_target = constructor_reduce(instance) - a_target = adversary_reduce(instance) - - # Compare structural equivalence - if normalize(c_target) == normalize(a_target): - agree += 1 - else: - disagree += 1 - print(f" DISAGREE on instance {instance}") - print(f" Constructor: {c_target}") - print(f" Adversary: {a_target}") - - # Compare feasibility verdicts - c_feas = is_feasible_target(c_target) - a_feas = adv_is_feasible_target(a_target) - if c_feas != a_feas: - feasibility_mismatch += 1 - print(f" FEASIBILITY MISMATCH on {instance}: " - f"constructor={c_feas}, adversary={a_feas}") - -print(f"\nCross-comparison: {agree} agree, {disagree} disagree, " - f"{feasibility_mismatch} feasibility mismatches") -if disagree > 0 or feasibility_mismatch > 0: - print("ACTION REQUIRED: investigate discrepancies before proceeding") - sys.exit(1) -``` - -Adapt the `normalize()` function and instance generator to the specific source/target problem types. - -### 5d. Verdict criteria - -Use this table to determine the outcome. All three signals must be considered together: - -| Constructor | Adversary | Cross-comparison | Verdict | Action | -|-------------|-----------|-----------------|---------|--------| -| Pass | Pass | Agree | Verified | Proceed to Step 6 | -| Pass | Pass | Disagree | Suspect | Both produce valid but structurally different targets. Investigate whether both are correct reductions (e.g., isomorphic gadgets) or one has a latent bug. Resolve before proceeding. | -| Pass | Fail | Agree | Adversary bug | Review adversary script for implementation errors. If adversary misread the Typst spec, fix spec clarity and re-run adversary. | -| Pass | Fail | Disagree | Suspect | Constructor may have a bug masked by its own tests. Investigate the disagreeing instances manually. | -| Fail | Pass | Agree | Constructor bug | Fix constructor script, re-run from Step 4. | -| Fail | Pass | Disagree | Suspect | Investigate both — the agreeing instances may be coincidental. | -| Fail | Fail | Agree | Proof bug | The reduction itself is likely wrong. Re-examine the Typst proof. Return to Step 2. | -| Fail | Fail | Disagree | Proof bug | Same as above — shared failures with structural disagreement indicate a fundamental problem. | - -When the verdict is "Suspect," the resolution is to manually inspect the disagreeing instances until the root cause is identified, then loop back to the appropriate step. - -## Step 6: Self-Review - -Before declaring verified, run through this checklist. Every item must be YES. If any is NO, go back and fix it. - -### Typst proof - -- [ ] Compiles without errors -- [ ] Has Construction with numbered steps -- [ ] Has Correctness with independent ⟹ and ⟸ paragraphs -- [ ] Has Solution extraction -- [ ] Has Overhead table with formula -- [ ] Has YES example (feasible, ≥ 3 variables/vertices, fully worked) -- [ ] Has NO example (infeasible, fully worked with explanation of WHY infeasible) -- [ ] Zero instances of "clearly", "obviously", "it is easy to see" -- [ ] Zero instances of "Wait", "Hmm", "Actually", scratch work -- [ ] Every symbol defined before first use - -### Constructor Python script - -- [ ] 0 failures -- [ ] ≥ 5,000 total checks -- [ ] Section 1 (symbolic) present and non-empty -- [ ] Section 2 (exhaustive) covers n ≤ 5 minimum -- [ ] Section 3 (extraction) tests EVERY feasible instance -- [ ] Section 4 (overhead) compares formula vs actual for all tested instances -- [ ] Section 5 (structural) has at least one non-trivial check -- [ ] Section 6 (YES example) reproduces Typst example numbers exactly -- [ ] Section 7 (NO example) reproduces Typst infeasible example exactly -- [ ] Gap analysis performed — every Typst claim has a corresponding test - -### Adversary Python script - -- [ ] 0 failures -- [ ] ≥ 5,000 total checks -- [ ] Implemented independently (no imports from constructor script) -- [ ] Uses `hypothesis` property-based testing (at least 2 strategies) -- [ ] Exhaustive testing for n ≤ 5 -- [ ] Reproduces both Typst examples - -### Cross-consistency - -- [ ] Cross-comparison script ran with 0 disagreements -- [ ] Cross-comparison script ran with 0 feasibility mismatches -- [ ] The constructor script's `reduce()` implements the Typst construction -- [ ] The adversary script's `reduce()` implements the Typst construction independently -- [ ] The overhead formula in both scripts matches the Typst overhead table -- [ ] The examples in both scripts match the Typst examples (same numbers, same instances) - -### Test vectors and auto-matching +Dispatch a subagent that reads ONLY the Typst proof (not the constructor script) and independently implements + tests the reduction. Requirements: own `reduce()`, own `extract_solution()`, `hypothesis` PBT (≥2 strategies), ≥5,000 checks, reproduce both Typst examples. -- [ ] `test_vectors__.json` exported successfully -- [ ] YES instance in JSON matches Typst feasible example (values present) -- [ ] NO instance in JSON matches Typst infeasible example (values present) -- [ ] All claims have `verified: true` -- [ ] At least 5 claims for non-trivial reductions +**Typed adversary prompt:** Include reduction-type focus instructions: +- **Identity:** exhaustive enumeration n ≤ 6, edge-case configs +- **Algebraic:** case boundary conditions (e.g., Σ = 2T ± 1), per-case extraction +- **Gadget:** widget structure invariants, traversal patterns, interior isolation -### Lean (optional) +After adversary runs, **cross-compare** constructor and adversary `reduce()` outputs on shared instances. Use the verdict table: -If Lean lemmas were added: +| Constructor | Adversary | Cross-compare | Verdict | +|-------------|-----------|---------------|---------| +| Pass | Pass | Agree | **Verified** → proceed | +| Pass | Pass | Disagree | **Suspect** → investigate structural differences | +| Any fail | — | — | Fix and re-run from the failing step | +| Both fail | — | — | **Proof bug** → return to Step 2 | -- [ ] Builds without errors (warnings OK) -- [ ] At least one non-trivial lemma (not just `rfl` or `omega` on a tautology) -- [ ] Every `sorry` has a comment explaining WHY +## Step 6: Self-Review Checklist -## Step 7: Report +Every item must be YES: Typst compiles with all sections, zero hand-waving, constructor ≥5K checks with 0 failures across all 7 sections, adversary ≥5K checks with 0 failures + hypothesis, cross-comparison 0 disagreements, test vectors JSON exported with Typst auto-matching verified. -``` -=== Verification Report: Source → Target (#NNN) === - -Typst proof: §X.Y - - Construction: ✓ (N steps) - - Correctness: ✓ (⟹ + ⟸) - - Extraction: ✓ - - Overhead: ✓ - - YES example: ✓ (N vars/vertices) - - NO example: ✓ (N vars/vertices, reason: ...) - -Constructor: verify__.py - - Checks: N passed, 0 failed - - Sections: 1(sympy) 2(exhaustive) 3(extraction) 4(overhead) 5(structural) 6(YES) 7(NO) - - Forward: exhaustive n ≤ K - - Backward: exhaustive n ≤ K - - Gap analysis: all claims covered - -Adversary: adversary__.py - - Checks: N passed, 0 failed - - Property-based: M hypothesis examples - - Forward: exhaustive n ≤ K - - Backward: exhaustive n ≤ K - - Bugs found: - -Cross-comparison: - - Instances compared: N - - Structural agreement: N/N - - Feasibility agreement: N/N - -Lean: .lean (optional) - - Non-trivial lemmas: N - - Trivial lemmas: M - - Sorry: J (with justification) - -Bugs found: -Iterations: N rounds - -Verdict: VERIFIED / OPEN (with reason) -``` - -## Step 8: Commit, Create PR, Clean Up - -### 8a. Commit +## Step 7: Commit, Create PR, Clean Up ```bash -git add docs/paper/.typ \ - docs/paper/verify-reductions/verify_*.py \ - docs/paper/verify-reductions/adversary_*.py \ - docs/paper/verify-reductions/test_vectors_*.json -git add -f docs/paper/.pdf -git commit -m "docs: /verify-reduction # VERIFIED - -Typst: Construction + Correctness + Extraction + Overhead + YES/NO examples -Constructor: N checks, 0 failures (exhaustive n ≤ K, 7 sections) -Adversary: M checks, 0 failures (independent impl + hypothesis) -Cross-comparison: all instances agree - -Co-Authored-By: Claude Opus 4.6 (1M context) " -``` - -### 8b. Push and create PR - -```bash -git push -u origin "$BRANCH" +git add docs/paper/verify-reductions/*** +git add -f docs/paper/verify-reductions/_.pdf +git commit -m "docs: /verify-reduction # VERIFIED" +git push -u origin "" gh pr create --title "docs: verify reduction #" --body "..." -``` - -### 8c. Clean up worktree - -```bash -cd "$REPO_ROOT" -python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" -``` - -### 8d. Comment on issue - -```bash gh issue comment "$ISSUE" --body "verify-reduction report: VERIFIED (PR #)..." +cd "$REPO_ROOT" && python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" ``` - -## Quality Gates - -A reduction is **VERIFIED** when ALL of these hold: - -- [ ] Typst compiles, has all mandatory sections including YES and NO examples -- [ ] Zero hand-waving language -- [ ] Constructor Python has 0 failures and ≥ 5,000 checks -- [ ] All 7 constructor Python sections present and non-empty -- [ ] Adversary Python has 0 failures and ≥ 5,000 checks -- [ ] Adversary Python uses `hypothesis` property-based testing -- [ ] Adversary implementation is independent (no shared code with constructor) -- [ ] Exhaustive n ≤ 5 minimum (both scripts) -- [ ] Solution extraction verified for all feasible instances -- [ ] Overhead formula matches actual construction -- [ ] Both Typst examples reproduced by both scripts -- [ ] Gap analysis shows all Typst claims tested -- [ ] Cross-comparison shows 0 disagreements and 0 feasibility mismatches -- [ ] Cross-consistency between Typst, constructor, and adversary verified -- [ ] Test vectors JSON exported with YES/NO instances, overhead, and claims -- [ ] Typst ↔ JSON auto-matching passed (key values present in Typst text) - -If any gate fails, go back and fix it before declaring the reduction verified. - -## Common Mistakes - -| Mistake | Consequence | -|---------|-------------| -| Adversary imports from constructor script | Rejected — must be independent | -| No property-based testing in adversary | Rejected — add hypothesis strategies | -| No symbolic checks (Section 1 empty) | Rejected — add sympy verification | -| Only YES example, no NO example | Rejected — add infeasible instance | -| n ≤ 3 or n ≤ 4 "because it's simple" | Rejected — minimum n ≤ 5 | -| "Passed on first run" without gap analysis | Rejected — perform gap analysis | -| Example has < 3 variables | Rejected — too degenerate | -| Either script has < 5,000 checks | Rejected — enhance exhaustive testing | -| Extraction not tested | Rejected — add Section 3 | -| Typst proof says "clearly" | Rejected — rewrite without hand-waving | -| Cross-comparison skipped | Rejected — run comparison script | -| Disagreements dismissed without investigation | Rejected — resolve every discrepancy | - -## Integration - -### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline - -`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` function is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. - -- **Before `/add-reduction`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON -- **During `/add-reduction`**: reads Python `reduce()` as pseudocode, overhead from test vectors JSON, generates Rust tests from test vectors -- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances; compositional check via alternative paths if available - -### Standalone usage - -- **After `write-rule-in-paper`**: invoke to verify paper entry -- **During `review-structural`**: check verification script exists and passes -- **Before `issue-to-pr --execute`**: pre-validate the algorithm - -## Reference: PR #975 Quality Level - -Target quality defined by PR #975: -- 800,000+ total checks, 0 unexpected failures -- 3 bugs caught through iteration loop -- Every script has forward + backward + extraction + overhead + example -- Two independent implementations agreeing on all test instances -- Failures marked OPEN honestly with diagnosis From 26d1ca2ea5b958eb74f55201b805821593f73f89 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 10:55:13 +0000 Subject: [PATCH 12/23] =?UTF-8?q?fix:=20restore=20structural=20requirement?= =?UTF-8?q?s=20in=20verify-reduction=20(124=E2=86=92274=20lines)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous rewrite over-condensed the skill, removing gates that agents need to follow: 7-section descriptions with table, minimum check count table, check count audit template, gap analysis format, common mistakes table, and self-review checklist with checkboxes. Restored: all structural gates and requirements. Kept concise: no verbose Python/Typst code templates (agent derives these). Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/verify-reduction/SKILL.md | 204 ++++++++++++++++++++--- 1 file changed, 177 insertions(+), 27 deletions(-) diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index 89b11ee2a..9bb30a03f 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -59,9 +59,28 @@ Extract: construction algorithm, correctness argument, overhead formulas, worked Create standalone `docs/paper/verify-reductions/_.typ`. -**Mandatory structure:** Construction (numbered steps, symbols defined before use) → Correctness (independent ⟹ and ⟸) → Solution extraction → Overhead table → YES example (≥3 variables, fully worked) → NO example (fully worked, explain WHY infeasible). +**Mandatory structure:** + +```typst +== Source $arrow.r$ Target +#theorem[...] +#proof[ + _Construction._ (numbered steps, every symbol defined before first use) + _Correctness._ + ($arrow.r.double$) ... (genuinely independent, NOT "the converse is similar") + ($arrow.l.double$) ... + _Solution extraction._ ... +] +*Overhead.* (table with target metric → formula) +*Feasible example.* (YES instance, ≥3 variables, fully worked with numbers) +*Infeasible example.* (NO instance, fully worked — show WHY no solution exists) +``` -**Hard rules:** Zero "clearly"/"obviously"/"straightforward". Zero scratch work. Two examples minimum with ≥3 variables. +**Hard rules:** +- Zero instances of "clearly", "obviously", "it is easy to see", "straightforward" +- Zero scratch work ("Wait", "Hmm", "Actually", "Let me try") +- Two examples minimum, both with ≥3 variables/vertices +- Every symbol defined before first use Compile: `python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')"` @@ -69,47 +88,166 @@ Compile: `python3 -c "import typst; typst.compile('.typ', output='.p Create `docs/paper/verify-reductions/verify__.py` with ALL 7 mandatory sections: -1. **Symbolic checks** (sympy) — overhead formulas, key algebraic identities -2. **Exhaustive forward + backward** — n ≤ 5 minimum, all instances or ≥300 sampled -3. **Solution extraction** — extract source solution from every feasible target witness -4. **Overhead formula** — compare actual target size against formula -5. **Structural properties** — well-formedness, no degenerate cases, gadget structure -6. **YES example** — reproduce exact Typst numbers -7. **NO example** — reproduce exact Typst numbers, verify both sides infeasible +| Section | What to verify | Notes | +|---------|---------------|-------| +| 1. Symbolic (sympy) | Overhead formulas symbolically for general n | "The overhead is trivial" is NOT an excuse to skip | +| 2. Exhaustive forward+backward | Source feasible ⟺ target feasible | n ≤ 5 minimum. ALL instances or ≥300 sampled per (n,m) | +| 3. Solution extraction | Extract source solution from every feasible target witness | Most commonly skipped section. DO NOT SKIP | +| 4. Overhead formula | Build target, measure actual size, compare against formula | Catches off-by-one in construction | +| 5. Structural properties | Target well-formed, no degenerate cases | Gadget reductions: girth, connectivity, widget structure | +| 6. YES example | Reproduce exact Typst feasible example numbers | Every value must match | +| 7. NO example | Reproduce exact Typst infeasible example, verify both sides infeasible | Must verify WHY infeasible | + +### Minimum check counts — STRICTLY ENFORCED -**Minimum:** 5,000 checks regardless of reduction type. 10,000 for identity/algebraic. +| Type | Minimum checks | Minimum n | +|------|---------------|-----------| +| Identity (same graph, different objective) | 10,000 | n ≤ 6 | +| Algebraic (padding, complement, case split) | 10,000 | n ≤ 5 | +| Gadget (widget, cycle construction) | 5,000 | n ≤ 5 | + +Every reduction gets at least 5,000 checks regardless of perceived simplicity. ## Step 4: Run and Iterate -Run the script. Fix failures. Re-run until 0 failures. +```bash +python3 docs/paper/verify-reductions/verify__.py +``` + +### Iteration 1: Fix failures -**Check count audit:** Print totals for each category. If any is below minimum, enhance and re-run. +Run the script. Fix any failures. Re-run until 0 failures. -**Gap analysis:** List every Typst claim and its corresponding test. Add tests for untested claims. +### Iteration 2: Check count audit -**Export test vectors** (`test_vectors__.json`) with YES/NO instances, overhead expressions, and structured `claim()` tags. Cross-check that key values from the JSON appear in the Typst file text. +Print and fill this table honestly: + +``` +CHECK COUNT AUDIT: + Total checks: ___ (minimum: 5,000) + Forward direction: ___ instances (minimum: all n ≤ 5) + Backward direction: ___ instances (minimum: all n ≤ 5) + Solution extraction: ___ feasible instances tested + Overhead formula: ___ instances compared + Symbolic (sympy): ___ identities verified + YES example: verified? [yes/no] + NO example: verified? [yes/no] + Structural properties: ___ checks +``` + +If ANY line is below minimum, enhance the script and re-run. Do NOT proceed. + +### Iteration 3: Gap analysis + +List EVERY claim in the Typst proof and whether it's tested: + +``` +CLAIM TESTED BY +"Universe has 2n elements" Section 4: overhead ✓ +"Complementarity forces consistency" Section 3: extraction ✓ +"Forward: NAE-sat → valid splitting" Section 2: exhaustive ✓ +... +``` + +If any claim has no test, add one. If untestable, document WHY. + +### Iteration 4: Export test vectors and validate Typst matching + +Export `docs/paper/verify-reductions/test_vectors__.json` with: + +```json +{ + "source": "", "target": "", "issue": , + "yes_instance": { "input": {...}, "output": {...}, "source_feasible": true, "target_feasible": true, "source_solution": [...], "extracted_solution": [...] }, + "no_instance": { "input": {...}, "output": {...}, "source_feasible": false, "target_feasible": false }, + "overhead": { "field": "expression" }, + "claims": [ {"tag": "...", "formula": "...", "verified": true} ] +} +``` + +**Typst ↔ JSON cross-check:** Verify key numerical values from the JSON appear in the Typst example sections (substring search). If any value is missing, the proof and script are out of sync — fix before proceeding. ## Step 5: Adversary Verification -Dispatch a subagent that reads ONLY the Typst proof (not the constructor script) and independently implements + tests the reduction. Requirements: own `reduce()`, own `extract_solution()`, `hypothesis` PBT (≥2 strategies), ≥5,000 checks, reproduce both Typst examples. +Dispatch a subagent that reads ONLY the Typst proof (not the constructor script) and independently implements + tests the reduction. + +**Adversary requirements:** +- Own `reduce()` function from scratch +- Own `extract_solution()` function +- Own `is_feasible_source()` and `is_feasible_target()` validators +- Exhaustive forward + backward for n ≤ 5 +- `hypothesis` property-based testing (≥2 strategies) +- Reproduce both Typst examples (YES and NO) +- ≥5,000 total checks +- Must NOT import from the constructor script -**Typed adversary prompt:** Include reduction-type focus instructions: -- **Identity:** exhaustive enumeration n ≤ 6, edge-case configs -- **Algebraic:** case boundary conditions (e.g., Σ = 2T ± 1), per-case extraction -- **Gadget:** widget structure invariants, traversal patterns, interior isolation +**Typed adversary focus** (include in prompt): +- **Identity reductions:** exhaustive enumeration n ≤ 6, edge-case configs (all-zero, all-one, alternating) +- **Algebraic reductions:** case boundary conditions (e.g., Σ = 2T exactly, Σ = 2T ± 1), per-case extraction +- **Gadget reductions:** widget structure invariants, traversal patterns, interior vertex isolation -After adversary runs, **cross-compare** constructor and adversary `reduce()` outputs on shared instances. Use the verdict table: +### Cross-comparison -| Constructor | Adversary | Cross-compare | Verdict | -|-------------|-----------|---------------|---------| -| Pass | Pass | Agree | **Verified** → proceed | -| Pass | Pass | Disagree | **Suspect** → investigate structural differences | -| Any fail | — | — | Fix and re-run from the failing step | -| Both fail | — | — | **Proof bug** → return to Step 2 | +After both scripts pass, compare `reduce()` outputs on shared instances. Both must produce structurally identical targets and agree on feasibility for all tested instances. + +### Verdict table + +| Constructor | Adversary | Cross-compare | Verdict | Action | +|-------------|-----------|---------------|---------|--------| +| Pass | Pass | Agree | **Verified** | Proceed to Step 6 | +| Pass | Pass | Disagree | **Suspect** | Investigate — may be isomorphic or latent bug | +| Pass | Fail | — | **Adversary bug** | Fix adversary or clarify Typst spec | +| Fail | Pass | — | **Constructor bug** | Fix constructor, re-run from Step 4 | +| Fail | Fail | — | **Proof bug** | Re-examine Typst proof, return to Step 2 | ## Step 6: Self-Review Checklist -Every item must be YES: Typst compiles with all sections, zero hand-waving, constructor ≥5K checks with 0 failures across all 7 sections, adversary ≥5K checks with 0 failures + hypothesis, cross-comparison 0 disagreements, test vectors JSON exported with Typst auto-matching verified. +Every item must be YES. If any is NO, go back and fix. + +### Typst proof +- [ ] Compiles without errors +- [ ] Construction with numbered steps, symbols defined before use +- [ ] Correctness with independent ⟹ and ⟸ paragraphs +- [ ] Solution extraction section present +- [ ] Overhead table with formulas +- [ ] YES example (≥3 variables, fully worked) +- [ ] NO example (fully worked, explains WHY infeasible) +- [ ] Zero hand-waving language +- [ ] Zero scratch work + +### Constructor Python +- [ ] 0 failures, ≥5,000 total checks +- [ ] All 7 sections present and non-empty +- [ ] Exhaustive n ≤ 5 +- [ ] Extraction tested for every feasible instance +- [ ] Gap analysis: every Typst claim has a test + +### Adversary Python +- [ ] 0 failures, ≥5,000 total checks +- [ ] Independent implementation (no imports from constructor) +- [ ] `hypothesis` PBT with ≥2 strategies +- [ ] Reproduces both Typst examples + +### Cross-consistency +- [ ] Cross-comparison: 0 disagreements, 0 feasibility mismatches +- [ ] Test vectors JSON exported with Typst auto-matching verified +- [ ] All claims have `verified: true` + +## Common Mistakes + +| Mistake | Consequence | +|---------|-------------| +| Adversary imports from constructor script | Rejected — must be independent | +| No `hypothesis` PBT in adversary | Rejected | +| Section 1 (symbolic) empty | Rejected — "overhead is trivial" is not an excuse | +| Only YES example, no NO example | Rejected | +| n ≤ 3 or n ≤ 4 "because it's simple" | Rejected — minimum n ≤ 5 | +| No gap analysis | Rejected — perform before proceeding | +| Example has < 3 variables | Rejected — too degenerate | +| Either script has < 5,000 checks | Rejected — enhance testing | +| Extraction (Section 3) not tested | Rejected — most commonly skipped | +| Cross-comparison skipped | Rejected | +| Disagreements dismissed without investigation | Rejected | ## Step 7: Commit, Create PR, Clean Up @@ -122,3 +260,15 @@ gh pr create --title "docs: verify reduction #" gh issue comment "$ISSUE" --body "verify-reduction report: VERIFIED (PR #)..." cd "$REPO_ROOT" && python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" ``` + +## Integration + +### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline + +`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. + +### Standalone usage + +- After `write-rule-in-paper`: invoke to verify paper entry +- During `review-structural`: check verification script exists and passes +- Before `issue-to-pr --execute`: pre-validate the algorithm From 1e2b375201468cfbd42d59fb664f01071b155fe4 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 12:40:07 +0000 Subject: [PATCH 13/23] fix: harden add-reduction with file-level verification gates Steps 4, 4b, 5 now have HARD GATE labels with verification commands that check the SPECIFIC required files appear in `git diff --name-only`. Step 8 has a pre-commit gate that lists all 6 required files and blocks commit if any is missing. Root cause: subagents skipped Steps 4 (put example in rule file instead of rule_builders.rs) and 5 (skipped paper entry entirely) because the skill said "MANDATORY" but had no mechanical enforcement. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-reduction/SKILL.md | 63 +++++++++++++++++++++++---- 1 file changed, 54 insertions(+), 9 deletions(-) diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index f47c0fb8e..6986157b3 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -92,15 +92,29 @@ Reference: `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` ## Step 4: Add Canonical Example to example_db -**MANDATORY (Check 9 from #974).** Add a builder in `src/example_db/rule_builders.rs` using the YES test vector instance. Register in `build_rule_examples()`. Follow existing patterns in the file. +**HARD GATE (Check 9 from #974).** You MUST modify `src/example_db/rule_builders.rs` — not the rule file, not anywhere else. Read the file first, follow existing patterns, add a builder function using the YES test vector instance, register in `build_rule_examples()`. + +**Verification — run this and confirm the file was modified:** +```bash +git diff --name-only | grep "rule_builders.rs" +# MUST show: src/example_db/rule_builders.rs +# If it does NOT appear, you skipped this step. Go back and do it. +``` ## Step 4b: Add Example-DB Lookup Test -**MANDATORY (Check 10 from #974).** Verify the new example is discoverable in `src/unit_tests/example_db.rs`. Add the rule to the existing exhaustive lookup test, or add a standalone test. +**HARD GATE (Check 10 from #974).** You MUST modify `src/unit_tests/example_db.rs`. Read the file first, add the new rule to the existing exhaustive lookup test or add a standalone test. + +**Verification:** +```bash +git diff --name-only | grep "example_db.rs" +# MUST show: src/unit_tests/example_db.rs +# If it does NOT appear, you skipped this step. Go back and do it. +``` ## Step 5: Write Paper Entry -**MANDATORY (Check 11 from #974).** The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the paper's macros. +**HARD GATE (Check 11 from #974).** You MUST modify `docs/paper/reductions.typ`. The Typst proof already exists from `/verify-reduction` — reformat it into the paper's macros. Do NOT skip this step. ### 5a. Load example data @@ -138,12 +152,19 @@ Step-by-step walkthrough with concrete numbers from JSON data. Must include: Add to `display-name` dictionary if the problem doesn't have an entry yet. -### 5e. Build +### 5e. Build and verify ```bash make paper # Must compile without errors or new completeness warnings ``` +**Verification — confirm the paper file was modified:** +```bash +git diff --name-only | grep "reductions.typ" +# MUST show: docs/paper/reductions.typ +# If it does NOT appear, you skipped Step 5. Go back and do it. +``` + ## Step 6: Regenerate Exports and Verify ```bash @@ -163,7 +184,34 @@ git rm -f docs/paper/verify-reductions/*** The Typst proof content lives on in the paper entry. The Python scripts were scaffolding — the Rust tests are the permanent verification. -## Step 8: Commit and Create PR +## Step 8: Pre-Commit Gate and Create PR + +**Before committing, run this checklist. ALL must pass:** + +```bash +# Gate 1: Required files modified +echo "=== Pre-commit file gate ===" +for f in \ + "src/rules/_.rs" \ + "src/unit_tests/rules/_.rs" \ + "src/rules/mod.rs" \ + "src/example_db/rule_builders.rs" \ + "src/unit_tests/example_db.rs" \ + "docs/paper/reductions.typ"; do + git diff --name-only HEAD | grep -q "$(basename $f)" && echo " ✓ $f" || echo " ✗ MISSING: $f" +done + +# Gate 2: No verification artifacts remaining +ls docs/paper/verify-reductions/*** 2>/dev/null && echo " ✗ ARTIFACTS NOT CLEANED" || echo " ✓ Artifacts cleaned" + +# Gate 3: Tests pass +cargo test 2>&1 | tail -3 + +# Gate 4: Paper compiles +make paper 2>&1 | tail -3 +``` + +**If ANY file shows ✗ MISSING, STOP and go back to the skipped step.** Do NOT commit with missing files. ```bash git add src/rules/_.rs \ @@ -172,10 +220,7 @@ git add src/rules/_.rs \ src/example_db/rule_builders.rs \ src/unit_tests/example_db.rs \ docs/paper/reductions.typ -git commit -m "feat: add reduction (#) - -Implemented via /verify-reduction → /add-reduction pipeline. -Verification: N checks (constructor) + M checks (adversary), 0 failures." +git commit -m "feat: add reduction (#)" git push -u origin "" gh pr create --title "feat: add reduction (#)" --body "..." ``` From f5fa88771e4b4e5d3656178d58187b03531fc2b7 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 14:02:25 +0000 Subject: [PATCH 14/23] fix: add CI-equivalent checks to add-reduction pre-commit gate Root cause: PRs #985 and #991 failed CI because: 1. Local clippy doesn't use -D warnings but CI does (caught needless_range_loop) 2. New reductions can create paths that dominate existing direct reductions (test_find_dominated_rules_returns_known_set has hardcoded known set) Added to Step 6: - Mandatory `cargo clippy -- -D warnings` (not just `cargo clippy`) - Mandatory `cargo test` (full suite, not filtered) - Explicit dominated-rules gate with fix instructions Added to Common Mistakes: - clippy without -D warnings - dominated rules test - skipping full cargo test Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-reduction/SKILL.md | 38 ++++++++++++++++++++++++--- 1 file changed, 34 insertions(+), 4 deletions(-) diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index 6986157b3..a644a0650 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -165,15 +165,36 @@ git diff --name-only | grep "reductions.typ" # If it does NOT appear, you skipped Step 5. Go back and do it. ``` -## Step 6: Regenerate Exports and Verify +## Step 6: Regenerate Exports, CI Checks, and Dominated-Rules Gate ```bash cargo run --example export_graph cargo run --example export_schemas make regenerate-fixtures # Slow — needs fixtures for paper example data -make test clippy ``` +### CI-equivalent checks — MANDATORY + +These must pass before committing. CI will fail if you skip them. + +```bash +# Clippy with -D warnings (CI uses this exact flag — local clippy without -D is insufficient) +cargo clippy -- -D warnings 2>&1 | tail -5 + +# Full test suite (catches dominated-rules test failures) +cargo test 2>&1 | tail -5 +``` + +### Dominated-rules gate — CHECK THIS + +Adding a new reduction can create paths that **dominate** existing direct reductions. The test `test_find_dominated_rules_returns_known_set` in `src/unit_tests/rules/analysis.rs` has a hardcoded set of known dominated pairs. If your new reduction creates a shorter path to a target that already has a direct reduction, this test will fail. + +```bash +cargo test test_find_dominated_rules_returns_known_set 2>&1 | tail -10 +``` + +If it fails with "Unexpected dominated rule: X -> Y (dominated by X -> ... -> Y)", add the new pair to the known set in `src/unit_tests/rules/analysis.rs`. + ## Step 7: Clean Up Verification Artifacts **MANDATORY.** Verification artifacts must NOT be committed into the library. Remove them after the Rust implementation is complete: @@ -204,10 +225,16 @@ done # Gate 2: No verification artifacts remaining ls docs/paper/verify-reductions/*** 2>/dev/null && echo " ✗ ARTIFACTS NOT CLEANED" || echo " ✓ Artifacts cleaned" -# Gate 3: Tests pass +# Gate 3: Clippy with -D warnings (CI uses this — local clippy alone is insufficient) +cargo clippy -- -D warnings 2>&1 | tail -3 + +# Gate 4: Full test suite (catches dominated-rules failures) cargo test 2>&1 | tail -3 -# Gate 4: Paper compiles +# Gate 5: Dominated-rules test specifically +cargo test test_find_dominated_rules_returns_known_set 2>&1 | tail -3 + +# Gate 6: Paper compiles make paper 2>&1 | tail -3 ``` @@ -237,3 +264,6 @@ gh pr create --title "feat: add reduction (#)" --bo | Missing paper `reduction-rule` entry | MANDATORY — Check 11 from #974 | | Leaving verification artifacts in repo | MANDATORY cleanup — Step 7 | | Not regenerating fixtures after example-db | `make regenerate-fixtures` required for paper | +| Running `cargo clippy` without `-D warnings` | CI uses `-D warnings` — local clippy without it misses lint failures | +| New reduction dominates existing direct reduction | Check `test_find_dominated_rules_returns_known_set` — add new pair to known set in `analysis.rs` | +| Skipping full `cargo test` before commit | Dominated-rules test only runs in full suite, not filtered tests | From 257845fc730e8937678bd748d9fdd5f2cd7a2ff9 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 14:08:25 +0000 Subject: [PATCH 15/23] fix: correct add-reduction HARD GATE for canonical examples MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit rule_builders.rs is a 4-line pass-through — canonical examples are registered via canonical_rule_example_specs() in each rule file, wired through mod.rs. Updated Step 4 to match actual architecture. Also added analysis.rs to git add list (for dominated-rules updates). Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-reduction/SKILL.md | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index a644a0650..b84a2e9fc 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -92,13 +92,20 @@ Reference: `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` ## Step 4: Add Canonical Example to example_db -**HARD GATE (Check 9 from #974).** You MUST modify `src/example_db/rule_builders.rs` — not the rule file, not anywhere else. Read the file first, follow existing patterns, add a builder function using the YES test vector instance, register in `build_rule_examples()`. +**HARD GATE (Check 9 from #974).** Add a `canonical_rule_example_specs()` function in the reduction rule file, and wire it into `src/rules/mod.rs` where other specs are collected. Read `src/rules/mod.rs` first — search for `canonical_rule_example_specs` to see how existing rules register their examples. -**Verification — run this and confirm the file was modified:** +The builder function must: +1. Construct the source instance from the YES test vector +2. Reduce it to the target +3. Return a `RuleExampleSpec` with build closure + +**Verification:** ```bash -git diff --name-only | grep "rule_builders.rs" -# MUST show: src/example_db/rule_builders.rs -# If it does NOT appear, you skipped this step. Go back and do it. +# The rule file must define canonical_rule_example_specs +grep "canonical_rule_example_specs" src/rules/_.rs +# mod.rs must wire it in +grep "_::canonical_rule_example_specs" src/rules/mod.rs +# Both must show results. If either is empty, go back and do it. ``` ## Step 4b: Add Example-DB Lookup Test @@ -216,7 +223,6 @@ for f in \ "src/rules/_.rs" \ "src/unit_tests/rules/_.rs" \ "src/rules/mod.rs" \ - "src/example_db/rule_builders.rs" \ "src/unit_tests/example_db.rs" \ "docs/paper/reductions.typ"; do git diff --name-only HEAD | grep -q "$(basename $f)" && echo " ✓ $f" || echo " ✗ MISSING: $f" @@ -244,8 +250,8 @@ make paper 2>&1 | tail -3 git add src/rules/_.rs \ src/unit_tests/rules/_.rs \ src/rules/mod.rs \ - src/example_db/rule_builders.rs \ src/unit_tests/example_db.rs \ + src/unit_tests/rules/analysis.rs \ docs/paper/reductions.typ git commit -m "feat: add reduction (#)" git push -u origin "" @@ -259,8 +265,8 @@ gh pr create --title "feat: add reduction (#)" --bo | Re-deriving algorithm from issue instead of Python `reduce()` | Python function is the verified spec — translate it | | Overhead expressions don't match test vectors JSON | Copy verbatim from `overhead` field | | Skipping infeasible (NO) test case | NO instance is in test vectors — always include | -| Missing canonical example in `rule_builders.rs` | MANDATORY — Check 9 from #974 | -| Missing example-db lookup test | MANDATORY — Check 10 from #974 | +| Missing `canonical_rule_example_specs()` in rule file + `mod.rs` wiring | MANDATORY — Check 9 from #974 | +| Missing example-db lookup test in `example_db.rs` | MANDATORY — Check 10 from #974 | | Missing paper `reduction-rule` entry | MANDATORY — Check 11 from #974 | | Leaving verification artifacts in repo | MANDATORY cleanup — Step 7 | | Not regenerating fixtures after example-db | `make regenerate-fixtures` required for paper | From 62a26fa44d4952bbcfb08bced41afefb961e5694 Mon Sep 17 00:00:00 2001 From: zazabap Date: Wed, 1 Apr 2026 14:29:13 +0000 Subject: [PATCH 16/23] feat: parent-side verification + pre-commit hook for add-reduction MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two enforcement mechanisms that don't rely on subagent compliance: 1. Parent-side verification (Step 8a): After subagent reports DONE, the parent runs file gate checks independently. If any required file is missing, sends the subagent back — doesn't trust self-report. 2. Pre-commit hook (.claude/hooks/add-reduction-precommit.sh): Mechanically blocks commits of new rule files unless example_db.rs, reductions.typ, and mod.rs are also staged. Subagents cannot bypass. Root cause: subagents skip HARD GATE steps despite skill text saying "MANDATORY". Text-based enforcement doesn't work — need mechanical checks that run after the subagent, not instructions the subagent reads. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/hooks/add-reduction-precommit.sh | 49 ++++++++++ .claude/skills/add-reduction/SKILL.md | 113 +++++++++++++++++------ 2 files changed, 135 insertions(+), 27 deletions(-) create mode 100755 .claude/hooks/add-reduction-precommit.sh diff --git a/.claude/hooks/add-reduction-precommit.sh b/.claude/hooks/add-reduction-precommit.sh new file mode 100755 index 000000000..f59aaefa6 --- /dev/null +++ b/.claude/hooks/add-reduction-precommit.sh @@ -0,0 +1,49 @@ +#!/bin/bash +# Pre-commit hook for add-reduction: blocks commits missing required files. +# +# This hook fires on any commit that stages a new rule file under src/rules/. +# It checks that the companion files (example_db test, paper entry, mod.rs) +# are also staged. If any are missing, the commit is blocked. +# +# Install: symlink or copy to .git/hooks/pre-commit +# ln -sf ../../.claude/hooks/add-reduction-precommit.sh .git/hooks/pre-commit + +STAGED=$(git diff --cached --name-only) + +# Only run for commits that add/modify rule files (not mod.rs or traits.rs) +RULE_FILES=$(echo "$STAGED" | grep "^src/rules/" | grep -v mod.rs | grep -v traits.rs | grep '\.rs$') +if [ -z "$RULE_FILES" ]; then + exit 0 +fi + +ERRORS=0 + +# Check example_db.rs is staged +if ! echo "$STAGED" | grep -q "example_db.rs"; then + echo "BLOCKED: src/unit_tests/example_db.rs not staged (Check 10 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check reductions.typ is staged +if ! echo "$STAGED" | grep -q "reductions.typ"; then + echo "BLOCKED: docs/paper/reductions.typ not staged (Check 11 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check mod.rs is staged +if ! echo "$STAGED" | grep -q "mod.rs"; then + echo "BLOCKED: src/rules/mod.rs not staged" + ERRORS=$((ERRORS + 1)) +fi + +# Check no verification artifacts are staged +if echo "$STAGED" | grep -q "docs/paper/verify-reductions/"; then + echo "BLOCKED: verification artifacts still staged — run git rm first" + ERRORS=$((ERRORS + 1)) +fi + +if [ $ERRORS -gt 0 ]; then + echo "" + echo "Fix the above and re-commit. See /add-reduction skill for details." + exit 1 +fi diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md index b84a2e9fc..42498cdfe 100644 --- a/.claude/skills/add-reduction/SKILL.md +++ b/.claude/skills/add-reduction/SKILL.md @@ -212,52 +212,111 @@ git rm -f docs/paper/verify-reductions/*** The Typst proof content lives on in the paper entry. The Python scripts were scaffolding — the Rust tests are the permanent verification. -## Step 8: Pre-Commit Gate and Create PR +## Step 8: Parent-Side Verification and Commit -**Before committing, run this checklist. ALL must pass:** +**Do NOT trust the subagent's self-report.** After the implementation subagent reports DONE, the parent (you) MUST run these checks independently: + +### 8a. File gate (parent runs this, not the subagent) ```bash -# Gate 1: Required files modified -echo "=== Pre-commit file gate ===" -for f in \ - "src/rules/_.rs" \ - "src/unit_tests/rules/_.rs" \ - "src/rules/mod.rs" \ - "src/unit_tests/example_db.rs" \ - "docs/paper/reductions.typ"; do - git diff --name-only HEAD | grep -q "$(basename $f)" && echo " ✓ $f" || echo " ✗ MISSING: $f" +echo "=== Parent-side file gate ===" +REQUIRED_PATTERNS=( + "_.rs" # rule file + "mod.rs" # module registration + "example_db.rs" # lookup test + "reductions.typ" # paper entry +) +CHANGED=$(git diff --name-only HEAD) +for pat in "${REQUIRED_PATTERNS[@]}"; do + echo "$CHANGED" | grep -q "$pat" && echo " ✓ $pat" || echo " ✗ MISSING: $pat — send subagent back" done -# Gate 2: No verification artifacts remaining -ls docs/paper/verify-reductions/*** 2>/dev/null && echo " ✗ ARTIFACTS NOT CLEANED" || echo " ✓ Artifacts cleaned" +# Canonical example wired? +grep -q "canonical_rule_example_specs" src/rules/_.rs && echo " ✓ canonical_rule_example_specs in rule file" || echo " ✗ MISSING canonical_rule_example_specs" +grep -q "_::canonical_rule_example_specs" src/rules/mod.rs && echo " ✓ wired in mod.rs" || echo " ✗ MISSING mod.rs wiring" -# Gate 3: Clippy with -D warnings (CI uses this — local clippy alone is insufficient) -cargo clippy -- -D warnings 2>&1 | tail -3 +# Artifacts cleaned? +ls docs/paper/verify-reductions/*** 2>/dev/null && echo " ✗ ARTIFACTS NOT CLEANED" || echo " ✓ Artifacts cleaned" +``` -# Gate 4: Full test suite (catches dominated-rules failures) -cargo test 2>&1 | tail -3 +**If ANY line shows ✗, send the subagent back to fix it.** Do NOT proceed. -# Gate 5: Dominated-rules test specifically -cargo test test_find_dominated_rules_returns_known_set 2>&1 | tail -3 +### 8b. CI-equivalent checks (parent runs) -# Gate 6: Paper compiles +```bash +cargo clippy -- -D warnings 2>&1 | tail -3 +cargo test 2>&1 | tail -3 make paper 2>&1 | tail -3 ``` -**If ANY file shows ✗ MISSING, STOP and go back to the skipped step.** Do NOT commit with missing files. +### 8c. Commit and push + +Only after ALL gates pass: ```bash -git add src/rules/_.rs \ - src/unit_tests/rules/_.rs \ - src/rules/mod.rs \ - src/unit_tests/example_db.rs \ - src/unit_tests/rules/analysis.rs \ - docs/paper/reductions.typ +git add -A git commit -m "feat: add reduction (#)" git push -u origin "" gh pr create --title "feat: add reduction (#)" --body "..." ``` +## Pre-Commit Hook + +Install this hook to mechanically block commits missing required files. The hook runs automatically — subagents cannot bypass it. + +**File:** `.claude/hooks/add-reduction-precommit.sh` + +```bash +#!/bin/bash +# Pre-commit hook for add-reduction: blocks commits missing required files. +# Install: copy to .git/hooks/pre-commit or add to settings.json hooks. + +STAGED=$(git diff --cached --name-only) + +# Only run for reduction rule commits +if ! echo "$STAGED" | grep -q "src/rules/.*\.rs$"; then + exit 0 +fi + +# Extract the reduction name from staged rule files +RULE_FILE=$(echo "$STAGED" | grep "^src/rules/" | grep -v mod.rs | grep -v traits.rs | head -1) +if [ -z "$RULE_FILE" ]; then + exit 0 +fi + +ERRORS=0 + +# Check example_db.rs is staged +if ! echo "$STAGED" | grep -q "example_db.rs"; then + echo "BLOCKED: src/unit_tests/example_db.rs not staged (Check 10 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check reductions.typ is staged +if ! echo "$STAGED" | grep -q "reductions.typ"; then + echo "BLOCKED: docs/paper/reductions.typ not staged (Check 11 from #974)" + ERRORS=$((ERRORS + 1)) +fi + +# Check mod.rs is staged +if ! echo "$STAGED" | grep -q "mod.rs"; then + echo "BLOCKED: src/rules/mod.rs not staged" + ERRORS=$((ERRORS + 1)) +fi + +# Check no verification artifacts are staged +if echo "$STAGED" | grep -q "verify-reductions/"; then + echo "BLOCKED: verification artifacts still staged — run git rm first" + ERRORS=$((ERRORS + 1)) +fi + +if [ $ERRORS -gt 0 ]; then + echo "" + echo "Fix the above and re-commit. See /add-reduction skill for details." + exit 1 +fi +``` + ## Common Mistakes | Mistake | Fix | From efb5603058403ee2932dae5f9b3ac5a351ef9eee Mon Sep 17 00:00:00 2001 From: zazabap Date: Thu, 2 Apr 2026 12:25:43 +0000 Subject: [PATCH 17/23] fix: strengthen type compatibility gate in verify-reduction skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Expanded the type compatibility table to explicitly list all incompatible pairs (Min→Or, Max→Or, Max→Min, Min→Max, Or→Sum, etc.) with concrete examples from batch verification (#198 MVC→HamCircuit, #890 MaxCut→OLA). Added common mistake entry for proceeding past the type gate. Learned from batch run: 5 out of 50 reductions were mathematically verified but turned out to be unimplementable as ReduceTo due to type mismatches that the original gate didn't catch. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/verify-reduction/SKILL.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index 9bb30a03f..eea41e5f4 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -46,10 +46,17 @@ Check source/target `Value` types before any work: grep "type Value = " src/models/*/.rs src/models/*/.rs ``` -**Compatible:** `Or`→`Or`, `Min`→`Min`, `Max`→`Max`, `Or`→`Min`/`Max`. -**Incompatible:** `Min`/`Max`→`Or` (needs threshold K that the source type lacks). +**Compatible pairs for `ReduceTo` (witness-capable):** +- `Or`→`Or`, `Min`→`Min`, `Max`→`Max` (same type) +- `Or`→`Min`, `Or`→`Max` (feasibility embeds into optimization) -If incompatible, STOP and comment on the issue explaining the type mismatch and options (add decision variant, use `ReduceToAggregate`, or choose different pair). Do NOT proceed. +**Incompatible — STOP if any of these:** +- `Min`→`Or` or `Max`→`Or` — optimization source has no threshold K; needs a decision-variant source model (e.g., `VertexCover(Or)` instead of `MinimumVertexCover(Min)`) +- `Max`→`Min` or `Min`→`Max` — opposite optimization directions; needs `ReduceToAggregate` or a decision-variant wrapper +- `Or`→`Sum` or `Min`→`Sum` — Sum is aggregate-only; needs `ReduceToAggregate` +- Any pair involving `And` or `Sum` on the target side + +If incompatible, STOP and comment on the issue explaining the type mismatch and options (add decision variant, use `ReduceToAggregate`, or choose different pair). Do NOT proceed with the mathematical verification — the reduction may be correct classically but cannot be implemented as `ReduceTo` in this codebase without a type-compatible source/target pair. ### If compatible @@ -237,6 +244,7 @@ Every item must be YES. If any is NO, go back and fix. | Mistake | Consequence | |---------|-------------| +| Proceeding past type gate with incompatible types | Wasted work — math may be correct but `ReduceTo` impl is impossible. Common: `Min→Or` (MVC→HamCircuit), `Max→Min` (MaxCut→OLA) | | Adversary imports from constructor script | Rejected — must be independent | | No `hypothesis` PBT in adversary | Rejected | | Section 1 (symbolic) empty | Rejected — "overhead is trivial" is not an excuse | From 7cf13d8d31c4625d70cc9a7db75b620769942bc6 Mon Sep 17 00:00:00 2001 From: Shiwen An <97461865+zazabap@users.noreply.github.com> Date: Mon, 6 Apr 2026 14:34:58 +0900 Subject: [PATCH 18/23] Delete docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md --- ...026-04-01-verify-reduction-enhancements.md | 466 ------------------ 1 file changed, 466 deletions(-) delete mode 100644 docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md diff --git a/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md b/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md deleted file mode 100644 index debd6c62f..000000000 --- a/docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md +++ /dev/null @@ -1,466 +0,0 @@ -# Verify-Reduction Enhancements Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Enhance `/verify-reduction` with auto-matching, test vector export, and typed adversary prompts. Create new `/add-reduction` skill that consumes verified artifacts. Keep original `/add-rule` and `/review-pipeline` unchanged. - -**Architecture:** Three file changes — update verify-reduction SKILL.md (new Step 4.5, updated Step 5/8), create add-reduction SKILL.md (fork of add-rule that reads Python reduce() + test vectors JSON), register add-reduction in CLAUDE.md. - -**Tech Stack:** Markdown skill definitions only — no code changes. - ---- - -### Task 1: Update verify-reduction SKILL.md — Add Step 4.5 (Auto-Matching + Test Vectors Export) - -**Files:** -- Modify: `.claude/skills/verify-reduction/SKILL.md` - -The new Step 4.5 goes between the current Step 4 (Run and Iterate) and Step 5 (Adversary Verification). It adds three things: (a) constructor exports test vectors JSON, (b) Typst ↔ JSON cross-check, (c) structured claim tags replace manual gap analysis. - -- [ ] **Step 1: Insert new Step 4.5 after the current "Iteration 3: Gap analysis" block** - -In `.claude/skills/verify-reduction/SKILL.md`, find the line `If any claim has no test, add one. If it's untestable, document WHY.` (end of Step 4, Iteration 3). After it, insert: - -```markdown -### Iteration 4: Export test vectors and validate Typst matching — MANDATORY - -After all checks pass and gap analysis is complete, the constructor script must export a test vectors JSON file for downstream consumption: - -**File:** `docs/paper/verify-reductions/test_vectors__.json` - -```json -{ - "source": "", - "target": "", - "issue": , - "yes_instance": { - "input": { ... }, - "output": { ... }, - "source_feasible": true, - "target_feasible": true, - "source_solution": [ ... ], - "extracted_solution": [ ... ] - }, - "no_instance": { - "input": { ... }, - "output": { ... }, - "source_feasible": false, - "target_feasible": false - }, - "overhead": { - "field_name": "expression using source getters" - }, - "claims": [ - {"tag": "overhead_universe_size", "formula": "2*n", "verified": true}, - {"tag": "forward_correctness", "description": "NAE-sat implies valid splitting", "verified": true} - ] -} -``` - -Add this export at the end of `main()` in the constructor script: - -```python -import json - -test_vectors = { - "source": "", - "target": "", - "issue": , - "yes_instance": { ... }, # from Section 6 - "no_instance": { ... }, # from Section 7 - "overhead": { ... }, # from Section 1/4 - "claims": claims_list, # accumulated claim() calls -} - -with open("docs/paper/verify-reductions/test_vectors__.json", "w") as f: - json.dump(test_vectors, f, indent=2) -print(f"Test vectors exported to test_vectors__.json") -``` - -**Typst ↔ JSON cross-check:** - -After exporting, load both the test vectors JSON and the Typst file. For each key numerical value in the JSON (input sizes, target values, output sizes, feasibility verdicts), check that it appears as a substring in the Typst YES/NO example sections. This is a substring search on the raw Typst text, not a full parser: - -```python -typst_text = open("").read() -for val in [str(v) for v in yes_instance["input"].values() if isinstance(v, (int, list))]: - assert str(val) in typst_text, f"Typst missing YES value: {val}" -for val in [str(v) for v in no_instance["input"].values() if isinstance(v, (int, list))]: - assert str(val) in typst_text, f"Typst missing NO value: {val}" -``` - -If any value is missing, the Typst proof and Python script are out of sync — fix before proceeding. - -**Structured claims (best-effort replacement for manual gap analysis):** - -Instead of the manual CLAIM/TESTED BY table, accumulate claims programmatically: - -```python -claims_list = [] - -def claim(tag, formula_or_desc, verified=True): - claims_list.append({"tag": tag, "formula": formula_or_desc, "verified": verified}) -``` - -Call `claim()` throughout the constructor script wherever a Typst proof claim is verified. The self-review step (Step 6) checks that all claims have `verified: true` and that the claim count is reasonable (at least 5 claims for any non-trivial reduction). -``` - -- [ ] **Step 2: Verify the edit is well-placed** - -Read the file and confirm Step 4.5 appears between Step 4 and Step 5 with correct markdown heading level. - -- [ ] **Step 3: Commit** - -```bash -git add .claude/skills/verify-reduction/SKILL.md -git commit -m "feat: add Step 4.5 to verify-reduction — test vectors export + Typst auto-matching" -``` - ---- - -### Task 2: Update verify-reduction SKILL.md — Typed Adversary Prompt (Step 5) - -**Files:** -- Modify: `.claude/skills/verify-reduction/SKILL.md` - -- [ ] **Step 1: Add reduction-type detection and tailored instructions to the adversary prompt** - -In `.claude/skills/verify-reduction/SKILL.md`, find the adversary prompt template in Step 5a (the block starting `You are an adversary verifier`). Before the `## Your task` section of the adversary prompt, insert: - -````markdown -## Reduction type - -Detect the reduction type from the Typst proof and tailor your testing focus: - -- **Identity reduction** (same graph/structure, different objective — keywords: "complement", "same graph", "negation"): Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration. Edge-case configs (all-zero, all-one, alternating) are highest priority. - -- **Algebraic reduction** (padding, case split, formula transformation — keywords: "padding", "case", "if Σ", "d ="): Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently. Include at least one hypothesis strategy targeting boundary values. - -- **Gadget reduction** (widget/component construction — keywords: "widget", "component", "gadget", "cover-testing"): Focus on widget structure invariants. Verify each traversal/usage pattern independently. Test that interior vertices/elements have no external connections. Check structural properties (connectivity, edge counts, degree sequences) across all small instances. -```` - -- [ ] **Step 2: Commit** - -```bash -git add .claude/skills/verify-reduction/SKILL.md -git commit -m "feat: add typed adversary prompts to verify-reduction (identity/algebraic/gadget)" -``` - ---- - -### Task 3: Update verify-reduction SKILL.md — Downstream Artifacts (Step 8) and Integration - -**Files:** -- Modify: `.claude/skills/verify-reduction/SKILL.md` - -- [ ] **Step 1: Update Step 8a commit to include test vectors JSON** - -Find the `git add` block in Step 8a. Change it to: - -```bash -git add docs/paper/.typ \ - docs/paper/verify-reductions/verify_*.py \ - docs/paper/verify-reductions/adversary_*.py \ - docs/paper/verify-reductions/test_vectors_*.json -git add -f docs/paper/.pdf -``` - -- [ ] **Step 2: Update the Integration section at the bottom of the file** - -Replace the current Integration section with: - -```markdown -## Integration - -### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline - -`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` function is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. - -- **Before `/add-reduction`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON -- **During `/add-reduction`**: reads Python `reduce()` as pseudocode, overhead from test vectors JSON, generates Rust tests from test vectors -- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances; compositional check via alternative paths if available - -### Standalone usage - -- **After `write-rule-in-paper`**: invoke to verify paper entry -- **During `review-structural`**: check verification script exists and passes -- **Before `issue-to-pr --execute`**: pre-validate the algorithm -``` - -- [ ] **Step 3: Add test vectors JSON to the Quality Gates checklist** - -In the Quality Gates section, add after the cross-consistency items: - -```markdown -- [ ] Test vectors JSON exported with YES/NO instances, overhead, and claims -- [ ] Typst ↔ JSON auto-matching passed (key values present in Typst text) -``` - -- [ ] **Step 4: Add test vectors JSON to the Self-Review checklist (Step 6)** - -In Step 6, add a new subsection after "Cross-consistency": - -```markdown -### Test vectors and auto-matching - -- [ ] `test_vectors__.json` exported successfully -- [ ] YES instance in JSON matches Typst feasible example (values present) -- [ ] NO instance in JSON matches Typst infeasible example (values present) -- [ ] All claims have `verified: true` -- [ ] At least 5 claims for non-trivial reductions -``` - -- [ ] **Step 5: Commit** - -```bash -git add .claude/skills/verify-reduction/SKILL.md -git commit -m "feat: update verify-reduction Step 8 + Integration for downstream pipeline" -``` - ---- - -### Task 4: Create add-reduction skill - -**Files:** -- Create: `.claude/skills/add-reduction/SKILL.md` - -This is a fork of add-rule that reads verified artifacts instead of deriving from the issue. - -- [ ] **Step 1: Create the skill directory** - -```bash -mkdir -p .claude/skills/add-reduction -``` - -- [ ] **Step 2: Write the SKILL.md** - -Create `.claude/skills/add-reduction/SKILL.md` with the following content: - -```markdown ---- -name: add-reduction -description: Add a new reduction rule using verified artifacts from /verify-reduction — reads Python reduce() as pseudocode, test vectors JSON for Rust tests, overhead from JSON ---- - -# Add Reduction (from Verified Artifacts) - -Step-by-step guide for adding a new reduction rule (A → B) when `/verify-reduction` has already produced verified artifacts (Typst proof, Python scripts, test vectors JSON). This skill consumes those artifacts directly instead of re-deriving from the issue. - -**When to use:** After `/verify-reduction` has produced a PR with verified artifacts for a reduction rule issue. Use `/add-rule` instead when no verification artifacts exist. - -## Step 0: Locate Verified Artifacts - -Check for existing verification artifacts: - -```bash -ls docs/paper/verify-reductions/verify__.py -ls docs/paper/verify-reductions/test_vectors__.json -ls docs/paper/verify-reductions/_.typ -``` - -If any are missing, run `/verify-reduction` first. - -### Read the artifacts - -1. **Python `reduce()` function** — this is the verified spec for the Rust `reduce_to()` implementation. Read it carefully; translate the algorithm, not the syntax. -2. **Test vectors JSON** — contains YES/NO instances with exact input/output values, overhead expressions, and verified claims. -3. **Typst proof** — the Construction section describes the algorithm in mathematical notation. Use for doc comments. - -```bash -# Load test vectors -TEST_VECTORS=$(cat docs/paper/verify-reductions/test_vectors__.json) -``` - -Extract from test vectors JSON: -- `overhead` → use directly in `#[reduction(overhead = { ... })]` -- `yes_instance.input` / `yes_instance.output` → first closed-loop test case -- `no_instance.input` / `no_instance.output` → infeasible test case -- `claims` → verify each is preserved in the Rust implementation - -## Reference Implementations - -Same as `/add-rule`: -- **Reduction rule:** `src/rules/minimumvertexcover_maximumindependentset.rs` -- **Reduction tests:** `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` -- **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` -- **Traits:** `src/rules/traits.rs` (`ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult`) - -## Step 1: Implement the reduction - -Create `src/rules/_.rs`. - -**Translation guide:** Map the Python `reduce()` function to Rust: - -| Python | Rust | -|--------|------| -| `reduce(n, clauses)` → `(universe_size, subsets)` | `fn reduce_to(&self) -> Self::Result` | -| `extract_assignment(n, config)` | `fn extract_solution(&self, target_sol: &[usize]) -> Vec` | -| `literal_to_element(lit, n)` | Private helper method | -| Python list of ints | `Vec`, `Vec`, etc. (match problem type) | - -**Overhead from test vectors JSON:** The `overhead` field maps directly to the `#[reduction]` macro: - -```rust -#[reduction(overhead = { - // Copy expressions verbatim from test_vectors JSON "overhead" field - field_name = "expression", -})] -``` - -The rest of the implementation structure follows `/add-rule` Step 1 exactly: ReductionResult struct, trait impl, ReduceTo impl. - -## Step 2: Register in mod.rs - -Same as `/add-rule` Step 2. Add `mod _;` to `src/rules/mod.rs`. - -## Step 3: Write unit tests from test vectors - -Create `src/unit_tests/rules/_.rs`. - -**Generate tests directly from test vectors JSON:** - -The YES instance becomes the primary closed-loop test: - -```rust -#[test] -fn test__to__closed_loop() { - // Construct source from test_vectors.yes_instance.input - let source = ::try_new(/* fields from JSON */).unwrap(); - - // Reduce - let reduction = ReduceTo::::reduce_to(&source); - - // Verify target matches test_vectors.yes_instance.output - let target = reduction.target_problem(); - assert_eq!(target.(), /* value from JSON output */); - - // Solve and extract - let solver = BruteForce; - for witness in solver.find_all_witnesses(target).unwrap() { - let extracted = reduction.extract_solution(&witness); - // Verify extracted solution is valid for source - let val = source.evaluate(&extracted); - assert!(val.0); // Or check objective value - } -} -``` - -The NO instance becomes the infeasible test: - -```rust -#[test] -fn test__to__infeasible() { - // Construct source from test_vectors.no_instance.input - let source = ::try_new(/* fields from JSON */).unwrap(); - - // Reduce - let reduction = ReduceTo::::reduce_to(&source); - - // Verify target is also infeasible - let solver = BruteForce; - let witnesses = solver.find_all_witnesses(reduction.target_problem()); - assert!(witnesses.is_none() || witnesses.unwrap().is_empty()); -} -``` - -Add additional structural tests as needed (target size, edge count, etc.) guided by the `claims` field in the test vectors JSON. - -## Step 4: Add canonical example to example_db - -Same as `/add-rule` Step 4. The YES instance from the test vectors JSON is a good candidate for the canonical example. - -## Step 5: Document in paper - -The Typst proof already exists from `/verify-reduction`. Integrate it into `docs/paper/reductions.typ` using the `reduction-rule` template. The proof text, worked examples, and overhead table are already written — adapt them to the paper's macros (`reduction-rule`, `problem-def`, etc.). - -Follow `/add-rule` Step 5 for the exact format. The heavy writing is already done; this step is reformatting. - -## Step 6: Regenerate exports and verify - -Same as `/add-rule` Step 6: - -```bash -cargo run --example export_graph -cargo run --example export_schemas -make regenerate-fixtures -make test clippy -``` - -## Solver Rules - -Same as `/add-rule`. If the target problem needs ILP, implement alongside. - -## CLI Impact - -Same as `/add-rule`. No CLI changes needed for witness-preserving reductions. - -## File Naming - -Same as `/add-rule`: -- Rule file: `src/rules/_.rs` -- Test file: `src/unit_tests/rules/_.rs` -- Canonical example: builder function in `src/example_db/rule_builders.rs` - -## Common Mistakes - -All mistakes from `/add-rule` apply, plus: - -| Mistake | Fix | -|---------|-----| -| Re-deriving algorithm from issue instead of reading Python `reduce()` | The Python function is the verified spec — translate it, don't reinvent | -| Ignoring test vectors JSON | Use the YES/NO instances for Rust tests directly | -| Overhead expressions don't match test vectors JSON | Copy verbatim from the `overhead` field | -| Skipping the infeasible (NO) test case | The NO instance is in the test vectors — always include it | -| Not integrating the existing Typst proof into the paper | The proof is already written; reformat, don't rewrite | -``` - -- [ ] **Step 3: Commit** - -```bash -git add .claude/skills/add-reduction/SKILL.md -git commit -m "feat: create add-reduction skill — consumes verified artifacts from verify-reduction" -``` - ---- - -### Task 5: Register add-reduction in CLAUDE.md - -**Files:** -- Modify: `.claude/CLAUDE.md` - -- [ ] **Step 1: Add add-reduction entry to the Skills list** - -In `.claude/CLAUDE.md`, find the line: -``` -- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. -``` - -Add immediately after it: -``` -- [add-reduction](skills/add-reduction/SKILL.md) -- Add a new reduction rule from verified artifacts (Python reduce() + test vectors JSON from `/verify-reduction`). Use instead of `add-rule` when verification artifacts exist. -``` - -- [ ] **Step 2: Update the verify-reduction description** - -Find the line starting `- [verify-reduction]` and replace it with: - -``` -- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), adversary Python script (≥5000 independent checks), and test vectors JSON for downstream consumption by `add-reduction`. Iterates until all checks pass. Creates worktree + PR. -``` - -- [ ] **Step 3: Commit** - -```bash -git add .claude/CLAUDE.md -git commit -m "feat: register add-reduction skill in CLAUDE.md, update verify-reduction description" -``` - ---- - -Plan complete and saved to `docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md`. Two execution options: - -**1. Subagent-Driven (recommended)** - I dispatch a fresh subagent per task, review between tasks, fast iteration - -**2. Inline Execution** - Execute tasks in this session using executing-plans, batch execution with checkpoints - -Which approach? \ No newline at end of file From a46925d2c79d05c3879621f18f80a62ba89a928f Mon Sep 17 00:00:00 2001 From: Shiwen An <97461865+zazabap@users.noreply.github.com> Date: Mon, 6 Apr 2026 14:35:31 +0900 Subject: [PATCH 19/23] Delete docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md --- ...6-03-31-proposed-reductions-note-design.md | 125 ------------------ 1 file changed, 125 deletions(-) delete mode 100644 docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md diff --git a/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md b/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md deleted file mode 100644 index 9ce7a1a90..000000000 --- a/docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md +++ /dev/null @@ -1,125 +0,0 @@ -# Design: Proposed Reduction Rules — Typst Verification Note - -**Date:** 2026-03-31 -**Goal:** Create a standalone Typst document with compiled PDF that formalizes 9 reduction rules from issue #770, resolving blockers for 7 incomplete issues and adding 2 high-leverage NP-hardness chain extensions. - -## Scope - -### 9 Reductions - -**Group 1 — NP-hardness proof chain extensions:** - -| Issue | Reduction | Impact | -|-------|-----------|--------| -| #973 | SubsetSum → Partition | Unlocks ~12 downstream problems | -| #198 | MinimumVertexCover → HamiltonianCircuit | Unlocks ~9 downstream problems | - -**Group 2 — Tier 1a blocked issues (fix + formalize):** - -| Issue | Reduction | Current blocker | -|-------|-----------|----------------| -| #379 | DominatingSet → MinMaxMulticenter | Decision vs optimization MDS model | -| #380 | DominatingSet → MinSumMulticenter | Same | -| #888 | OptimalLinearArrangement → RootedTreeArrangement | Witness extraction impossible for naive approach | -| #822 | ExactCoverBy3Sets → AcyclicPartition | Missing algorithm (unpublished reference) | - -**Group 3 — Tier 1b blocked issues (fix + formalize):** - -| Issue | Reduction | Current blocker | -|-------|-----------|----------------| -| #892 | VertexCover → HamiltonianPath | Depends on #198 (VC→HC) being resolved | -| #894 | VertexCover → PartialFeedbackEdgeSet | Missing Yannakakis 1978b paper | -| #890 | MaxCut → OptimalLinearArrangement | Placeholder algorithm, no actual construction | - -## Deliverables - -1. **`docs/paper/proposed-reductions.typ`** — standalone Typst document -2. **`docs/paper/proposed-reductions.pdf`** — compiled PDF checked into repo -3. **Updated GitHub issues** — #379, #380, #888, #822, #892, #894, #890 corrected with verified algorithms -4. **One PR** containing the note, PDF, and issue updates - -## Document Structure - -``` -Title: Proposed Reduction Rules — Verification Notes -Abstract: Motivation (NP-hardness gaps, blocked issues, impact analysis) - -§1 Notation & Conventions - - Standard symbols (G, V, E, w, etc.) - - Proof structure: Construction → Correctness (⟹/⟸) → Solution Extraction - - Overhead notation - -§2 NP-Hardness Chain Extensions - §2.1 SubsetSum → Partition (#973) - §2.2 MinimumVertexCover → HamiltonianCircuit (#198) - §2.3 VertexCover → HamiltonianPath (#892) - -§3 Graph Reductions - §3.1 MaxCut → OptimalLinearArrangement (#890) - §3.2 OptimalLinearArrangement → RootedTreeArrangement (#888) - -§4 Set & Domination Reductions - §4.1 DominatingSet → MinMaxMulticenter (#379) - §4.2 DominatingSet → MinSumMulticenter (#380) - §4.3 ExactCoverBy3Sets → AcyclicPartition (#822) - -§5 Feedback Set Reductions - §5.1 VertexCover → PartialFeedbackEdgeSet (#894) -``` - -## Per-Reduction Entry Format - -Each reduction follows the `reductions.typ` convention: - -1. **Theorem statement** — 1-3 sentence intuition, citation (e.g., `[GJ79, ND50]`) -2. **Proof** with three mandatory subsections: - - _Construction._ Numbered algorithm steps, all symbols defined before use - - _Correctness._ Bidirectional: (⟹) forward direction, (⟸) backward direction - - _Solution extraction._ How to map target solution back to source -3. **Overhead table** — target size fields as functions of source size fields -4. **Worked example** — concrete small instance, full construction steps, solution verification - -Mathematical notation uses Typst math mode: `$V$`, `$E$`, `$arrow.r.double$`, `$overline(x)$`, etc. - -## Research Plan for Blocked Issues - -For each blocked reduction: - -1. **Search** for the original reference via the citation in Garey & Johnson -2. **Reconstruct** the correct algorithm from the paper or from first principles -3. **Verify** correctness with a hand-worked example in the note -4. **Resolve** the blocker: - - #379/#380: Clarify that the reduction operates on the decision variant; note model alignment needed - - #888: Research Gavril 1977a gadget construction for forcing path-tree solutions - - #822: Research the acyclic partition reduction from G&J or construct from first principles - - #892: Chain through #198 (VC→HC→HP); detail the HC→HP modification - - #894: Search for Yannakakis 1978b or reconstruct the gadget - - #890: Research the Garey-Johnson-Stockmeyer 1976 construction - -If a reference is unavailable, construct a novel reduction and clearly mark it as such. - -## Typst Setup - -- Standalone document (not importing from `reductions.typ`) -- Uses: `ctheorems` for theorem/proof environments, `cetz` if diagrams needed -- Page: A4, New Computer Modern 10pt -- Theorem numbering: `Theorem 2.1`, `Theorem 2.2`, etc. -- No dependency on `examples.json` or `reduction_graph.json` (standalone) -- Compile command: `typst compile docs/paper/proposed-reductions.typ docs/paper/proposed-reductions.pdf` - -## Quality Criteria - -Each reduction must satisfy: -1. **Math equations correct** — all formulas verified against source paper or hand-derivation -2. **Provable correctness** — both directions of the proof are rigorous, no hand-waving -3. **Algorithm clear** — detailed enough that a developer can implement `reduce_to()` and `extract_solution()` directly from the proof -4. **From math to code verifiable** — overhead expressions match the construction, worked example can be used as a test case - -## PR Structure - -- Branch: `feat/proposed-reductions-note` -- Files: - - `docs/paper/proposed-reductions.typ` (new) - - `docs/paper/proposed-reductions.pdf` (new, compiled) -- No code changes — this is a documentation-only PR -- Issue updates done via `gh issue edit` (not in the PR diff) From 71c1444d45f22aac9a42591eaa67b5c51d099eb4 Mon Sep 17 00:00:00 2001 From: Shiwen An <97461865+zazabap@users.noreply.github.com> Date: Mon, 6 Apr 2026 14:36:17 +0900 Subject: [PATCH 20/23] Delete .claude/CLAUDE.md --- .claude/CLAUDE.md | 285 ---------------------------------------------- 1 file changed, 285 deletions(-) delete mode 100644 .claude/CLAUDE.md diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md deleted file mode 100644 index a3b771d5a..000000000 --- a/.claude/CLAUDE.md +++ /dev/null @@ -1,285 +0,0 @@ -# CLAUDE.md - -## Project Overview -Rust library for NP-hard problem reductions. Implements computational problems with reduction rules for transforming between equivalent formulations. - -## Skills -These repo-local skills live under `.claude/skills/*/SKILL.md`. - -- [run-pipeline](skills/run-pipeline/SKILL.md) -- Pick a Ready issue from the GitHub Project board, move it through In Progress -> issue-to-pr -> Review pool. One issue at a time; forever-loop handles iteration. -- [issue-to-pr](skills/issue-to-pr/SKILL.md) -- Convert a GitHub issue into a PR with an implementation plan. Default rule: one item per PR. Exception: a `[Model]` issue that explicitly claims direct ILP solvability should implement the model and its direct ` -> ILP` rule together; `[Rule]` issues still require both models to exist on `main`. -- [add-model](skills/add-model/SKILL.md) -- Add a new problem model. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. -- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. -- [add-reduction](skills/add-reduction/SKILL.md) -- Add a new reduction rule from verified artifacts (Python reduce() + test vectors JSON from `/verify-reduction`). Use instead of `add-rule` when verification artifacts exist. -- [review-structural](skills/review-structural/SKILL.md) -- Project-specific structural completeness check: model/rule checklists, build, semantic correctness, issue compliance. Read-only, no code changes. Called by `review-pipeline`. -- [review-quality](skills/review-quality/SKILL.md) -- Generic code quality review: DRY, KISS, cohesion/coupling, test quality, HCI. Read-only, no code changes. Called by `review-pipeline`. -- [fix-pr](skills/fix-pr/SKILL.md) -- Resolve PR review comments, fix CI failures, and address codecov coverage gaps. Uses `gh api` for codecov (not local `cargo-llvm-cov`). -- [write-model-in-paper](skills/write-model-in-paper/SKILL.md) -- Write or improve a problem-def entry in the Typst paper (standalone, for improving existing entries). Core instructions are inlined in `add-model` Step 6. -- [write-rule-in-paper](skills/write-rule-in-paper/SKILL.md) -- Write or improve a reduction-rule entry in the Typst paper (standalone, for improving existing entries). Core instructions are inlined in `add-rule` Step 5. -- [release](skills/release/SKILL.md) -- Create a new crate release. Determines version bump from diff, verifies tests/clippy, then runs `make release`. -- [check-issue](skills/check-issue/SKILL.md) -- Quality gate for `[Rule]` and `[Model]` issues. Checks usefulness, non-triviality, correctness of literature, and writing quality. Posts structured report and adds failure labels. -- [fix-issue](skills/fix-issue/SKILL.md) -- Fix quality issues found by check-issue — auto-fixes mechanical problems, brainstorms substantive issues with human, then re-checks and moves to Ready. -- [topology-sanity-check](skills/topology-sanity-check/SKILL.md) -- Run sanity checks on the reduction graph: detect orphan (isolated) problems and redundant reduction rules. - - `topology-sanity-check orphans` -- Detect isolated problem types (runs `examples/detect_isolated_problems.rs`) - - `topology-sanity-check np-hardness` -- Verify NP-hardness proof chains from 3-SAT (runs `examples/detect_unreachable_from_3sat.rs`) - - `topology-sanity-check redundancy [source target]` -- Check for dominated reduction rules -- [review-pipeline](skills/review-pipeline/SKILL.md) -- Agentic review for PRs in Review pool: runs structural check, quality check, and agentic feature tests (no code changes), posts combined verdict, always moves to Final review. -- [propose](skills/propose/SKILL.md) -- Interactive brainstorming to help domain experts propose a new model or rule. Asks one question at a time, uses mathematical language (no programming jargon), and files a GitHub issue. -- [final-review](skills/final-review/SKILL.md) -- Interactive maintainer review for PRs in "Final review" column. Merges main, walks through agentic review bullets with human, then merge or hold. -- [dev-setup](skills/dev-setup/SKILL.md) -- Interactive wizard to install and configure all development tools for new maintainers. -- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), adversary Python script (≥5000 independent checks), and test vectors JSON for downstream consumption by `add-reduction`. Iterates until all checks pass. Creates worktree + PR. -- [tutorial](skills/tutorial/SKILL.md) -- Interactive tutorial — walk through the pred CLI to explore, reduce, and solve NP-hard problems. No Rust internals. - -## Codex Compatibility -- Claude slash commands such as `/issue-to-pr 42 --execute` are aliases for the matching repo-local skill files under `.claude/skills/`. -- In Codex, read the relevant `SKILL.md` directly and follow it; do not assume slash-command support exists. -- The Makefile targets `run-plan`, `run-issue`, `run-pipeline`, and `run-review` already translate these workflows into explicit `SKILL.md` prompts for Codex. -- The default Codex model in the Makefile is `gpt-5.4`. Override it with `CODEX_MODEL=` if needed. -- The Step 0/Step 1 packet builders under `scripts/pipeline_skill_context.py` and `scripts/pipeline_checks.py` are expensive GitHub-backed calls. Per top-level skill invocation, generate each packet at most once and reuse the resulting text/JSON for all later steps unless the skill explicitly requires a fresh rerun. - -## Commands -```bash -make help # Show all available targets -make build # Build the project -make test # Run all tests -make fmt # Format code with rustfmt -make fmt-check # Check code formatting -make clippy # Run clippy lints -make doc # Build mdBook documentation (includes reduction graph export) -make mdbook # Build and serve mdBook with live reload -make paper # Build Typst paper from checked-in example fixtures -make coverage # Generate coverage report (>95% required) -make check # Quick pre-commit check (fmt + clippy + test) -make rust-export # Generate Julia parity test data (mapping stages) -make export-schemas # Regenerate problem schemas JSON -make qubo-testdata # Regenerate QUBO ground truth JSON -make clean # Clean build artifacts -make diagrams # Generate SVG diagrams from Typst (light + dark) -make compare # Generate and compare Rust mapping exports -make jl-testdata # Regenerate Julia parity test data (requires julia) -make cli # Build the pred CLI tool (without MCP, fast) -make mcp # Build the pred CLI tool with MCP server support -make cli-demo # Run closed-loop CLI demo (exercises all commands) -make mcp-test # Run MCP server tests (unit + integration) -make run-plan # Execute a plan with Codex or Claude -make run-issue N=42 # Run issue-to-pr --execute for a GitHub issue -make run-pipeline # Pick next Ready issue from project board, implement, move to Review pool -make run-pipeline N=97 # Process a specific issue from the project board -make run-pipeline-forever # Poll Ready column, run-pipeline when new issues appear -make run-review # Pick next PR from Review pool column, run agentic review, move to Final review -make run-review N=570 # Process a specific PR from the Review pool column -make run-review-forever # Poll Review pool for eligible PRs, dispatch run-review -make copilot-review # (Optional) Request Copilot code review on current PR -make release V=x.y.z # Tag and push a new release (CI publishes to crates.io) -# Set RUNNER=claude to use Claude instead of Codex (default: codex) -# Default Codex model: CODEX_MODEL=gpt-5.4 -``` - -## Git Safety -- **NEVER force push** (`git push --force`, `git push -f`, `git push --force-with-lease`). This is an absolute rule with no exceptions. Force push can silently destroy other people's work and stashed changes. - -## Architecture - -### Core Modules -- `src/models/` - Problem implementations organized by input structure: - - `graph/` - Graph-input problems - - `formula/` - Boolean formulas and circuits - - `set/` - Set systems (universe + subsets) - - `algebraic/` - Matrices, linear systems, lattices - - `misc/` - Unique input structures - - Run `pred list` for the full catalog of problems, variants, and reductions; `pred show ` for details on a specific problem -- `src/rules/` - Reduction rules + inventory registration -- `src/solvers/` - BruteForce solver for aggregate values plus witness recovery when supported, ILP solver (feature-gated, witness-only). To check if a problem supports ILP solving via a witness-capable reduction path, run `pred path ILP` -- `src/traits.rs` - `Problem` trait -- `src/rules/traits.rs` - `ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult` traits -- `src/registry/` - Compile-time reduction metadata collection -- `problemreductions-cli/` - `pred` CLI tool (separate crate in workspace) -- `src/unit_tests/` - Unit test files (mirroring `src/` structure, referenced via `#[path]`) -- `tests/main.rs` - Integration tests (modules in `tests/suites/`); example tests use `include!` for direct invocation (no subprocess) -- `tests/data/` - Ground truth JSON for integration tests -- `scripts/` - Python test data generation scripts (managed with `uv`) -- `docs/plans/` - Implementation plans - -### Trait Hierarchy - -``` -Problem (core trait — all problems must implement) -│ -├── const NAME: &'static str // e.g., "MaximumIndependentSet" -├── type Value: Clone // aggregate value: Max/Min/Sum/Or/And/Extremum/... -├── fn dims(&self) -> Vec // config space: [2, 2, 2] for 3 binary variables -├── fn evaluate(&self, config) -> Value -├── fn variant() -> Vec<(&str, &str)> // e.g., [("graph","SimpleGraph"), ("weight","i32")] -├── fn num_variables(&self) -> usize // default: dims().len() -└── fn problem_type() -> ProblemType // catalog bridge: registry lookup by NAME -``` - -**Witness-capable objective problems** (e.g., `MaximumIndependentSet`) typically use `Value = Max`, `Min`, or `Extremum`. - -**Witness-capable feasibility problems** (e.g., `Satisfiability`) typically use `Value = Or`. - -**Aggregate-only problems** use fold values such as `Sum` or `And`; these solve to a value but have no representative witness configuration. - -Common aggregate wrappers live in `src/types.rs`: -```rust -Max, Min, Sum, Or, And, Extremum, ExtremumSense -``` - -### Key Patterns -- `variant_params!` macro implements `Problem::variant()` — e.g., `crate::variant_params![G, W]` for two type params, `crate::variant_params![]` for none (see `src/variant.rs`) -- `declare_variants!` proc macro registers concrete type instantiations with best-known complexity and registry-backed load/serialize/value-solve/witness-solve metadata. One entry per problem may be marked `default`, and variable names in complexity strings are validated at compile time against actual getter methods. -- Problems parameterized by graph type `G` and optionally weight type `W` (problem-dependent) -- `Solver::solve()` computes the aggregate value for any `Problem` whose `Value` implements `Aggregate` -- `BruteForce::find_witness()` / `find_all_witnesses()` recover witnesses only when `P::Value::supports_witnesses()` -- `ReductionResult` provides `target_problem()` and `extract_solution()` for witness/config workflows; `AggregateReductionResult` provides `extract_value()` for aggregate/value workflows -- CLI-facing dynamic formatting uses aggregate wrapper names directly (for example `Max(2)`, `Min(None)`, `Or(true)`, or `Sum(56)`) -- Graph types: SimpleGraph, PlanarGraph, BipartiteGraph, UnitDiskGraph, KingsSubgraph, TriangularSubgraph -- Weight types: `One` (unit weight marker), `i32`, `f64` — all implement `WeightElement` trait -- `WeightElement` trait: `type Sum: NumericSize` + `fn to_sum(&self)` — converts weight to a summable numeric type -- Weight management via inherent methods (`weights()`, `set_weights()`, `is_weighted()`), not traits -- `NumericSize` supertrait bundles common numeric bounds (`Clone + Default + PartialOrd + Num + Zero + Bounded + AddAssign + 'static`) - -### Overhead System -Reduction overhead is expressed using `Expr` AST (in `src/expr.rs`) with the `#[reduction]` macro. The `overhead` attribute is **required** — omitting it is a compile error: -```rust -#[reduction(overhead = { - num_vertices = "num_vertices + num_clauses", - num_edges = "3 * num_clauses", -})] -impl ReduceTo for Source { ... } -``` -- Expression strings are parsed at compile time by a Pratt parser in the proc macro crate -- Variable names are validated against actual getter methods on the source type — typos cause compile errors -- Each problem type provides inherent getter methods (e.g., `num_vertices()`, `num_edges()`) that the overhead expressions reference -- `ReductionOverhead` stores `Vec<(&'static str, Expr)>` — field name to symbolic expression mappings -- `ReductionEntry` has both symbolic (`overhead_fn`) and compiled (`overhead_eval_fn`) evaluation — the compiled version calls getters directly -- `VariantEntry` has both a complexity string and compiled `complexity_eval_fn` — same pattern -- Expressions support: constants, variables, `+`, `-`, `*`, `/`, `^`, `exp()`, `log()`, `sqrt()`, `factorial()` -- Complexity strings must use **concrete numeric values only** (e.g., `"2^(2.372 * num_vertices / 3)"`, not `"2^(omega * num_vertices / 3)"`) -- `Expr::parse()` provides runtime parsing for cross-check tests that compare compiled vs symbolic evaluation - -### Problem Names -Problem types use explicit optimization prefixes (`Maximum...`, `Minimum...`) or no prefix. Run `pred list` for the full catalog. Common aliases (e.g., `MIS` → `MaximumIndependentSet`, `MVC` → `MinimumVertexCover`) are shown in the `Aliases` column. - -### Problem Variants -Reduction graph nodes use variant key-value pairs from `Problem::variant()`: -- Base: `MaximumIndependentSet` (empty variant = defaults) -- Graph variant: `MaximumIndependentSet {graph: "KingsSubgraph", weight: "One"}` -- Weight variant: `MaximumIndependentSet {graph: "SimpleGraph", weight: "f64"}` -- Default variant ranking: `SimpleGraph`, `One`, `KN` are considered default values; variants with the most default values sort first -- Nodes come exclusively from `#[reduction]` registrations; natural edges between same-name variants are inferred from the graph/weight subtype partial order -- Each primitive reduction is determined by the exact `(source_variant, target_variant)` endpoint pair -- Reduction edges carry `EdgeCapabilities { witness, aggregate }`; graph search defaults to witness mode, and aggregate mode is available through `ReductionMode::Aggregate` -- `#[reduction]` accepts only `overhead = { ... }` and currently registers witness/config reductions; aggregate-only edges require manual `ReductionEntry` registration with `reduce_aggregate_fn` - -### Extension Points -- New models register dynamic load/serialize/brute-force dispatch through `declare_variants!` in the model file, not by adding manual match arms in the CLI -- Aggregate-only models are first-class in `declare_variants!`; aggregate-only reduction edges still need manual `ReductionEntry` wiring because `#[reduction]` only registers witness/config reductions today -- Exact registry dispatch lives in `src/registry/`; alias resolution and partial/default variant resolution live in `problemreductions-cli/src/problem_name.rs` -- `pred create` UX lives in `problemreductions-cli/src/commands/create.rs` -- Canonical paper and CLI examples live in `src/example_db/model_builders.rs` and `src/example_db/rule_builders.rs` - -## Conventions - -### File Naming -- Reduction files: `src/rules/_.rs` (e.g., `maximumindependentset_qubo.rs`) -- Model files: `src/models//.rs` — category is by input structure: `graph/` (graph input), `formula/` (boolean formula/circuit), `set/` (universe + subsets), `algebraic/` (matrix/linear system/lattice), `misc/` (other) -- Canonical examples: builder functions in `src/example_db/rule_builders.rs` and `src/example_db/model_builders.rs` -- Example binaries in `examples/`: utility/export tools and pedagogical demos only (not per-reduction files) -- Test naming: `test__to__closed_loop` - -### Paper (docs/paper/reductions.typ) -- `problem-def(name)[def][body]` — defines a problem with auto-generated schema, reductions list, and label ``. Title comes from `display-name` dict. -- `reduction-rule(source, target, example: bool, ...)[rule][proof]` — generates a theorem with label `` and registers in `covered-rules` state. Overhead auto-derived from JSON edge data. -- Every directed reduction needs its own `reduction-rule` entry -- Completeness warnings auto-check that all JSON graph nodes/edges are covered in the paper -- `display-name` dict maps `ProblemName` to display text - -## Testing Requirements - -**Reference implementations — read these first:** -- **Reduction test:** `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` — closed-loop pattern -- **Model test:** `src/unit_tests/models/graph/maximum_independent_set.rs` — evaluation, serialization -- **Solver test:** `src/unit_tests/solvers/brute_force.rs` — aggregate `solve()` plus witness recovery helpers -- **Trait definitions:** `src/traits.rs` (`Problem`), `src/solvers/mod.rs` (`Solver`) - -### Coverage - -New code must have >95% test coverage. Run `make coverage` to check. - -### Naming - -- Reduction tests: `test__to__closed_loop` -- Model tests: descriptive names — e.g., `test__creation`, `test__evaluate_*`, `test__direction`, `test__solver`, `test__serialization`. Use whichever are relevant; there is no fixed per-model naming set. -- Solver tests: `test__` - -### Key Testing Patterns - -See Key Patterns above for solver API signatures. Follow the reference files for exact usage. - -### File Organization - -Unit tests in `src/unit_tests/` linked via `#[path]` (see Core Modules above). Integration tests in `tests/suites/`, consolidated through `tests/main.rs`. Canonical example-db coverage lives in `src/unit_tests/example_db.rs`. - -Model review automation checks for a dedicated test file under `src/unit_tests/models/...` with at least 3 test functions. The exact split of coverage is judged per model during review. - -## Documentation Locations -- `README.md` — Project overview and quickstart -- `.claude/` — Claude Code instructions and skills -- `docs/book/` — mdBook user documentation (built with `make doc`) -- `docs/paper/reductions.typ` — Typst paper with problem definitions and reduction theorems -- `src/example_db/` — Canonical model/rule examples: `model_builders.rs`, `rule_builders.rs` (in-memory builders), `specs.rs` (per-module invariant specs), consumed by `pred create --example` and paper exports -- `examples/` — Export utilities, graph-analysis helpers, and pedagogical demos - -## Documentation Requirements - -**Reference:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` to see a complete problem-def + reduction-rule example. - -### Adding a Problem Definition - -```typst -#problem-def("ProblemName")[ - Mathematical definition... -][ - Background, examples, algorithms... -] -``` - -Also add to the `display-name` dictionary: -```typst -"ProblemName": [Problem Name], -``` - -### Adding a Reduction Theorem - -```typst -#reduction-rule("Source", "Target", - example: true, - example-caption: [caption text], -)[ - Rule statement... -][ - Proof sketch... -] -``` - -Every directed reduction in the graph needs its own `reduction-rule` entry. The paper auto-checks completeness against the generated `reduction_graph.json` export. - -## Complexity Verification Requirements - -### Variant Worst-Case Complexity (`declare_variants!`) -The complexity string represents the **worst-case time complexity of the best known algorithm** for that problem variant. To verify correctness: -1. Identify the best known exact algorithm for the problem (name, author, year, citation) -2. Confirm the worst-case time bound from the original paper or a survey -3. Check that polynomial-time problems (e.g., MaximumMatching, 2-SAT, 2-Coloring) are NOT declared with exponential complexity -4. For NP-hard problems, verify the base of the exponential matches the literature (e.g., 1.1996^n for MIS, not 2^n) -5. Use only concrete numeric values — no symbolic constants (epsilon, omega); inline the actual numbers with citations -6. Variable names must match getter methods on the problem type (enforced at compile time) - -### Reduction Overhead (`#[reduction(overhead = {...})]`) -Overhead expressions describe how target problem size relates to source problem size. To verify correctness: -1. Read the `reduce_to()` implementation and count the actual output sizes -2. Check that each field (e.g., `num_vertices`, `num_edges`, `num_sets`) matches the constructed target problem -3. Watch for common errors: universe elements mismatch (edge indices vs vertex indices), worst-case edge counts in intersection graphs (quadratic, not linear), constant factors in circuit constructions -4. Test with concrete small instances: construct a source problem, run the reduction, and compare target sizes against the formula -5. Ensure there is only one primitive reduction registration for each exact source/target variant pair; wrap shared helpers instead of registering duplicate endpoints From b0989200196a7fb72cec7390a96c7a15a46b448b Mon Sep 17 00:00:00 2001 From: zazabap Date: Mon, 6 Apr 2026 05:45:02 +0000 Subject: [PATCH 21/23] Revert "Delete .claude/CLAUDE.md" This reverts commit 71c1444d45f22aac9a42591eaa67b5c51d099eb4. --- .claude/CLAUDE.md | 285 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 285 insertions(+) create mode 100644 .claude/CLAUDE.md diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md new file mode 100644 index 000000000..a3b771d5a --- /dev/null +++ b/.claude/CLAUDE.md @@ -0,0 +1,285 @@ +# CLAUDE.md + +## Project Overview +Rust library for NP-hard problem reductions. Implements computational problems with reduction rules for transforming between equivalent formulations. + +## Skills +These repo-local skills live under `.claude/skills/*/SKILL.md`. + +- [run-pipeline](skills/run-pipeline/SKILL.md) -- Pick a Ready issue from the GitHub Project board, move it through In Progress -> issue-to-pr -> Review pool. One issue at a time; forever-loop handles iteration. +- [issue-to-pr](skills/issue-to-pr/SKILL.md) -- Convert a GitHub issue into a PR with an implementation plan. Default rule: one item per PR. Exception: a `[Model]` issue that explicitly claims direct ILP solvability should implement the model and its direct ` -> ILP` rule together; `[Rule]` issues still require both models to exist on `main`. +- [add-model](skills/add-model/SKILL.md) -- Add a new problem model. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. +- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. +- [add-reduction](skills/add-reduction/SKILL.md) -- Add a new reduction rule from verified artifacts (Python reduce() + test vectors JSON from `/verify-reduction`). Use instead of `add-rule` when verification artifacts exist. +- [review-structural](skills/review-structural/SKILL.md) -- Project-specific structural completeness check: model/rule checklists, build, semantic correctness, issue compliance. Read-only, no code changes. Called by `review-pipeline`. +- [review-quality](skills/review-quality/SKILL.md) -- Generic code quality review: DRY, KISS, cohesion/coupling, test quality, HCI. Read-only, no code changes. Called by `review-pipeline`. +- [fix-pr](skills/fix-pr/SKILL.md) -- Resolve PR review comments, fix CI failures, and address codecov coverage gaps. Uses `gh api` for codecov (not local `cargo-llvm-cov`). +- [write-model-in-paper](skills/write-model-in-paper/SKILL.md) -- Write or improve a problem-def entry in the Typst paper (standalone, for improving existing entries). Core instructions are inlined in `add-model` Step 6. +- [write-rule-in-paper](skills/write-rule-in-paper/SKILL.md) -- Write or improve a reduction-rule entry in the Typst paper (standalone, for improving existing entries). Core instructions are inlined in `add-rule` Step 5. +- [release](skills/release/SKILL.md) -- Create a new crate release. Determines version bump from diff, verifies tests/clippy, then runs `make release`. +- [check-issue](skills/check-issue/SKILL.md) -- Quality gate for `[Rule]` and `[Model]` issues. Checks usefulness, non-triviality, correctness of literature, and writing quality. Posts structured report and adds failure labels. +- [fix-issue](skills/fix-issue/SKILL.md) -- Fix quality issues found by check-issue — auto-fixes mechanical problems, brainstorms substantive issues with human, then re-checks and moves to Ready. +- [topology-sanity-check](skills/topology-sanity-check/SKILL.md) -- Run sanity checks on the reduction graph: detect orphan (isolated) problems and redundant reduction rules. + - `topology-sanity-check orphans` -- Detect isolated problem types (runs `examples/detect_isolated_problems.rs`) + - `topology-sanity-check np-hardness` -- Verify NP-hardness proof chains from 3-SAT (runs `examples/detect_unreachable_from_3sat.rs`) + - `topology-sanity-check redundancy [source target]` -- Check for dominated reduction rules +- [review-pipeline](skills/review-pipeline/SKILL.md) -- Agentic review for PRs in Review pool: runs structural check, quality check, and agentic feature tests (no code changes), posts combined verdict, always moves to Final review. +- [propose](skills/propose/SKILL.md) -- Interactive brainstorming to help domain experts propose a new model or rule. Asks one question at a time, uses mathematical language (no programming jargon), and files a GitHub issue. +- [final-review](skills/final-review/SKILL.md) -- Interactive maintainer review for PRs in "Final review" column. Merges main, walks through agentic review bullets with human, then merge or hold. +- [dev-setup](skills/dev-setup/SKILL.md) -- Interactive wizard to install and configure all development tools for new maintainers. +- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), adversary Python script (≥5000 independent checks), and test vectors JSON for downstream consumption by `add-reduction`. Iterates until all checks pass. Creates worktree + PR. +- [tutorial](skills/tutorial/SKILL.md) -- Interactive tutorial — walk through the pred CLI to explore, reduce, and solve NP-hard problems. No Rust internals. + +## Codex Compatibility +- Claude slash commands such as `/issue-to-pr 42 --execute` are aliases for the matching repo-local skill files under `.claude/skills/`. +- In Codex, read the relevant `SKILL.md` directly and follow it; do not assume slash-command support exists. +- The Makefile targets `run-plan`, `run-issue`, `run-pipeline`, and `run-review` already translate these workflows into explicit `SKILL.md` prompts for Codex. +- The default Codex model in the Makefile is `gpt-5.4`. Override it with `CODEX_MODEL=` if needed. +- The Step 0/Step 1 packet builders under `scripts/pipeline_skill_context.py` and `scripts/pipeline_checks.py` are expensive GitHub-backed calls. Per top-level skill invocation, generate each packet at most once and reuse the resulting text/JSON for all later steps unless the skill explicitly requires a fresh rerun. + +## Commands +```bash +make help # Show all available targets +make build # Build the project +make test # Run all tests +make fmt # Format code with rustfmt +make fmt-check # Check code formatting +make clippy # Run clippy lints +make doc # Build mdBook documentation (includes reduction graph export) +make mdbook # Build and serve mdBook with live reload +make paper # Build Typst paper from checked-in example fixtures +make coverage # Generate coverage report (>95% required) +make check # Quick pre-commit check (fmt + clippy + test) +make rust-export # Generate Julia parity test data (mapping stages) +make export-schemas # Regenerate problem schemas JSON +make qubo-testdata # Regenerate QUBO ground truth JSON +make clean # Clean build artifacts +make diagrams # Generate SVG diagrams from Typst (light + dark) +make compare # Generate and compare Rust mapping exports +make jl-testdata # Regenerate Julia parity test data (requires julia) +make cli # Build the pred CLI tool (without MCP, fast) +make mcp # Build the pred CLI tool with MCP server support +make cli-demo # Run closed-loop CLI demo (exercises all commands) +make mcp-test # Run MCP server tests (unit + integration) +make run-plan # Execute a plan with Codex or Claude +make run-issue N=42 # Run issue-to-pr --execute for a GitHub issue +make run-pipeline # Pick next Ready issue from project board, implement, move to Review pool +make run-pipeline N=97 # Process a specific issue from the project board +make run-pipeline-forever # Poll Ready column, run-pipeline when new issues appear +make run-review # Pick next PR from Review pool column, run agentic review, move to Final review +make run-review N=570 # Process a specific PR from the Review pool column +make run-review-forever # Poll Review pool for eligible PRs, dispatch run-review +make copilot-review # (Optional) Request Copilot code review on current PR +make release V=x.y.z # Tag and push a new release (CI publishes to crates.io) +# Set RUNNER=claude to use Claude instead of Codex (default: codex) +# Default Codex model: CODEX_MODEL=gpt-5.4 +``` + +## Git Safety +- **NEVER force push** (`git push --force`, `git push -f`, `git push --force-with-lease`). This is an absolute rule with no exceptions. Force push can silently destroy other people's work and stashed changes. + +## Architecture + +### Core Modules +- `src/models/` - Problem implementations organized by input structure: + - `graph/` - Graph-input problems + - `formula/` - Boolean formulas and circuits + - `set/` - Set systems (universe + subsets) + - `algebraic/` - Matrices, linear systems, lattices + - `misc/` - Unique input structures + - Run `pred list` for the full catalog of problems, variants, and reductions; `pred show ` for details on a specific problem +- `src/rules/` - Reduction rules + inventory registration +- `src/solvers/` - BruteForce solver for aggregate values plus witness recovery when supported, ILP solver (feature-gated, witness-only). To check if a problem supports ILP solving via a witness-capable reduction path, run `pred path ILP` +- `src/traits.rs` - `Problem` trait +- `src/rules/traits.rs` - `ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult` traits +- `src/registry/` - Compile-time reduction metadata collection +- `problemreductions-cli/` - `pred` CLI tool (separate crate in workspace) +- `src/unit_tests/` - Unit test files (mirroring `src/` structure, referenced via `#[path]`) +- `tests/main.rs` - Integration tests (modules in `tests/suites/`); example tests use `include!` for direct invocation (no subprocess) +- `tests/data/` - Ground truth JSON for integration tests +- `scripts/` - Python test data generation scripts (managed with `uv`) +- `docs/plans/` - Implementation plans + +### Trait Hierarchy + +``` +Problem (core trait — all problems must implement) +│ +├── const NAME: &'static str // e.g., "MaximumIndependentSet" +├── type Value: Clone // aggregate value: Max/Min/Sum/Or/And/Extremum/... +├── fn dims(&self) -> Vec // config space: [2, 2, 2] for 3 binary variables +├── fn evaluate(&self, config) -> Value +├── fn variant() -> Vec<(&str, &str)> // e.g., [("graph","SimpleGraph"), ("weight","i32")] +├── fn num_variables(&self) -> usize // default: dims().len() +└── fn problem_type() -> ProblemType // catalog bridge: registry lookup by NAME +``` + +**Witness-capable objective problems** (e.g., `MaximumIndependentSet`) typically use `Value = Max`, `Min`, or `Extremum`. + +**Witness-capable feasibility problems** (e.g., `Satisfiability`) typically use `Value = Or`. + +**Aggregate-only problems** use fold values such as `Sum` or `And`; these solve to a value but have no representative witness configuration. + +Common aggregate wrappers live in `src/types.rs`: +```rust +Max, Min, Sum, Or, And, Extremum, ExtremumSense +``` + +### Key Patterns +- `variant_params!` macro implements `Problem::variant()` — e.g., `crate::variant_params![G, W]` for two type params, `crate::variant_params![]` for none (see `src/variant.rs`) +- `declare_variants!` proc macro registers concrete type instantiations with best-known complexity and registry-backed load/serialize/value-solve/witness-solve metadata. One entry per problem may be marked `default`, and variable names in complexity strings are validated at compile time against actual getter methods. +- Problems parameterized by graph type `G` and optionally weight type `W` (problem-dependent) +- `Solver::solve()` computes the aggregate value for any `Problem` whose `Value` implements `Aggregate` +- `BruteForce::find_witness()` / `find_all_witnesses()` recover witnesses only when `P::Value::supports_witnesses()` +- `ReductionResult` provides `target_problem()` and `extract_solution()` for witness/config workflows; `AggregateReductionResult` provides `extract_value()` for aggregate/value workflows +- CLI-facing dynamic formatting uses aggregate wrapper names directly (for example `Max(2)`, `Min(None)`, `Or(true)`, or `Sum(56)`) +- Graph types: SimpleGraph, PlanarGraph, BipartiteGraph, UnitDiskGraph, KingsSubgraph, TriangularSubgraph +- Weight types: `One` (unit weight marker), `i32`, `f64` — all implement `WeightElement` trait +- `WeightElement` trait: `type Sum: NumericSize` + `fn to_sum(&self)` — converts weight to a summable numeric type +- Weight management via inherent methods (`weights()`, `set_weights()`, `is_weighted()`), not traits +- `NumericSize` supertrait bundles common numeric bounds (`Clone + Default + PartialOrd + Num + Zero + Bounded + AddAssign + 'static`) + +### Overhead System +Reduction overhead is expressed using `Expr` AST (in `src/expr.rs`) with the `#[reduction]` macro. The `overhead` attribute is **required** — omitting it is a compile error: +```rust +#[reduction(overhead = { + num_vertices = "num_vertices + num_clauses", + num_edges = "3 * num_clauses", +})] +impl ReduceTo for Source { ... } +``` +- Expression strings are parsed at compile time by a Pratt parser in the proc macro crate +- Variable names are validated against actual getter methods on the source type — typos cause compile errors +- Each problem type provides inherent getter methods (e.g., `num_vertices()`, `num_edges()`) that the overhead expressions reference +- `ReductionOverhead` stores `Vec<(&'static str, Expr)>` — field name to symbolic expression mappings +- `ReductionEntry` has both symbolic (`overhead_fn`) and compiled (`overhead_eval_fn`) evaluation — the compiled version calls getters directly +- `VariantEntry` has both a complexity string and compiled `complexity_eval_fn` — same pattern +- Expressions support: constants, variables, `+`, `-`, `*`, `/`, `^`, `exp()`, `log()`, `sqrt()`, `factorial()` +- Complexity strings must use **concrete numeric values only** (e.g., `"2^(2.372 * num_vertices / 3)"`, not `"2^(omega * num_vertices / 3)"`) +- `Expr::parse()` provides runtime parsing for cross-check tests that compare compiled vs symbolic evaluation + +### Problem Names +Problem types use explicit optimization prefixes (`Maximum...`, `Minimum...`) or no prefix. Run `pred list` for the full catalog. Common aliases (e.g., `MIS` → `MaximumIndependentSet`, `MVC` → `MinimumVertexCover`) are shown in the `Aliases` column. + +### Problem Variants +Reduction graph nodes use variant key-value pairs from `Problem::variant()`: +- Base: `MaximumIndependentSet` (empty variant = defaults) +- Graph variant: `MaximumIndependentSet {graph: "KingsSubgraph", weight: "One"}` +- Weight variant: `MaximumIndependentSet {graph: "SimpleGraph", weight: "f64"}` +- Default variant ranking: `SimpleGraph`, `One`, `KN` are considered default values; variants with the most default values sort first +- Nodes come exclusively from `#[reduction]` registrations; natural edges between same-name variants are inferred from the graph/weight subtype partial order +- Each primitive reduction is determined by the exact `(source_variant, target_variant)` endpoint pair +- Reduction edges carry `EdgeCapabilities { witness, aggregate }`; graph search defaults to witness mode, and aggregate mode is available through `ReductionMode::Aggregate` +- `#[reduction]` accepts only `overhead = { ... }` and currently registers witness/config reductions; aggregate-only edges require manual `ReductionEntry` registration with `reduce_aggregate_fn` + +### Extension Points +- New models register dynamic load/serialize/brute-force dispatch through `declare_variants!` in the model file, not by adding manual match arms in the CLI +- Aggregate-only models are first-class in `declare_variants!`; aggregate-only reduction edges still need manual `ReductionEntry` wiring because `#[reduction]` only registers witness/config reductions today +- Exact registry dispatch lives in `src/registry/`; alias resolution and partial/default variant resolution live in `problemreductions-cli/src/problem_name.rs` +- `pred create` UX lives in `problemreductions-cli/src/commands/create.rs` +- Canonical paper and CLI examples live in `src/example_db/model_builders.rs` and `src/example_db/rule_builders.rs` + +## Conventions + +### File Naming +- Reduction files: `src/rules/_.rs` (e.g., `maximumindependentset_qubo.rs`) +- Model files: `src/models//.rs` — category is by input structure: `graph/` (graph input), `formula/` (boolean formula/circuit), `set/` (universe + subsets), `algebraic/` (matrix/linear system/lattice), `misc/` (other) +- Canonical examples: builder functions in `src/example_db/rule_builders.rs` and `src/example_db/model_builders.rs` +- Example binaries in `examples/`: utility/export tools and pedagogical demos only (not per-reduction files) +- Test naming: `test__to__closed_loop` + +### Paper (docs/paper/reductions.typ) +- `problem-def(name)[def][body]` — defines a problem with auto-generated schema, reductions list, and label ``. Title comes from `display-name` dict. +- `reduction-rule(source, target, example: bool, ...)[rule][proof]` — generates a theorem with label `` and registers in `covered-rules` state. Overhead auto-derived from JSON edge data. +- Every directed reduction needs its own `reduction-rule` entry +- Completeness warnings auto-check that all JSON graph nodes/edges are covered in the paper +- `display-name` dict maps `ProblemName` to display text + +## Testing Requirements + +**Reference implementations — read these first:** +- **Reduction test:** `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` — closed-loop pattern +- **Model test:** `src/unit_tests/models/graph/maximum_independent_set.rs` — evaluation, serialization +- **Solver test:** `src/unit_tests/solvers/brute_force.rs` — aggregate `solve()` plus witness recovery helpers +- **Trait definitions:** `src/traits.rs` (`Problem`), `src/solvers/mod.rs` (`Solver`) + +### Coverage + +New code must have >95% test coverage. Run `make coverage` to check. + +### Naming + +- Reduction tests: `test__to__closed_loop` +- Model tests: descriptive names — e.g., `test__creation`, `test__evaluate_*`, `test__direction`, `test__solver`, `test__serialization`. Use whichever are relevant; there is no fixed per-model naming set. +- Solver tests: `test__` + +### Key Testing Patterns + +See Key Patterns above for solver API signatures. Follow the reference files for exact usage. + +### File Organization + +Unit tests in `src/unit_tests/` linked via `#[path]` (see Core Modules above). Integration tests in `tests/suites/`, consolidated through `tests/main.rs`. Canonical example-db coverage lives in `src/unit_tests/example_db.rs`. + +Model review automation checks for a dedicated test file under `src/unit_tests/models/...` with at least 3 test functions. The exact split of coverage is judged per model during review. + +## Documentation Locations +- `README.md` — Project overview and quickstart +- `.claude/` — Claude Code instructions and skills +- `docs/book/` — mdBook user documentation (built with `make doc`) +- `docs/paper/reductions.typ` — Typst paper with problem definitions and reduction theorems +- `src/example_db/` — Canonical model/rule examples: `model_builders.rs`, `rule_builders.rs` (in-memory builders), `specs.rs` (per-module invariant specs), consumed by `pred create --example` and paper exports +- `examples/` — Export utilities, graph-analysis helpers, and pedagogical demos + +## Documentation Requirements + +**Reference:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` to see a complete problem-def + reduction-rule example. + +### Adding a Problem Definition + +```typst +#problem-def("ProblemName")[ + Mathematical definition... +][ + Background, examples, algorithms... +] +``` + +Also add to the `display-name` dictionary: +```typst +"ProblemName": [Problem Name], +``` + +### Adding a Reduction Theorem + +```typst +#reduction-rule("Source", "Target", + example: true, + example-caption: [caption text], +)[ + Rule statement... +][ + Proof sketch... +] +``` + +Every directed reduction in the graph needs its own `reduction-rule` entry. The paper auto-checks completeness against the generated `reduction_graph.json` export. + +## Complexity Verification Requirements + +### Variant Worst-Case Complexity (`declare_variants!`) +The complexity string represents the **worst-case time complexity of the best known algorithm** for that problem variant. To verify correctness: +1. Identify the best known exact algorithm for the problem (name, author, year, citation) +2. Confirm the worst-case time bound from the original paper or a survey +3. Check that polynomial-time problems (e.g., MaximumMatching, 2-SAT, 2-Coloring) are NOT declared with exponential complexity +4. For NP-hard problems, verify the base of the exponential matches the literature (e.g., 1.1996^n for MIS, not 2^n) +5. Use only concrete numeric values — no symbolic constants (epsilon, omega); inline the actual numbers with citations +6. Variable names must match getter methods on the problem type (enforced at compile time) + +### Reduction Overhead (`#[reduction(overhead = {...})]`) +Overhead expressions describe how target problem size relates to source problem size. To verify correctness: +1. Read the `reduce_to()` implementation and count the actual output sizes +2. Check that each field (e.g., `num_vertices`, `num_edges`, `num_sets`) matches the constructed target problem +3. Watch for common errors: universe elements mismatch (edge indices vs vertex indices), worst-case edge counts in intersection graphs (quadratic, not linear), constant factors in circuit constructions +4. Test with concrete small instances: construct a source problem, run the reduction, and compare target sizes against the formula +5. Ensure there is only one primitive reduction registration for each exact source/target variant pair; wrap shared helpers instead of registering duplicate endpoints From a2b7c0bc2d029162102a862f78ad784588343f39 Mon Sep 17 00:00:00 2001 From: zazabap Date: Mon, 6 Apr 2026 05:45:30 +0000 Subject: [PATCH 22/23] chore: remove docs/superpowers/specs/ directory Co-Authored-By: Claude Opus 4.6 (1M context) --- .../2026-03-24-tier3-ilp-reductions-design.md | 220 ------------------ ...6-04-01-adversarial-verification-design.md | 145 ------------ ...01-verify-reduction-enhancements-design.md | 174 -------------- 3 files changed, 539 deletions(-) delete mode 100644 docs/superpowers/specs/2026-03-24-tier3-ilp-reductions-design.md delete mode 100644 docs/superpowers/specs/2026-04-01-adversarial-verification-design.md delete mode 100644 docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md diff --git a/docs/superpowers/specs/2026-03-24-tier3-ilp-reductions-design.md b/docs/superpowers/specs/2026-03-24-tier3-ilp-reductions-design.md deleted file mode 100644 index cbdfc6bc2..000000000 --- a/docs/superpowers/specs/2026-03-24-tier3-ilp-reductions-design.md +++ /dev/null @@ -1,220 +0,0 @@ -# Tier 3 ILP Reductions — Design Spec - -**Date:** 2026-03-24 -**Scope:** One PR adding 39 `→ ILP` reductions for Tier 3 orphan problems, plus a shared helper module. -**Deferred:** PartialFeedbackEdgeSet (no polynomial-size correct ILP for L < n), RootedTreeArrangement (compound `vec![n; 2*n]` config too complex for batch). -**Tracking issue:** #762 (DefaultSolver classification) - ---- - -## Goal - -Connect 39 of 41 isolated Tier 3 problem types to the reduction graph via direct ILP reductions. Two problems (PartialFeedbackEdgeSet, RootedTreeArrangement) are deferred to separate issues due to formulation complexity. - -## Deliverables - -1. `src/rules/ilp_helpers.rs` — shared linearization helpers (with unit tests) -2. 39 new reduction files `src/rules/_ilp.rs` (feature-gated under `#[cfg(feature = "ilp-solver")]`) -3. 39 entries in `src/rules/mod.rs`: module declarations + `canonical_rule_example_specs()` aggregation -4. 39 closed-loop tests in corresponding unit test files -5. 39 `reduction-rule` entries in `docs/paper/reductions.typ` -6. Updated #762 body (move Tier 3 → Tier 1) - ---- - -## Problem Classification - -### Value types (optimization vs satisfaction) - -**Optimization** (`Min`/`Max` — use `assert_optimization_round_trip_from_optimization_target`): -- BottleneckTravelingSalesman (`Min`), MinimumTardinessSequencing (`Min`), - QuadraticAssignment (`Min`), BMF (`Min`), PaintShop (`Min`), - SteinerTreeInGraphs (`Min`) - -**Satisfaction** (`Or` — use `assert_satisfaction_round_trip` or satisfaction variant): -- All other 33 problems - -### Config-space encodings requiring non-trivial `extract_solution` - -| Encoding | Problems | Extraction strategy | -|----------|----------|-------------------| -| **Lehmer code** `[n, n-1, ..., 1]` | FlowShopScheduling, MinimumTardinessSequencing, SequencingToMinimizeWeightedTardiness | Sort jobs by ILP completion times → derive permutation → convert to Lehmer code | -| **Vertex permutation** `vec![n; n]` | HamiltonianPath, OptimalLinearArrangement, ConsecutiveBlockMinimization, AcyclicPartition | One-hot decode: for each position/vertex, find the 1 in the assignment row | -| **Arc permutation** `vec![m; m]` | StackerCrane | Position-assignment decode: for each position, find the selected arc | -| **Injection** `vec![m; k]` | SubgraphIsomorphism, QuadraticAssignment | One-hot decode per source element | -| **Parent array** `vec![n; n]` | RootedTreeStorageAssignment | Decode parent-selection one-hot matrix → parent index per node | -| **Bijection** `vec![n; n]` | IsomorphicSpanningTree | One-hot decode tree-vertex → graph-vertex | -| **Compound** `vec![n; 2*n]` | *(RootedTreeArrangement — deferred)* | — | -| **Binary** `vec![2; ...]` | All others | Direct identity or first-k prefix extraction | -| **Ternary** `vec![3; num_edges]` | RuralPostman | Integer flow variable → clamp to {0,1,2} per edge | - ---- - -## Phase 0: Helper Module - -**File:** `src/rules/ilp_helpers.rs` - -Seven helper functions returning `Vec` (or single `LinearConstraint`): - -```rust -/// McCormick linearization: y = x_a * x_b (both binary). -/// Returns 3 constraints: y ≤ x_a, y ≤ x_b, y ≥ x_a + x_b - 1. -pub fn mccormick_product(y_idx: usize, x_a: usize, x_b: usize) -> Vec - -/// MTZ topological ordering for directed arcs. -/// For each arc (u→v): o_v - o_u ≥ 1 - M*(x_u + x_v). -/// When both x_u=0, x_v=0 (both kept): enforces o_v > o_u. -/// When either x_u=1 or x_v=1 (removed): constraint is slack. -/// Also emits bound constraints: x_i ≤ 1, 0 ≤ o_i ≤ n-1. -/// Matches the pattern in minimumfeedbackvertexset_ilp.rs. -pub fn mtz_ordering( - arcs: &[(usize, usize)], - n: usize, - x_offset: usize, - o_offset: usize, -) -> Vec - -/// Flow conservation at each node. -/// For each node u: Σ_{(u,v)} f_{uv} - Σ_{(v,u)} f_{vu} = demand[u]. -pub fn flow_conservation( - arcs: &[(usize, usize)], - num_nodes: usize, - flow_idx: &dyn Fn(usize) -> usize, - demand: &[f64], -) -> Vec - -/// Big-M activation: f ≤ M * y. Single constraint. -pub fn big_m_activation(f_idx: usize, y_idx: usize, big_m: f64) -> LinearConstraint - -/// Absolute value linearization: |a - b| ≤ z. -/// Returns 2 constraints: a - b ≤ z, b - a ≤ z. -pub fn abs_diff_le(a_idx: usize, b_idx: usize, z_idx: usize) -> Vec - -/// Minimax: z ≥ expr_i for each expression. -/// Each expr is a list of (var_idx, coeff) terms. -pub fn minimax_constraints(z_idx: usize, expr_terms: &[Vec<(usize, f64)>]) -> Vec - -/// One-hot to index extraction: given n*k binary assignment vars, -/// decode position p → value v where x_{v,p} = 1. -/// Shared by all permutation/assignment-based reductions. -pub fn one_hot_decode(solution: &[usize], num_items: usize, num_slots: usize, var_offset: usize) -> Vec -``` - -The helper module gets its own unit tests verifying constraint correctness. - -No new types introduced. Existing Tier 1/2 reductions are **not** refactored — helpers are used only by new Tier 3 code. - ---- - -## Phase 1: Flow-based (9 reductions) - -| Problem | Value | ILP type | Variables | Key constraints | Helpers | Extract | -|---------|-------|----------|-----------|-----------------|---------|---------| -| IntegralFlowHomologousArcs | `Or` | `i32` | Integer f_a per arc | Capacity, conservation, homologous equality, requirement | `flow_conservation` | Direct (f_a values) | -| IntegralFlowWithMultipliers | `Or` | `i32` | Integer f_a per arc | Capacity, modified conservation (multiplier factors), requirement | `flow_conservation` | Direct | -| PathConstrainedNetworkFlow | `Or` | `i32` | Integer f_p per allowed path | Capacity aggregation per arc, flow requirement | — | Direct | -| DisjointConnectingPaths | `Or` | `bool` | Binary f^k_{uv} per commodity per arc | Conservation per commodity, vertex-disjointness (Σ_k ≤ 1 at non-terminals) | `flow_conservation` | Reconstruct edge selection from flow variables | -| LengthBoundedDisjointPaths | `Or` | `i32` | Binary f^k_{uv} + integer hop h^k_v per commodity | Conservation, disjointness, hop count h^k_v ≤ L per commodity | `flow_conservation` | Reconstruct edge selection from flow variables | -| MixedChinesePostman | `Or` | `i32` | Integer traversal t_a + binary orientation d_e | Euler balance (in = out), required edge/arc coverage ≥ 1 | `flow_conservation` | Direct (traversal counts) | -| RuralPostman | `Or` | `i32` | Integer t_e ∈ {0,1,2} per edge (traversal multiplicity) | Required edge coverage (t_e ≥ 1), Euler balance (even degree at each vertex), connectivity via flow, total cost ≤ bound | `flow_conservation`, `big_m_activation` | Direct (t_e values map to `dims() = vec![3; num_edges]`) | -| StackerCrane | `Or` | `i32` | Binary x_{a,k} (arc a at position k) + shortest-path cost auxiliaries | Position-assignment (each position gets one required arc, each arc used once), inter-arc connection cost via precomputed shortest paths, total ≤ bound | `big_m_activation` | One-hot decode → arc permutation (`dims() = vec![m; m]`) | -| SteinerTreeInGraphs | `Min` | `bool` | Binary y_e + multi-commodity flow f^t_{uv} | Conservation, capacity linking (same pattern as SteinerTree→ILP); minimize Σ w_e·y_e | `flow_conservation`, `big_m_activation` | Direct (edge selection) | - ---- - -## Phase 2: Scheduling (7 reductions) - -All scheduling problems with Lehmer-code configs share a common extraction pattern: ILP ordering variables → sort to get permutation → convert permutation to Lehmer code. - -| Problem | Value | ILP type | Variables | Key constraints | Helpers | Extract | -|---------|-------|----------|-----------|-----------------|---------|---------| -| FlowShopScheduling | `Or` | `i32` | Binary y_{ij} (job i before j) + integer C_{jm} (completion on machine m) | Machine precedence: C_{j,m+1} ≥ C_{j,m} + p_{j,m+1}; ordering via big-M; makespan ≤ deadline | `big_m_activation` | Completion times → sort → Lehmer code | -| MinimumTardinessSequencing | `Min` | `i32` | Binary y_{ij} + integer C_j | Ordering via big-M, precedence constraints; objective: minimize Σ tardy_j (binary indicators for C_j > d_j) | `big_m_activation` | Completion times → sort → Lehmer code | -| ResourceConstrainedScheduling | `Or` | `bool` | Binary x_{jt} (job j starts at time t) | One start per job, precedence, resource capacity per period, deadline | — | Time-indexed decode → Lehmer code | -| SequencingToMinimizeMaximumCumulativeCost | `Or` | `i32` | Binary y_{ij} + integer C_j | Ordering via big-M; cumulative cost ≤ bound (feasibility, not minimax) | `big_m_activation` | Completion times → sort → Lehmer code | -| SequencingToMinimizeWeightedTardiness | `Or` | `i32` | Binary y_{ij} + integer C_j | Ordering via big-M; Σ w_j * max(0, C_j - d_j) ≤ bound (feasibility) | `big_m_activation` | Completion times → sort → Lehmer code | -| SequencingWithReleaseTimesAndDeadlines | `Or` | `bool` | Binary x_{jt} (job j at time t) | Release: no start before r_j, deadline: finish by d_j, non-overlap | — | Time-indexed decode → Lehmer code | -| TimetableDesign | `Or` | `bool` | Binary x_{c,t,h} (craftsman c, task t, period h) | Craftsman exclusivity, task exclusivity, requirement satisfaction | — | Direct (binary) | - ---- - -## Phase 3: Position/Assignment + McCormick (6 reductions) - -| Problem | Value | ILP type | Variables | Key constraints | Helpers | Extract | -|---------|-------|----------|-----------|-----------------|---------|---------| -| HamiltonianPath | `Or` | `bool` | Binary x_{v,k} (vertex v at position k) | Row/column assignment, adjacency: McCormick for consecutive pairs | `mccormick_product` | One-hot decode → vertex permutation (`dims() = vec![n; n]`) | -| BottleneckTravelingSalesman | `Min` | `i32` | Binary x_{v,k} + integer z (bottleneck) | TSP assignment + z ≥ w(u,v) for each used edge (McCormick); minimize z | `mccormick_product`, `minimax_constraints` | Edge selection from assignment matrix (`dims() = vec![2; num_edges]`) | -| LongestCircuit | `Or` | `bool` | Binary y_e (edge selection) + binary s_v (vertex on circuit) + flow vars | Degree: Σ_{e∋v} y_e = 2·s_v; size: Σ y_e ≥ 3; connectivity via root-flow on selected edges; length: Σ w_e·y_e ≥ B | `flow_conservation` | Direct (y_e binary edge vector, `dims() = vec![2; num_edges]`) | -| QuadraticAssignment | `Min` | `bool` | Binary x_{i,p} (facility i at location p) | Assignment + McCormick for x_{i,p}·x_{j,q}; minimize Σ C_{ij}·D_{f(i),f(j)} | `mccormick_product` | One-hot decode → facility-to-location injection (`dims() = vec![num_locations; num_facilities]`) | -| OptimalLinearArrangement | `Or` | `i32` | Binary x_{v,p} + integer z_{uv} per edge | Assignment + z_{uv} ≥ |π(u)-π(v)| via abs_diff; Σ z_{uv} ≤ bound | `abs_diff_le` | One-hot decode → vertex-to-position (`dims() = vec![n; n]`) | -| SubgraphIsomorphism | `Or` | `bool` | Binary x_{v,u} (pattern v → host u) | Injection (each pattern vertex maps to exactly 1 host vertex, each host vertex used ≤ 1 time) + edge preservation: for each pattern edge (v,w) and host non-edge (u,u'), x_{v,u} + x_{w,u'} ≤ 1 (no McCormick needed) | — | One-hot decode → injection (`dims() = vec![n_host; n_pattern]`) | - ---- - -## Phase 4: Graph structure (7 reductions) - -| Problem | Value | ILP type | Variables | Key constraints | Helpers | Extract | -|---------|-------|----------|-----------|-----------------|---------|---------| -| AcyclicPartition | `Or` | `i32` | Binary x_{v,c} (vertex v in class c) + integer o_c (class ordering) + binary s_{uv,c} (same-class indicators per arc per class) | Assignment (Σ_c x_{v,c} = 1); weight bound per class; cost bound on inter-class arcs; same-class: s_{uv,c} via McCormick on x_{u,c}·x_{v,c}; DAG: for each arc (u→v), o_v_class - o_u_class ≥ 1 - M·Σ_c s_{uv,c} | `mccormick_product` | One-hot decode x_{v,c} → partition label (`dims() = vec![n; n]`) | -| BalancedCompleteBipartiteSubgraph | `Or` | `bool` | Binary x_v (side A), y_v (side B) | Balance: Σx = Σy = k; completeness: McCormick for x_u·y_v on non-edges | `mccormick_product` | Direct (binary) | -| BicliqueCover | `Or` | `bool` | Binary z_{v,j} (vertex v in biclique j) | Biclique validity via McCormick, edge coverage | `mccormick_product` | Direct (binary) | -| BiconnectivityAugmentation | `Or` | `i32` | Binary y_e (add edge e) + flow vars for 2-vertex-connectivity | For each vertex v: removing v must leave graph connected. Formulated via flow: for each vertex v and each pair (s,t) of v's neighbors, unit flow from s to t avoiding v, through original + selected edges | `flow_conservation`, `big_m_activation` | Direct (binary edge selection, `dims() = vec![2; num_potential_edges]`) | -| BoundedComponentSpanningForest | `Or` | `i32` | Binary y_e (edge in forest) + integer label l_v (component root ID) + flow vars | Forest structure (no cycles via MTZ on directed version); component assignment via labels; per-component total vertex **weight** ≤ B (not size) | `flow_conservation`, `mtz_ordering` | Edge selection → component label decode (`dims() = vec![2; num_edges]` or label-based) | -| MinimumCutIntoBoundedSets | `Or` | `bool` | Binary x_v (partition side) + binary y_e (cut edge) | Balance: L ≤ Σx_v ≤ U; cut linking: y_e ≥ x_u - x_v and y_e ≥ x_v - x_u; Σ w_e·y_e ≤ bound | — | Direct (binary partition) | -| StrongConnectivityAugmentation | `Or` | `i32` | Binary y_a (add arc) + multi-commodity flow | For each ordered pair (s,t): unit flow from s to t through original + selected arcs | `flow_conservation`, `big_m_activation` | Direct (binary arc selection) | - ---- - -## Phase 5: Matrix/encoding (5 reductions) - -| Problem | Value | ILP type | Variables | Key constraints | Helpers | Extract | -|---------|-------|----------|-----------|-----------------|---------|---------| -| BMF | `Min` | `bool` | Binary a_{ik}, b_{kj} + auxiliary p_{ijk} (McCormick for a_{ik}·b_{kj}) + binary w_{ij} (reconstructed entry) | p_{ijk} via McCormick; w_{ij} ≥ p_{ijk} for all k (OR-of-ANDs); w_{ij} ≤ Σ_k p_{ijk}; minimize Σ |A_{ij} - w_{ij}| | `mccormick_product` | Direct (binary factor matrices) | -| ConsecutiveBlockMinimization | `Or` | `bool` | Binary x_{c,p} (column c at position p) + binary b_{r,p} (block start at row r, position p) | Column permutation (one-hot assignment); block detection: b_{r,p} activated when row r transitions 0→1 at position p; Σ blocks ≤ bound | — | One-hot decode → column permutation (`dims() = vec![num_cols; num_cols]`) | -| ConsecutiveOnesMatrixAugmentation | `Or` | `bool` | Binary x_{c,p} (column permutation) + binary f_{r,j} (flip entry r,j) | Permutation + consecutive-ones property after flips; minimize/bound total flips | — | One-hot decode → column permutation (`dims() = vec![num_cols; num_cols]`) | -| ConsecutiveOnesSubmatrix | `Or` | `bool` | Binary s_j (select column j) + auxiliary binary x_{c,p} (column permutation of selected columns) | Exactly K columns selected (Σ s_j = K); permutation of selected columns; C1P enforced on every row within selected+permuted columns. s_j at indices 0..num_cols (extracted directly). x_{c,p} are auxiliary. | — | Direct (s_j binary selection, `dims() = vec![2; num_cols]`) | -| SparseMatrixCompression | `Or` | `bool` | Binary x_{i,g} (row i in group g) | Row-to-group assignment (one group per row); compatibility: conflicting rows not in same group; num groups ≤ K | — | One-hot decode → group assignment | - ---- - -## Phase 6: Sequence/misc (5 reductions) - -| Problem | Value | ILP type | Variables | Key constraints | Helpers | Extract | -|---------|-------|----------|-----------|-----------------|---------|---------| -| ShortestCommonSupersequence | `Or` | `bool` | Binary x_{p,a} (position p has symbol a) + match vars m_{s,j,p} | Symbol assignment + monotone matching for each input string; total length ≤ bound | — | Symbol sequence extraction | -| StringToStringCorrection | `Or` | `bool` | Binary d_{i,j,op} (edit operation at alignment point) | Alignment grid + operation exclusivity + cost ≤ bound | — | Direct (binary operation selection) | -| PaintShop | `Min` | `bool` | Binary x_i (color for car i's first occurrence) + binary c_p (color-change indicator at position p) | Pairing: second occurrence gets 1-x_i; c_p ≥ color_p - color_{p-1} and c_p ≥ color_{p-1} - color_p; minimize Σ c_p | — | Direct (x_i binary, `dims() = vec![2; num_cars]`) | -| IsomorphicSpanningTree | `Or` | `bool` | Binary x_{u,v} (tree vertex u maps to graph vertex v) | Bijection: one-hot per tree vertex and per graph vertex; edge preservation: for each tree edge {u,w} and graph non-edge {v,z}, x_{u,v} + x_{w,z} ≤ 1 (no McCormick or flow needed — bijection preserving tree edges automatically produces a spanning tree) | — | One-hot decode → bijection (`dims() = vec![n; n]`) | -| RootedTreeStorageAssignment | `Or` | `i32` | Binary p_{v,u} (node v's parent is u) + integer depth d_v | Tree structure: each non-root node has exactly one parent, acyclicity via depth ordering (d_v > d_u if u is parent of v), connectivity; per-subset path cost ≤ bound | — | One-hot parent decode → parent array (`dims() = vec![n; n]`) | - ---- - -## Testing Strategy - -- Each reduction gets one `test__to_ilp_closed_loop` test -- **Optimization problems** (BottleneckTSP, MinTardiness, QAP, BMF, PaintShop): use `assert_optimization_round_trip_from_optimization_target` -- **Satisfaction problems** (all others): use the satisfaction round-trip variant -- Test instances should be small enough for brute-force cross-check (n ≤ 6-8) -- All tests in `src/unit_tests/rules/_ilp.rs` -- Helper module gets standalone unit tests in `src/unit_tests/rules/ilp_helpers.rs` - -## Integration Checklist (per reduction) - -Each new reduction file requires: -1. `#[cfg(feature = "ilp-solver")] pub(crate) mod _ilp;` in `src/rules/mod.rs` -2. `specs.extend(_ilp::canonical_rule_example_specs());` in the `#[cfg(feature = "ilp-solver")]` block of `canonical_rule_example_specs()` in `src/rules/mod.rs` -3. `#[reduction(overhead = { ... })]` with verified overhead expressions referencing source-type getter methods -4. Closed-loop test + paper entry - -## Paper - -Each reduction gets a `reduction-rule` entry in `docs/paper/reductions.typ` with: -- Rule statement describing the formulation -- Proof sketch (variable layout, constraint count, correctness argument) -- Example flag set to `true` where pedagogically useful - -## Post-merge - -- Update #762 body: move 39 problems from Tier 3 to Tier 1 -- Close #728 (TimetableDesign→ILP) and #733 (IntegralFlowHomologousArcs→ILP) -- File separate issues for deferred problems: PartialFeedbackEdgeSet→ILP, RootedTreeArrangement→ILP diff --git a/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md b/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md deleted file mode 100644 index c42684f08..000000000 --- a/docs/superpowers/specs/2026-04-01-adversarial-verification-design.md +++ /dev/null @@ -1,145 +0,0 @@ -# Design: Adversarial Multi-Agent Reduction Verification - -**Date:** 2026-04-01 -**Goal:** Strengthen `/verify-reduction` with an adversarial second agent that independently verifies the reduction, catching systematic errors that the single-agent approach misses. - -## Problem Statement - -The current verification flow has a fundamental weakness: the same AI agent writes both the mathematical proof AND the verification script. If the agent systematically misunderstands the reduction (e.g., wrong variable mapping, wrong extraction direction), it will write a wrong proof and a wrong script that agree with each other. - -Evidence from PR #975: -- Python caught 4 bugs; Typst proofs caught 0; Lean caught 0 -- All bugs were construction errors or formula mistakes -- No bug was caused by "proof is right but script is wrong" — they were always "both are wrong in the same way" - -The adversarial approach addresses this by having two independent agents verify the same reduction. - -## Architecture - -``` -/verify-reduction - │ - ├── Steps 0-2: Parse input, create worktree, read issue, write Typst proof - │ - ├── Step 3: Constructor Agent - │ └── Writes verify__.py (7 sections, ≥5000 checks) - │ - ├── Step 4: Run constructor script, iterate until 0 failures - │ - ├── Step 4b: Adversary Agent (dispatched as subagent with worktree isolation) - │ ├── Input: theorem statement + construction steps ONLY - │ │ (stripped of constructor's script, internal reasoning, examples) - │ ├── Writes: adversary__.py - │ │ ├── Independent reduce() implementation - │ │ ├── Independent extract_solution() - │ │ ├── Property-based testing (hypothesis, n up to 50) - │ │ ├── Targeted counterexample search - │ │ └── ≥5000 independent checks - │ └── Output: pass/fail + counterexamples - │ - ├── Step 4c: Cross-Comparison - │ ├── constructor.reduce(x) vs adversary.reduce(x) for 1000 instances - │ ├── constructor.check(adversary.reduce(x)) vs adversary.check(constructor.reduce(x)) - │ └── Disagreement → diagnose (real bug vs script bug) - │ - ├── Steps 5-8: Self-review, report, commit, PR - │ - └── Final PR includes BOTH scripts + cross-comparison results -``` - -## Adversary Agent Design - -### What the adversary receives - -A stripped prompt containing ONLY: -- The theorem statement (1-3 sentences) -- The construction steps (numbered) -- The extraction procedure -- The overhead formula - -The adversary does NOT receive: -- The constructor's Python script -- The Typst proof's correctness argument -- The worked examples -- Any internal reasoning from Step 3 - -### What the adversary produces - -`adversary__.py` with: - -1. **Independent `reduce()` function** — implemented from the construction steps, without seeing the constructor's implementation -2. **Independent `extract_solution()` function** — same -3. **Exhaustive testing** — forward + backward for n ≤ 5 (matching constructor's minimum) -4. **Property-based testing** — `hypothesis` with `@given(st.integers(min_value=2, max_value=50))` for random instances -5. **Targeted counterexample search**: - - All-True / all-False assignments - - Single-variable instances - - Maximum and minimum clause/subset sizes - - Instances where source is barely feasible (one satisfying assignment) - - Instances where source is barely infeasible (off by one element) -6. **Diagnostic output** — any counterexample prints the full instance, expected result, actual result - -### Cross-comparison - -After both scripts pass independently, the orchestrator runs: - -```python -# Import both implementations -from verify_source_target import reduce as constructor_reduce, check as constructor_check -from adversary_source_target import reduce as adversary_reduce, check as adversary_check - -# 1. Do they produce the same target instance? -for instance in random_instances(1000): - t1 = constructor_reduce(instance) - t2 = adversary_reduce(instance) - assert t1 == t2, f"Reductions disagree on {instance}" - -# 2. Cross-check feasibility -for instance in random_instances(1000): - assert constructor_check(adversary_reduce(instance)) == adversary_check(constructor_reduce(instance)) -``` - -## Verdict Criteria - -| Constructor | Adversary | Cross-check | Verdict | -|-------------|-----------|-------------|---------| -| 0 fail | 0 fail | Agreement | **VERIFIED** | -| 0 fail | Counterexample found | — | Real bug (constructor has blind spot) | -| Fail | — | — | Bug found before adversary needed | -| 0 fail | 0 fail | Disagreement | One implementation is wrong — diagnose | - -## Why Lean is Dropped - -Lean's contribution to this project: -- **Caught bugs:** 0 out of 4 -- **Useful proofs:** `G ⊔ Gᶜ = ⊤` (1 genuinely meaningful theorem) -- **Trivial proofs:** `n + m = n + m`, `14m + (2m-n) + 2nK = 16m-n+2nK` (add no confidence) -- **Infrastructure gap:** No `ReduceTo`, `NAESatisfiability`, `SetSplitting`, or reduction framework in Mathlib -- **ROI:** Negative. Every Lean lemma requires duplicating problem definitions that already exist in Rust. - -The adversarial Python approach provides stronger evidence: -- Two independent implementations agreeing > one implementation + arithmetic Lean proofs -- Property-based testing catches scale-dependent bugs that Lean can't -- No Mathlib infrastructure needed - -Lean remains **optional** (nice for credibility) but is removed from the required quality gates. - -## Changes to verify-reduction Skill - -| Step | Before | After | -|------|--------|-------| -| Step 3 | Constructor writes script | Unchanged | -| Step 4 | Iterate until 0 failures | Unchanged | -| **Step 4b** | — | Adversary agent writes independent script | -| **Step 4c** | — | Cross-comparison | -| Step 5 | Lean required | Lean optional | -| Quality gates | 1 script ≥5000 checks | 2 independent scripts ≥5000 each + cross-comparison | -| Deliverables | 3 files (Typst + Python + Lean) | 3 files (Typst + constructor Python + adversary Python) | - -## Cost - -- Compute: ~2x (two agents instead of one) -- Time: ~1.5x (adversary runs in parallel via subagent) -- Complexity: moderate (orchestrator manages two outputs) - -Worth it: a wrong reduction that passes review is far more expensive than 2x compute. diff --git a/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md b/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md deleted file mode 100644 index 1ea071e75..000000000 --- a/docs/superpowers/specs/2026-04-01-verify-reduction-enhancements-design.md +++ /dev/null @@ -1,174 +0,0 @@ -# Verify-Reduction Skill Enhancements - -**Date:** 2026-04-01 -**Status:** Draft -**Context:** Lessons from running `/verify-reduction` on 3 reductions (#841 NAE→SS, #973 SS→Partition, #198 VC→HC) and comparison with PR #975's verification approach. - -## Problem - -The verify-reduction skill (PR #979) produces high-quality mathematical verification (Typst proofs + dual Python scripts), but: - -1. **Typst ↔ Python matching is manual.** The human writes examples in both places and hopes they agree. No automated cross-check. -2. **Gap analysis is manual.** The skill asks the agent to list every Typst claim and map it to a test — this is done as a text block, not verified programmatically. -3. **Verification is disconnected from Rust implementation.** The Python scripts verify the math, but when `/add-rule` implements the Rust code, it re-reads the issue from scratch. Verified artifacts aren't consumed downstream. -4. **No compositional testing.** The strongest verification — testing via alternative `pred` reduction paths — only becomes possible after the Rust implementation exists. The current skill doesn't set this up. - -## Design - -### Pipeline Integration - -The verify-reduction skill is a **pre-verification gate** in a larger pipeline: - -``` -Issue → /verify-reduction → /add-rule → /review-pipeline - (Typst + Python) (Rust impl) (agentic test) -``` - -The Python `reduce()` function is the verified spec. `/add-rule` translates it to Rust. `/review-pipeline`'s agentic test (`pred reduce`, `pred solve`) confirms the Rust matches. - -### Enhancement 1: Typst ↔ Python Auto-Matching - -#### Example-level (mandatory) - -The constructor script exports test vectors to a JSON sidecar file: - -``` -docs/paper/verify-reductions/test_vectors__.json -``` - -Contents: -```json -{ - "source": "SubsetSum", - "target": "Partition", - "yes_instance": { - "input": {"sizes": [3, 5, 7, 1, 4], "target": 8}, - "output": {"sizes": [3, 5, 7, 1, 4, 4]}, - "source_feasible": true, - "target_feasible": true, - "source_solution": [0, 1, 1, 0, 0], - "extracted_solution": [0, 1, 1, 0, 0] - }, - "no_instance": { - "input": {"sizes": [3, 7, 11], "target": 5}, - "output": {"sizes": [3, 7, 11, 11]}, - "source_feasible": false, - "target_feasible": false - }, - "overhead": { - "num_elements": "num_elements + 1" - } -} -``` - -A new validation step (Step 4.5) loads both the test vectors JSON and the Typst file, then checks that key numerical values from the JSON (input sizes, targets, output sizes, feasibility verdicts) appear in the Typst YES/NO example sections. The check is substring-based on the rendered numbers — not a full Typst parser. This catches divergence between the proof and the scripts (e.g., Typst says `S = {3, 5, 7}` but Python tests `S = {3, 5, 8}`). - -The same JSON is consumed downstream: -- `/add-rule` reads it to generate Rust `#[test]` closed-loop cases -- `/review-pipeline`'s agentic test uses it for `pred reduce`/`pred solve` verification - -#### Claim-level (best-effort) - -The constructor script emits structured claim tags: - -```python -claim("overhead_universe_size", "2*n", verified=True) -claim("forward_case2", "T + d = Sigma - T", verified=True) -claim("extraction_opposite_side", "sigma < 2T => opposite side from padding", verified=True) -``` - -Implementation: a `claims` list accumulated during `main()`, exported alongside the test vectors. The self-review step checks: every claim keyword should plausibly map to a Typst proof section. Claims without `verified=True` get flagged. This replaces the manual gap analysis table. - -### Enhancement 2: Adversary Tailoring by Reduction Type - -The adversary prompt is currently generic. Tailor it by reduction type: - -| Type | Adversary Focus | Rationale | -|------|----------------|-----------| -| Identity (same graph, different objective) | Exhaustive enumeration, edge-case configs | Bugs hide in subtle objective differences | -| Algebraic (padding, complement, case split) | Symbolic case boundaries, off-by-one in padding | SS→Partition had 3 cases; boundary conditions are error-prone | -| Gadget (widget construction) | Widget structure, traversal patterns, connectivity | VC→HC: the 3 traversal patterns and cross-edge positions are the core invariant | - -The skill detects type from the issue description (gadget keywords: "widget", "component", "gadget"; algebraic keywords: "padding", "complement", "case"; identity: everything else) and adjusts the adversary prompt template. - -### Enhancement 3: Downstream Integration with add-rule - -When `/add-rule` is invoked for a reduction that has verified artifacts: - -1. **Check for existing verification:** Look for `docs/paper/verify-reductions/verify__.py` and `test_vectors__.json`. -2. **Read Python `reduce()` as pseudocode:** The verified Python function is the spec for the Rust `reduce_to()` implementation. The agent translates it directly. -3. **Read overhead from test vectors JSON:** The `overhead` field maps directly to `#[reduction(overhead = {...})]` expressions. -4. **Generate Rust tests from test vectors:** The YES/NO instances become closed-loop test cases: construct source → reduce → solve target → extract → verify. -5. **Read Typst proof for documentation:** The proof's Construction section becomes the doc comment on the Rust impl. - -This eliminates re-derivation from the issue. The verified artifacts are the single source of truth. - -### Enhancement 4: Compositional Verification in review-pipeline - -After the Rust implementation exists, review-pipeline's agentic test gains a compositional check: - -1. Load test vectors from `test_vectors__.json` -2. For each test vector, run `pred solve ` via the new path -3. If an alternative path exists (e.g., both `NAE→SS→ILP` and `NAE→ILP`), run via both and compare -4. Disagreement = bug in either the new reduction or the test vector - -This is the strongest verification because it tests the actual Rust code through independent code paths. It happens naturally in review-pipeline's existing agentic test step — we just need the test vectors file to drive it. - -## Changes to SKILL.md - -### New Step 4.5: Auto-Matching Validation - -After the constructor passes (Step 4), before the adversary (Step 5): - -1. Constructor exports `test_vectors__.json` with YES/NO instances, overhead, and claims -2. Parse the Typst proof for numerical values in example sections -3. Verify the Typst examples match the JSON test vectors -4. Verify all `claim()` tags have `verified=True` -5. Flag any untested claims - -### Updated Step 5: Typed Adversary Prompt - -The adversary prompt template includes a reduction-type section: - -- **Identity reductions:** "Focus on exhaustive enumeration of all source instances for n ≤ 6. Test every possible configuration." -- **Algebraic reductions:** "Focus on case boundary conditions. Test instances where the case selection changes (e.g., Σ = 2T exactly, Σ = 2T ± 1). Verify extraction logic for each case independently." -- **Gadget reductions:** "Focus on widget structure invariants. Verify each traversal pattern independently. Test that interior vertices have no external edges. Check connectivity after removing widgets." - -### Updated Step 8: Downstream Artifacts - -The commit includes the test vectors JSON alongside the Typst and Python files: - -```bash -git add docs/paper/verify-reductions/test_vectors__.json -``` - -The PR description notes: "Test vectors available for `/add-rule` to consume when implementing the Rust reduction." - -### Updated Integration Section - -``` -- **Before `/add-rule`**: `/verify-reduction` produces Typst proof + Python scripts + test vectors JSON -- **During `/add-rule`**: reads Python `reduce()` as pseudocode, overhead from JSON, generates Rust tests from test vectors -- **During `/review-pipeline`**: agentic test runs `pred reduce`/`pred solve` on test vector instances, compositional check via alternative paths if available -``` - -## Non-Goals - -- **No Rust code generation.** verify-reduction stays in Typst + Python. Rust translation is add-rule's job. -- **No manifest format.** The Python script + test vectors JSON are the artifacts. No intermediate spec language. -- **No Lean changes.** Lean remains optional and orthogonal to this enhancement. - -## Files Changed - -1. `.claude/skills/verify-reduction/SKILL.md` — new Step 4.5, updated Step 5 and Step 8 -2. `.claude/skills/add-rule/SKILL.md` — add "check for existing verification artifacts" to Step 1 -3. `.claude/skills/review-pipeline/SKILL.md` — add compositional test vector check to agentic test step (or document as optional enhancement) - -## Success Criteria - -After implementing these enhancements, running `/verify-reduction` → `/add-rule` → `/review-pipeline` on a new reduction should: - -1. Produce a test vectors JSON consumed by both add-rule and review-pipeline -2. Auto-detect Typst ↔ Python example divergence (if introduced deliberately as a test) -3. Generate Rust tests from the same test vectors that the Python scripts verified -4. Catch Rust implementation bugs via `pred reduce`/`pred solve` on those test vectors From 624f6d7cfba6046d39017486369ca1d89843e04e Mon Sep 17 00:00:00 2001 From: zazabap Date: Mon, 6 Apr 2026 15:41:49 +0000 Subject: [PATCH 23/23] refactor: integrate verify-reduction into add-rule pipeline - Delete /add-reduction skill and pre-commit hook (absorbed into /add-rule) - /add-rule now runs /verify-reduction by default; --no-verify to skip - /verify-reduction simplified: no worktree, no PR, no saved artifacts - /issue-to-pr passes --no-verify flag through to /add-rule - Update CLAUDE.md skill descriptions Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/CLAUDE.md | 5 +- .claude/hooks/add-reduction-precommit.sh | 49 ---- .claude/skills/add-reduction/SKILL.md | 334 ----------------------- .claude/skills/add-rule/SKILL.md | 75 ++++- .claude/skills/issue-to-pr/SKILL.md | 6 +- .claude/skills/verify-reduction/SKILL.md | 156 +++++------ 6 files changed, 131 insertions(+), 494 deletions(-) delete mode 100755 .claude/hooks/add-reduction-precommit.sh delete mode 100644 .claude/skills/add-reduction/SKILL.md diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index a3b771d5a..7ee38534c 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -9,8 +9,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [run-pipeline](skills/run-pipeline/SKILL.md) -- Pick a Ready issue from the GitHub Project board, move it through In Progress -> issue-to-pr -> Review pool. One issue at a time; forever-loop handles iteration. - [issue-to-pr](skills/issue-to-pr/SKILL.md) -- Convert a GitHub issue into a PR with an implementation plan. Default rule: one item per PR. Exception: a `[Model]` issue that explicitly claims direct ILP solvability should implement the model and its direct ` -> ILP` rule together; `[Rule]` issues still require both models to exist on `main`. - [add-model](skills/add-model/SKILL.md) -- Add a new problem model. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. -- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. -- [add-reduction](skills/add-reduction/SKILL.md) -- Add a new reduction rule from verified artifacts (Python reduce() + test vectors JSON from `/verify-reduction`). Use instead of `add-rule` when verification artifacts exist. +- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Runs mathematical verification by default (via `/verify-reduction`); pass `--no-verify` to skip for trivial reductions. Can be used standalone or called from `issue-to-pr`. - [review-structural](skills/review-structural/SKILL.md) -- Project-specific structural completeness check: model/rule checklists, build, semantic correctness, issue compliance. Read-only, no code changes. Called by `review-pipeline`. - [review-quality](skills/review-quality/SKILL.md) -- Generic code quality review: DRY, KISS, cohesion/coupling, test quality, HCI. Read-only, no code changes. Called by `review-pipeline`. - [fix-pr](skills/fix-pr/SKILL.md) -- Resolve PR review comments, fix CI failures, and address codecov coverage gaps. Uses `gh api` for codecov (not local `cargo-llvm-cov`). @@ -27,7 +26,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [propose](skills/propose/SKILL.md) -- Interactive brainstorming to help domain experts propose a new model or rule. Asks one question at a time, uses mathematical language (no programming jargon), and files a GitHub issue. - [final-review](skills/final-review/SKILL.md) -- Interactive maintainer review for PRs in "Final review" column. Merges main, walks through agentic review bullets with human, then merge or hold. - [dev-setup](skills/dev-setup/SKILL.md) -- Interactive wizard to install and configure all development tools for new maintainers. -- [verify-reduction](skills/verify-reduction/SKILL.md) -- End-to-end verification of a reduction rule: generates Typst proof (with YES+NO examples), Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5), adversary Python script (≥5000 independent checks), and test vectors JSON for downstream consumption by `add-reduction`. Iterates until all checks pass. Creates worktree + PR. +- [verify-reduction](skills/verify-reduction/SKILL.md) -- Standalone mathematical verification of a reduction rule: Typst proof, constructor Python (≥5000 checks), adversary Python (≥5000 independent checks). Reports verdict, no artifacts saved. Also called as a subroutine by `/add-rule` (default behavior). - [tutorial](skills/tutorial/SKILL.md) -- Interactive tutorial — walk through the pred CLI to explore, reduce, and solve NP-hard problems. No Rust internals. ## Codex Compatibility diff --git a/.claude/hooks/add-reduction-precommit.sh b/.claude/hooks/add-reduction-precommit.sh deleted file mode 100755 index f59aaefa6..000000000 --- a/.claude/hooks/add-reduction-precommit.sh +++ /dev/null @@ -1,49 +0,0 @@ -#!/bin/bash -# Pre-commit hook for add-reduction: blocks commits missing required files. -# -# This hook fires on any commit that stages a new rule file under src/rules/. -# It checks that the companion files (example_db test, paper entry, mod.rs) -# are also staged. If any are missing, the commit is blocked. -# -# Install: symlink or copy to .git/hooks/pre-commit -# ln -sf ../../.claude/hooks/add-reduction-precommit.sh .git/hooks/pre-commit - -STAGED=$(git diff --cached --name-only) - -# Only run for commits that add/modify rule files (not mod.rs or traits.rs) -RULE_FILES=$(echo "$STAGED" | grep "^src/rules/" | grep -v mod.rs | grep -v traits.rs | grep '\.rs$') -if [ -z "$RULE_FILES" ]; then - exit 0 -fi - -ERRORS=0 - -# Check example_db.rs is staged -if ! echo "$STAGED" | grep -q "example_db.rs"; then - echo "BLOCKED: src/unit_tests/example_db.rs not staged (Check 10 from #974)" - ERRORS=$((ERRORS + 1)) -fi - -# Check reductions.typ is staged -if ! echo "$STAGED" | grep -q "reductions.typ"; then - echo "BLOCKED: docs/paper/reductions.typ not staged (Check 11 from #974)" - ERRORS=$((ERRORS + 1)) -fi - -# Check mod.rs is staged -if ! echo "$STAGED" | grep -q "mod.rs"; then - echo "BLOCKED: src/rules/mod.rs not staged" - ERRORS=$((ERRORS + 1)) -fi - -# Check no verification artifacts are staged -if echo "$STAGED" | grep -q "docs/paper/verify-reductions/"; then - echo "BLOCKED: verification artifacts still staged — run git rm first" - ERRORS=$((ERRORS + 1)) -fi - -if [ $ERRORS -gt 0 ]; then - echo "" - echo "Fix the above and re-commit. See /add-reduction skill for details." - exit 1 -fi diff --git a/.claude/skills/add-reduction/SKILL.md b/.claude/skills/add-reduction/SKILL.md deleted file mode 100644 index 42498cdfe..000000000 --- a/.claude/skills/add-reduction/SKILL.md +++ /dev/null @@ -1,334 +0,0 @@ ---- -name: add-reduction -description: Add a new reduction rule using verified artifacts from /verify-reduction — reads Python reduce() as pseudocode, test vectors JSON for Rust tests, overhead from JSON ---- - -# Add Reduction (from Verified Artifacts) - -Complete pipeline for adding a reduction rule when `/verify-reduction` has produced verified artifacts. Translates Python `reduce()` to Rust, writes tests from test vectors, adds example-db entry, writes paper entry, then cleans up verification artifacts. - -**When to use:** After `/verify-reduction` PR exists. Use `/add-rule` when no verification artifacts exist. - -## Step 0: Locate and Read Verified Artifacts - -```bash -ls docs/paper/verify-reductions/verify__.py -ls docs/paper/verify-reductions/test_vectors__.json -ls docs/paper/verify-reductions/_.typ -``` - -If any are missing, run `/verify-reduction` first. - -**Read these three artifacts:** -1. **Python `reduce()` function** — the verified spec. Translate the algorithm to Rust, not the syntax. -2. **Test vectors JSON** — YES/NO instances with exact values, overhead expressions, verified claims. -3. **Typst proof** — Construction section for doc comments, proof for paper entry. - -## Step 1: Implement the Reduction - -Create `src/rules/_.rs`. Follow the pattern in `src/rules/satisfiability_naesatisfiability.rs`. - -**Translation guide from Python to Rust:** - -| Python | Rust | -|--------|------| -| `reduce(n, clauses)` → `(universe_size, subsets)` | `fn reduce_to(&self) -> Self::Result` | -| `extract_assignment(n, config)` | `fn extract_solution(&self, target_sol: &[usize]) -> Vec` | -| Helper functions (e.g., `literal_to_element`) | Private functions in the same file | -| Python list of ints | `Vec`, `Vec`, etc. (match the target problem's API) | - -**Required structure:** - -```rust -/// Result struct — holds target + any state needed for extraction. -#[derive(Debug, Clone)] -pub struct ReductionXToY { target: TargetType, /* mapping state */ } - -impl ReductionResult for ReductionXToY { - type Source = SourceType; - type Target = TargetType; - fn target_problem(&self) -> &Self::Target { &self.target } - fn extract_solution(&self, target_solution: &[usize]) -> Vec { - // Translate Python extract_assignment() logic - } -} - -#[reduction(overhead = { - // Copy verbatim from test_vectors JSON "overhead" field - field_name = "expression", -})] -impl ReduceTo for SourceType { - type Result = ReductionXToY; - fn reduce_to(&self) -> Self::Result { - // Translate Python reduce() logic - } -} - -#[cfg(test)] -#[path = "../unit_tests/rules/_.rs"] -mod tests; -``` - -## Step 2: Register in mod.rs - -Add to `src/rules/mod.rs`: -```rust -pub(crate) mod _; -``` - -## Step 3: Write Unit Tests from Test Vectors - -Create `src/unit_tests/rules/_.rs`. - -**From test vectors JSON, generate at minimum:** - -1. **Closed-loop test** (`test__to__closed_loop`) — construct source from `yes_instance.input`, reduce, verify target matches `yes_instance.output`, solve target with `BruteForce`, extract solutions, verify each is valid for source. - -2. **Infeasible test** (`test__to__infeasible`) — construct source from `no_instance.input`, reduce, verify target is also infeasible (no witnesses). - -3. **Structural test** — verify target dimensions match overhead formula, check well-formedness. - -Reference: `src/unit_tests/rules/minimumvertexcover_maximumindependentset.rs` - -## Step 4: Add Canonical Example to example_db - -**HARD GATE (Check 9 from #974).** Add a `canonical_rule_example_specs()` function in the reduction rule file, and wire it into `src/rules/mod.rs` where other specs are collected. Read `src/rules/mod.rs` first — search for `canonical_rule_example_specs` to see how existing rules register their examples. - -The builder function must: -1. Construct the source instance from the YES test vector -2. Reduce it to the target -3. Return a `RuleExampleSpec` with build closure - -**Verification:** -```bash -# The rule file must define canonical_rule_example_specs -grep "canonical_rule_example_specs" src/rules/_.rs -# mod.rs must wire it in -grep "_::canonical_rule_example_specs" src/rules/mod.rs -# Both must show results. If either is empty, go back and do it. -``` - -## Step 4b: Add Example-DB Lookup Test - -**HARD GATE (Check 10 from #974).** You MUST modify `src/unit_tests/example_db.rs`. Read the file first, add the new rule to the existing exhaustive lookup test or add a standalone test. - -**Verification:** -```bash -git diff --name-only | grep "example_db.rs" -# MUST show: src/unit_tests/example_db.rs -# If it does NOT appear, you skipped this step. Go back and do it. -``` - -## Step 5: Write Paper Entry - -**HARD GATE (Check 11 from #974).** You MUST modify `docs/paper/reductions.typ`. The Typst proof already exists from `/verify-reduction` — reformat it into the paper's macros. Do NOT skip this step. - -### 5a. Load example data - -```typst -#let src_tgt = load-example("Source", "Target") -#let src_tgt_sol = src_tgt.solutions.at(0) -``` - -### 5b. Write reduction-rule entry - -```typst -#reduction-rule("Source", "Target", - example: true, - example-caption: [Description ($n = ...$, $|E| = ...$)], -)[ - // Theorem body: complexity + construction summary + overhead hint -][ - // Proof body: _Construction._ ... _Correctness._ ... _Solution extraction._ ... - // Adapt from the verified Typst proof — reformat, don't rewrite -] -``` - -### 5c. Write worked example (extra block) - -Step-by-step walkthrough with concrete numbers from JSON data. Must include: -- `pred-commands()` block at top (create/reduce/solve/evaluate) -- Source instance display (graph visualization if applicable) -- Construction walkthrough with intermediate values -- Solution verified end-to-end -- Witness multiplicity note - -**Gold-standard reference:** Search for `reduction-rule("KColoring", "QUBO"` in `reductions.typ`. - -### 5d. Register display name (if new problem in paper) - -Add to `display-name` dictionary if the problem doesn't have an entry yet. - -### 5e. Build and verify - -```bash -make paper # Must compile without errors or new completeness warnings -``` - -**Verification — confirm the paper file was modified:** -```bash -git diff --name-only | grep "reductions.typ" -# MUST show: docs/paper/reductions.typ -# If it does NOT appear, you skipped Step 5. Go back and do it. -``` - -## Step 6: Regenerate Exports, CI Checks, and Dominated-Rules Gate - -```bash -cargo run --example export_graph -cargo run --example export_schemas -make regenerate-fixtures # Slow — needs fixtures for paper example data -``` - -### CI-equivalent checks — MANDATORY - -These must pass before committing. CI will fail if you skip them. - -```bash -# Clippy with -D warnings (CI uses this exact flag — local clippy without -D is insufficient) -cargo clippy -- -D warnings 2>&1 | tail -5 - -# Full test suite (catches dominated-rules test failures) -cargo test 2>&1 | tail -5 -``` - -### Dominated-rules gate — CHECK THIS - -Adding a new reduction can create paths that **dominate** existing direct reductions. The test `test_find_dominated_rules_returns_known_set` in `src/unit_tests/rules/analysis.rs` has a hardcoded set of known dominated pairs. If your new reduction creates a shorter path to a target that already has a direct reduction, this test will fail. - -```bash -cargo test test_find_dominated_rules_returns_known_set 2>&1 | tail -10 -``` - -If it fails with "Unexpected dominated rule: X -> Y (dominated by X -> ... -> Y)", add the new pair to the known set in `src/unit_tests/rules/analysis.rs`. - -## Step 7: Clean Up Verification Artifacts - -**MANDATORY.** Verification artifacts must NOT be committed into the library. Remove them after the Rust implementation is complete: - -```bash -git rm -f docs/paper/verify-reductions/*** -``` - -The Typst proof content lives on in the paper entry. The Python scripts were scaffolding — the Rust tests are the permanent verification. - -## Step 8: Parent-Side Verification and Commit - -**Do NOT trust the subagent's self-report.** After the implementation subagent reports DONE, the parent (you) MUST run these checks independently: - -### 8a. File gate (parent runs this, not the subagent) - -```bash -echo "=== Parent-side file gate ===" -REQUIRED_PATTERNS=( - "_.rs" # rule file - "mod.rs" # module registration - "example_db.rs" # lookup test - "reductions.typ" # paper entry -) -CHANGED=$(git diff --name-only HEAD) -for pat in "${REQUIRED_PATTERNS[@]}"; do - echo "$CHANGED" | grep -q "$pat" && echo " ✓ $pat" || echo " ✗ MISSING: $pat — send subagent back" -done - -# Canonical example wired? -grep -q "canonical_rule_example_specs" src/rules/_.rs && echo " ✓ canonical_rule_example_specs in rule file" || echo " ✗ MISSING canonical_rule_example_specs" -grep -q "_::canonical_rule_example_specs" src/rules/mod.rs && echo " ✓ wired in mod.rs" || echo " ✗ MISSING mod.rs wiring" - -# Artifacts cleaned? -ls docs/paper/verify-reductions/*** 2>/dev/null && echo " ✗ ARTIFACTS NOT CLEANED" || echo " ✓ Artifacts cleaned" -``` - -**If ANY line shows ✗, send the subagent back to fix it.** Do NOT proceed. - -### 8b. CI-equivalent checks (parent runs) - -```bash -cargo clippy -- -D warnings 2>&1 | tail -3 -cargo test 2>&1 | tail -3 -make paper 2>&1 | tail -3 -``` - -### 8c. Commit and push - -Only after ALL gates pass: - -```bash -git add -A -git commit -m "feat: add reduction (#)" -git push -u origin "" -gh pr create --title "feat: add reduction (#)" --body "..." -``` - -## Pre-Commit Hook - -Install this hook to mechanically block commits missing required files. The hook runs automatically — subagents cannot bypass it. - -**File:** `.claude/hooks/add-reduction-precommit.sh` - -```bash -#!/bin/bash -# Pre-commit hook for add-reduction: blocks commits missing required files. -# Install: copy to .git/hooks/pre-commit or add to settings.json hooks. - -STAGED=$(git diff --cached --name-only) - -# Only run for reduction rule commits -if ! echo "$STAGED" | grep -q "src/rules/.*\.rs$"; then - exit 0 -fi - -# Extract the reduction name from staged rule files -RULE_FILE=$(echo "$STAGED" | grep "^src/rules/" | grep -v mod.rs | grep -v traits.rs | head -1) -if [ -z "$RULE_FILE" ]; then - exit 0 -fi - -ERRORS=0 - -# Check example_db.rs is staged -if ! echo "$STAGED" | grep -q "example_db.rs"; then - echo "BLOCKED: src/unit_tests/example_db.rs not staged (Check 10 from #974)" - ERRORS=$((ERRORS + 1)) -fi - -# Check reductions.typ is staged -if ! echo "$STAGED" | grep -q "reductions.typ"; then - echo "BLOCKED: docs/paper/reductions.typ not staged (Check 11 from #974)" - ERRORS=$((ERRORS + 1)) -fi - -# Check mod.rs is staged -if ! echo "$STAGED" | grep -q "mod.rs"; then - echo "BLOCKED: src/rules/mod.rs not staged" - ERRORS=$((ERRORS + 1)) -fi - -# Check no verification artifacts are staged -if echo "$STAGED" | grep -q "verify-reductions/"; then - echo "BLOCKED: verification artifacts still staged — run git rm first" - ERRORS=$((ERRORS + 1)) -fi - -if [ $ERRORS -gt 0 ]; then - echo "" - echo "Fix the above and re-commit. See /add-reduction skill for details." - exit 1 -fi -``` - -## Common Mistakes - -| Mistake | Fix | -|---------|-----| -| Re-deriving algorithm from issue instead of Python `reduce()` | Python function is the verified spec — translate it | -| Overhead expressions don't match test vectors JSON | Copy verbatim from `overhead` field | -| Skipping infeasible (NO) test case | NO instance is in test vectors — always include | -| Missing `canonical_rule_example_specs()` in rule file + `mod.rs` wiring | MANDATORY — Check 9 from #974 | -| Missing example-db lookup test in `example_db.rs` | MANDATORY — Check 10 from #974 | -| Missing paper `reduction-rule` entry | MANDATORY — Check 11 from #974 | -| Leaving verification artifacts in repo | MANDATORY cleanup — Step 7 | -| Not regenerating fixtures after example-db | `make regenerate-fixtures` required for paper | -| Running `cargo clippy` without `-D warnings` | CI uses `-D warnings` — local clippy without it misses lint failures | -| New reduction dominates existing direct reduction | Check `test_find_dominated_rules_returns_known_set` — add new pair to known set in `analysis.rs` | -| Skipping full `cargo test` before commit | Dominated-rules test only runs in full suite, not filtered tests | diff --git a/.claude/skills/add-rule/SKILL.md b/.claude/skills/add-rule/SKILL.md index 185e9184d..8a1b8c325 100644 --- a/.claude/skills/add-rule/SKILL.md +++ b/.claude/skills/add-rule/SKILL.md @@ -5,7 +5,16 @@ description: Use when adding a new reduction rule to the codebase, either from a # Add Rule -Step-by-step guide for adding a new reduction rule (A -> B) to the codebase. +Step-by-step guide for adding a new reduction rule (A -> B) to the codebase. By default, every rule goes through mathematical verification (via `/verify-reduction`) before implementation. Pass `--no-verify` to skip verification for trivial reductions. + +## Invocation + +``` +/add-rule # interactive, with verification (default) +/add-rule --no-verify # interactive, skip verification +``` + +When called from `/issue-to-pr`, the `--no-verify` flag is passed through if present. ## Step 0: Gather Required Information @@ -27,6 +36,26 @@ Before any implementation, collect all required information. If called from `iss If any item is missing, ask the user to provide it. Put a high standard on item 7 (concrete example): it must be in tutorial style with clear intuition and easy to understand. Do NOT proceed until the checklist is complete. +## Step 0.5: Type Compatibility Gate + +Check source/target `Value` types before any work: + +```bash +grep "type Value = " src/models/*/.rs src/models/*/.rs +``` + +**Compatible pairs for `ReduceTo` (witness-capable):** +- `Or`->`Or`, `Min`->`Min`, `Max`->`Max` (same type) +- `Or`->`Min`, `Or`->`Max` (feasibility embeds into optimization) + +**Incompatible — STOP if any of these:** +- `Min`->`Or` or `Max`->`Or` — optimization source has no threshold K; needs a decision-variant source model +- `Max`->`Min` or `Min`->`Max` — opposite optimization directions; needs `ReduceToAggregate` or a decision-variant wrapper +- `Or`->`Sum` or `Min`->`Sum` — Sum is aggregate-only; needs `ReduceToAggregate` +- Any pair involving `And` or `Sum` on the target side + +If incompatible, STOP and comment on the issue explaining the type mismatch and options. Do NOT proceed. + ## Reference Implementations Read these first to understand the patterns: @@ -35,7 +64,19 @@ Read these first to understand the patterns: - **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` - **Traits:** `src/rules/traits.rs` (`ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult`) -## Step 1: Implement the reduction +## Step 1: Mathematical Verification (default, skip with `--no-verify`) + +**If `--no-verify` was passed, skip to Step 2.** + +Invoke the `/verify-reduction` skill to mathematically verify the reduction before writing Rust code. This runs the full verification pipeline: Typst proof, constructor Python script (>=5000 checks), adversary subagent (>=5000 independent checks), and cross-comparison. + +All verification artifacts are ephemeral — they exist only in conversation context and temp files. Nothing is committed to the repository. + +**If verification FAILS: STOP. Report to user. Do NOT proceed to implementation.** + +If verification passes, the verified Python `reduce()` and `extract_solution()` functions, along with the YES/NO instances, carry forward in conversation context to inform Steps 2-5. Use them as the canonical spec for the Rust implementation. + +## Step 2: Implement the reduction Create `src/rules/_.rs` (all lowercase, no underscores between words within a problem name): @@ -67,6 +108,7 @@ impl ReductionResult for ReductionXToY { fn target_problem(&self) -> &Self::Target { &self.target } fn extract_solution(&self, target_solution: &[usize]) -> Vec { // Map target solution back to source solution + // If Step 1 ran: translate the verified Python extract_solution() logic } } ``` @@ -78,7 +120,9 @@ impl ReductionResult for ReductionXToY { })] impl ReduceTo for SourceType { type Result = ReductionXToY; - fn reduce_to(&self) -> Self::Result { ... } + fn reduce_to(&self) -> Self::Result { + // If Step 1 ran: translate the verified Python reduce() logic + } } ``` @@ -86,13 +130,13 @@ Each primitive reduction is determined by the exact source/target variant pair. **Aggregate-only reductions:** when the rule preserves aggregate values but cannot recover a source witness from a target witness, implement `AggregateReductionResult` + `ReduceToAggregate` instead of `ReductionResult` + `ReduceTo`. Those edges are not auto-registered by `#[reduction]` yet; register them manually with `ReductionEntry { reduce_aggregate_fn: ..., capabilities: EdgeCapabilities::aggregate_only(), ... }`. See `src/unit_tests/rules/traits.rs` and `src/unit_tests/rules/graph.rs` for the reference pattern. -## Step 2: Register in mod.rs +## Step 3: Register in mod.rs Add to `src/rules/mod.rs`: - `mod _;` - If feature-gated (e.g., ILP): wrap with `#[cfg(feature = "ilp-solver")]` -## Step 3: Write unit tests +## Step 4: Write unit tests Create `src/unit_tests/rules/_.rs`: @@ -105,6 +149,8 @@ Create `src/unit_tests/rules/_.rs`: // 5. Verify: extracted solution is valid and optimal for source ``` +If Step 1 ran, use the verified YES/NO instances from conversation context to construct test cases. Include both a feasible (closed-loop) and infeasible (no witnesses) test. + Additional recommended tests: - Verify target problem structure (correct size, edges, constraints) - Edge cases (empty graph, single vertex, etc.) @@ -117,15 +163,17 @@ For aggregate-only reductions, replace the closed-loop witness test with value-c Link via `#[cfg(test)] #[path = "..."] mod tests;` at the bottom of the rule file. -## Step 4: Add canonical example to example_db +## Step 5: Add canonical example to example_db Add a builder function in `src/example_db/rule_builders.rs` that constructs a small, canonical instance for this reduction. Follow the existing patterns in that file. Register the builder in `build_rule_examples()`. -## Step 5: Document in paper +## Step 6: Document in paper Write a `reduction-rule` entry in `docs/paper/reductions.typ`. **Reference example:** search for `reduction-rule("KColoring", "QUBO"` to see the gold-standard entry — use it as a template. For a minimal example, see MinimumVertexCover -> MaximumIndependentSet. -### 5a. Write theorem body (rule statement) +If Step 1 ran, adapt the verified Typst proof into the paper's macros. Do not rewrite the proof from scratch — reformat it. + +### 6a. Write theorem body (rule statement) ```typst #reduction-rule("Source", "Target", @@ -138,7 +186,7 @@ Write a `reduction-rule` entry in `docs/paper/reductions.typ`. **Reference examp Three parts: complexity with citation, construction summary, overhead hint. -### 5b. Write proof body +### 6b. Write proof body Use these subsections with italic labels: @@ -156,7 +204,7 @@ Use these subsections with italic labels: Must be self-contained (all notation defined) and reproducible. -### 5c. Write worked example (extra block) +### 6c. Write worked example (extra block) Step-by-step walkthrough with concrete numbers from JSON data. Required steps: 1. Show source instance (dimensions, structure, graph visualization if applicable) @@ -168,7 +216,7 @@ Use `graph-colors`, `g-node()`, `g-edge()` for graph visualization — see refer **Reproducibility:** The `extra:` block must start with a `pred-commands()` call showing the create/reduce/solve/evaluate pipeline. The source-side `pred create --example ...` spec must be derived from the loaded canonical example data via the helper pattern in `write-rule-in-paper`; do not hand-write a bare alias and assume the default variant matches. -### 5d. Build and verify +### 6d. Build and verify ```bash make paper # Must compile without errors @@ -176,7 +224,7 @@ make paper # Must compile without errors Checklist: notation self-contained, complexity cited, overhead consistent, example uses JSON data (not hardcoded), solution verified end-to-end, witness semantics respected, paper compiles. -## Step 6: Regenerate exports and verify +## Step 7: Regenerate exports and verify ```bash cargo run --example export_graph # Generate reduction_graph.json for docs/paper builds @@ -185,7 +233,7 @@ make regenerate-fixtures # Regenerate example_db/fixtures/examples.js make test clippy # Must pass ``` -`make regenerate-fixtures` is required so the paper can load the new rule's example data from `src/example_db/fixtures/examples.json`. Without it, the `reduction-rule` entry in Step 5 will reference missing fixture data. +`make regenerate-fixtures` is required so the paper can load the new rule's example data from `src/example_db/fixtures/examples.json`. Without it, the `reduction-rule` entry in Step 6 will reference missing fixture data. Structural and quality review is handled by the `review-pipeline` stage, not here. The run stage just needs to produce working code. @@ -226,3 +274,4 @@ Aggregate-only reductions currently have a narrower CLI surface: | Not regenerating reduction graph | Run `cargo run --example export_graph` after adding a rule | | Source/target model not fully registered | Both problems must already have `declare_variants!`, aliases as needed, and CLI create support -- use `add-model` skill first | | Treating a direct-to-ILP rule as a toy stub | Direct ILP reductions need exact overhead metadata and strong semantic regression tests, just like other production ILP rules | +| Skipping verification for complex reductions | Verification is default for a reason — `--no-verify` is for trivial identity/complement reductions only | diff --git a/.claude/skills/issue-to-pr/SKILL.md b/.claude/skills/issue-to-pr/SKILL.md index 9962f4b8a..2573bedd5 100644 --- a/.claude/skills/issue-to-pr/SKILL.md +++ b/.claude/skills/issue-to-pr/SKILL.md @@ -9,7 +9,8 @@ Convert a GitHub issue into a PR: write a plan, create the PR, then execute the ## Invocation -- `/issue-to-pr 42` — create PR with plan, then execute +- `/issue-to-pr 42` — create PR with plan, then execute (for `[Rule]` issues, verification runs by default) +- `/issue-to-pr 42 --no-verify` — skip mathematical verification for `[Rule]` issues For Codex, open this `SKILL.md` directly and treat the slash-command forms above as aliases. The Makefile `run-issue` target already does this translation. @@ -37,6 +38,7 @@ Normalize to: - `ISSUE=` - `REPO=` (default `CodingThrust/problem-reductions`) - `EXECUTE=true|false` +- `NO_VERIFY=true|false` (default `false`; pass `--no-verify` to skip mathematical verification for `[Rule]` issues) ### 2. Fetch Issue + Preflight Guards @@ -91,7 +93,7 @@ The plan MUST reference the appropriate implementation skill and follow its step - **For ordinary `[Model]` issues:** Follow [add-model](../add-model/SKILL.md) Steps 1-7 as the action pipeline - **For `[Model]` issues that explicitly claim direct ILP solving:** Follow [add-model](../add-model/SKILL.md) Steps 1-7 **and** [add-rule](../add-rule/SKILL.md) Steps 1-6 for the direct ` -> ILP` rule in the same plan / PR -- **For `[Rule]` issues:** Follow [add-rule](../add-rule/SKILL.md) Steps 1-6 as the action pipeline +- **For `[Rule]` issues:** Follow [add-rule](../add-rule/SKILL.md) Steps 1-7 as the action pipeline. By default, `/add-rule` runs mathematical verification (Step 1) before implementation. If `--no-verify` was passed, include `--no-verify` when invoking `/add-rule` to skip verification. Include the concrete details from the issue (problem definition, reduction algorithm, example, etc.) mapped onto each step. diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md index eea41e5f4..76f1246f6 100644 --- a/.claude/skills/verify-reduction/SKILL.md +++ b/.claude/skills/verify-reduction/SKILL.md @@ -1,15 +1,13 @@ --- name: verify-reduction -description: End-to-end verification of a reduction rule — generates Typst proof, constructor Python script (>=5000 checks), and adversary Python script (>=5000 independent checks), iterating until all checks pass. Creates worktree + PR. +description: Standalone mathematical verification of a reduction rule — generates Typst proof, constructor Python script (>=5000 checks), and adversary Python script (>=5000 independent checks). Reports verdict. No artifacts saved. --- # Verify Reduction -Pre-verification gate for reduction rules. Produces Typst proof + dual Python verification scripts + test vectors JSON, iterating until all checks pass. Creates worktree + PR. Downstream: `/add-reduction` consumes the artifacts to implement Rust code. +Mathematical verification of a reduction rule. Produces a Typst proof + dual Python verification scripts, iterating until all checks pass. Reports a VERIFIED/FAILED verdict. All artifacts are ephemeral — nothing is committed to the repository. -``` -Issue → /verify-reduction (Typst + Python) → /add-reduction (Rust) → /review-pipeline -``` +Use standalone to check correctness before implementation, or as a subroutine of `/add-rule` (which calls this by default). ## Invocation @@ -18,18 +16,16 @@ Issue → /verify-reduction (Typst + Python) → /add-reduction (Rust) → /revi /verify-reduction SubsetSum Partition ``` -## Step 0: Parse Input and Create Worktree +## Step 0: Parse Input ```bash REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner) ISSUE= ISSUE_JSON=$(gh issue view "$ISSUE" --json title,body,number) -REPO_ROOT=$(pwd) -WORKTREE_JSON=$(python3 scripts/pipeline_worktree.py enter --name "verify-$ISSUE" --format json) -WORKTREE_DIR=$(printf '%s\n' "$WORKTREE_JSON" | python3 -c "import sys,json; print(json.load(sys.stdin)['worktree_dir'])") -cd "$WORKTREE_DIR" && git checkout -b "issue/$ISSUE-verify--" ``` +If invoked with problem names instead of an issue number, use the names directly. + ## Step 1: Read Issue, Study Models, Type Check ```bash @@ -47,16 +43,16 @@ grep "type Value = " src/models/*/.rs src/models/*/.rs ``` **Compatible pairs for `ReduceTo` (witness-capable):** -- `Or`→`Or`, `Min`→`Min`, `Max`→`Max` (same type) -- `Or`→`Min`, `Or`→`Max` (feasibility embeds into optimization) +- `Or`->`Or`, `Min`->`Min`, `Max`->`Max` (same type) +- `Or`->`Min`, `Or`->`Max` (feasibility embeds into optimization) **Incompatible — STOP if any of these:** -- `Min`→`Or` or `Max`→`Or` — optimization source has no threshold K; needs a decision-variant source model (e.g., `VertexCover(Or)` instead of `MinimumVertexCover(Min)`) -- `Max`→`Min` or `Min`→`Max` — opposite optimization directions; needs `ReduceToAggregate` or a decision-variant wrapper -- `Or`→`Sum` or `Min`→`Sum` — Sum is aggregate-only; needs `ReduceToAggregate` +- `Min`->`Or` or `Max`->`Or` — optimization source has no threshold K; needs a decision-variant source model +- `Max`->`Min` or `Min`->`Max` — opposite optimization directions; needs `ReduceToAggregate` or a decision-variant wrapper +- `Or`->`Sum` or `Min`->`Sum` — Sum is aggregate-only; needs `ReduceToAggregate` - Any pair involving `And` or `Sum` on the target side -If incompatible, STOP and comment on the issue explaining the type mismatch and options (add decision variant, use `ReduceToAggregate`, or choose different pair). Do NOT proceed with the mathematical verification — the reduction may be correct classically but cannot be implemented as `ReduceTo` in this codebase without a type-compatible source/target pair. +If incompatible, STOP and report the type mismatch and options. Do NOT proceed. ### If compatible @@ -64,7 +60,7 @@ Extract: construction algorithm, correctness argument, overhead formulas, worked ## Step 2: Write Typst Proof -Create standalone `docs/paper/verify-reductions/_.typ`. +Write a standalone Typst proof (in a temp file, not committed). **Mandatory structure:** @@ -78,27 +74,25 @@ Create standalone `docs/paper/verify-reductions/_.typ`. ($arrow.l.double$) ... _Solution extraction._ ... ] -*Overhead.* (table with target metric → formula) -*Feasible example.* (YES instance, ≥3 variables, fully worked with numbers) +*Overhead.* (table with target metric -> formula) +*Feasible example.* (YES instance, >=3 variables, fully worked with numbers) *Infeasible example.* (NO instance, fully worked — show WHY no solution exists) ``` **Hard rules:** - Zero instances of "clearly", "obviously", "it is easy to see", "straightforward" - Zero scratch work ("Wait", "Hmm", "Actually", "Let me try") -- Two examples minimum, both with ≥3 variables/vertices +- Two examples minimum, both with >=3 variables/vertices - Every symbol defined before first use -Compile: `python3 -c "import typst; typst.compile('.typ', output='.pdf', root='.')"` - ## Step 3: Write Constructor Python Script -Create `docs/paper/verify-reductions/verify__.py` with ALL 7 mandatory sections: +Write a Python verification script (temp file) with ALL 7 mandatory sections: | Section | What to verify | Notes | |---------|---------------|-------| | 1. Symbolic (sympy) | Overhead formulas symbolically for general n | "The overhead is trivial" is NOT an excuse to skip | -| 2. Exhaustive forward+backward | Source feasible ⟺ target feasible | n ≤ 5 minimum. ALL instances or ≥300 sampled per (n,m) | +| 2. Exhaustive forward+backward | Source feasible <=> target feasible | n <= 5 minimum. ALL instances or >=300 sampled per (n,m) | | 3. Solution extraction | Extract source solution from every feasible target witness | Most commonly skipped section. DO NOT SKIP | | 4. Overhead formula | Build target, measure actual size, compare against formula | Catches off-by-one in construction | | 5. Structural properties | Target well-formed, no degenerate cases | Gadget reductions: girth, connectivity, widget structure | @@ -109,16 +103,16 @@ Create `docs/paper/verify-reductions/verify__.py` with ALL 7 man | Type | Minimum checks | Minimum n | |------|---------------|-----------| -| Identity (same graph, different objective) | 10,000 | n ≤ 6 | -| Algebraic (padding, complement, case split) | 10,000 | n ≤ 5 | -| Gadget (widget, cycle construction) | 5,000 | n ≤ 5 | +| Identity (same graph, different objective) | 10,000 | n <= 6 | +| Algebraic (padding, complement, case split) | 10,000 | n <= 5 | +| Gadget (widget, cycle construction) | 5,000 | n <= 5 | Every reduction gets at least 5,000 checks regardless of perceived simplicity. ## Step 4: Run and Iterate ```bash -python3 docs/paper/verify-reductions/verify__.py +python3 /tmp/verify__.py ``` ### Iteration 1: Fix failures @@ -132,8 +126,8 @@ Print and fill this table honestly: ``` CHECK COUNT AUDIT: Total checks: ___ (minimum: 5,000) - Forward direction: ___ instances (minimum: all n ≤ 5) - Backward direction: ___ instances (minimum: all n ≤ 5) + Forward direction: ___ instances (minimum: all n <= 5) + Backward direction: ___ instances (minimum: all n <= 5) Solution extraction: ___ feasible instances tested Overhead formula: ___ instances compared Symbolic (sympy): ___ identities verified @@ -150,30 +144,14 @@ List EVERY claim in the Typst proof and whether it's tested: ``` CLAIM TESTED BY -"Universe has 2n elements" Section 4: overhead ✓ -"Complementarity forces consistency" Section 3: extraction ✓ -"Forward: NAE-sat → valid splitting" Section 2: exhaustive ✓ +"Universe has 2n elements" Section 4: overhead +"Complementarity forces consistency" Section 3: extraction +"Forward: NAE-sat -> valid splitting" Section 2: exhaustive ... ``` If any claim has no test, add one. If untestable, document WHY. -### Iteration 4: Export test vectors and validate Typst matching - -Export `docs/paper/verify-reductions/test_vectors__.json` with: - -```json -{ - "source": "", "target": "", "issue": , - "yes_instance": { "input": {...}, "output": {...}, "source_feasible": true, "target_feasible": true, "source_solution": [...], "extracted_solution": [...] }, - "no_instance": { "input": {...}, "output": {...}, "source_feasible": false, "target_feasible": false }, - "overhead": { "field": "expression" }, - "claims": [ {"tag": "...", "formula": "...", "verified": true} ] -} -``` - -**Typst ↔ JSON cross-check:** Verify key numerical values from the JSON appear in the Typst example sections (substring search). If any value is missing, the proof and script are out of sync — fix before proceeding. - ## Step 5: Adversary Verification Dispatch a subagent that reads ONLY the Typst proof (not the constructor script) and independently implements + tests the reduction. @@ -182,15 +160,15 @@ Dispatch a subagent that reads ONLY the Typst proof (not the constructor script) - Own `reduce()` function from scratch - Own `extract_solution()` function - Own `is_feasible_source()` and `is_feasible_target()` validators -- Exhaustive forward + backward for n ≤ 5 -- `hypothesis` property-based testing (≥2 strategies) +- Exhaustive forward + backward for n <= 5 +- `hypothesis` property-based testing (>=2 strategies) - Reproduce both Typst examples (YES and NO) -- ≥5,000 total checks +- >=5,000 total checks - Must NOT import from the constructor script **Typed adversary focus** (include in prompt): -- **Identity reductions:** exhaustive enumeration n ≤ 6, edge-case configs (all-zero, all-one, alternating) -- **Algebraic reductions:** case boundary conditions (e.g., Σ = 2T exactly, Σ = 2T ± 1), per-case extraction +- **Identity reductions:** exhaustive enumeration n <= 6, edge-case configs (all-zero, all-one, alternating) +- **Algebraic reductions:** case boundary conditions (e.g., S = 2T exactly, S = 2T +/- 1), per-case extraction - **Gadget reductions:** widget structure invariants, traversal patterns, interior vertex isolation ### Cross-comparison @@ -201,82 +179,74 @@ After both scripts pass, compare `reduce()` outputs on shared instances. Both mu | Constructor | Adversary | Cross-compare | Verdict | Action | |-------------|-----------|---------------|---------|--------| -| Pass | Pass | Agree | **Verified** | Proceed to Step 6 | +| Pass | Pass | Agree | **VERIFIED** | Done (or proceed to add-rule Step 2) | | Pass | Pass | Disagree | **Suspect** | Investigate — may be isomorphic or latent bug | -| Pass | Fail | — | **Adversary bug** | Fix adversary or clarify Typst spec | -| Fail | Pass | — | **Constructor bug** | Fix constructor, re-run from Step 4 | -| Fail | Fail | — | **Proof bug** | Re-examine Typst proof, return to Step 2 | +| Pass | Fail | -- | **Adversary bug** | Fix adversary or clarify Typst spec | +| Fail | Pass | -- | **Constructor bug** | Fix constructor, re-run from Step 4 | +| Fail | Fail | -- | **Proof bug** | Re-examine Typst proof, return to Step 2 | ## Step 6: Self-Review Checklist Every item must be YES. If any is NO, go back and fix. ### Typst proof -- [ ] Compiles without errors - [ ] Construction with numbered steps, symbols defined before use -- [ ] Correctness with independent ⟹ and ⟸ paragraphs +- [ ] Correctness with independent => and <= paragraphs - [ ] Solution extraction section present - [ ] Overhead table with formulas -- [ ] YES example (≥3 variables, fully worked) +- [ ] YES example (>=3 variables, fully worked) - [ ] NO example (fully worked, explains WHY infeasible) - [ ] Zero hand-waving language - [ ] Zero scratch work ### Constructor Python -- [ ] 0 failures, ≥5,000 total checks +- [ ] 0 failures, >=5,000 total checks - [ ] All 7 sections present and non-empty -- [ ] Exhaustive n ≤ 5 +- [ ] Exhaustive n <= 5 - [ ] Extraction tested for every feasible instance - [ ] Gap analysis: every Typst claim has a test ### Adversary Python -- [ ] 0 failures, ≥5,000 total checks +- [ ] 0 failures, >=5,000 total checks - [ ] Independent implementation (no imports from constructor) -- [ ] `hypothesis` PBT with ≥2 strategies +- [ ] `hypothesis` PBT with >=2 strategies - [ ] Reproduces both Typst examples ### Cross-consistency - [ ] Cross-comparison: 0 disagreements, 0 feasibility mismatches -- [ ] Test vectors JSON exported with Typst auto-matching verified -- [ ] All claims have `verified: true` + +## Step 7: Report Verdict + +Report the final verdict to the user: + +``` +VERIFICATION RESULT: VERIFIED / FAILED + Source: + Target: + Constructor checks: + Adversary checks: + Cross-comparison: instances, 0 disagreements + Issue: # +``` + +If called as a subroutine of `/add-rule`, the verified Python `reduce()`, `extract_solution()`, and YES/NO instances remain in conversation context for use in the Rust implementation steps. No files are saved. + +If called standalone, the verdict is the final output. The user can inspect the proof and scripts interactively during the session. ## Common Mistakes | Mistake | Consequence | |---------|-------------| -| Proceeding past type gate with incompatible types | Wasted work — math may be correct but `ReduceTo` impl is impossible. Common: `Min→Or` (MVC→HamCircuit), `Max→Min` (MaxCut→OLA) | +| Proceeding past type gate with incompatible types | Wasted work — math may be correct but `ReduceTo` impl is impossible | | Adversary imports from constructor script | Rejected — must be independent | | No `hypothesis` PBT in adversary | Rejected | | Section 1 (symbolic) empty | Rejected — "overhead is trivial" is not an excuse | | Only YES example, no NO example | Rejected | -| n ≤ 3 or n ≤ 4 "because it's simple" | Rejected — minimum n ≤ 5 | +| n <= 3 or n <= 4 "because it's simple" | Rejected — minimum n <= 5 | | No gap analysis | Rejected — perform before proceeding | | Example has < 3 variables | Rejected — too degenerate | | Either script has < 5,000 checks | Rejected — enhance testing | | Extraction (Section 3) not tested | Rejected — most commonly skipped | | Cross-comparison skipped | Rejected | | Disagreements dismissed without investigation | Rejected | - -## Step 7: Commit, Create PR, Clean Up - -```bash -git add docs/paper/verify-reductions/*** -git add -f docs/paper/verify-reductions/_.pdf -git commit -m "docs: /verify-reduction # VERIFIED" -git push -u origin "" -gh pr create --title "docs: verify reduction #" --body "..." -gh issue comment "$ISSUE" --body "verify-reduction report: VERIFIED (PR #)..." -cd "$REPO_ROOT" && python3 scripts/pipeline_worktree.py cleanup --worktree "$WORKTREE_DIR" -``` - -## Integration - -### Pipeline: Issue → verify-reduction → add-reduction → review-pipeline - -`/verify-reduction` is a **pre-verification gate**. The Python `reduce()` is the verified spec. `/add-reduction` translates it to Rust. `/review-pipeline`'s agentic test confirms the Rust matches. - -### Standalone usage - -- After `write-rule-in-paper`: invoke to verify paper entry -- During `review-structural`: check verification script exists and passes -- Before `issue-to-pr --execute`: pre-validate the algorithm +| Saving artifacts to the repository | All files are ephemeral — use temp directory, nothing committed |