Skip to content

test#945

Open
Trecek wants to merge 355 commits intostablefrom
main
Open

test#945
Trecek wants to merge 355 commits intostablefrom
main

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 15, 2026

No description provided.

Trecek and others added 30 commits April 5, 2026 03:54
…ipeline (#611)

## Summary

Every recipe (`implementation`, `remediation`, `implementation-groups`,
`merge-prs`) previously had an interactive `confirm_cleanup` prompt at
its terminal step. When `process-issues` drives batch processing, this
halted the pipeline waiting for user input. A `defer_cleanup` flag was
designed to bypass it, but made "interrupt the pipeline" the default and
"don't interrupt" the opt-in.

The fix: remove the interactive cleanup path entirely from all recipes.
Every terminal step unconditionally calls `register_clone_status`
(success or failure), writing to a shared registry file. After all
issues in `process-issues` complete, a single `batch_cleanup_clones`
call deletes all success-status clones and preserves all error-status
clones. No prompts. No flags. No per-issue decisions.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    START([● process-issues starts batch])

    subgraph PerIssue ["Per-Issue Recipe (× N issues)"]
        direction TB
        RECIPE["● Recipe Pipeline<br/>━━━━━━━━━━<br/>implementation / remediation<br/>implementation-groups / merge-prs<br/>plan → implement → test → push → PR → wait"]
        OUTCOME{"terminal<br/>outcome?"}
        REL_S["● release_issue_success<br/>━━━━━━━━━━<br/>release GitHub issue claim<br/>on_success/on_failure → register"]
        REL_F["● release_issue_failure<br/>━━━━━━━━━━<br/>release on error<br/>on_success/on_failure → register_failure"]
        REG_S["● register_clone_success<br/>━━━━━━━━━━<br/>register_clone_status<br/>status='success'<br/>on_success/on_failure → done"]
        REG_F["● register_clone_failure<br/>━━━━━━━━━━<br/>register_clone_status<br/>status='error'<br/>on_success/on_failure → escalate_stop"]
        DONE["● done<br/>━━━━━━━━━━<br/>action: stop (success)"]
        FAIL["● escalate_stop<br/>━━━━━━━━━━<br/>action: stop (failure)"]
    end

    REGISTRY[("● clone-cleanup-registry.json<br/>━━━━━━━━━━<br/>.autoskillit/temp/<br/>accumulated entries")]

    subgraph PostBatch ["● After ALL Batches Complete (process-issues Step 3d)"]
        direction LR
        BATCH["● batch_cleanup_clones<br/>━━━━━━━━━━<br/>reads registry<br/>deletes status=success clones<br/>preserves status=error clones<br/>no prompt, one call"]
        PRESERVED["preserved clones<br/>━━━━━━━━━━<br/>status=error kept<br/>for investigation"]
        DELETED["deleted clones<br/>━━━━━━━━━━<br/>status=success removed<br/>disk reclaimed"]
    end

    END_OK([COMPLETE])

    START --> RECIPE
    RECIPE --> OUTCOME
    OUTCOME -->|"success path"| REL_S
    OUTCOME -->|"failure path"| REL_F
    REL_S --> REG_S
    REL_F --> REG_F
    REG_S -->|"writes status=success"| REGISTRY
    REG_F -->|"writes status=error"| REGISTRY
    REG_S --> DONE
    REG_F --> FAIL
    DONE -->|"after all issues done"| BATCH
    FAIL -->|"after all issues done"| BATCH
    BATCH -->|"reads registry"| REGISTRY
    BATCH --> PRESERVED
    BATCH --> DELETED
    DELETED --> END_OK
    PRESERVED --> END_OK

    class START,END_OK terminal;
    class RECIPE handler;
    class OUTCOME stateNode;
    class REL_S,REL_F phase;
    class REG_S,REG_F,BATCH newComponent;
    class DONE phase;
    class FAIL detector;
    class REGISTRY stateNode;
    class PRESERVED,DELETED output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Start and end states |
| Orange | Handler | Recipe pipeline execution |
| Teal | State | Decision routing and registry storage |
| Purple | Phase | Control flow nodes (release, done) |
| Green | New/Modified | ● Modified steps (register, batch cleanup) |
| Red | Detector | Failure terminal (escalate_stop) |
| Dark Teal | Output | Clone disposition artifacts |

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    START([Pipeline Terminal Step])

    subgraph WritePath ["● WRITE: Recipe Terminal Registration (once per clone)"]
        direction LR
        REG_S["● register_clone_success<br/>━━━━━━━━━━<br/>INIT_ONLY write<br/>status='success'<br/>clone_path (immutable)"]
        REG_F["● register_clone_failure<br/>━━━━━━━━━━<br/>INIT_ONLY write<br/>status='error'<br/>clone_path (immutable)"]
    end

    subgraph Registry ["● Registry File — APPEND_ONLY during run"]
        direction TB
        ENTRY["● clone-cleanup-registry.json<br/>━━━━━━━━━━<br/>entries: [{clone_path, status,<br/>step_name, timestamp}]<br/>written N times (once per clone)<br/>never mutated after write"]
    end

    subgraph ReadPath ["● READ: Batch Cleanup (once, post-run)"]
        direction LR
        BATCH["● batch_cleanup_clones<br/>━━━━━━━━━━<br/>reads all entries<br/>partitions by status"]
        GATE{"status?"}
        DEL["delete clone dir<br/>━━━━━━━━━━<br/>status=success<br/>disk reclaimed"]
        KEEP["preserve clone dir<br/>━━━━━━━━━━<br/>status=error<br/>for investigation"]
    end

    subgraph Contracts ["Contract Cards (recipe input contracts)"]
        direction LR
        C1["★ contracts/implementation-groups.yaml<br/>━━━━━━━━━━<br/>NEW — no defer_cleanup<br/>no registry_path"]
        C2["● contracts/implementation.yaml<br/>━━━━━━━━━━<br/>updated — removed<br/>defer_cleanup, registry_path"]
        C3["● contracts/remediation.yaml<br/>━━━━━━━━━━<br/>updated — removed<br/>defer_cleanup, registry_path"]
        C4["● contracts/merge-prs.yaml<br/>━━━━━━━━━━<br/>updated — removed defer_cleanup<br/>registry_path, keep_clone_on_failure"]
    end

    ELIMINATED["ELIMINATED state<br/>━━━━━━━━━━<br/>defer_cleanup ingredient<br/>registry_path ingredient<br/>keep_clone_on_failure ingredient<br/>check_defer_cleanup step<br/>confirm_cleanup step"]

    END_OK([COMPLETE])

    START -->|"success terminal"| REG_S
    START -->|"failure terminal"| REG_F
    REG_S -->|"appends entry"| ENTRY
    REG_F -->|"appends entry"| ENTRY
    ENTRY -->|"read once post-run"| BATCH
    BATCH --> GATE
    GATE -->|"status=success"| DEL
    GATE -->|"status=error"| KEEP
    DEL --> END_OK
    KEEP --> END_OK

    C1 -.->|"contract enforces"| REG_S
    C2 -.->|"contract enforces"| REG_S
    C3 -.->|"contract enforces"| REG_S
    C4 -.->|"contract enforces"| REG_S

    ELIMINATED -.->|"no longer written"| ENTRY

    class START,END_OK terminal;
    class REG_S,REG_F,BATCH newComponent;
    class ENTRY stateNode;
    class GATE stateNode;
    class DEL,KEEP output;
    class C1 phase;
    class C2,C3,C4 phase;
    class ELIMINATED detector;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Pipeline start and end |
| Green | ● Modified / New | register steps and batch cleanup (this PR)
|
| Teal | State | Registry file and status decision |
| Purple | Phase | Contract card files |
| Dark Teal | Output | Clone disposition outcomes |
| Red | Eliminated | State that no longer exists |

Closes #610

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-185031-682892/.autoskillit/temp/make-plan/process_issues_defer_clone_cleanup_plan_2026-04-04_000000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s |
| verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s |
| implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s |
| fix | 214 | 28.4k | 3.5M | 5 | 30m 58s |
| audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s |
| open_pr | 36 | 16.9k | 1.4M | 1 | 6m 15s |
| **Total** | 10.1k | 383.2k | 42.0M | | 2h 51m |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ocal (#612)

## Summary

Move `smoke-test.yaml` and its companion artifacts (contract card, flow
diagram) from the bundled `src/autoskillit/recipes/` directory to the
project-local `.autoskillit/recipes/` directory. This makes smoke-test
invisible to end-user projects while remaining fully functional when
running from the AutoSkillit repository root. The existing project-local
recipe discovery mechanism already supports this — no production code
changes are needed. All changes are file relocations and test updates.

## Requirements

### MOVE — Recipe File Relocation

- **REQ-MOVE-001:** The file `src/autoskillit/recipes/smoke-test.yaml`
must be relocated to `.autoskillit/recipes/smoke-test.yaml` at the
project root.
- **REQ-MOVE-002:** Associated contract card(s) in
`src/autoskillit/recipes/contracts/` matching `smoke-test*` must be
relocated to `.autoskillit/recipes/contracts/`.
- **REQ-MOVE-003:** Associated diagram(s) in
`src/autoskillit/recipes/diagrams/` matching `smoke-test*` must be
relocated to `.autoskillit/recipes/diagrams/`.

### LIST — Listing Behavior

- **REQ-LIST-001:** The smoke-test recipe must not appear in
`list_recipes` output when the current working directory is outside the
AutoSkillit repository.
- **REQ-LIST-002:** The smoke-test recipe must appear in `list_recipes`
output with source `PROJECT` when the current working directory is the
AutoSkillit repository root.

### LOAD — Pipeline Compatibility

- **REQ-LOAD-001:** `load_recipe("smoke-test")` must succeed when
invoked from the AutoSkillit repository root.
- **REQ-LOAD-002:** Existing smoke-test pipeline execution must remain
functionally identical after the move.

### TEST — Test Updates

- **REQ-TEST-001:** Tests that assert smoke-test has
`RecipeSource.BUILTIN` must be updated to assert `RecipeSource.PROJECT`.
- **REQ-TEST-002:** Tests that count the number of bundled recipes must
be updated to reflect the removal of smoke-test from the bundled set.

## Architecture Impact

### Operational Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    START(["list_recipes / find_recipe_by_name called"])

    subgraph ProjectLocal ["★ PROJECT-LOCAL SCAN (priority 1)"]
        direction TB
        PROJ_DIR["★ .autoskillit/recipes/<br/>━━━━━━━━━━<br/>source = PROJECT<br/>★ smoke-test.yaml (moved here)"]
        PROJ_CONTRACT["★ .autoskillit/recipes/contracts/<br/>━━━━━━━━━━<br/>★ smoke-test.yaml"]
        PROJ_DIAGRAM["★ .autoskillit/recipes/diagrams/<br/>━━━━━━━━━━<br/>★ smoke-test.md"]
    end

    subgraph Bundled ["BUNDLED SCAN (priority 2)"]
        direction TB
        BUILTIN_DIR["src/autoskillit/recipes/<br/>━━━━━━━━━━<br/>source = BUILTIN<br/>implementation, remediation,<br/>merge-prs, impl-groups<br/>(smoke-test removed)"]
    end

    DEDUP["Dedup via seen set<br/>━━━━━━━━━━<br/>Project names shadow bundled"]

    subgraph AutoskillitRepo ["AUTOSKILLIT REPO CONTEXT"]
        direction TB
        CLI_LIST["● autoskillit recipes list<br/>━━━━━━━━━━<br/>Shows smoke-test (source: project)"]
        CLI_ORDER["autoskillit order<br/>━━━━━━━━━━<br/>Pipeline execution menu"]
        CLI_RENDER["autoskillit recipes render<br/>━━━━━━━━━━<br/>_recipes_dir_for(PROJECT)<br/>→ .autoskillit/recipes/diagrams/"]
    end

    subgraph ExternalProject ["EXTERNAL PROJECT CONTEXT"]
        direction TB
        EXT_LIST["autoskillit recipes list<br/>━━━━━━━━━━<br/>smoke-test NOT visible<br/>(no project-local copy)"]
    end

    START --> PROJ_DIR
    PROJ_DIR --> DEDUP
    DEDUP --> BUILTIN_DIR
    PROJ_DIR --> PROJ_CONTRACT
    PROJ_DIR --> PROJ_DIAGRAM
    DEDUP --> CLI_LIST
    DEDUP --> CLI_ORDER
    CLI_RENDER --> PROJ_DIAGRAM
    DEDUP --> EXT_LIST

    class START terminal;
    class PROJ_DIR,PROJ_CONTRACT,PROJ_DIAGRAM newComponent;
    class BUILTIN_DIR stateNode;
    class DEDUP handler;
    class CLI_LIST,CLI_ORDER,CLI_RENDER cli;
    class EXT_LIST detector;
```

### Module Dependency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    subgraph Tests ["TESTS (modified ●)"]
        direction TB
        T_SMOKE["● test_smoke_pipeline.py<br/>━━━━━━━━━━<br/>uses SMOKE_SCRIPT<br/>→ project-local path"]
        T_BUNDLED["● test_bundled_recipes.py<br/>━━━━━━━━━━<br/>smoke_yaml fixture<br/>→ project-local path"]
        T_POLICY["● test_bundled_recipe_hidden_policy.py<br/>━━━━━━━━━━<br/>BUNDLED_RECIPE_NAMES<br/>smoke-test removed"]
        T_TOOLS["● test_tools_recipe.py<br/>━━━━━━━━━━<br/>list_recipes assertion<br/>smoke-test NOT in bundled"]
        T_ENGINE["● test_engine.py<br/>━━━━━━━━━━<br/>contract adapter test<br/>→ project-local path"]
    end

    subgraph L3 ["L3 — SERVER"]
        direction TB
        TOOLS_RECIPE["server.tools_recipe<br/>━━━━━━━━━━<br/>list_recipes, load_recipe<br/>validate_recipe"]
    end

    subgraph L2R ["L2 — RECIPE"]
        direction TB
        RECIPE_IO["recipe.io<br/>━━━━━━━━━━<br/>builtin_recipes_dir()<br/>list_recipes()"]
        RECIPE_VALIDATOR["recipe.validator<br/>━━━━━━━━━━<br/>run_semantic_rules<br/>analyze_dataflow"]
        RECIPE_CONTRACTS["recipe.contracts<br/>━━━━━━━━━━<br/>load_bundled_manifest"]
    end

    subgraph L2M ["L2 — MIGRATION"]
        direction TB
        MIG_ENGINE["migration.engine<br/>━━━━━━━━━━<br/>default_migration_engine<br/>contract adapters"]
    end

    subgraph L0 ["L0 — CORE"]
        direction TB
        CORE_PATHS["core.paths<br/>━━━━━━━━━━<br/>pkg_root() → bundled dir<br/>fan-in: all layers"]
    end

    subgraph Artifacts ["★ PROJECT-LOCAL ARTIFACTS (new)"]
        direction TB
        PROJ_RECIPE["★ .autoskillit/recipes/<br/>━━━━━━━━━━<br/>smoke-test.yaml"]
        PROJ_CONTRACT["★ .autoskillit/recipes/contracts/<br/>━━━━━━━━━━<br/>smoke-test.yaml"]
        PROJ_DIAGRAM["★ .autoskillit/recipes/diagrams/<br/>━━━━━━━━━━<br/>smoke-test.md"]
    end

    T_SMOKE -->|"imports"| TOOLS_RECIPE
    T_SMOKE -->|"imports"| RECIPE_IO
    T_BUNDLED -->|"imports"| RECIPE_IO
    T_BUNDLED -->|"imports"| RECIPE_CONTRACTS
    T_POLICY -->|"imports"| CORE_PATHS
    T_TOOLS -->|"imports"| TOOLS_RECIPE
    T_ENGINE -->|"imports"| CORE_PATHS
    T_ENGINE -->|"imports"| MIG_ENGINE

    TOOLS_RECIPE -->|"imports"| RECIPE_IO
    RECIPE_IO -->|"builtin_recipes_dir()"| CORE_PATHS
    RECIPE_VALIDATOR -->|"imports"| RECIPE_IO
    RECIPE_CONTRACTS -->|"imports"| RECIPE_IO
    MIG_ENGINE -->|"imports"| CORE_PATHS

    T_SMOKE -.->|"now reads"| PROJ_RECIPE
    T_BUNDLED -.->|"now reads"| PROJ_RECIPE
    T_ENGINE -.->|"now reads"| PROJ_CONTRACT

    class T_SMOKE,T_BUNDLED,T_POLICY,T_TOOLS,T_ENGINE phase;
    class TOOLS_RECIPE cli;
    class RECIPE_IO,RECIPE_VALIDATOR,RECIPE_CONTRACTS handler;
    class MIG_ENGINE handler;
    class CORE_PATHS stateNode;
    class PROJ_RECIPE,PROJ_CONTRACT,PROJ_DIAGRAM newComponent;
```

Closes #600

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-190817-394673/.autoskillit/temp/make-plan/move_smoke_test_recipe_plan_2026-04-04_190817.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s |
| verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s |
| implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s |
| fix | 214 | 28.4k | 3.5M | 5 | 30m 58s |
| audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s |
| open_pr | 74 | 37.1k | 3.0M | 2 | 12m 44s |
| **Total** | 10.1k | 403.4k | 43.6M | | 2h 58m |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…Type in review-design (#614)

## Summary

The `review-design` skill has L1 severity calibration that correctly
caps `estimand_clarity` and `hypothesis_falsifiability` by
`experiment_type` — benchmarks can never produce L1 critical findings.
But the red-team dimension has **no analogous calibration**, meaning any
critical red-team finding triggers STOP regardless of experiment type.
This creates an unresolvable loop for benchmarks: the red-team always
finds new critical issues at progressively higher abstraction (the Hydra
pattern), exhausting retries without ever producing GO.

The fix adds a red-team severity calibration rubric to
`review-design/SKILL.md` (mirroring the L1 rubric), updates the verdict
logic to apply the cap before building `stop_triggers`, and adds
diminishing-return awareness to `resolve-design-review/SKILL.md` so it
can detect goalposts-moving across rounds.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([Plan submitted])
    GO([GO → execute])
    REVISE_OUT([REVISE → revise_design])
    REVISED_OUT([revised → revise_design])
    FAILED_OUT([failed → design_rejected])

    subgraph ReviewDesign ["● review-design/SKILL.md"]
        direction TB
        L1["L1 Analysis<br/>━━━━━━━━━━<br/>estimand_clarity +<br/>hypothesis_falsifiability"]
        L1GATE{"L1 Fail-Fast<br/>━━━━━━━━━━<br/>Any L1 critical?"}
        PARALLEL["L2 + L3 + L4 + RT<br/>━━━━━━━━━━<br/>Parallel analysis"]
        RTCAP["● RT Severity Cap<br/>━━━━━━━━━━<br/>RT_MAX_SEVERITY[experiment_type]<br/>Downgrade if above ceiling"]
        MERGE["Merge + Dedup<br/>━━━━━━━━━━<br/>All findings pooled"]
        VERDICT{"● Verdict Logic<br/>━━━━━━━━━━<br/>stop_triggers built<br/>AFTER rt_cap applied"}
    end

    subgraph ResolveDesign ["● resolve-design-review/SKILL.md"]
        direction TB
        PARSE["Step 1: Parse Dashboard<br/>━━━━━━━━━━<br/>Extract stop-trigger findings<br/>Classify ADDRESSABLE/STRUCTURAL/DISCUSS"]
        DIMCHECK{"prior_revision_guidance<br/>━━━━━━━━━━<br/>provided?"}
        DIMRET["● Step 1.5: Diminishing-Return<br/>━━━━━━━━━━<br/>Compare ADDRESSABLE themes<br/>vs prior guidance entries"]
        GOALPOST{"goalposts_moving<br/>━━━━━━━━━━<br/>true for any finding?"}
        RECLASSIFY["● Reclassify<br/>━━━━━━━━━━<br/>ADDRESSABLE → STRUCTURAL<br/>annotate prior_theme_match"]
        RESGATE{"Any ADDRESSABLE<br/>or DISCUSS?"}
    end

    subgraph RecipeRouting ["● research.yaml — resolve_design_review step"]
        direction LR
        RECIPE["skill_command passes<br/>━━━━━━━━━━<br/>$context.revision_guidance<br/>as optional 3rd arg"]
    end

    START --> L1
    L1 --> L1GATE
    L1GATE -->|"yes (L1 critical)"| MERGE
    L1GATE -->|"no"| PARALLEL
    PARALLEL --> RTCAP
    RTCAP --> MERGE
    MERGE --> VERDICT
    VERDICT -->|"stop_triggers present"| RECIPE
    VERDICT -->|"critical or ≥3 warnings"| REVISE_OUT
    VERDICT -->|"otherwise"| GO

    RECIPE --> PARSE
    PARSE --> DIMCHECK
    DIMCHECK -->|"yes"| DIMRET
    DIMCHECK -->|"no (round 1)"| RESGATE
    DIMRET --> GOALPOST
    GOALPOST -->|"true"| RECLASSIFY
    GOALPOST -->|"false"| RESGATE
    RECLASSIFY --> RESGATE
    RESGATE -->|"yes"| REVISED_OUT
    RESGATE -->|"all STRUCTURAL"| FAILED_OUT

    class START,GO,REVISE_OUT,REVISED_OUT,FAILED_OUT terminal;
    class L1,PARALLEL handler;
    class L1GATE,VERDICT,DIMCHECK,GOALPOST,RESGATE stateNode;
    class MERGE,PARSE phase;
    class RTCAP,DIMRET,RECLASSIFY newComponent;
    class RECIPE detector;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Start and outcome states |
| Orange | Handler | Analysis agents (L1, parallel L2-L4+RT) |
| Teal | State | Decision points and verdict routing |
| Purple | Phase | Merge and parse aggregation steps |
| Green | Modified Component | ● Nodes changed by this PR (RT cap,
diminishing-return detection, reclassify, recipe routing) |
| Red | Detector | Recipe routing gate (passes revision_guidance) |

Closes #609

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-185816-184240/.autoskillit/temp/make-plan/add-red-team-severity-calibration-by-experiment-type_plan_2026-04-04_185816.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s |
| verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s |
| implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s |
| fix | 214 | 28.4k | 3.5M | 5 | 30m 58s |
| audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s |
| open_pr | 135 | 68.4k | 5.4M | 4 | 23m 1s |
| review_pr | 31 | 22.8k | 1.2M | 1 | 5m 50s |
| **Total** | 10.2k | 457.5k | 47.2M | | 3h 14m |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
#615)

## Summary

The token summary table (displayed in PRs, terminal, and compact KV
output) collapses 4 distinct Claude API token fields into 3 misleading
columns. The column labeled "input" actually shows only the tiny
uncached delta (`input_tokens`), and "cached" silently sums two
cost-distinct categories (`cache_read_input_tokens` at 0.1x billing +
`cache_creation_input_tokens` at 1.25x billing). This change splits the
display into 4 token columns — `uncached`, `output`, `cache_read`,
`cache_write` — across all 3 independent formatter implementations and
their tests.

No data model, extraction, or storage changes are needed — `TokenEntry`
already preserves all 4 fields. This is purely a formatting-layer fix.

## Architecture Impact

### Data Lineage Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart LR
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph API ["Claude API Response"]
        direction TB
        F1["input_tokens<br/>━━━━━━━━━━<br/>Uncached delta"]
        F2["output_tokens<br/>━━━━━━━━━━<br/>Generated tokens"]
        F3["cache_read_input_tokens<br/>━━━━━━━━━━<br/>0.1x billing"]
        F4["cache_creation_input_tokens<br/>━━━━━━━━━━<br/>1.25x billing"]
    end

    subgraph Storage ["TokenEntry Storage"]
        TE[("TokenEntry<br/>━━━━━━━━━━<br/>4 fields intact<br/>Accumulated per step")]
        TJ[("token_usage.json<br/>━━━━━━━━━━<br/>Persisted session data<br/>All 4 fields")]
    end

    subgraph Canonical ["● telemetry_fmt.py (Canonical Formatter)"]
        direction TB
        FMD["● format_token_table()<br/>━━━━━━━━━━<br/>Markdown table<br/>Step|uncached|output|cache_read|cache_write|count|time"]
        FTM["● format_token_table_terminal()<br/>━━━━━━━━━━<br/>Terminal table<br/>UNCACHED|OUTPUT|CACHE_RD|CACHE_WR"]
        FKV["● format_compact_kv()<br/>━━━━━━━━━━<br/>Compact KV<br/>uc:|out:|cr:|cw:"]
    end

    subgraph Hooks ["Stdlib Hooks (no autoskillit imports)"]
        direction TB
        TSA["● token_summary_appender._format_table()<br/>━━━━━━━━━━<br/>Reads token_usage.json<br/>Markdown table → GitHub PR body"]
        POS["● pretty_output._fmt_get_token_summary()<br/>━━━━━━━━━━<br/>Reads get_token_summary JSON<br/>Compact KV → PostToolUse"]
        POR["● pretty_output._fmt_run_skill()<br/>━━━━━━━━━━<br/>Reads run_skill result dict<br/>Inline KV → PostToolUse"]
    end

    subgraph Outputs ["Display Targets"]
        direction TB
        MD["PR Body<br/>━━━━━━━━━━<br/>GitHub markdown table"]
        TERM["Terminal<br/>━━━━━━━━━━<br/>Padded column output"]
        KV["Compact KV<br/>━━━━━━━━━━<br/>One-liner summaries"]
        HOOK["PostToolUse Output<br/>━━━━━━━━━━<br/>Hook-formatted display"]
    end

    F1 --> TE
    F2 --> TE
    F3 --> TE
    F4 --> TE
    TE --> TJ

    TE --> FMD
    TE --> FTM
    TE --> FKV
    TJ --> TSA
    TJ -.-> POS

    FMD -->|"markdown rows"| MD
    FTM -->|"padded columns"| TERM
    FKV -->|"kv lines"| KV
    TSA -->|"gh api PATCH"| MD
    POS -->|"formatted text"| HOOK
    POR -->|"formatted text"| HOOK

    class F1,F2,F3,F4 cli;
    class TE,TJ stateNode;
    class FMD,FTM,FKV handler;
    class TSA,POS,POR integration;
    class MD,TERM,KV,HOOK output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | API Fields | 4 Claude API token categories from usage
response |
| Teal | Storage | TokenEntry dataclass + persisted JSON session files |
| Orange | Canonical Formatter | 3 functions in telemetry_fmt.py (all ●
modified) |
| Red | Stdlib Hooks | Independent hook implementations (all ● modified)
|
| Dark Teal | Outputs | Display targets: PR body, terminal, compact KV,
PostToolUse |

### Operational Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph Triggers ["OPERATOR TRIGGERS"]
        direction TB
        GTS["get_token_summary<br/>━━━━━━━━━━<br/>MCP tool call<br/>format=json|markdown"]
        RS["run_skill<br/>━━━━━━━━━━<br/>MCP tool call<br/>Headless session"]
        PRPATCH["PR body update<br/>━━━━━━━━━━<br/>After open-pr skill<br/>PostToolUse event"]
    end

    subgraph State ["TOKEN STATE (read/write)"]
        direction TB
        TL[("DefaultTokenLog<br/>━━━━━━━━━━<br/>In-memory accumulator<br/>4 fields per step")]
        TJ[("token_usage.json<br/>━━━━━━━━━━<br/>Per-session disk files<br/>Read by stdlib hooks")]
    end

    subgraph Formatters ["● FORMATTERS (modified)"]
        direction TB
        TF["● telemetry_fmt.py<br/>━━━━━━━━━━<br/>format_token_table()<br/>format_token_table_terminal()<br/>format_compact_kv()"]
        TSA["● token_summary_appender.py<br/>━━━━━━━━━━<br/>_format_table()<br/>Stdlib-only hook"]
        PO["● pretty_output.py<br/>━━━━━━━━━━<br/>_fmt_get_token_summary()<br/>_fmt_run_skill()"]
    end

    subgraph Outputs ["OBSERVABILITY OUTPUTS (write-only)"]
        direction TB
        MDTBL["PR Body Table<br/>━━━━━━━━━━<br/>## Token Usage Summary<br/>Step|uncached|output|cache_read|cache_write|count|time"]
        TERM["Terminal Table<br/>━━━━━━━━━━<br/>STEP UNCACHED OUTPUT CACHE_RD CACHE_WR COUNT TIME<br/>Padded for readability"]
        KV["Compact KV<br/>━━━━━━━━━━<br/>name xN [uc:X out:X cr:X cw:X t:Xs]<br/>total_uncached / total_cache_read / total_cache_write"]
        HOOK["PostToolUse Display<br/>━━━━━━━━━━<br/>tokens_uncached:<br/>tokens_cache_read:<br/>tokens_cache_write:"]
    end

    GTS -->|"reads"| TL
    TL -.->|"flush"| TJ
    TJ -->|"load_sessions"| TSA
    TJ -.->|"via MCP JSON payload"| PO

    GTS --> TF
    TF -->|"markdown"| MDTBL
    TF -->|"terminal"| TERM
    TF -->|"compact"| KV

    RS -->|"PostToolUse event"| PO
    PO -->|"_fmt_run_skill"| HOOK
    PO -->|"_fmt_get_token_summary"| KV

    PRPATCH -->|"PostToolUse event"| TSA
    TSA -->|"gh api PATCH"| MDTBL

    class GTS,RS,PRPATCH cli;
    class TL,TJ stateNode;
    class TF,TSA,PO handler;
    class MDTBL,TERM,KV,HOOK output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Triggers | Operator-initiated MCP tool calls and
PostToolUse events |
| Teal | State | Token accumulator (read/write) and persisted JSON files
|
| Orange | Formatters | 3 modified formatter implementations (all ●
changed) |
| Dark Teal | Outputs | Write-only observability artifacts: PR table,
terminal, compact KV |

Closes #604

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-190817-266225/.autoskillit/temp/make-plan/token_summary_4_columns_plan_2026-04-04_191000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary

Add `api-simulator` as a dev dependency and use its `mock_http_server`
pytest fixture to test the quota guard's real HTTP path end-to-end.
Currently all quota tests monkeypatch `_fetch_quota` at the function
level — the actual httpx client construction, header injection
(`Authorization: Bearer`, `anthropic-beta`), response parsing, and error
handling are never exercised. This plan introduces a `base_url`
parameter to `_fetch_quota` and `check_and_sleep_if_needed`, then writes
7 tests that point the real httpx client at `mock_http_server` to
exercise the full HTTP path.

**Files changed:** 3 (`pyproject.toml`,
`src/autoskillit/execution/quota.py`, new
`tests/execution/test_quota_http.py`)
**Existing tests:** Unchanged — all monkeypatch-based tests in
`test_quota.py` remain as-is.

## Requirements

### DEP — Dependency Integration

- **REQ-DEP-001:** The system must include `api-simulator` as a dev-only
dependency with a pinned git tag source.
- **REQ-DEP-002:** The api-simulator dependency must not appear in
production runtime dependencies.

### CFG — URL Configurability

- **REQ-CFG-001:** `_fetch_quota` must accept a `base_url` parameter
defaulting to `https://api.anthropic.com`.
- **REQ-CFG-002:** `check_and_sleep_if_needed` must thread the
`base_url` parameter through to `_fetch_quota` at both call sites.
- **REQ-CFG-003:** The production behavior must be unchanged when
`base_url` is not explicitly provided.

### HTTP — HTTP Path Verification

- **REQ-HTTP-001:** Tests must exercise the real httpx client
construction path, not monkeypatch `_fetch_quota`.
- **REQ-HTTP-002:** Tests must verify that the `Authorization: Bearer`
header is sent on the request.
- **REQ-HTTP-003:** Tests must verify that the `anthropic-beta:
oauth-2025-04-20` header is sent on the request.
- **REQ-HTTP-004:** Tests must verify correct JSON response parsing for
the `five_hour` utilization shape.

### ERR — Error Handling Verification

- **REQ-ERR-001:** Tests must verify fail-open behavior on HTTP 4xx/5xx
responses.
- **REQ-ERR-002:** Tests must verify fail-open behavior on network
timeout.
- **REQ-ERR-003:** Tests must verify that the above-threshold path
triggers a double-fetch (two HTTP requests).

### COMPAT — Backward Compatibility

- **REQ-COMPAT-001:** Existing `test_quota.py` tests must continue to
pass unchanged.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([START: check_and_sleep_if_needed])

    subgraph GatePhase ["Gate Phase"]
        direction TB
        ENABLED{"config.enabled?"}
        DISABLED(["RETURN<br/>should_sleep: false"])
    end

    subgraph CachePhase ["Cache Phase"]
        direction TB
        CACHE["_read_cache<br/>━━━━━━━━━━<br/>Read local JSON cache"]
        CACHE_HIT{"Cache fresh?<br/>━━━━━━━━━━<br/>age ≤ max_age?"}
    end

    subgraph FetchPhase ["HTTP Fetch Phase"]
        direction TB
        FETCH["● _fetch_quota<br/>━━━━━━━━━━<br/>★ base_url parameter<br/>httpx.AsyncClient GET"]
        BASEURL["★ base_url<br/>━━━━━━━━━━<br/>default: api.anthropic.com<br/>test: mock_http_server.url"]
        PARSE["Parse Response<br/>━━━━━━━━━━<br/>five_hour.utilization<br/>Z→+00:00 normalization"]
    end

    subgraph DecisionPhase ["Threshold Decision"]
        direction TB
        THRESHOLD{"utilization<br/>≥ threshold?"}
        RESETS_AT1{"resets_at<br/>is None?<br/>(Gate 1)"}
        REFETCH["● _fetch_quota re-fetch<br/>━━━━━━━━━━<br/>★ base_url threaded<br/>Double-fetch for accuracy"]
        RESETS_AT2{"resets_at<br/>still None?<br/>(Gate 2)"}
    end

    subgraph Results ["Results"]
        BELOW(["RETURN<br/>should_sleep: false"])
        FALLBACK1(["RETURN<br/>should_sleep: true<br/>reason: unknown_reset<br/>fallback ≥ 60s"])
        FALLBACK2(["RETURN<br/>should_sleep: true<br/>reason: unknown_reset<br/>fallback ≥ 60s"])
        SLEEP(["RETURN<br/>should_sleep: true<br/>sleep_seconds computed"])
        FAILOPEN(["RETURN<br/>should_sleep: false<br/>error key present"])
    end

    subgraph TestInfra ["★ Test Infrastructure (test_quota_http.py)"]
        direction TB
        MOCK["★ mock_http_server<br/>━━━━━━━━━━<br/>api-simulator fixture<br/>HTTP server"]
        REGISTER["★ register / register_sequence<br/>━━━━━━━━━━<br/>Custom endpoint responses<br/>Status codes, delays"]
        INSPECT["★ get_requests / request_count<br/>━━━━━━━━━━<br/>Header verification<br/>Double-fetch assertion"]
    end

    START --> ENABLED
    ENABLED -->|"false"| DISABLED
    ENABLED -->|"true"| CACHE
    CACHE --> CACHE_HIT
    CACHE_HIT -->|"fresh + below threshold"| BELOW
    CACHE_HIT -->|"miss or expired"| FETCH
    FETCH --> BASEURL
    BASEURL --> PARSE
    PARSE --> THRESHOLD
    THRESHOLD -->|"below"| BELOW
    THRESHOLD -->|"above"| RESETS_AT1
    RESETS_AT1 -->|"None"| FALLBACK1
    RESETS_AT1 -->|"present"| REFETCH
    REFETCH --> RESETS_AT2
    RESETS_AT2 -->|"None"| FALLBACK2
    RESETS_AT2 -->|"present"| SLEEP
    FETCH -.->|"HTTP error / timeout"| FAILOPEN

    MOCK -.->|"serves responses to"| BASEURL
    REGISTER -.->|"configures"| MOCK
    INSPECT -.->|"verifies headers / count"| FETCH

    class START terminal;
    class DISABLED,BELOW,FALLBACK1,FALLBACK2,SLEEP,FAILOPEN phase;
    class ENABLED,CACHE_HIT,THRESHOLD,RESETS_AT1,RESETS_AT2 stateNode;
    class CACHE,PARSE handler;
    class FETCH,REFETCH handler;
    class BASEURL,MOCK,REGISTER,INSPECT newComponent;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Entry point |
| Teal | State | Decision points and routing |
| Orange | Handler | Processing nodes (cache read, HTTP fetch, parse) |
| Green | New Component | ★ New `base_url` parameter and test
infrastructure |
| Purple | Phase | Result return paths |

### Development Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    subgraph Deps ["● DEPENDENCY MANIFEST (pyproject.toml)"]
        direction TB
        PYPROJECT["● pyproject.toml<br/>━━━━━━━━━━<br/>hatchling build backend<br/>requires-python ≥ 3.11"]
        DEVDEPS["● dev optional-dependencies<br/>━━━━━━━━━━<br/>pytest, pytest-asyncio,<br/>pytest-httpx, pytest-xdist,<br/>pytest-timeout, ruff,<br/>import-linter, packaging"]
        APISIM["★ api-simulator<br/>━━━━━━━━━━<br/>New dev dependency<br/>HTTP mock fixture provider"]
        UVSRC["★ [tool.uv.sources]<br/>━━━━━━━━━━<br/>api-simulator pinned<br/>git: TalonT-Org/api-simulator<br/>branch: main"]
        UVLOCK["● uv.lock<br/>━━━━━━━━━━<br/>Regenerated with<br/>api-simulator entry"]
    end

    subgraph Quality ["CODE QUALITY GATES (pre-commit)"]
        direction TB
        FORMAT["ruff format<br/>━━━━━━━━━━<br/>Auto-fix code style<br/>reads + modifies src"]
        LINT["ruff check<br/>━━━━━━━━━━<br/>Auto-fix lint violations<br/>reads + modifies src"]
        TYPES["mypy<br/>━━━━━━━━━━<br/>Type checking<br/>reads src, reports only"]
        UVCHECK["uv lock check<br/>━━━━━━━━━━<br/>Verifies lockfile sync<br/>reads uv.lock"]
        SECRETS["gitleaks<br/>━━━━━━━━━━<br/>Secret scanning<br/>reads staged files"]
        IMPORTLINT["import-linter<br/>━━━━━━━━━━<br/>Layer contract enforcement<br/>IL-001 through IL-007"]
    end

    subgraph Testing ["TEST FRAMEWORK"]
        direction TB
        PYTEST["pytest + pytest-asyncio<br/>━━━━━━━━━━<br/>asyncio_mode=auto<br/>timeout=60s signal"]
        XDIST["pytest-xdist -n 4<br/>━━━━━━━━━━<br/>Parallel test workers<br/>worksteal distribution"]
        UNITQUOTA["● test_quota.py<br/>━━━━━━━━━━<br/>23 unit tests<br/>monkeypatch _fetch_quota<br/>mock signature updated"]
        HTTPQUOTA["★ test_quota_http.py<br/>━━━━━━━━━━<br/>7 end-to-end HTTP tests<br/>real httpx client path<br/>no monkeypatching"]
        MOCKSERVER["★ mock_http_server fixture<br/>━━━━━━━━━━<br/>api-simulator provides<br/>register / register_sequence<br/>get_requests / request_count"]
    end

    subgraph EntryPoints ["ENTRY POINTS"]
        CLI["autoskillit CLI<br/>━━━━━━━━━━<br/>autoskillit.cli:main"]
    end

    PYPROJECT --> DEVDEPS
    DEVDEPS --> APISIM
    APISIM --> UVSRC
    UVSRC --> UVLOCK

    PYPROJECT --> FORMAT
    FORMAT --> LINT
    LINT --> TYPES
    TYPES --> UVCHECK
    UVCHECK --> SECRETS
    SECRETS --> IMPORTLINT

    IMPORTLINT --> PYTEST
    PYTEST --> XDIST
    XDIST --> UNITQUOTA
    XDIST --> HTTPQUOTA
    APISIM -.->|"provides fixture"| MOCKSERVER
    MOCKSERVER -.->|"injected into"| HTTPQUOTA

    PYPROJECT --> CLI

    class PYPROJECT,DEVDEPS,UVLOCK phase;
    class APISIM,UVSRC,HTTPQUOTA,MOCKSERVER newComponent;
    class UNITQUOTA handler;
    class FORMAT,LINT,TYPES,UVCHECK,SECRETS,IMPORTLINT detector;
    class PYTEST,XDIST handler;
    class CLI output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Purple | Build Config | pyproject.toml, dev deps, lockfile |
| Green | New Component | ★ api-simulator dep, uv.sources, HTTP test
file, mock fixture |
| Orange | Test Framework | pytest, xdist, existing test_quota.py |
| Red | Quality Gates | ruff, mypy, uv lock check, gitleaks,
import-linter |
| Dark Teal | Entry Points | CLI entry point |

Closes #607

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260404-190816-816130/.autoskillit/temp/make-plan/integrate_api_simulator_quota_guard_plan_2026-04-04_191500.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s |
| verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s |
| implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s |
| fix | 214 | 28.4k | 3.5M | 5 | 30m 58s |
| audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s |
| open_pr | 100 | 51.3k | 3.9M | 3 | 16m 38s |
| **Total** | 10.2k | 417.5k | 44.5M | | 3h 2m |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary

The `zero_writes` gate in `execution/headless.py` fires unconditionally
when `write_behavior.mode == "always"` and `write_call_count == 0`. The
`resolve-failures` contract declares `write_behavior: always`, but the
skill legitimately exits with zero `Edit`/`Write` calls when the
worktree is already green (0 fix iterations). The gate has no escape
path for this case — `success=True` is demoted to `zero_writes`, killing
an otherwise correct pipeline run.

This PR changes the contract to `conditional` mode with a pattern gated
on the `fixes_applied` structured token, extends the same fix to
`retry-worktree` and `resolve-review`, and adds a semantic rule to
prevent regression.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([run_skill called])
    SUCCESS(["✓ success=True<br/>subtype=success"])
    DEMOTED(["✗ success=False<br/>subtype=zero_writes"])

    subgraph Contract ["● Contract Resolution"]
        direction TB
        YAML["● skill_contracts.yaml<br/>━━━━━━━━━━<br/>resolve-failures:<br/>  write_behavior: conditional<br/>  write_expected_when:<br/>  - fixes_applied ≥ 1 regex"]
        FACTORY["● _factory.py<br/>━━━━━━━━━━<br/>_resolve_write_behavior()<br/>reads contract via lru_cache"]
        SPEC["WriteBehaviorSpec<br/>━━━━━━━━━━<br/>mode=conditional<br/>expected_when=(pattern,)"]
    end

    subgraph Execution ["● Skill Execution"]
        direction TB
        SESSION["headless subprocess<br/>━━━━━━━━━━<br/>run tests, apply fixes<br/>via Bash / Edit / Write"]
        TOKEN["● Structured Token<br/>━━━━━━━━━━<br/>fixes_applied = N<br/>emitted at Step 4"]
        COUNT["write_call_count<br/>━━━━━━━━━━<br/>count Edit + Write<br/>in tool_uses"]
    end

    subgraph Gate ["● Zero-Write Gate"]
        direction TB
        GUARD{"success=True AND<br/>write_count=0 AND<br/>write_behavior≠None?"}
        MODE{"● mode?<br/>━━━━━━━━━━<br/>always vs conditional"}
        PATTERN{"● _check_expected_patterns<br/>━━━━━━━━━━<br/>AND-match all patterns<br/>against session output"}
        EXPECT{"write_expected<br/>AND write_count=0?"}
    end

    %% FLOW %%
    START --> YAML
    YAML -->|"reads"| FACTORY
    FACTORY -->|"builds"| SPEC
    SPEC -->|"passed to executor"| SESSION
    SESSION --> TOKEN
    SESSION --> COUNT
    TOKEN --> GUARD
    COUNT --> GUARD

    GUARD -->|"No — gate inactive"| SUCCESS
    GUARD -->|"Yes"| MODE

    MODE -->|"always"| EXPECT
    MODE -->|"conditional"| PATTERN

    PATTERN -->|"fixes_applied=0<br/>no match → False"| SUCCESS
    PATTERN -->|"fixes_applied≥1<br/>match → True"| EXPECT

    EXPECT -->|"write_count > 0<br/>artifact written"| SUCCESS
    EXPECT -->|"write_count = 0<br/>no artifact"| DEMOTED

    %% CLASS ASSIGNMENTS %%
    class START,SUCCESS,DEMOTED terminal;
    class YAML,SPEC stateNode;
    class FACTORY,SESSION,COUNT handler;
    class TOKEN output;
    class GUARD,MODE,PATTERN,EXPECT detector;
```

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    subgraph ContractFields ["● INIT_ONLY: Contract Fields (YAML → frozen)"]
        direction TB
        WB["● write_behavior<br/>━━━━━━━━━━<br/>always ∣ conditional ∣ null<br/>Set in skill_contracts.yaml<br/>Cached via @lru_cache"]
        WEW["● write_expected_when<br/>━━━━━━━━━━<br/>list of regex patterns<br/>AND-semantics at gate<br/>Empty = no pattern gate"]
    end

    subgraph SpecFields ["INIT_ONLY: WriteBehaviorSpec (frozen dataclass)"]
        direction TB
        MODE["● mode: str ∣ None<br/>━━━━━━━━━━<br/>Mirrors write_behavior<br/>Frozen after construction"]
        EXPECTED["● expected_when: tuple<br/>━━━━━━━━━━<br/>Immutable tuple of patterns<br/>Frozen after construction"]
    end

    subgraph SessionState ["MUTABLE + APPEND: Session State"]
        direction TB
        TOOLS["tool_uses: list<br/>━━━━━━━━━━<br/>APPEND_ONLY during session<br/>Each Edit/Write appended"]
        RESULT["● session output: str<br/>━━━━━━━━━━<br/>Contains structured tokens<br/>fixes_applied = N"]
        WCC["write_call_count: int<br/>━━━━━━━━━━<br/>DERIVED from tool_uses<br/>count(Edit + Write)"]
    end

    subgraph GateState ["● MUTABLE: SkillResult Fields (gate mutations)"]
        direction TB
        SUCCESS["● success: bool<br/>━━━━━━━━━━<br/>Init: True (if session ok)<br/>Gate may demote → False"]
        SUBTYPE["● subtype: str<br/>━━━━━━━━━━<br/>Init: success<br/>Gate may set → zero_writes"]
        RETRY["● needs_retry: bool<br/>━━━━━━━━━━<br/>Init: False<br/>Gate may set → True"]
    end

    subgraph Validation ["● VALIDATION GATES"]
        direction TB
        G1{"● mode check<br/>━━━━━━━━━━<br/>always → write_expected=True<br/>conditional → check patterns"}
        G2{"● _check_expected_patterns<br/>━━━━━━━━━━<br/>AND over all patterns<br/>re.search each on output"}
        G3{"write_expected AND<br/>write_count == 0?<br/>━━━━━━━━━━<br/>Demote if both True"}
    end

    %% FLOW: Contract → Spec %%
    WB -->|"reads"| MODE
    WEW -->|"reads"| EXPECTED

    %% FLOW: Spec → Gate %%
    MODE -->|"determines gate path"| G1
    EXPECTED -->|"provides patterns"| G2

    %% FLOW: Session → Gate %%
    TOOLS -->|"derives"| WCC
    RESULT -->|"scanned by"| G2
    WCC -->|"checked by"| G3

    %% FLOW: Gate decisions %%
    G1 -->|"conditional"| G2
    G1 -->|"always"| G3
    G2 -->|"match → True"| G3
    G2 -->|"no match → False"| SUCCESS

    %% FLOW: Gate → Mutation %%
    G3 -->|"demote"| SUBTYPE
    G3 -->|"demote"| RETRY
    G3 -->|"preserve"| SUCCESS

    %% CLASS ASSIGNMENTS %%
    class WB,WEW detector;
    class MODE,EXPECTED detector;
    class TOOLS handler;
    class RESULT output;
    class WCC phase;
    class SUCCESS,SUBTYPE,RETRY gap;
    class G1,G2,G3 stateNode;
```

Closes #603

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/remediation-20260404-212507-745574/.autoskillit/temp/rectify/rectify_zero-writes-false-positive_2026-04-04_215019_part_a.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | input | output | cached | count | time |
|------|-------|--------|--------|-------|------|
| investigate | 31 | 12.6k | 747.1k | 1 | 6m 34s |
| rectify | 11.4k | 57.9k | 2.0M | 1 | 27m 28s |
| review | 3.6k | 7.2k | 216.3k | 1 | 8m 0s |
| dry_walkthrough | 51 | 30.8k | 2.3M | 2 | 11m 22s |
| implement | 2.2k | 28.2k | 3.0M | 2 | 10m 56s |
| assess | 44 | 7.8k | 1.1M | 2 | 8m 43s |
| audit_impl | 30 | 18.6k | 654.7k | 2 | 9m 10s |
| open_pr | 28 | 15.8k | 1.0M | 1 | 7m 3s |
| **Total** | 17.3k | 178.9k | 11.1M | | 1h 29m |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…alation Routing, and Pack Fix (#620)

## Summary

This part adds the post-review re-validation loop and escalation
consumption infrastructure to `research.yaml`, adds the `needs_rerun`
structured output token to `resolve-research-review/SKILL.md`, and fixes
the missing `exp-lens` pack registration. Additionally adds the data
provenance lifecycle across 5 research pipeline skills (plan-experiment,
run-experiment, write-report, review-design, review-research-pr) with
contract and guard tests.

## Requirements

### DATA — Data Provenance Lifecycle

- **REQ-DATA-001:** The `plan-experiment` skill must generate a Data
Manifest section in every experiment plan that maps each hypothesis to
its required data source(s), specifying source type (synthetic, fixture,
external, gitignored), acquisition method (generate, download, copy),
and verification criteria.
- **REQ-DATA-002:** When the research task directive or issue specifies
using particular data, the `plan-experiment` skill must include explicit
acquisition steps for that data in the plan — the plan must not assume
data will already be present.
- **REQ-DATA-003:** The `run-experiment` skill pre-flight must perform a
hypothesis-to-data mapping check against the Data Manifest: for each
hypothesis, verify its required data source is present and non-empty
before execution begins.
- **REQ-DATA-004:** When `run-experiment` pre-flight finds that data the
plan said would be acquired is missing, it must emit a structured
`blocked_hypotheses` list and treat this as a FAIL — not silently
degrade to N/A.
- **REQ-DATA-005:** The `review-design` skill must include data
acquisition completeness as a reviewable dimension at sufficient weight
to influence the verdict (not L-weight), checking that every hypothesis
has a data source, every external source has an acquisition step, and
every gitignored path has a generation/download step.
- **REQ-DATA-006:** The `review-research-pr` skill must include a
`data-scope` review dimension that checks whether the experiment's data
coverage matches the research task directive and flags when all
benchmarks used only synthetic data for a domain-specific project.

### REPORT — Write-Report Data Scope Guardrails

- **REQ-REPORT-001:** The `write-report` skill must include a mandatory
Data Scope Statement in the Executive Summary that explicitly states
what data types were used for all benchmarks and whether domain target
data was present, absent, or partial.
- **REQ-REPORT-002:** The `write-report` skill must perform a Metrics
Provenance Check before including any `*_metrics.json` files: verify
they were generated during the current experiment. If stale or
unrelated, disclose and omit with explanation rather than silently
dropping.
- **REQ-REPORT-003:** The `write-report` skill must enforce
pre-specified hypothesis gate thresholds: when a gate is not met, the
report must state this as a failure, and GO recommendations must
reference the specific gate that was met rather than silently
substituting a different threshold.

### REVAL — Post-Review Re-Validation Loop

- **REQ-REVAL-001:** The `resolve-research-review` skill must emit a
structured output token (`needs_rerun = true/false`) indicating whether
any `rerun_required` escalations exist, so the recipe can capture and
route on it.
- **REQ-REVAL-002:** The `research.yaml` recipe must include a routing
step after `resolve_research_review` that checks for `rerun_required`
escalations and routes to a `re_run_experiment` step when present.
- **REQ-REVAL-003:** The `re_run_experiment` step must perform a
targeted re-run of affected benchmarks/analyses (not a full experiment
replay) using the same data and scripts, then flow to `re_write_report`
→ `re_push_research`.
- **REQ-REVAL-004:** When only `design_flaw` escalations exist (no
`rerun_required`), the recipe must annotate the PR body with the
escalation details and continue to push.

### ESC — Escalation Consumption

- **REQ-ESC-001:** The `research.yaml` recipe must include a
`check_escalations` step between `resolve_research_review` and
`re_push_research` that reads `escalation_records_{pr}.json` and routes
based on escalation strategy types.
- **REQ-ESC-002:** The `check_escalations` step must distinguish between
`rerun_required` escalations (route to re-validation) and
`design_flaw`-only escalations (annotate and continue).

### PACK — Exp-Lens Pack Registration

- **REQ-PACK-001:** The `research.yaml` recipe must declare
`requires_packs: [research, exp-lens]` so that all 18 exp-lens skills
are available in headless sessions during the research recipe pipeline.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    PUSH_BR([push_branch<br/>━━━━━━━━━━<br/>git push worktree])

    subgraph PRReview ["PR Review Phase"]
        direction TB
        OPEN["open_research_pr<br/>━━━━━━━━━━<br/>run_skill: open-pr"]
        GUARD{"guard_pr_url<br/>━━━━━━━━━━<br/>context.pr_url?"}
        REVIEW["● review_research_pr<br/>━━━━━━━━━━<br/>run_skill: review-research-pr<br/>captures: verdict"]
    end

    subgraph Resolution ["Review Resolution"]
        direction TB
        RESOLVE["● resolve_research_review<br/>━━━━━━━━━━<br/>run_skill: resolve-research-review<br/>captures: needs_rerun<br/>retries: 2"]
    end

    subgraph EscalationRouting ["★ Escalation Routing (New)"]
        direction TB
        CHECK{"★ check_escalations<br/>━━━━━━━━━━<br/>action: route<br/>context.needs_rerun?"}
    end

    subgraph RevalidationLoop ["★ Re-Validation Loop (New)"]
        direction TB
        RERUN["★ re_run_experiment<br/>━━━━━━━━━━<br/>run-experiment --adjust<br/>targeted benchmark re-run"]
        REWRITE["★ re_write_report<br/>━━━━━━━━━━<br/>write-report<br/>updated results"]
        RETEST["★ re_test<br/>━━━━━━━━━━<br/>test_check<br/>post-revalidation gate"]
    end

    REPUSH["● re_push_research<br/>━━━━━━━━━━<br/>run_cmd: git push"]
    COMPLETE([research_complete<br/>━━━━━━━━━━<br/>action: stop])

    PUSH_BR --> OPEN
    OPEN --> GUARD
    GUARD -->|"pr_url truthy"| REVIEW
    GUARD -->|"no pr_url"| COMPLETE
    REVIEW -->|"changes_requested"| RESOLVE
    REVIEW -->|"approved / needs_human"| COMPLETE
    RESOLVE -->|"on_success"| CHECK
    RESOLVE -->|"on_failure / exhausted"| COMPLETE
    CHECK -->|"needs_rerun == true"| RERUN
    CHECK -->|"default (false/absent)"| REPUSH
    RERUN -->|"on_success"| REWRITE
    RERUN -->|"on_failure / context_limit"| REPUSH
    REWRITE -->|"on_success"| RETEST
    REWRITE -->|"on_failure / context_limit"| REPUSH
    RETEST -->|"pass or fail"| REPUSH
    REPUSH --> COMPLETE

    class PUSH_BR,COMPLETE terminal;
    class GUARD,CHECK stateNode;
    class OPEN,REVIEW,RESOLVE handler;
    class RERUN,REWRITE,RETEST newComponent;
    class REPUSH phase;
```

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    subgraph Manifest ["★ Data Manifest Contract (INIT_ONLY)"]
        direction TB
        DM["★ data_manifest<br/>━━━━━━━━━━<br/>hypothesis[], source_type,<br/>acquisition, location,<br/>verification, depends_on"]
        V9{"★ V9 Gate<br/>━━━━━━━━━━<br/>Every hypothesis has source?<br/>External has acquisition?<br/>Gitignored has generation?"}
    end

    subgraph DesignGate ["★ Design Review Gate"]
        direction TB
        DAQ{"★ data_acquisition L4<br/>━━━━━━━━━━<br/>Hypothesis coverage?<br/>External readiness?<br/>Directive compliance?"}
    end

    subgraph PreFlight ["★ Run-Experiment Pre-Flight"]
        direction TB
        PF{"★ Data Manifest<br/>Verification<br/>━━━━━━━━━━<br/>location exists?<br/>acquisition succeeds?"}
        BH["★ blocked_hypotheses<br/>━━━━━━━━━━<br/>APPEND_ONLY<br/>H5: missing at path"]
    end

    subgraph ReportGates ["★ Write-Report Validation Gates"]
        direction TB
        DSS["★ Data Scope Statement<br/>━━━━━━━━━━<br/>Mandatory in Executive Summary<br/>data types + domain coverage"]
        MPC["★ Metrics Provenance<br/>━━━━━━━━━━<br/>timestamp + relevance check<br/>disclose, never silently drop"]
        GE["★ Gate Enforcement<br/>━━━━━━━━━━<br/>pre-specified thresholds only<br/>no silent substitution"]
    end

    subgraph ReviewGate ["★ PR Review Gate"]
        direction TB
        DSCOPE["★ data-scope dimension<br/>━━━━━━━━━━<br/>Scope coverage?<br/>Claims qualified?<br/>Statement present?"]
    end

    subgraph EscalationState ["● Resolve Output Contract"]
        direction TB
        ESC["escalation_records<br/>━━━━━━━━━━<br/>APPEND_ONLY<br/>strategy: rerun_required<br/>strategy: design_flaw"]
        NR["● needs_rerun<br/>━━━━━━━━━━<br/>DERIVED from escalations<br/>any rerun_required → true<br/>else → false"]
    end

    DM -->|"writes"| V9
    V9 -->|"PASS: plan saved"| DAQ
    V9 -->|"FAIL: plan rejected"| FAIL_PLAN([Plan Rejected])

    DAQ -->|"GO: proceed"| PF
    DAQ -->|"STOP: hypothesis has no source"| REVISE([Revise Plan])
    DAQ -->|"REVISE: missing verification"| REVISE

    PF -->|"ALL READY"| DSS
    PF -->|"BLOCKED: data missing"| BH
    BH --> FAIL_RUN([Status: FAILED])

    DM -.->|"reads manifest"| PF
    DM -.->|"reads manifest"| DSS
    DM -.->|"reads manifest"| DSCOPE

    DSS --> MPC
    MPC --> GE
    GE -->|"report committed"| DSCOPE

    DSCOPE -->|"findings"| ESC
    ESC -->|"derive"| NR
    NR -->|"true → re-validate"| RERUN([Re-Validation Loop])
    NR -->|"false → push"| PUSH([Direct Push])

    class DM detector;
    class V9,DAQ,PF stateNode;
    class BH,ESC handler;
    class DSS,MPC,GE newComponent;
    class DSCOPE newComponent;
    class NR phase;
    class FAIL_PLAN,FAIL_RUN,REVISE gap;
    class RERUN,PUSH cli;
```

Closes #618

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-074034-301298/.autoskillit/temp/make-plan/research_recipe_data_provenance_plan_2026-04-05_074500_part_a.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 587 | 30.7k | 1.2M | 112.6k | 1 | 13m 29s |
| verify | 73 | 35.9k | 3.7M | 137.0k | 2 | 11m 23s |
| implement | 2.1k | 36.2k | 5.9M | 155.2k | 2 | 17m 4s |
| fix | 50 | 13.2k | 2.1M | 64.5k | 1 | 10m 53s |
| audit_impl | 28 | 17.3k | 786.1k | 51.7k | 1 | 5m 55s |
| open_pr | 23 | 17.1k | 736.1k | 58.6k | 1 | 8m 12s |
| **Total** | 2.9k | 150.3k | 14.5M | 579.5k | | 1h 6m |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary

When a headless session spawns background agents via Claude Code's
`Agent` tool with `run_in_background: true`, Claude Code defers the
`type=result` NDJSON record until all background agents finish. If
autoskillit kills the process tree after Channel B confirms completion,
the deferred `type=result` is never flushed to stdout.
`parse_session_result` classifies the output as `UNPARSEABLE`, which
gates out all recovery paths and Channel B bypass — producing a false
failure for sessions that completed successfully.

The fix adds a **pre-gate Channel B drain-race recovery** in
`_build_skill_result` that runs *before* the `session.session_complete`
gate. When Channel B confirmed completion but the session is
UNPARSEABLE/EMPTY_OUTPUT, it reconstructs the result from
`assistant_messages` (which are written to stdout BEFORE the deferred
`type=result`) and promotes the session to SUCCESS, unlocking all
downstream recovery paths and Channel B bypass naturally.

## Architecture Impact

### Error/Resilience Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    START(["● _build_skill_result<br/>━━━━━━━━━━<br/>Entry with SubprocessResult"])

    subgraph PreGate ["● PRE-GATE: Channel B Drain-Race Recovery"]
        direction TB
        CB_CHECK{"● Channel B?<br/>+ subtype in<br/>RECOVERABLE_SUBTYPES?<br/>+ completion_marker?"}
        CB_RECOVER["● _recover_from_separate_marker<br/>━━━━━━━━━━<br/>Reconstruct result from<br/>assistant_messages"]
        CB_PROMOTE["● Promote session<br/>━━━━━━━━━━<br/>subtype → SUCCESS<br/>is_error → False"]
        CB_SKIP["No recovery needed<br/>━━━━━━━━━━<br/>Pass through unchanged"]
    end

    subgraph CompletionGate ["session.session_complete Gate"]
        direction TB
        GATE{"session_complete?<br/>━━━━━━━━━━<br/>not is_error AND<br/>subtype not in<br/>FAILURE_SUBTYPES"}
        MARKER_RECOVER["_recover_from_separate_marker<br/>━━━━━━━━━━<br/>Marker-based recovery"]
        PATTERN_RECOVER["_recover_block_from_assistant_messages<br/>━━━━━━━━━━<br/>Pattern-based recovery"]
        SYNTH["_synthesize_from_write_artifacts<br/>━━━━━━━━━━<br/>UNMONITORED only"]
        SKIP_RECOVERY["Skip all recovery<br/>━━━━━━━━━━<br/>TIMEOUT / genuine failure"]
    end

    subgraph Outcome ["● _compute_outcome"]
        direction TB
        CB_BYPASS{"● Channel B<br/>bypass in<br/>_compute_success?"}
        CONTENT_CHECK["_check_session_content<br/>━━━━━━━━━━<br/>6-gate validation"]
        DEAD_END{"Dead-end guard<br/>━━━━━━━━━━<br/>ABSENT → DRAIN_RACE<br/>CONTRACT_VIOLATION → FAIL"}
    end

    subgraph PostOutcome ["Post-Outcome Gates"]
        direction TB
        BUDGET["_apply_budget_guard<br/>━━━━━━━━━━<br/>Max consecutive retries"]
        CONTRACT["CONTRACT_RECOVERY gate<br/>━━━━━━━━━━<br/>adjudicated_failure +<br/>write evidence"]
        ZERO_WRITE["Zero-write gate<br/>━━━━━━━━━━<br/>Expected writes missing"]
    end

    subgraph Terminals ["TERMINAL STATES"]
        T_SUCCESS([SUCCEEDED])
        T_RETRY([RETRIABLE<br/>DRAIN_RACE / RESUME /<br/>CONTRACT_RECOVERY])
        T_FAIL([FAILED])
        T_BUDGET([BUDGET_EXHAUSTED])
    end

    START --> CB_CHECK
    CB_CHECK -->|"Yes: CHANNEL_B +<br/>UNPARSEABLE or EMPTY_OUTPUT"| CB_RECOVER
    CB_CHECK -->|"No: other channel<br/>or non-recoverable subtype"| CB_SKIP
    CB_RECOVER -->|"Recovery succeeds:<br/>marker standalone +<br/>substantive content"| CB_PROMOTE
    CB_RECOVER -->|"Recovery fails:<br/>no marker in messages"| CB_SKIP
    CB_PROMOTE --> GATE
    CB_SKIP --> GATE

    GATE -->|"True: session promoted<br/>or originally complete"| MARKER_RECOVER
    GATE -->|"False: TIMEOUT /<br/>unrecoverable subtype"| SKIP_RECOVERY
    MARKER_RECOVER --> PATTERN_RECOVER
    PATTERN_RECOVER --> SYNTH
    SYNTH --> CB_BYPASS
    SKIP_RECOVERY --> CB_BYPASS

    CB_BYPASS -->|"CHANNEL_B + session_complete<br/>+ patterns pass"| T_SUCCESS
    CB_BYPASS -->|"No bypass: falls to<br/>termination dispatch"| CONTENT_CHECK
    CONTENT_CHECK -->|"All 6 gates pass"| T_SUCCESS
    CONTENT_CHECK -->|"Any gate fails"| DEAD_END
    DEAD_END -->|"ABSENT + channel confirmed"| T_RETRY
    DEAD_END -->|"CONTRACT_VIOLATION /<br/>SESSION_ERROR"| T_FAIL

    T_RETRY --> BUDGET
    BUDGET -->|"Under limit"| CONTRACT
    BUDGET -->|"Exceeded"| T_BUDGET
    CONTRACT -->|"adjudicated_failure +<br/>writes ≥ 1"| T_RETRY
    CONTRACT -->|"No match"| ZERO_WRITE
    ZERO_WRITE -->|"Expected writes missing"| T_RETRY
    ZERO_WRITE -->|"No issue"| T_SUCCESS

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class CB_CHECK,GATE,CB_BYPASS,DEAD_END stateNode;
    class CB_RECOVER,CB_PROMOTE newComponent;
    class CB_SKIP,SKIP_RECOVERY gap;
    class MARKER_RECOVER,PATTERN_RECOVER,SYNTH handler;
    class CONTENT_CHECK phase;
    class BUDGET,CONTRACT,ZERO_WRITE detector;
    class T_SUCCESS,T_RETRY,T_FAIL,T_BUDGET terminal;
```

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    START(["● _build_skill_result<br/>━━━━━━━━━━<br/>SubprocessResult input"])

    subgraph EarlyExit ["Phase 1: Early Exit Interception"]
        direction TB
        TERM_CHECK{"termination<br/>reason?"}
        STALE_PATH["STALE handler<br/>━━━━━━━━━━<br/>Attempt stdout recovery<br/>then retry or fail"]
        TIMEOUT_PATH["TIMEOUT handler<br/>━━━━━━━━━━<br/>Override subtype=TIMEOUT<br/>is_error=True"]
    end

    PARSE["parse_session_result<br/>━━━━━━━━━━<br/>NDJSON → ClaudeSessionResult<br/>extracts assistant_messages"]

    subgraph DrainRace ["● Phase 2: Channel B Drain-Race Recovery"]
        direction TB
        CB_MATCH{"● match channel<br/>━━━━━━━━━━<br/>CHANNEL_B +<br/>UNPARSEABLE/EMPTY_OUTPUT<br/>+ completion_marker?"}
        CB_RECON["● _recover_from_separate_marker<br/>━━━━━━━━━━<br/>Check marker standalone<br/>in assistant_messages"]
        CB_PROMOTE["● Promote session<br/>━━━━━━━━━━<br/>subtype → SUCCESS<br/>is_error → False"]
        CB_NONE["No drain-race<br/>━━━━━━━━━━<br/>Session unchanged"]
    end

    subgraph GatedRecovery ["Phase 3: Completion-Gated Recovery"]
        direction TB
        GATE{"session_complete?<br/>━━━━━━━━━━<br/>not is_error AND<br/>subtype ∉ FAILURE_SUBTYPES"}
        REC_MARKER["_recover_from_separate_marker<br/>━━━━━━━━━━<br/>Join assistant_messages<br/>when marker is standalone"]
        REC_PATTERN["_recover_block_from_assistant<br/>━━━━━━━━━━<br/>Patterns in messages<br/>not in result"]
        REC_SYNTH["_synthesize_from_write_artifacts<br/>━━━━━━━━━━<br/>UNMONITORED only:<br/>inject write paths"]
        GATE_SKIP["Skip recovery<br/>━━━━━━━━━━<br/>Incomplete session"]
    end

    subgraph ComputeOutcome ["● Phase 4: Outcome Adjudication"]
        direction TB
        COMPUTE["● _compute_outcome<br/>━━━━━━━━━━<br/>_compute_success +<br/>_compute_retry"]
        SUCCESS_CHECK{"● success?"}
        RETRY_CHECK{"needs_retry?"}
    end

    subgraph PostGates ["Phase 5: Post-Outcome Gates"]
        direction TB
        BUDGET_G["_apply_budget_guard<br/>━━━━━━━━━━<br/>consecutive_failures ><br/>max_retries?"]
        CONTRACT_G{"CONTRACT_RECOVERY?<br/>━━━━━━━━━━<br/>adjudicated_failure<br/>+ write_count ≥ 1"}
        ZERO_G{"zero_write_gate?<br/>━━━━━━━━━━<br/>success but no<br/>Write/Edit calls"}
    end

    T_SUCCESS([SUCCEEDED])
    T_RETRY([RETRIABLE])
    T_FAIL([FAILED])

    %% FLOW %%
    START --> TERM_CHECK
    TERM_CHECK -->|"STALE"| STALE_PATH
    TERM_CHECK -->|"TIMED_OUT"| TIMEOUT_PATH
    TERM_CHECK -->|"COMPLETED /<br/>NATURAL_EXIT"| PARSE
    STALE_PATH --> T_RETRY
    TIMEOUT_PATH --> PARSE

    PARSE --> CB_MATCH
    CB_MATCH -->|"Yes: all 3 guards pass"| CB_RECON
    CB_MATCH -->|"No: wrong channel /<br/>wrong subtype / no marker"| CB_NONE
    CB_RECON -->|"Marker found standalone<br/>+ substantive content"| CB_PROMOTE
    CB_RECON -->|"No marker or<br/>empty content"| CB_NONE
    CB_PROMOTE --> GATE
    CB_NONE --> GATE

    GATE -->|"True: complete session"| REC_MARKER
    GATE -->|"False: incomplete"| GATE_SKIP
    REC_MARKER --> REC_PATTERN
    REC_PATTERN --> REC_SYNTH
    REC_SYNTH --> COMPUTE
    GATE_SKIP --> COMPUTE

    COMPUTE --> SUCCESS_CHECK
    SUCCESS_CHECK -->|"True"| ZERO_G
    SUCCESS_CHECK -->|"False"| RETRY_CHECK
    RETRY_CHECK -->|"True"| BUDGET_G
    RETRY_CHECK -->|"False"| CONTRACT_G

    BUDGET_G -->|"Under limit"| T_RETRY
    BUDGET_G -->|"Exhausted"| T_FAIL
    CONTRACT_G -->|"Yes: promote to retry"| BUDGET_G
    CONTRACT_G -->|"No"| T_FAIL
    ZERO_G -->|"Writes expected<br/>but count = 0"| T_RETRY
    ZERO_G -->|"OK"| T_SUCCESS

    %% CLASS ASSIGNMENTS %%
    class START,T_SUCCESS,T_RETRY,T_FAIL terminal;
    class TERM_CHECK,CB_MATCH,GATE,SUCCESS_CHECK,RETRY_CHECK stateNode;
    class STALE_PATH,TIMEOUT_PATH,PARSE handler;
    class CB_RECON,CB_PROMOTE newComponent;
    class CB_NONE,GATE_SKIP gap;
    class REC_MARKER,REC_PATTERN,REC_SYNTH handler;
    class COMPUTE phase;
    class BUDGET_G,CONTRACT_G,ZERO_G detector;
```

Closes #619

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-619-20260405-085642-620214/.autoskillit/temp/make-plan/channel_b_drain_race_recovery_plan_2026-04-05_090230.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 42 | 18.8k | 1.6M | 80.7k | 1 | 9m 8s |
| verify | 17 | 17.4k | 687.5k | 79.7k | 1 | 6m 55s |
| implement | 77 | 28.2k | 4.4M | 89.7k | 1 | 15m 40s |
| audit_impl | 14 | 8.9k | 348.9k | 43.4k | 1 | 3m 4s |
| open_pr | 3.0k | 17.7k | 865.3k | 63.1k | 1 | 7m 30s |
| **Total** | 3.1k | 91.0k | 8.0M | 356.6k | | 42m 19s |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ulator FakeClaudeCLI (#624)

## Summary

Add 10 end-to-end tests in a new file
`tests/execution/test_session_classification_e2e.py` that exercise the
full session failure classification pipeline — from raw NDJSON
subprocess output produced by api-simulator's `fake_claude` fixture
through `parse_session_result()` and `_build_skill_result()` to final
`SkillResult` classification. Today all headless tests use
`MockSubprocessRunner` with pre-constructed `SubprocessResult` objects;
the NDJSON parsing and classification logic is never exercised against
realistic subprocess output. These tests close that gap using 4 groups:
NDJSON stream robustness (4 tests), context exhaustion edge cases (2
tests), kill boundary scenarios (2 tests), and process behavior
simulation (2 tests).

No production code changes are required. The `api-simulator` dev
dependency was added by #607.

## Requirements

### BRIDGE — Integration Bridge

- **REQ-BRIDGE-001:** Tests must use `fake_claude.run()` to produce real
subprocess output, not hand-constructed strings.
- **REQ-BRIDGE-002:** Tests must feed `proc.stdout` through
`parse_session_result()` from `autoskillit.execution.session`.
- **REQ-BRIDGE-003:** Tests must wrap the parsed result in a
`SubprocessResult` and pass it to `_build_skill_result()` for full
classification.

### PARSE — NDJSON Parse Robustness

- **REQ-PARSE-001:** The parser must correctly skip `type=system` /
`api_retry` records and still extract the final `type=result` record.
- **REQ-PARSE-002:** The parser must handle non-JSON lines (stream
corruption) gracefully without losing valid records.
- **REQ-PARSE-003:** When multiple `type=result` records appear, the
last one must determine classification.

### CTX — Context Exhaustion

- **REQ-CTX-001:** A flat assistant record containing the context
exhaustion marker with no `type=result` record must classify as
`context_exhaustion` with `needs_retry=True`.
- **REQ-CTX-002:** A `type=result` record with `is_error=True` and
`errors` containing the marker must classify as retriable with
`retry_reason=RESUME`.

### KILL — Kill Boundary

- **REQ-KILL-001:** A truncated stream (via `truncate_after`) must
produce `subtype=unparseable` or partial classification with nonzero
exit code.
- **REQ-KILL-002:** An `interrupted` subtype with nonzero exit code must
result in `needs_retry=False` (gated by returncode).

### PROC — Process Behavior

- **REQ-PROC-001:** The hang-after-result scenario must verify that the
result record was emitted to stdout before the process hung.
- **REQ-PROC-002:** Mid-stream exit via `inject_exit` must produce the
correct exit code and truncated stdout.

### COMPAT — Compatibility

- **REQ-COMPAT-001:** Existing `test_headless.py` and `test_session.py`
tests must remain unchanged and passing.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([FakeClaudeCLI<br/>━━━━━━━━━━<br/>api-simulator fixture])

    subgraph Bridge ["★ E2E Test Bridge (new)"]
        direction TB
        RUN["★ fake_claude.run()<br/>━━━━━━━━━━<br/>CompletedProcess<br/>with real NDJSON stdout"]
        WRAP["★ _classify() / inline<br/>━━━━━━━━━━<br/>Wrap in SubprocessResult<br/>pid=0, caller termination"]
    end

    subgraph Parse ["parse_session_result()"]
        direction TB
        SCAN{"stdout empty?"}
        LOOP["Scan NDJSON lines<br/>━━━━━━━━━━<br/>JSON decode; skip errors<br/>last type=result wins"]
        CTX_FLAG{"flat assistant<br/>output_tokens=0<br/>+ ctx marker?"}
        RESULT_FOUND{"result record<br/>found?"}
    end

    subgraph Classify ["_compute_outcome()"]
        direction TB
        SUCCESS_GATE{"_compute_success<br/>━━━━━━━━━━<br/>returncode=0?<br/>is_error? result?"}
        RETRY_GATE{"_compute_retry<br/>━━━━━━━━━━<br/>session.needs_retry?<br/>kill anomaly?"}
        CONTRA{"contradiction<br/>success+retry?"}
        DEADEND{"dead-end<br/>failed+confirmed<br/>+ABSENT?"}
    end

    subgraph Normalize ["_normalize_subtype()"]
        NORM["Map raw CLI subtype<br/>━━━━━━━━━━<br/>to final string label"]
    end

    subgraph Gates ["Post-Classification Gates"]
        BUDGET{"budget<br/>exhausted?"}
        ZERO{"zero writes<br/>when expected?"}
    end

    subgraph Outcomes ["SkillResult"]
        direction LR
        OK([success])
        CTX([context_exhaustion<br/>needs_retry=True])
        EMPTY([empty_output /<br/>unparseable])
        INTR([interrupted<br/>needs_retry=False])
        FAIL([failure<br/>terminal])
    end

    START --> RUN
    RUN --> WRAP
    WRAP --> SCAN
    SCAN -->|"empty"| EMPTY
    SCAN -->|"non-empty"| LOOP
    LOOP --> CTX_FLAG
    CTX_FLAG -->|"yes → jsonl_context_exhausted=True"| RESULT_FOUND
    CTX_FLAG -->|"no"| RESULT_FOUND
    RESULT_FOUND -->|"yes"| SUCCESS_GATE
    RESULT_FOUND -->|"no → UNPARSEABLE / CTX_EXHAUSTION"| RETRY_GATE
    SUCCESS_GATE --> RETRY_GATE
    RETRY_GATE --> CONTRA
    CONTRA -->|"demote success"| DEADEND
    CONTRA -->|"consistent"| DEADEND
    DEADEND -->|"DRAIN_RACE"| NORM
    DEADEND -->|"terminal"| NORM
    NORM --> BUDGET
    BUDGET -->|"BUDGET_EXHAUSTED"| FAIL
    BUDGET -->|"ok"| ZERO
    ZERO -->|"zero_writes"| CTX
    ZERO -->|"ok"| OK
    SUCCESS_GATE -->|"returncode!=0"| INTR

    class START terminal;
    class RUN,WRAP newComponent;
    class LOOP handler;
    class SCAN,CTX_FLAG,RESULT_FOUND stateNode;
    class SUCCESS_GATE,RETRY_GATE,CONTRA,DEADEND phase;
    class NORM handler;
    class BUDGET,ZERO detector;
    class OK,CTX,EMPTY,INTR,FAIL terminal;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Start (FakeClaudeCLI), final SkillResult
outcomes |
| Green | New Component | ★ `_classify()` bridge helper and
`fake_claude.run()` — new test code |
| Orange | Handler | NDJSON scan/accumulation and subtype normalization
|
| Teal | State | Decision points: empty check, context flag, result
found |
| Purple | Phase | Outcome computation gates (success, retry,
contradiction, dead-end) |
| Red | Detector | Post-classification guards (budget, zero-write) |

### Error/Resilience Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    START([★ E2E Test Suite<br/>━━━━━━━━━━<br/>10 failure scenarios<br/>via FakeClaudeCLI])

    subgraph ParseGates ["NDJSON Parse Resilience Gates"]
        direction TB
        EMPTY_CHECK{"stdout<br/>empty?"}
        JSON_ERR["Corrupt / non-JSON lines<br/>━━━━━━━━━━<br/>silently skipped<br/>(test 2: corrupt_stream)"]
        API_RETRY["api_retry records<br/>━━━━━━━━━━<br/>skipped — not type=result<br/>(test 1: inject_api_retry)"]
        LAST_WINS["Multiple result records<br/>━━━━━━━━━━<br/>last record wins<br/>(test 3: two results)"]
        EXHAUST["Exhausted retries<br/>━━━━━━━━━━<br/>no result record emitted<br/>(test 4: exhaust=True)"]
    end

    subgraph CtxDetect ["Context Exhaustion Detection"]
        direction TB
        FLAT_DETECT{"flat assistant<br/>output_tokens=0<br/>+ ctx marker?<br/>(test 5)"}
        ERR_DETECT{"is_error=True AND<br/>marker in errors[]?<br/>(test 6)"}
        CTX_FLAG["jsonl_context_exhausted<br/>━━━━━━━━━━<br/>race-resilient flag"]
    end

    subgraph KillGates ["Kill Boundary Gates"]
        direction TB
        RC_CHECK{"returncode != 0?"}
        KILL_ANOM{"_is_kill_anomaly?<br/>━━━━━━━━━━<br/>UNPARSEABLE /\nEMPTY_OUTPUT /\nINTERRUPTED"}
        INTR_GATE{"subtype=interrupted<br/>+ rc != 0?<br/>(test 8)"}
    end

    subgraph PostGates ["Post-Classification Guards"]
        BUDGET{"consecutive failures<br/>> budget max?"}
        ZERO_WRITE{"success AND<br/>write_count=0<br/>AND write expected?"}
    end

    T_SUCCESS([success<br/>━━━━━━━━━━<br/>needs_retry=False])
    T_CTX([context_exhaustion<br/>━━━━━━━━━━<br/>needs_retry=True, RESUME])
    T_EMPTY([empty_output / unparseable<br/>━━━━━━━━━━<br/>needs_retry=True via RESUME])
    T_INTR([interrupted<br/>━━━━━━━━━━<br/>needs_retry=False, terminal])
    T_BUDGET([budget_exhausted<br/>━━━━━━━━━━<br/>needs_retry=False, terminal])
    T_ZERO([zero_writes<br/>━━━━━━━━━━<br/>needs_retry=True])

    START --> EMPTY_CHECK
    EMPTY_CHECK -->|"empty stdout"| T_EMPTY
    EMPTY_CHECK -->|"has content"| JSON_ERR
    JSON_ERR -->|"skip bad lines, continue"| API_RETRY
    API_RETRY -->|"skip, continue to result"| LAST_WINS
    LAST_WINS -->|"no result"| EXHAUST
    EXHAUST -->|"empty_output / unparseable"| T_EMPTY
    LAST_WINS -->|"result found"| FLAT_DETECT
    FLAT_DETECT -->|"yes"| CTX_FLAG
    FLAT_DETECT -->|"no"| ERR_DETECT
    ERR_DETECT -->|"yes"| CTX_FLAG
    CTX_FLAG -->|"needs_retry=True"| T_CTX
    ERR_DETECT -->|"no"| RC_CHECK
    RC_CHECK -->|"nonzero (test 7,8,10)"| INTR_GATE
    INTR_GATE -->|"yes → no retry"| T_INTR
    INTR_GATE -->|"no"| T_EMPTY
    RC_CHECK -->|"zero"| KILL_ANOM
    KILL_ANOM -->|"anomaly → RESUME retry"| T_EMPTY
    KILL_ANOM -->|"no anomaly"| BUDGET
    BUDGET -->|"exceeded"| T_BUDGET
    BUDGET -->|"ok"| ZERO_WRITE
    ZERO_WRITE -->|"violation"| T_ZERO
    ZERO_WRITE -->|"ok"| T_SUCCESS

    class START newComponent;
    class EMPTY_CHECK,FLAT_DETECT,ERR_DETECT,RC_CHECK,KILL_ANOM,INTR_GATE stateNode;
    class JSON_ERR,API_RETRY,LAST_WINS,EXHAUST,CTX_FLAG handler;
    class BUDGET,ZERO_WRITE detector;
    class T_SUCCESS,T_CTX,T_EMPTY,T_INTR,T_BUDGET,T_ZERO terminal;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Green | New Component | ★ E2E test suite (new) — exercises all failure
paths |
| Teal | Decision Gates | Key detection and routing decisions |
| Orange | Handler | Parse resilience processing and flag setting |
| Red | Guard | Post-classification safety guards (budget, zero-write) |
| Dark Blue | Terminal | Final SkillResult outcome states |

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    TEST["★ test_session_classification_e2e.py<br/>━━━━━━━━━━<br/>10 scenarios assert field contracts<br/>across all classification paths"]

    subgraph ParseState ["INIT_ONLY — Set by Parser, Never Overwritten"]
        direction LR
        CTX_EX["jsonl_context_exhausted<br/>━━━━━━━━━━<br/>flat assistant → True<br/>read by _is_context_exhausted()"]
        RC["returncode / termination<br/>━━━━━━━━━━<br/>from SubprocessResult<br/>used in all compute_* gates"]
        SID["session_id<br/>━━━━━━━━━━<br/>from result record<br/>passed through unchanged"]
    end

    subgraph DerivedState ["DERIVED — Computed, Not Stored During Parse"]
        direction TB
        SUCCESS_D["success<br/>━━━━━━━━━━<br/>returncode=0 AND content gates<br/>must be False if needs_retry=True"]
        RETRY_D["needs_retry + retry_reason<br/>━━━━━━━━━━<br/>RESUME / ZERO_WRITES / etc.<br/>only valid pair if needs_retry=True"]
        SUBTYPE_D["subtype (normalized)<br/>━━━━━━━━━━<br/>'success' / 'context_exhaustion'<br/>/ 'interrupted' / etc."]
    end

    subgraph Contracts ["CONTRACT ENFORCEMENT GATES"]
        direction TB
        CONTRA_GATE{"Contradiction Guard<br/>━━━━━━━━━━<br/>success=True AND<br/>needs_retry=True?"}
        INTR_GATE{"Interrupted Gate<br/>━━━━━━━━━━<br/>subtype=interrupted AND<br/>rc != 0?"}
        CTX_GATE{"Context Exhaustion<br/>━━━━━━━━━━<br/>jsonl_context_exhausted OR<br/>marker in errors[]?"}
        BUDGET_GATE{"Budget Guard<br/>━━━━━━━━━━<br/>consecutive failures<br/>> budget max?"}
    end

    subgraph ResumeStates ["RESUME SAFETY — needs_retry contract"]
        direction LR
        RESUME_OK(["needs_retry=True<br/>retry_reason=RESUME<br/>━━━━━━━━━━<br/>context_exhaustion path"])
        NO_RETRY(["needs_retry=False<br/>retry_reason=NONE<br/>━━━━━━━━━━<br/>interrupted + rc!=0 path"])
        BUDGET_STOP(["needs_retry=False<br/>retry_reason=BUDGET_EXHAUSTED<br/>━━━━━━━━━━<br/>terminal, no more retries"])
    end

    TEST -->|"asserts all contracts"| CTX_EX
    TEST --> RC
    TEST --> SID

    CTX_EX -->|"read by"| CTX_GATE
    RC -->|"read by"| INTR_GATE
    RC -->|"read by"| CONTRA_GATE

    CTX_GATE -->|"exhausted → needs_retry=True"| RETRY_D
    CTX_GATE -->|"not exhausted"| INTR_GATE
    INTR_GATE -->|"interrupted+rc!=0 → terminal"| NO_RETRY
    INTR_GATE -->|"other"| CONTRA_GATE
    CONTRA_GATE -->|"contradiction → demote success"| SUCCESS_D
    CONTRA_GATE -->|"consistent"| SUCCESS_D
    RETRY_D --> BUDGET_GATE
    SUCCESS_D --> BUDGET_GATE
    SUBTYPE_D --> BUDGET_GATE
    BUDGET_GATE -->|"exceeded → clamp"| BUDGET_STOP
    BUDGET_GATE -->|"within budget"| RESUME_OK

    class TEST newComponent;
    class CTX_EX,RC,SID detector;
    class SUCCESS_D,RETRY_D,SUBTYPE_D phase;
    class CTX_GATE,INTR_GATE,CONTRA_GATE,BUDGET_GATE stateNode;
    class RESUME_OK,NO_RETRY,BUDGET_STOP cli;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Green | New Component | ★ E2E test suite — asserts all field contracts
|
| Red | INIT_ONLY | Fields set by parser, never overwritten |
| Purple | Derived | Fields computed from classification, not stored
during parse |
| Teal | Gates | Contract enforcement decision points |
| Dark Blue | Resume States | Terminal resume-safety outcomes |

Closes #608

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-608-20260405-085643-660865/.autoskillit/temp/make-plan/test_session_failure_classification_with_api_simulator_plan_2026-04-05_090300.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 31 | 22.4k | 812.6k | 59.1k | 1 | 12m 6s |
| verify | 21 | 17.2k | 863.3k | 66.7k | 1 | 9m 28s |
| implement | 2.5k | 9.4k | 1.1M | 48.2k | 1 | 5m 43s |
| fix | 21 | 7.3k | 703.0k | 42.4k | 1 | 7m 38s |
| audit_impl | 10 | 7.4k | 139.9k | 39.6k | 1 | 3m 29s |
| open_pr | 47 | 27.2k | 2.2M | 74.8k | 1 | 10m 44s |
| **Total** | 2.7k | 90.9k | 5.8M | 330.8k | | 49m 11s |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…rect Changes (#623)

## Summary

When `implement-worktree-no-merge` runs and the model ignores
instructions to create a worktree (via `git worktree add`), it edits
files directly in the clone directory. This leaves dirty uncommitted
changes (or direct commits) on the clone's branch. On retry, the next
session inherits a contaminated working tree.

This plan adds a **clone contamination guard** to the headless execution
pipeline. The guard:
1. Snapshots the clone's HEAD SHA before each worktree-based skill
session
2. After a failed session where no worktree was created, detects
contamination (uncommitted changes or direct commits)
3. Reverts the clone to its pre-session state
4. Logs the cleanup for pipeline observability

Key architectural insight: `EnterWorktree` does not exist in this
codebase. Worktree creation uses standard `git worktree add` via Bash,
and success is signaled by emitting `worktree_path = <path>` tokens in
assistant messages. Detection of "no worktree created" is therefore: no
`worktree_path` token in `session.assistant_messages`.

## Requirements

### Snapshot (SNAP)

- **REQ-SNAP-001:** The system must capture the clone HEAD SHA before
each `run_skill` invocation for worktree-based skills
(implement-worktree-no-merge, retry-worktree).
- **REQ-SNAP-002:** The system must capture the clone working tree
cleanliness state (clean/dirty) before each `run_skill` invocation for
worktree-based skills.

### Detection (DET)

- **REQ-DET-001:** The system must detect uncommitted changes in the
clone CWD after a worktree-based skill session that was adjudicated as
failure.
- **REQ-DET-002:** The system must detect direct commits in the clone
(HEAD differs from pre-session SHA) after a worktree-based skill session
that was adjudicated as failure.
- **REQ-DET-003:** The system must verify whether `EnterWorktree` was
called during the session by inspecting tool_uses in the session result.

### Revert (REV)

- **REQ-REV-001:** The system must revert uncommitted changes in the
clone when contamination is detected (git checkout + git clean).
- **REQ-REV-002:** The system must revert direct commits in the clone
when contamination is detected (git reset to pre-session SHA).
- **REQ-REV-003:** The revert must only execute when all three
conditions are met: worktree-based skill, adjudicated failure, and no
EnterWorktree call in tool_uses.

### Observability (OBS)

- **REQ-OBS-001:** The system must log all contamination detection and
revert actions in the audit log with sufficient detail for pipeline
visibility.
- **REQ-OBS-002:** The audit log entry must include the pre-session SHA,
post-session SHA, list of contaminated files, and revert action taken.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START(["● run_headless_core()"])

    subgraph PreSession ["★ Pre-Session Snapshot"]
        direction TB
        IS_WT{"★ is_worktree_skill?<br/>━━━━━━━━━━<br/>implement-worktree-no-merge<br/>or retry-worktree in cmd"}
        IS_CLONE{"★ not is_git_worktree?<br/>━━━━━━━━━━<br/>cwd is clone root,<br/>not a worktree"}
        SNAP["★ snapshot_clone_state()<br/>━━━━━━━━━━<br/>git rev-parse HEAD<br/>→ CloneSnapshot(head_sha)"]
    end

    subgraph Session ["Existing Session Lifecycle"]
        direction TB
        RUN["● runner() subprocess<br/>━━━━━━━━━━<br/>Headless Claude CLI"]
        BUILD["● _build_skill_result()<br/>━━━━━━━━━━<br/>Adjudication + gates<br/>worktree_path always extracted"]
    end

    subgraph PostGuard ["★ Post-Session Clone Guard"]
        direction TB
        CHK_SNAP{"★ snapshot captured?<br/>━━━━━━━━━━<br/>_clone_snapshot is not None"}
        CHK_SUCC{"★ skill_result.success?"}
        CHK_WT{"★ worktree_path set?<br/>━━━━━━━━━━<br/>skill_result.worktree_path<br/>is not None"}
        DETECT["★ detect_contamination()<br/>━━━━━━━━━━<br/>git rev-parse HEAD → post_sha<br/>git status --porcelain → files"]
        CHK_DIRTY{"★ contamination found?<br/>━━━━━━━━━━<br/>post_sha ≠ pre_sha<br/>OR dirty files"}
        REVERT["★ revert_contamination()<br/>━━━━━━━━━━<br/>git reset --hard pre_sha<br/>git clean -fd"]
        AUDIT["★ audit.record_failure()<br/>━━━━━━━━━━<br/>subtype=clone_contamination<br/>RetryReason.CLONE_CONTAMINATION"]
    end

    FLUSH["● flush_session_log()<br/>━━━━━━━━━━<br/>★ clone_contamination_reverted<br/>→ summary.json"]
    RETURN(["● return skill_result"])
    SKIP_SNAP(["skip → _clone_snapshot=None"])

    START --> IS_WT
    IS_WT -->|"no: not a worktree skill"| SKIP_SNAP
    IS_WT -->|"yes"| IS_CLONE
    IS_CLONE -->|"already a worktree CWD"| SKIP_SNAP
    IS_CLONE -->|"clone root CWD"| SNAP
    SNAP --> RUN
    SKIP_SNAP --> RUN
    RUN --> BUILD
    BUILD --> CHK_SNAP
    CHK_SNAP -->|"no snapshot"| FLUSH
    CHK_SNAP -->|"snapshot exists"| CHK_SUCC
    CHK_SUCC -->|"success=True"| FLUSH
    CHK_SUCC -->|"success=False"| CHK_WT
    CHK_WT -->|"worktree created"| FLUSH
    CHK_WT -->|"no worktree"| DETECT
    DETECT --> CHK_DIRTY
    CHK_DIRTY -->|"clean"| FLUSH
    CHK_DIRTY -->|"contaminated"| REVERT
    REVERT --> AUDIT
    AUDIT --> FLUSH
    FLUSH --> RETURN

    class START,RETURN,SKIP_SNAP terminal;
    class IS_WT,IS_CLONE,CHK_SNAP,CHK_SUCC,CHK_WT,CHK_DIRTY stateNode;
    class RUN,BUILD,FLUSH handler;
    class SNAP,DETECT,REVERT,AUDIT newComponent;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Entry/exit points of `run_headless_core` |
| Teal | State/Decision | Routing decisions that control guard
activation |
| Orange | Handler | Existing subprocess, adjudication, and telemetry
nodes |
| Green | New Component | New clone contamination guard components (★) |

### Module Dependency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    subgraph L3 ["L3 — SERVER (existing, unchanged)"]
        direction LR
        SERVER["server/tools_execution.py<br/>━━━━━━━━━━<br/>run_skill, run_cmd handlers"]
    end

    subgraph L1 ["L1 — EXECUTION"]
        direction TB
        HEADLESS["● execution/headless.py<br/>━━━━━━━━━━<br/>run_headless_core()<br/>_build_skill_result()"]
        CLONE_GUARD["★ execution/clone_guard.py<br/>━━━━━━━━━━<br/>is_worktree_skill()<br/>snapshot_clone_state()<br/>check_and_revert_clone_contamination()"]
        SESSION_LOG["● execution/session_log.py<br/>━━━━━━━━━━<br/>flush_session_log()<br/>★ clone_contamination_reverted"]
        COMMANDS["execution/commands.py<br/>━━━━━━━━━━<br/>build_full_headless_cmd()"]
        SESSION["execution/session.py<br/>━━━━━━━━━━<br/>ClaudeSessionResult"]
    end

    subgraph L0 ["L0 — CORE (zero autoskillit imports)"]
        direction TB
        ENUMS["● core/_type_enums.py<br/>━━━━━━━━━━<br/>RetryReason enum<br/>★ CLONE_CONTAMINATION added"]
        TYPES["core/types.py<br/>━━━━━━━━━━<br/>SkillResult, FailureRecord<br/>AuditStore, SubprocessRunner"]
        PATHS["core/paths.py<br/>━━━━━━━━━━<br/>is_git_worktree()"]
        LOGGING["core/logging.py<br/>━━━━━━━━━━<br/>get_logger()"]
        CORE_INIT["core/__init__.py<br/>━━━━━━━━━━<br/>Re-exports all L0 surface"]
    end

    subgraph Ext ["EXTERNAL (stdlib)"]
        STDLIB["dataclasses, pathlib<br/>datetime, typing"]
    end

    SERVER -->|"imports run_headless"| HEADLESS
    HEADLESS -->|"★ imports 3 functions"| CLONE_GUARD
    HEADLESS -->|"imports"| COMMANDS
    HEADLESS -->|"imports"| SESSION
    HEADLESS -->|"imports"| SESSION_LOG
    HEADLESS -->|"imports core surface"| CORE_INIT
    CLONE_GUARD -->|"★ imports FailureRecord<br/>RetryReason, SkillResult<br/>get_logger, is_git_worktree"| CORE_INIT
    SESSION_LOG -->|"imports"| LOGGING
    CORE_INIT -->|"re-exports"| ENUMS
    CORE_INIT -->|"re-exports"| TYPES
    CORE_INIT -->|"re-exports"| PATHS
    CORE_INIT -->|"re-exports"| LOGGING
    TYPES -->|"imports RetryReason"| ENUMS
    CLONE_GUARD -->|"stdlib only"| STDLIB
    ENUMS -->|"stdlib only"| STDLIB

    class SERVER cli;
    class HEADLESS,SESSION_LOG,COMMANDS,SESSION handler;
    class CLONE_GUARD newComponent;
    class ENUMS,TYPES,PATHS,LOGGING,CORE_INIT stateNode;
    class STDLIB integration;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Server (L3) | MCP tool handlers — top application layer |
| Orange | Execution (L1) | Service/orchestration layer modules |
| Green | New Module | `clone_guard.py` — new L1 execution module (★) |
| Teal | Core (L0) | Stable vocabulary/type layer — high fan-in |
| Red | External | Standard library dependencies |

Closes #617

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-617-20260405-085643-202786/.autoskillit/temp/make-plan/clone_contamination_guard_plan_2026-04-05_090600.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 6.9k | 23.4k | 1.7M | 82.7k | 1 | 10m 39s |
| verify | 33 | 20.7k | 1.4M | 55.6k | 1 | 8m 39s |
| implement | 81 | 24.3k | 4.4M | 89.7k | 1 | 10m 6s |
| fix | 40 | 14.4k | 1.7M | 62.9k | 1 | 9m 17s |
| audit_impl | 13 | 11.0k | 288.2k | 45.3k | 1 | 4m 14s |
| open_pr | 28 | 20.1k | 1.0M | 55.4k | 1 | 7m 18s |
| **Total** | 7.1k | 113.9k | 10.5M | 391.6k | | 50m 15s |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…rtifact Merge Phase (#625)

## Summary

Add a six-step archival phase to the end of the research recipe
(`research.yaml`) that separates research artifacts from experimental
code before completion. After all review cycles, re-runs, and CI checks
finish, the new phase: (1) captures the experiment branch name, (2)
creates a clean artifact-only branch containing only `research/` from a
temporary worktree, (3) opens an artifact PR targeting the base branch,
(4) tags the full experiment branch under `archive/research/` for
permanent reference, (5) closes the original experiment PR with
cross-reference links, then (6) proceeds to `research_complete`. Every
archival step degrades gracefully — `on_failure` routes to
`research_complete` so the pipeline never blocks on archival failures.

## Requirements

### SPLIT — Artifact Extraction

- **REQ-SPLIT-001:** The recipe must create a new branch from the base
branch (e.g., main) containing only the `research/` directory contents
from the experiment branch, with no production source file changes.
- **REQ-SPLIT-002:** The artifact extraction must use `git checkout
<experiment-branch> -- research/` (or equivalent) to copy only the
research directory's file state, not replay commit history.
- **REQ-SPLIT-003:** The artifact-only branch must produce a single
clean commit with a descriptive message referencing the experiment name.

### PR — Artifact PR

- **REQ-PR-001:** The recipe must open a PR targeting the base branch
with the artifact-only branch, referencing the original experiment PR
number and summarizing key findings in the body.
- **REQ-PR-002:** The artifact PR must contain zero changes to
production source files — only files under `research/`.

### TAG — Branch Archival

- **REQ-TAG-001:** The recipe must create an annotated git tag with the
prefix `archive/research/` capturing the final state of the experiment
branch (after all reviews, re-runs, and CI pass).
- **REQ-TAG-002:** The annotated tag message must include the experiment
name and a note that the report was merged via the artifact PR.
- **REQ-TAG-003:** The tag must be pushed to the remote before the
experiment branch is cleaned up.

### CLOSE — Experiment PR Closure

- **REQ-CLOSE-001:** The recipe must close the original experiment PR
with a comment linking to the artifact PR, the archive tag, and any
follow-up implementation issues.
- **REQ-CLOSE-002:** The closure comment must explain why the PR was not
merged (experimental code in production source files) and where the
research record is preserved.

### ORDER — Execution Ordering

- **REQ-ORDER-001:** The archival phase must execute only after all
review cycles, review resolutions, experiment re-runs (per #618), and CI
checks have completed successfully.
- **REQ-ORDER-002:** The archival phase must be the final phase before
`research_complete`, not interleaved with review or re-validation steps.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    subgraph PostReview ["● Post-Review Phase (modified routing)"]
        direction TB
        GPR{"guard_pr_url<br/>━━━━━━━━━━<br/>pr_url set?"}
        RRP["● review_research_pr<br/>━━━━━━━━━━<br/>run_skill: review-pr<br/>skip_when_false: review_pr"]
        RRR["● resolve_research_review<br/>━━━━━━━━━━<br/>run_skill: resolve-review<br/>retries: 2"]
        CE{"check_escalations<br/>━━━━━━━━━━<br/>needs_rerun?"}
        RERUN["re_run_experiment<br/>━━━━━━━━━━<br/>run-experiment --adjust"]
        REWRITE["re_write_report<br/>━━━━━━━━━━<br/>write-report"]
        RETEST["re_test<br/>━━━━━━━━━━<br/>test_check"]
        REPUSH["● re_push_research<br/>━━━━━━━━━━<br/>git push"]
    end

    subgraph Archival ["★ Archival Phase (new)"]
        direction TB
        BA{"★ begin_archival<br/>━━━━━━━━━━<br/>pr_url truthy?"}
        CEB["★ capture_experiment_branch<br/>━━━━━━━━━━<br/>git rev-parse HEAD<br/>captures: experiment_branch"]
        CAB["★ create_artifact_branch<br/>━━━━━━━━━━<br/>worktree + checkout research/<br/>captures: artifact_branch"]
        OAP["★ open_artifact_pr<br/>━━━━━━━━━━<br/>gh pr create (research/ only)<br/>captures: artifact_pr_url"]
        TEB["★ tag_experiment_branch<br/>━━━━━━━━━━<br/>git tag -a archive/research/*<br/>captures: archive_tag"]
        CEP["★ close_experiment_pr<br/>━━━━━━━━━━<br/>gh pr close + comment"]
    end

    RC([research_complete<br/>━━━━━━━━━━<br/>action: stop])

    GPR -->|"pr_url empty"| RC
    GPR -->|"pr_url truthy"| RRP
    RRP -->|"changes_requested"| RRR
    RRP -->|"needs_human / default / fail"| BA
    RRR -->|"success"| CE
    RRR -->|"exhausted / fail"| BA
    CE -->|"needs_rerun=true"| RERUN
    CE -->|"default"| REPUSH
    RERUN --> REWRITE --> RETEST --> REPUSH
    REPUSH -->|"success / fail"| BA

    BA -->|"pr_url truthy"| CEB
    BA -->|"default"| RC
    CEB -->|"success"| CAB
    CEB -->|"fail"| RC
    CAB -->|"success"| OAP
    CAB -->|"fail"| RC
    OAP -->|"success"| TEB
    OAP -->|"fail"| RC
    TEB -->|"success"| CEP
    TEB -->|"fail"| RC
    CEP -->|"success / fail"| RC

    class GPR,CE,BA stateNode;
    class RRP,RRR,RERUN,REWRITE,RETEST,REPUSH handler;
    class CEB,CAB,OAP,TEB,CEP newComponent;
    class RC terminal;
```

**Color Legend:**

| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | `research_complete` stop state |
| Teal | State/Route | Decision and routing steps (guard_pr_url,
check_escalations, begin_archival) |
| Orange | Handler | Existing processing steps — `●` marks modified
routing targets |
| Green | New Component | Six new archival steps (`★`) — linear chain
with graceful degradation |

Closes #621

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-101015-593986/.autoskillit/temp/make-plan/research_recipe_post_completion_archival_plan_2026-04-05_101500.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.2k | 36.6k | 1.4M | 90.3k | 1 | 16m 17s |
| verify | 32 | 25.8k | 1.2M | 55.5k | 1 | 14m 5s |
| implement | 48 | 14.0k | 1.9M | 50.5k | 1 | 5m 52s |
| audit_impl | 16 | 9.7k | 178.9k | 55.3k | 2 | 4m 31s |
| open_pr | 22 | 11.7k | 690.1k | 46.2k | 1 | 4m 26s |
| **Total** | 2.3k | 97.7k | 5.4M | 297.8k | | 45m 13s |

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary

- Add `Configure git auth for private deps` step to
`patch-bump-integration.yml` and `version-bump.yml` before `uv lock`
runs
- Fixes authentication failure when resolving the private
`api-simulator` git dependency added in PR #613
- Mirrors the existing auth pattern already present in `tests.yml` (line
76)

## Root Cause

PR #613 added `api-simulator` as a private git dependency in
`pyproject.toml`. The `tests.yml` workflow was updated with git auth,
but both version-bump workflows were missed. Every PR merged to
`integration` since then fails at the `uv lock` step with:

```
fatal: could not read Username for 'https://github.com': terminal prompts disabled
```

## Test plan

- [ ] This PR's own CI passes (tests.yml)
- [ ] After merge, the patch-bump workflow should succeed — verify by
checking the `bump-patch` check on this PR's merge commit
- [ ] Re-run a recent failed bump-patch workflow to confirm the fix

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary

Fixes a 3-iteration ejection loop in the merge queue pipeline by
introducing ejection-cause enrichment (`ejected_ci_failure` state and
`ejection_cause` field in `wait_for_merge_queue`), a CI gate after every
force-push (`ci_watch_post_queue_fix` step), and two post-rebase
manifest validation gates (language-aware validity check and duplicate
key scan) in `resolve-merge-conflicts`. Closes all six gaps identified
in #627: blind CI ejection routing, missing CI gate after re-push,
absent manifest/semantic validation, and missing `head_sha` in CI
results.

<details>
<summary>Individual Group Plans</summary>

### Group 1: Implementation Plan: Queue Ejection Loop Fix — PART A ONLY

This part addresses the Python code layer for the queue ejection loop
fix (Gaps 2 and 5 from issue #627).

**Gap 2** — `execution/merge_queue.py` currently returns
`pr_state="ejected"` for every ejection regardless of cause. When
GitHub's CI fails on a merge-group commit, the recipe cannot distinguish
a CI failure ejection from a conflict ejection, so it retries conflict
resolution indefinitely (no-op rebase loop). The fix: when the ejection
is confirmed and `checks_state == "FAILURE"`, return
`pr_state="ejected_ci_failure"` plus an `ejection_cause="ci_failure"`
field, allowing recipe `on_result` routing to send CI failures directly
to `diagnose_ci` instead of `queue_ejected_fix`.

**Gap 5** — `server/tools_ci.py` infers `head_sha` from `git rev-parse
HEAD` but never includes it in the JSON response. Recipe orchestrators
cannot verify that CI results correspond to the current HEAD after a
force-push. The fix: include `head_sha` in the `wait_for_ci` return dict
when it was resolved.

### Group 2: Implementation Plan: Queue Ejection Loop Fix — PART B ONLY

This part addresses the recipe and skill layer of the queue ejection
loop fix (Gaps 1, 3, 4, 6 from issue #627). Part A (code layer) must be
implemented first — this part routes on `pr_state="ejected_ci_failure"`
which Part A introduces.

**Gap 1** — `re_push_queue_fix` routes directly to `reenter_merge_queue`
after force-push, bypassing CI. Fix: insert a new
`ci_watch_post_queue_fix` step between `re_push_queue_fix` and
`reenter_merge_queue`, mirroring the existing `ci_watch` step.

**Gap 6** — `wait_for_queue` routes all `ejected` states to
`queue_ejected_fix` (conflict resolution), even when the ejection was
caused by a CI failure that conflict resolution cannot fix. Fix: add an
`ejected_ci_failure` route before `ejected` in
`wait_for_queue.on_result`, routing to `diagnose_ci` instead.

**Gap 3** — `resolve-merge-conflicts` SKILL.md runs only `pre-commit run
--all-files` post-rebase. Fix: add Step 5a — language-detected manifest
validation using fast non-compiling checks.

**Gap 4** — Even a clean rebase can produce duplicate keys when both
branches independently added the same dependency. Fix: add Step 5b —
targeted duplicate key scan in TOML/JSON manifest files.

Applied to: `recipes/implementation.yaml`, `recipes/remediation.yaml`,
`recipes/implementation-groups.yaml`,
`skills_extended/resolve-merge-conflicts/SKILL.md`.

</details>

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([wait_for_queue\nrecipe step])
    END_OK([release_issue_success])
    END_FAIL([release_issue_failure])
    END_TIMEOUT([release_issue_timeout])
    END_DIAG([diagnose_ci])

    subgraph MQPoll ["● Merge Queue Watcher (merge_queue.py)"]
        direction TB
        POLL["poll GitHub GraphQL\n━━━━━━━━━━\nPR state + queue state\n+ checks_state"]
        MERGED{"merged?"}
        CI_FAIL{"● checks_state\n== 'FAILURE'?"}
        CONFIRM["confirmation window\n━━━━━━━━━━\nnot_in_queue_cycles++"]
        CONFIRMED{"cycles ≥ threshold?"}
        STALL{"stall retries\nexhausted?"}
        TIMEOUT{"deadline\nexceeded?"}
    end

    subgraph EjectRoute ["● Recipe Ejection Routing (implementation.yaml)"]
        direction TB
        ROUTE{"● pr_state?"}
        REENROLL["reenroll_stalled_pr\n━━━━━━━━━━\ntoggle_auto_merge tool"]
    end

    subgraph ConflictFix ["● Conflict Fix Sub-Flow (implementation.yaml)"]
        direction TB
        QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"]
        ESC{"escalation_required?"}
        REPUSH["re_push_queue_fix\n━━━━━━━━━━\npush_to_remote force=true"]
        CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci tool\ntimeout=300s"]
        CI_PASS{"CI pass?"}
        DETECT["detect_ci_conflict\n━━━━━━━━━━\ndiagnose-ci skill"]
        REENTER["reenter_merge_queue\n━━━━━━━━━━\ngh pr merge --squash --auto"]
    end

    subgraph WFCITool ["● wait_for_ci tool handler (tools_ci.py)"]
        direction LR
        INFER["infer head_sha\n━━━━━━━━━━\ngit rev-parse HEAD"]
        CIWAIT["ci_watcher.wait(scope)"]
        ENRICH["● result includes head_sha\n━━━━━━━━━━\nverifies SHA matches HEAD\nafter force-push"]
    end

    %% MAIN FLOW %%
    START --> POLL
    POLL --> MERGED
    MERGED -->|"yes"| END_OK
    MERGED -->|"no"| CONFIRM
    CONFIRM --> CONFIRMED
    CONFIRMED -->|"no"| STALL
    CONFIRMED -->|"yes (not in queue)"| CI_FAIL
    STALL -->|"yes"| END_TIMEOUT
    STALL -->|"no"| TIMEOUT
    TIMEOUT -->|"yes"| END_TIMEOUT
    TIMEOUT -->|"no"| POLL

    CI_FAIL -->|"yes"| ROUTE
    CI_FAIL -->|"no"| ROUTE

    ROUTE -->|"ejected_ci_failure\n(● new route)"| END_DIAG
    ROUTE -->|"ejected"| QFIX
    ROUTE -->|"stalled"| REENROLL
    ROUTE -->|"timeout"| END_TIMEOUT
    REENROLL -->|"success"| START
    REENROLL -->|"failure"| END_FAIL

    QFIX --> ESC
    ESC -->|"true"| END_FAIL
    ESC -->|"false"| REPUSH
    REPUSH -->|"failure"| END_FAIL
    REPUSH -->|"success"| CI_WATCH

    CI_WATCH --> INFER --> CIWAIT --> ENRICH
    ENRICH --> CI_PASS
    CI_PASS -->|"failure"| DETECT
    CI_PASS -->|"success"| REENTER
    DETECT --> END_FAIL
    REENTER -->|"success"| START
    REENTER -->|"failure"| END_FAIL

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class END_OK,END_FAIL,END_TIMEOUT,END_DIAG terminal;
    class POLL,CONFIRM handler;
    class MERGED,CONFIRMED,STALL,TIMEOUT stateNode;
    class CI_FAIL,ROUTE,ESC,CI_PASS detector;
    class QFIX,REPUSH,REENTER handler;
    class REENROLL,DETECT handler;
    class CI_WATCH,INFER,CIWAIT,ENRICH newComponent;
```

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    subgraph MQResult ["wait_for_merge_queue Return Dict (merge_queue.py)"]
        direction TB
        PS["● pr_state : str\n━━━━━━━━━━\nmerged | ejected\nejected_ci_failure | stalled\ntimeout | error\n(bare literals, no StrEnum)"]
        SUC["success : bool\n━━━━━━━━━━\ntrue only for 'merged'"]
        REASON["reason : str\n━━━━━━━━━━\nhuman-readable\nalways present"]
        STALL["stall_retries_attempted : int\n━━━━━━━━━━\nalways present\nexcept 'error' path"]
        EC["● ejection_cause : str\n━━━━━━━━━━\n'ci_failure' only\nwhen pr_state==ejected_ci_failure\nCONDITIONAL FIELD"]
    end

    subgraph InternalPoll ["PRFetchState — Internal Polling State (not returned)"]
        direction LR
        CHECKS["checks_state : str|None\n━━━━━━━━━━\nGitHub StatusCheckRollup\nNone = no checks configured"]
        INQUEUE["in_queue : bool\n━━━━━━━━━━\nPR in mergeQueue.entries"]
        QSTATE["queue_state : str|None\n━━━━━━━━━━\nUNMERGEABLE | AWAITING_CHECKS\n| LOCKED | null"]
    end

    subgraph Gate1 ["● Ejection Decision Gate (merge_queue.py)"]
        direction TB
        CFAIL{"checks_state\n== 'FAILURE'?"}
        SET_ECI["● set pr_state='ejected_ci_failure'\n━━━━━━━━━━\nejection_cause='ci_failure'\nINJECTED into result"]
        SET_EJ["set pr_state='ejected'\n━━━━━━━━━━\nno ejection_cause field\n(absent, not null)"]
    end

    subgraph CIScope ["CIRunScope — Frozen Input Scope (core/types)"]
        direction LR
        WF["workflow : str|None\n━━━━━━━━━━\ne.g. 'tests.yml'"]
        HS["● head_sha : str|None\n━━━━━━━━━━\ngit rev-parse HEAD\nor caller-supplied"]
    end

    subgraph CIResult ["● wait_for_ci Return Dict (tools_ci.py)"]
        direction TB
        RUNID["run_id : int|None\n━━━━━━━━━━\nGitHub Actions run ID"]
        CONC["conclusion : str\n━━━━━━━━━━\nsuccess|failure|cancelled\naction_required|timed_out\nno_runs|error|unknown"]
        FJOBS["failed_jobs : list\n━━━━━━━━━━\nalways present\nempty on billing errors"]
        HSHA["● head_sha : str\n━━━━━━━━━━\nCONDITIONAL: present only\nwhen scope.head_sha truthy\ninjected by tool layer"]
    end

    subgraph ConsumerGate ["Recipe Routing Gate (on_result)"]
        direction TB
        ROUTE{"pr_state value?"}
        R1["ejected_ci_failure\n→ diagnose_ci"]
        R2["ejected\n→ queue_ejected_fix"]
        R3["merged|stalled|timeout\n→ other routes"]
    end

    %% FLOW %%
    CHECKS --> CFAIL
    INQUEUE --> CFAIL
    QSTATE --> CFAIL
    CFAIL -->|"FAILURE"| SET_ECI
    CFAIL -->|"other"| SET_EJ
    SET_ECI --> PS
    SET_ECI --> EC
    SET_EJ --> PS
    PS --> SUC
    PS --> REASON
    PS --> STALL

    HS --> CIResult
    WF --> CIResult
    RUNID --> CONC
    CONC --> FJOBS
    FJOBS --> HSHA

    PS --> ROUTE
    EC --> ROUTE
    ROUTE --> R1
    ROUTE --> R2
    ROUTE --> R3

    HSHA -.->|"verifies HEAD\nafter force-push"| R2

    %% CLASS ASSIGNMENTS %%
    class PS,EC,HSHA,SET_ECI,HS,CFAIL gap;
    class SUC,REASON,STALL,RUNID,CONC,FJOBS output;
    class CHECKS,INQUEUE,QSTATE,WF stateNode;
    class SET_EJ handler;
    class ROUTE,R1,R2,R3 detector;
    class InternalPoll phase;
```

### Error/Resilience Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    END_OK([release_issue_success])
    END_FAIL([release_issue_failure\n━━━━━━━━━━\nhuman escalation\nclone preserved])
    END_DIAG([diagnose_ci])

    subgraph MQLoop ["● Merge Queue Poll Loop (merge_queue.py)"]
        direction TB
        POLL["GraphQL fetch\n━━━━━━━━━━\nPR + queue state"]
        POLL_ERR{"Exception\ncaught?"}
        TIMEOUT_CHK{"deadline\nexceeded?"}
        STALL_CHK{"stall retries\n≥ max (3)?"}
    end

    subgraph EjectGate ["● Ejection Classification Gate (merge_queue.py)"]
        direction TB
        EJECT_DECISION{"● checks_state\n== 'FAILURE'?"}
        CI_EJ["● ejected_ci_failure\n━━━━━━━━━━\nejection_cause=ci_failure\nskips conflict resolution"]
        CONF_EJ["ejected\n━━━━━━━━━━\nno cause field\nconflict resolution"]
    end

    subgraph StallBreaker ["Stall Circuit Breaker (merge_queue.py)"]
        direction LR
        TOGGLE["_toggle_auto_merge\n━━━━━━━━━━\ndisable → 2s → re-enable\nbackoff: 30/60/120s"]
        TOGGLE_ERR{"Exception\ncaught?"}
    end

    subgraph ConflictPath ["Conflict Resolution Path (implementation.yaml)"]
        direction TB
        QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"]
        ESC_CHK{"escalation\nrequired?"}
        REPUSH["re_push_queue_fix\n━━━━━━━━━━\nforce-push"]
        REPUSH_FAIL{"push\nfailed?"}
    end

    subgraph CIGate ["● CI Gate After Re-Push (implementation.yaml + tools_ci.py)"]
        direction TB
        CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci, timeout=300s\nincludes head_sha"]
        CI_CONC{"conclusion\n== success?"}
        DETECT["detect_ci_conflict\n━━━━━━━━━━\ngit merge-base check\n(stale base?)"]
        DETECT_CHK{"stale\nbase?"}
        CI_CF["ci_conflict_fix\n━━━━━━━━━━\nresolve-merge-conflicts"]
    end

    subgraph ManifestGates ["● Post-Rebase Manifest Validation (SKILL.md)"]
        direction TB
        STEP5A["● Step 5a: manifest validity\n━━━━━━━━━━\ncargo metadata / node JSON.parse\nuv lock --check / tomllib"]
        STEP5A_CHK{"manifest\nvalid?"}
        STEP5B["● Step 5b: duplicate key scan\n━━━━━━━━━━\nTOML dep sections\nJSON object_pairs_hook"]
        STEP5B_CHK{"duplicates\nfound?"}
        REBASE_ABORT["git rebase --abort\n━━━━━━━━━━\nescalation_required=true"]
    end

    %% POLL LOOP FLOW %%
    POLL --> POLL_ERR
    POLL_ERR -->|"yes: log + retry"| POLL
    POLL_ERR -->|"no"| TIMEOUT_CHK
    TIMEOUT_CHK -->|"yes"| END_FAIL
    TIMEOUT_CHK -->|"no"| STALL_CHK
    STALL_CHK -->|"yes: stalled"| END_FAIL
    STALL_CHK -->|"no: stall attempt"| TOGGLE
    TOGGLE --> TOGGLE_ERR
    TOGGLE_ERR -->|"yes: log + increment"| STALL_CHK
    TOGGLE_ERR -->|"no: success"| POLL

    %% EJECTION GATE %%
    STALL_CHK -->|"ejection confirmed"| EJECT_DECISION
    EJECT_DECISION -->|"FAILURE"| CI_EJ
    EJECT_DECISION -->|"other"| CONF_EJ
    CI_EJ --> END_DIAG
    CONF_EJ --> QFIX

    %% CONFLICT PATH %%
    QFIX --> STEP5A
    STEP5A --> STEP5A_CHK
    STEP5A_CHK -->|"invalid"| REBASE_ABORT
    STEP5A_CHK -->|"valid"| STEP5B
    STEP5B --> STEP5B_CHK
    STEP5B_CHK -->|"duplicates"| REBASE_ABORT
    STEP5B_CHK -->|"clean"| ESC_CHK
    REBASE_ABORT --> ESC_CHK
    ESC_CHK -->|"true"| END_FAIL
    ESC_CHK -->|"false"| REPUSH
    REPUSH --> REPUSH_FAIL
    REPUSH_FAIL -->|"yes"| END_FAIL
    REPUSH_FAIL -->|"no"| CI_WATCH

    %% CI GATE %%
    CI_WATCH --> CI_CONC
    CI_CONC -->|"yes"| END_OK
    CI_CONC -->|"no"| DETECT
    DETECT --> DETECT_CHK
    DETECT_CHK -->|"yes: stale base"| CI_CF
    DETECT_CHK -->|"no: code failure"| END_DIAG
    CI_CF --> ESC_CHK

    %% CLASS ASSIGNMENTS %%
    class END_OK,END_FAIL,END_DIAG terminal;
    class POLL,TOGGLE handler;
    class POLL_ERR,TOGGLE_ERR,TIMEOUT_CHK,STALL_CHK gap;
    class EJECT_DECISION,CI_CONC,DETECT_CHK,STEP5A_CHK,STEP5B_CHK,ESC_CHK,REPUSH_FAIL detector;
    class CI_EJ,CONF_EJ,REBASE_ABORT output;
    class QFIX,REPUSH,CI_WATCH,DETECT,CI_CF handler;
    class STEP5A,STEP5B phase;
```

Closes #627

## Implementation Plan

Plan files:
-
`/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_a.md`
-
`/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_b.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 37 | 31.7k | 1.9M | 113.2k | 1 | 11m 19s |
| review | 3.4k | 5.6k | 147.3k | 41.5k | 1 | 5m 45s |
| verify | 44 | 35.4k | 1.9M | 144.8k | 2 | 11m 15s |
| implement | 100 | 33.5k | 4.6M | 123.5k | 2 | 12m 17s |
| audit_impl | 15 | 14.0k | 279.5k | 44.2k | 1 | 3m 46s |
| open_pr | 33 | 30.5k | 1.2M | 68.1k | 1 | 10m 58s |
| **Total** | 3.6k | 150.8k | 9.9M | 535.3k | | 55m 23s |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…Artifact Preservation (#630)

## Summary

The review-design skill has four compounding defects that make GO
verdicts structurally unreachable. This plan fixes all four:

1. **Threshold unreachable** — Replace the static `>= 3` warning
threshold with a proportional formula based on active dimensions
(`active_dimensions * WARNING_BUDGET_PER_DIM` where budget = 5),
calibrated so that the spectral-init v6 baseline (32 warnings across ~7
dimensions, deemed "substantively sound") would receive a GO verdict.

2. **Prescriptive findings** — Add evaluative-only constraints to
Critical Constraints and a shared subagent evaluation scope block before
Step 2, requiring findings to describe WHAT is lacking, never HOW to fix
it.

3. **Scope drift** — Add a design scope boundary to the shared subagent
block, prohibiting evaluation of implementation code snippets and
constraining review to experimental design elements.

4. **Artifact preservation** — Enhance the `create_worktree` step in
research.yaml to copy all review-cycle artifacts (dashboards, revision
guidance, plan versions, resolve-design-review output) into
`research/.../artifacts/`, and add a `commit_research_artifacts` step
before `push_branch` to capture phase-groups and phase-plans from the
worktree.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    START([plan_experiment])
    COMPLETE([research_complete])
    STOP_OUT([design_rejected])

    subgraph DesignReview ["● review_design Step (research.yaml)"]
        direction TB
        RD["● review_design<br/>━━━━━━━━━━<br/>run_skill<br/>retries: 2"]
        REVISE_ROUTE["revise_design<br/>━━━━━━━━━━<br/>route → plan_experiment"]
        RESOLVE["resolve_design_review<br/>━━━━━━━━━━<br/>run_skill, retries: 1"]
    end

    subgraph VerdictSynthesis ["● Step 7: Verdict Synthesis (review-design SKILL.md)"]
        direction TB
        SCOPE["● Evaluative Scope Gate<br/>━━━━━━━━━━<br/>Findings: WHAT is lacking<br/>Design boundary only"]
        RTCAP["rt_cap = RT_MAX_SEVERITY<br/>━━━━━━━━━━<br/>Downgrade red_team<br/>severity by type"]
        CLASSIFY["Classify findings<br/>━━━━━━━━━━<br/>critical_findings<br/>warning_findings"]
        ACTIVE["● active_dimensions<br/>━━━━━━━━━━<br/>count spawned non-SILENT<br/>dims (L1+L2+L3+L4+RT)"]
        THRESH["★ warning_threshold<br/>━━━━━━━━━━<br/>active_dims × 5<br/>WARNING_BUDGET_PER_DIM=5"]
        VERDICT{"● Verdict Decision<br/>━━━━━━━━━━<br/>stop_triggers?<br/>critical? warnings≥threshold?"}
    end

    subgraph ArtifactPath ["★ Artifact Commit Path (research.yaml)"]
        direction TB
        TEST["● test<br/>━━━━━━━━━━<br/>test_check"]
        FIX["fix_tests<br/>━━━━━━━━━━<br/>run_skill"]
        RETEST["● retest<br/>━━━━━━━━━━<br/>test_check"]
        COMMIT["★ commit_research_artifacts<br/>━━━━━━━━━━<br/>run_cmd: copy phase-groups<br/>phase-plans → artifacts/<br/>on_failure: push_branch"]
    end

    PUSH["push_branch<br/>━━━━━━━━━━<br/>run_cmd"]

    START -->|"run review_design"| RD
    RD -->|"STOP verdict"| RESOLVE
    RD -->|"REVISE verdict"| REVISE_ROUTE
    RD -->|"GO verdict"| create_worktree
    REVISE_ROUTE -->|"loop back"| START
    RESOLVE -->|"revised"| REVISE_ROUTE
    RESOLVE -->|"failed"| STOP_OUT
    RD -->|"on_failure / on_exhausted"| create_worktree

    create_worktree["create_worktree<br/>━━━━━━━━━━<br/>★ copies review-cycles<br/>plan-versions artifacts"]

    create_worktree --> decompose["decompose_phases<br/>plan_phase<br/>implement_phase"]
    decompose --> experiment["run_experiment<br/>write_report"]
    experiment --> TEST

    TEST -->|"pass"| COMMIT
    TEST -->|"fail"| FIX
    FIX --> RETEST
    RETEST -->|"pass"| COMMIT
    RETEST -->|"fail"| PUSH

    COMMIT -->|"success or failure"| PUSH
    PUSH --> COMPLETE

    SCOPE -.->|"constraint applied to<br/>all dimension subagents"| CLASSIFY
    RTCAP --> CLASSIFY
    CLASSIFY --> ACTIVE
    ACTIVE --> THRESH
    THRESH --> VERDICT
    VERDICT -->|"stop_triggers"| STOP_OUT
    VERDICT -->|"critical_findings or<br/>warnings ≥ threshold"| REVISE_ROUTE
    VERDICT -->|"else"| create_worktree

    class START,COMPLETE,STOP_OUT terminal;
    class RD,RESOLVE,decompose,experiment,FIX handler;
    class REVISE_ROUTE,RTCAP,CLASSIFY phase;
    class VERDICT,ACTIVE stateNode;
    class SCOPE detector;
    class THRESH,COMMIT,create_worktree newComponent;
    class TEST,RETEST,PUSH output;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Start, complete, and terminal states |
| Orange | Handler | Processing steps (run_skill, run_cmd) |
| Purple | Phase | Control flow, routing, severity capping |
| Teal | State | Decision and counting nodes |
| Red | Detector | Constraint gates (evaluative scope) |
| Green | New | ★ new components, ● modified components |
| Dark Teal | Output | test_check steps and push_branch |

Closes #629

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-160303-009353/.autoskillit/temp/make-plan/fix-review-design-threshold-unreachable-prescriptive-finding_plan_2026-04-05_161500.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.8k | 22.6k | 1.2M | 85.0k | 1 | 10m 36s |
| verify | 30 | 14.6k | 1.5M | 74.8k | 1 | 8m 28s |
| implement | 62 | 19.9k | 4.1M | 92.5k | 1 | 7m 41s |
| audit_impl | 87 | 10.6k | 473.5k | 47.1k | 1 | 6m 41s |
| open_pr | 25 | 11.7k | 806.3k | 48.9k | 1 | 4m 22s |
| **Total** | 3.0k | 79.4k | 8.1M | 348.3k | | 37m 50s |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ound Bash Tasks (#633)

## Summary

Headless sessions running long-lived background Bash tasks (e.g. `cargo
bench` launched via
`run_in_background: true`) are killed as stale because the staleness
signal is JSONL file growth,
not actual session liveness. When the LLM goes idle waiting for a
background child, the JSONL
stops growing and the 20-minute staleness threshold is breached — even
though child processes are
actively running.

Three changes eliminate this class of false kill:

1. **`_has_active_child_processes`** — a second suppression gate in
`_session_log_monitor` that
checks child process CPU activity before issuing a kill. Added alongside
the existing
   `_has_active_api_connection` port-443 gate.

2. **`RecipeStep.stale_threshold`** — an optional per-step threshold
field that recipe authors
can raise for steps known to run long-lived experiments, passed through
`run_skill` →
   `run_headless_core` → `_session_log_monitor`.

3. **Recipe YAML overrides** — `stale_threshold: 2400` (40 min) on
specific long-running steps
in `research.yaml`, `implementation.yaml`, `remediation.yaml`,
`implementation-groups.yaml`,
   and `merge-prs.yaml`.

## Requirements

### STALE — Staleness Suppression via Child Process Detection

- **REQ-STALE-001:** The system must detect active child processes in
the headless session's process tree when the stale threshold is
breached.
- **REQ-STALE-002:** The system must suppress the stale kill when any
child process in the tree reports CPU usage exceeding ~10% via
`cpu_percent(interval=0)`.
- **REQ-STALE-003:** The system must reset the staleness clock
(`last_change`) when child process activity suppresses the stale kill,
identical to the existing `_has_active_api_connection` suppression
behavior.
- **REQ-STALE-004:** The child process detection must follow the
established exception-handling pattern, silently skipping
`NoSuchProcess`, `ZombieProcess`, and `AccessDenied` errors per process.
- **REQ-STALE-005:** The child process detection must only execute when
the stale threshold has already been breached (zero performance impact
during normal operation).
- **REQ-STALE-006:** The child process detection must emit a structured
log warning when suppressing a stale kill, following the pattern
established by `_has_active_api_connection`.

### SCHEMA — Per-Step Stale Threshold in RecipeStep

- **REQ-SCHEMA-001:** The `RecipeStep` dataclass must accept an optional
`stale_threshold` field of type `int | None` with no default value
(defaults to `None`).
- **REQ-SCHEMA-002:** When `stale_threshold` is `None` on a recipe step,
the global `RunSkillConfig.stale_threshold` (1200s) must apply.
- **REQ-SCHEMA-003:** The `run_skill` MCP tool handler must accept an
optional `stale_threshold` parameter and forward it to
`run_headless_core`.
- **REQ-SCHEMA-004:** The recipe validator must reject `stale_threshold`
values that are not positive integers when set.

### RECIPE — Research Recipe Step Overrides

- **REQ-RECIPE-001:** Research-oriented recipes must set
`stale_threshold: 2400` (40 minutes) on specific long-running steps
(e.g., `implement_phase`, `run_experiment`).
- **REQ-RECIPE-002:** Fast-completing steps (e.g., `plan_phase`) must
not have a `stale_threshold` override, relying on the global default.

### TEST — Test Coverage

- **REQ-TEST-001:** Unit tests must verify `_has_active_child_processes`
returns `True` when a child process exceeds the CPU threshold.
- **REQ-TEST-002:** Unit tests must verify `_has_active_child_processes`
returns `False` when all children are idle, when no children exist, and
when exceptions are raised.
- **REQ-TEST-003:** An integration test must verify stale suppression
when a child process is CPU-active but has no port-443 connection.
- **REQ-TEST-004:** The existing
`TestSessionLogMonitorStaleSuppressionGate` test class must be extended
with the child-process-active scenario.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([SESSION LAUNCHED])
    T_COMPLETE([COMPLETION])
    T_STALE([STALE — KILL])

    %% CONFIG CHAIN %%
    subgraph Config ["● RECIPE STEP CONFIG (stale_threshold flow)"]
        direction TB
        RecipeStep["● RecipeStep YAML<br/>━━━━━━━━━━<br/>stale_threshold: 2400<br/>(or unset → None)"]
        RunSkill["● run_skill handler<br/>━━━━━━━━━━<br/>tools_execution.py<br/>stale_threshold: int | None"]
        Runner["DefaultSubprocessRunner<br/>━━━━━━━━━━<br/>process.py<br/>default: 1200s"]
    end

    %% PHASE 1 %%
    subgraph Phase1 ["PHASE 1 — JSONL File Discovery (poll 1s, timeout 30s)"]
        direction TB
        P1_Poll["Poll session_log_dir<br/>━━━━━━━━━━<br/>ctime > spawn_time?<br/>Match session_id?"]
        P1_Found{"File found<br/>within 30s?"}
    end

    %% PHASE 2 %%
    subgraph Phase2 ["● PHASE 2 — Staleness Monitor Loop (poll every 2s)"]
        direction TB
        P2_Stat["stat(session_file)<br/>━━━━━━━━━━<br/>current_size vs last_size"]
        P2_Grew{"JSONL<br/>grew?"}
        P2_Marker["Read new content<br/>━━━━━━━━━━<br/>scan for completion<br/>marker in JSONL"]
        P2_MarkerFound{"Completion<br/>marker found?"}
        P2_ResetGrow["last_size = current_size<br/>last_change = now()"]
        P2_Elapsed{"elapsed >=<br/>stale_threshold?"}
    end

    %% SUPPRESSION GATES %%
    subgraph Gates ["● SUPPRESSION GATES (only fire when stale threshold breached)"]
        direction TB
        Gate1["_has_active_api_connection<br/>━━━━━━━━━━<br/>Walk proc tree<br/>ESTABLISHED port-443?"]
        Gate1_Active{"API conn<br/>active?"}
        Gate2["● _has_active_child_processes<br/>━━━━━━━━━━<br/>Walk child procs<br/>cpu_percent > 10%?"]
        Gate2_Active{"Child CPU<br/>> 10%?"}
        ResetClock["last_change = now()<br/>━━━━━━━━━━<br/>Suppress stale kill<br/>reset staleness clock"]
    end

    %% CONNECTIONS %%
    START --> RecipeStep
    RecipeStep -->|"stale_threshold (int|None)"| RunSkill
    RunSkill -->|"float(x) or None → default 1200s"| Runner
    Runner -->|"stale_threshold, pid"| P1_Poll

    P1_Poll --> P1_Found
    P1_Found -->|"yes"| P2_Stat
    P1_Found -->|"no (30s timeout)"| T_STALE

    P2_Stat --> P2_Grew
    P2_Grew -->|"yes"| P2_ResetGrow
    P2_ResetGrow --> P2_Marker
    P2_Marker --> P2_MarkerFound
    P2_MarkerFound -->|"yes"| T_COMPLETE
    P2_MarkerFound -->|"no"| P2_Elapsed

    P2_Grew -->|"no"| P2_Elapsed
    P2_Elapsed -->|"no (wait)"| P2_Stat
    P2_Elapsed -->|"yes"| Gate1

    Gate1 --> Gate1_Active
    Gate1_Active -->|"yes"| ResetClock
    Gate1_Active -->|"no"| Gate2
    Gate2 --> Gate2_Active
    Gate2_Active -->|"yes"| ResetClock
    Gate2_Active -->|"no"| T_STALE
    ResetClock -->|"continue loop"| P2_Stat

    %% CLASS ASSIGNMENTS %%
    class START,T_COMPLETE,T_STALE terminal;
    class RecipeStep,RunSkill handler;
    class Runner stateNode;
    class P1_Poll,P2_Stat,P2_Marker,P2_ResetGrow,ResetClock phase;
    class P1_Found,P2_Grew,P2_MarkerFound,P2_Elapsed,Gate1_Active,Gate2_Active stateNode;
    class Gate1 handler;
    class Gate2 newComponent;
```

### Concurrency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([SESSION LAUNCHED])
    COMPLETE([TASK GROUP CANCELLED])

    %% MAIN THREAD: Sequential setup %%
    subgraph MainSeq ["MAIN COROUTINE — Sequential Setup"]
        direction TB
        SpawnProc["Spawn Claude Code process<br/>━━━━━━━━━━<br/>asyncio subprocess<br/>get proc.pid"]
        CreateAcc["Create RaceAccumulator + trigger<br/>━━━━━━━━━━<br/>anyio.Event (idempotent set)<br/>channel_b_ready Event"]
        OpenTG["anyio.create_task_group()<br/>━━━━━━━━━━<br/>Fork: start 4–5 coroutines<br/>as tg.start_soon(...)"]
        TrigWait["await trigger.wait()<br/>━━━━━━━━━━<br/>Block until first watcher wins<br/>(or wall-clock timeout)"]
        DrainWait["Optional drain window<br/>━━━━━━━━━━<br/>await channel_b_ready if<br/>process exited but B pending"]
        CancelTG["tg.cancel_scope.cancel()<br/>━━━━━━━━━━<br/>Tear down all remaining tasks"]
        Resolve["resolve_termination(RaceSignals)<br/>━━━━━━━━━━<br/>Priority: exit > stale > completion"]
    end

    %% TASK GROUP: Concurrent watchers %%
    subgraph TaskGroup ["anyio TASK GROUP — Concurrent Watchers (cooperative, single event loop)"]
        direction LR

        subgraph ChA ["Channel A"]
            WatchProc["_watch_process<br/>━━━━━━━━━━<br/>await proc.wait()<br/>acc.process_exited=True"]
            WatchHB["_watch_heartbeat<br/>━━━━━━━━━━<br/>poll stdout NDJSON 0.5s<br/>acc.channel_a_confirmed=True"]
        end

        subgraph ChB ["● Channel B — Session Log"]
            ExtractID["_extract_stdout_session_id<br/>━━━━━━━━━━<br/>poll stdout for type=system<br/>sets stdout_session_id_ready"]
            WatchSL["● _watch_session_log<br/>━━━━━━━━━━<br/>calls _session_log_monitor<br/>acc.channel_b_status=COMPLETION|STALE"]
        end
    end

    %% STALENESS SUPPRESSION %%
    subgraph StaleGates ["● STALENESS SUPPRESSION — Sync psutil walks (inside _session_log_monitor)"]
        direction TB
        Gate1["_has_active_api_connection(pid)<br/>━━━━━━━━━━<br/>[parent + children(recursive=True)]<br/>net_connections port-443 ESTABLISHED?"]
        Gate2["● _has_active_child_processes(pid)<br/>━━━━━━━━━━<br/>[children(recursive=True) only]<br/>cpu_percent(interval=0) > 10%?"]
        ResetClock["last_change = monotonic()<br/>━━━━━━━━━━<br/>suppress stale kill<br/>continue Phase 2 loop"]
        ReturnStale["return STALE<br/>━━━━━━━━━━<br/>acc.channel_b_status = STALE<br/>trigger.set()"]
    end

    %% FLOW %%
    START --> SpawnProc
    SpawnProc --> CreateAcc
    CreateAcc --> OpenTG

    OpenTG -->|"tg.start_soon"| WatchProc
    OpenTG -->|"tg.start_soon"| WatchHB
    OpenTG -->|"tg.start_soon"| ExtractID
    OpenTG -->|"tg.start_soon"| WatchSL

    WatchProc -->|"trigger.set()"| TrigWait
    WatchHB -->|"trigger.set()"| TrigWait
    WatchSL -->|"trigger.set() after drain"| TrigWait

    WatchSL -->|"stale threshold breached"| Gate1
    Gate1 -->|"no API conn"| Gate2
    Gate2 -->|"child CPU active"| ResetClock
    Gate2 -->|"no activity"| ReturnStale
    Gate1 -->|"API conn active"| ResetClock
    ResetClock -->|"continue loop"| WatchSL

    TrigWait --> DrainWait
    DrainWait --> CancelTG
    CancelTG --> Resolve
    Resolve --> COMPLETE

    %% CLASS ASSIGNMENTS %%
    class START,COMPLETE terminal;
    class SpawnProc,CreateAcc,TrigWait,DrainWait,CancelTG,Resolve phase;
    class OpenTG detector;
    class WatchProc,WatchHB handler;
    class ExtractID handler;
    class WatchSL handler;
    class Gate1 handler;
    class Gate2 newComponent;
    class ResetClock output;
    class ReturnStale detector;
```

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([RECIPE YAML LOADED])
    T_PASS([VALID — forwarded to run_skill])
    T_FAIL([INVALID — validation error])

    %% PARSE LAYER %%
    subgraph Parse ["● YAML → RecipeStep (io.py _parse_step)"]
        direction TB
        YAMLRead["● YAML key read<br/>━━━━━━━━━━<br/>data.get('stale_threshold')<br/>absent → None (no coercion)"]
        Construct["● RecipeStep(...)<br/>━━━━━━━━━━<br/>stale_threshold: int | None = None<br/>No __post_init__ mutations"]
        IntegrityGuard["_PARSE_STEP_HANDLED_FIELDS guard<br/>━━━━━━━━━━<br/>compile-time assert: fields == dataclass<br/>RuntimeError if diverged"]
    end

    %% VALIDATION LAYER %%
    subgraph Validation ["● STRUCTURAL VALIDATION (validator.py validate_recipe)"]
        direction TB
        IsNone{"stale_threshold<br/>is None?"}
        TypeCheck{"isinstance(int)<br/>AND > 0?"}
        AppendError["append error<br/>━━━━━━━━━━<br/>'must be positive integer<br/>when set'"]
        PassThrough["field passes<br/>━━━━━━━━━━<br/>no validation error<br/>for None or valid int"]
    end

    %% SEMANTIC LAYER %%
    subgraph Semantic ["● SEMANTIC RULE — _TOOL_PARAMS registry (rules_tools.py)"]
        direction TB
        ToolParamsCheck["_TOOL_PARAMS['run_skill']<br/>━━━━━━━━━━<br/>frozenset includes 'stale_threshold'<br/>dead-with-param rule: NO warning"]
        OtherToolWarn["Other tools<br/>━━━━━━━━━━<br/>stale_threshold not in their params<br/>dead-with-param: WARNING emitted"]
    end

    %% EXECUTION FORWARDING %%
    subgraph Execution ["EXECUTION FORWARDING (tools_execution.py run_skill)"]
        direction TB
        NullPath["stale_threshold = None<br/>━━━━━━━━━━<br/>→ DefaultSubprocessRunner default<br/>= 1200s (global config)"]
        OverridePath["stale_threshold = int<br/>━━━━━━━━━━<br/>float(stale_threshold)<br/>→ overrides global default"]
        Monitor["_session_log_monitor<br/>━━━━━━━━━━<br/>stale_threshold used as<br/>breach-detection window"]
    end

    %% FLOW %%
    START --> YAMLRead
    YAMLRead --> Construct
    Construct --> IntegrityGuard
    IntegrityGuard -->|"fields match — import OK"| IsNone

    IsNone -->|"yes (absent or None)"| PassThrough
    IsNone -->|"no (value present)"| TypeCheck
    TypeCheck -->|"valid"| PassThrough
    TypeCheck -->|"invalid (non-int or ≤ 0)"| AppendError
    AppendError --> T_FAIL
    PassThrough --> ToolParamsCheck

    ToolParamsCheck -->|"tool: run_skill"| T_PASS
    ToolParamsCheck -->|"other tool"| OtherToolWarn

    T_PASS --> NullPath
    T_PASS --> OverridePath
    NullPath --> Monitor
    OverridePath --> Monitor
    Monitor --> T_PASS

    %% CLASS ASSIGNMENTS %%
    class START,T_PASS,T_FAIL terminal;
    class YAMLRead,Construct handler;
    class IntegrityGuard detector;
    class IsNone,TypeCheck stateNode;
    class AppendError detector;
    class PassThrough output;
    class ToolParamsCheck newComponent;
    class OtherToolWarn gap;
    class NullPath,OverridePath,Monitor phase;
```

Closes #631

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-170436-566038/.autoskillit/temp/make-plan/fix_false_stale_kills_plan_2026-04-05_000000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.8k | 45.6k | 2.0M | 151.7k | 2 | 19m 31s |
| verify | 62 | 36.0k | 3.3M | 155.3k | 2 | 15m 1s |
| implement | 149 | 47.2k | 9.6M | 183.8k | 2 | 16m 24s |
| audit_impl | 102 | 20.0k | 762.1k | 90.1k | 2 | 10m 31s |
| open_pr | 69 | 39.4k | 2.6M | 116.8k | 2 | 15m 32s |
| review_pr | 38 | 57.4k | 1.8M | 103.1k | 1 | 18m 47s |
| resolve_review | 55 | 32.5k | 3.1M | 84.3k | 1 | 14m 9s |
| fix | 38 | 14.6k | 1.3M | 58.3k | 1 | 9m 9s |
| **Total** | 3.3k | 292.6k | 24.3M | 943.5k | | 1h 59m |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary

All four bundled recipes (`implementation`, `remediation`, `merge-prs`,
`implementation-groups`)
currently ship with `audit: default: "true"`, meaning `audit-impl` runs
unless explicitly
disabled. This plan changes all four recipes to `default: "false"` so
`audit-impl` is skipped
by default and becomes opt-in. No structural changes to the step graph,
routing, or test
infrastructure are needed — only the ingredient default changes.

**Scope:** 4 YAML ingredient default changes + 1 test assertion added.

## Requirements

### RCFG — Recipe Configuration

- **REQ-RCFG-001:** The `audit` input in `implementation.yaml` must
default to `"false"`.
- **REQ-RCFG-002:** The `audit` input in `implementation-groups.yaml`
must default to `"false"`.
- **REQ-RCFG-003:** The `audit` input in `remediation.yaml` must default
to `"false"`.
- **REQ-RCFG-004:** The `audit` input in `merge-prs.yaml` must default
to `"false"`.
- **REQ-RCFG-005:** The `audit_impl` step definition and its
`skip_when_false: "inputs.audit"` guard must remain unchanged in all
recipes.
- **REQ-RCFG-006:** Callers must still be able to opt in to audit-impl
by passing `audit: "true"` at pipeline invocation time.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([Pipeline Invoked])
    CONTINUE([Continue to push / merge])
    ERROR([escalate_stop / register_clone_failure])

    subgraph Ingredient ["● Ingredient Resolution"]
        direction TB
        AuditIng["● audit ingredient<br/>━━━━━━━━━━<br/>BEFORE: default='true'<br/>AFTER: default='false'"]
    end

    subgraph Gate ["skip_when_false Gate"]
        direction TB
        SkipCheck{"inputs.audit == 'true'?"}
        SkipBypass["BYPASS<br/>━━━━━━━━━━<br/>Skip audit_impl<br/>(now default path)"]
        RunAudit["● run audit-impl skill<br/>━━━━━━━━━━<br/>runs /autoskillit:audit-impl<br/>(now opt-in path)"]
        Verdict{"GO / NO GO?"}
        Remediate["remediate<br/>━━━━━━━━━━<br/>Route to remediation<br/>or re-plan"]
    end

    %% FLOW %%
    START --> AuditIng
    AuditIng -->|"resolves to 'false'<br/>(new default)"| SkipCheck
    SkipCheck -->|"false (default — bypass)"| SkipBypass
    SkipCheck -->|"true (opt-in — explicit)"| RunAudit
    RunAudit --> Verdict
    Verdict -->|"GO"| CONTINUE
    Verdict -->|"NO GO"| Remediate
    Verdict -->|"error"| ERROR
    Remediate -->|"re-plan loop"| START
    SkipBypass --> CONTINUE

    %% CLASS ASSIGNMENTS %%
    class START,CONTINUE,ERROR terminal;
    class AuditIng handler;
    class SkipCheck,Verdict stateNode;
    class SkipBypass phase;
    class RunAudit detector;
    class Remediate phase;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Pipeline start, continuation, and error states
|
| Teal | State | Decision gates (skip_when_false, GO/NO GO) |
| Orange | Handler | ● Audit ingredient (modified: default flipped to
"false") |
| Red | Detector | ● audit-impl skill execution (now opt-in path) |
| Purple | Phase | Bypass path (now default) and remediation routing |

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([Recipe Invoked])
    GATE([skip_when_false Evaluated])

    subgraph Contracts ["● INGREDIENT CONTRACT DEFINITIONS"]
        direction TB
        ImplYaml["● implementation.yaml<br/>━━━━━━━━━━<br/>audit:<br/>  default: 'false'<br/>(was: 'true')"]
        ImplGroupsYaml["● implementation-groups.yaml<br/>━━━━━━━━━━<br/>audit:<br/>  default: 'false'<br/>(was: 'true')"]
        RemediationYaml["● remediation.yaml<br/>━━━━━━━━━━<br/>audit:<br/>  default: 'false'<br/>(was: 'true')"]
        MergePrsYaml["● merge-prs.yaml<br/>━━━━━━━━━━<br/>audit:<br/>  default: 'false'<br/>(was: 'true')"]
    end

    subgraph Resolution ["INIT_ONLY: Ingredient Resolution"]
        direction TB
        CallerSupplied["Caller-supplied value<br/>━━━━━━━━━━<br/>audit='true' (opt-in)<br/>INIT_ONLY — frozen for run"]
        DefaultApplied["● Contract default applied<br/>━━━━━━━━━━<br/>audit='false'<br/>INIT_ONLY — frozen for run"]
    end

    subgraph TestGate ["● CONTRACT VALIDATION (test_bundled_recipes.py)"]
        direction TB
        TestAssert["● test_audit_ingredient_defaults_to_false<br/>━━━━━━━━━━<br/>@pytest.mark.parametrize<br/>asserts audit.default == 'false'<br/>for all 4 recipes"]
    end

    %% FLOW %%
    START -->|"caller passes audit='true'"| CallerSupplied
    START -->|"no audit arg (default)"| DefaultApplied
    ImplYaml --> DefaultApplied
    ImplGroupsYaml --> DefaultApplied
    RemediationYaml --> DefaultApplied
    MergePrsYaml --> DefaultApplied
    CallerSupplied --> GATE
    DefaultApplied --> GATE

    Contracts -.->|"validated by"| TestAssert

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class GATE stateNode;
    class ImplYaml,ImplGroupsYaml,RemediationYaml,MergePrsYaml handler;
    class CallerSupplied detector;
    class DefaultApplied phase;
    class TestAssert gap;
```

**Color Legend:**
| Color | Category | Description |
|-------|----------|-------------|
| Dark Blue | Terminal | Pipeline invocation point |
| Teal | Gate | skip_when_false evaluation (INIT_ONLY field read) |
| Orange | Contract | ● Recipe YAML ingredient contract definitions
(modified) |
| Red | Opt-in | Caller-supplied value override (explicit audit='true')
|
| Purple | Default | ● Contract default applied (now 'false') |
| Yellow | Test | ● Contract validation test assertion (new) |

Closes #632

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-180825-135856/.autoskillit/temp/make-plan/feat_default_audit_impl_off_plan_2026-04-05_181000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.8k | 60.3k | 4.0M | 213.2k | 3 | 24m 25s |
| verify | 82 | 43.0k | 3.9M | 193.2k | 3 | 22m 22s |
| implement | 176 | 53.6k | 10.3M | 221.3k | 3 | 18m 51s |
| audit_impl | 117 | 25.1k | 1.0M | 114.6k | 3 | 12m 6s |
| open_pr | 101 | 60.0k | 3.7M | 168.5k | 3 | 22m 39s |
| review_pr | 71 | 112.5k | 3.4M | 189.2k | 2 | 33m 19s |
| resolve_review | 77 | 40.4k | 3.7M | 117.7k | 2 | 18m 16s |
| fix | 38 | 14.6k | 1.3M | 58.3k | 1 | 9m 9s |
| **Total** | 3.5k | 409.5k | 31.4M | 1.3M | | 2h 41m |

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Increase sensitivity to catch quota exhaustion earlier, giving more
buffer before hard API limits are hit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l for Experiment Failures (#636)

## Summary

This plan adds automated failure diagnosis to the research pipeline
(issue #635). There are two distinct requirements:

**DIAG**: Create a `troubleshoot-experiment` skill that reads session
logs and process traces to classify why a research step failed, then
emit a structured diagnostic artifact and `is_fixable` signal. Wire this
skill into `research.yaml` so that `implement_phase` failures route to
it instead of dying at `escalate_stop`.

**SEP**: Fix the structural misuse of `retry-worktree` in
`implement_phase`. The skill `retry-worktree` is designed to *resume*
context-exhausted `implement-worktree` sessions — it is not a primary
implementation driver. The research recipe already has the correct
purpose-built skill: `implement-experiment`, which explicitly forbids
experiment execution during implementation and routes context exhaustion
directly to `run-experiment`. Switching `implement_phase` to use
`implement-experiment` addresses REQ-SEP-001 and REQ-SEP-002 at the
skill level, where the constraint is enforceable.

## Requirements

### DIAG — Experiment Failure Diagnosis

- **REQ-DIAG-001:** The system must provide a skill that investigates
why a research recipe step failed by reading session logs and process
traces.
- **REQ-DIAG-002:** The skill must classify the failure type (stale
timeout, context exhaustion, build failure, data missing, parameter
issue, unknown).
- **REQ-DIAG-003:** The skill must emit a structured diagnostic artifact
that downstream steps or the human can act on.
- **REQ-DIAG-004:** The research recipe must route experiment failures
to the diagnostic skill instead of `escalate_stop`.

### SEP — Structural Separation of Implementation and Execution

- **REQ-SEP-001:** Implementation worktree steps must not perform
experiment execution (benchmarks, profiling, data collection).
- **REQ-SEP-002:** Experiment execution must route through the
`run_experiment` step (or equivalent) which has appropriate timeout and
retry semantics.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([RESEARCH PIPELINE])
    ESCALATE([escalate_stop])
    COMPLETE([research_complete])

    subgraph PhaseMgmt ["Phase Management"]
        plan_phase["● plan_phase<br/>━━━━━━━━━━<br/>make-plan skill<br/>plans current group"]
        implement_phase["● implement_phase<br/>━━━━━━━━━━<br/>implement-experiment<br/>(was: retry-worktree)<br/>stale_threshold: 2400"]
        next_phase{"next_phase_or_experiment<br/>━━━━━━━━━━<br/>more phases?"}
    end

    subgraph DiagPhase ["★ Failure Diagnosis (NEW)"]
        troubleshoot["★ troubleshoot_implement_failure<br/>━━━━━━━━━━<br/>troubleshoot-experiment skill<br/>worktree_path + implement_phase"]
        route_fix{"★ route_implement_failure<br/>━━━━━━━━━━<br/>is_fixable?"}
    end

    subgraph SkillInternals ["★ troubleshoot-experiment Internals"]
        direction TB
        init_idx["★ initialize code-index<br/>━━━━━━━━━━<br/>set_project_path(worktree_path)"]
        session_lookup["★ locate failed session<br/>━━━━━━━━━━<br/>sessions.jsonl<br/>select success=false + cwd match"]
        read_diags["★ read session diagnostics<br/>━━━━━━━━━━<br/>summary.json: termination_reason<br/>write_call_count, exit_code<br/>anomalies.jsonl: kind, severity"]
        classify{"★ classify failure type<br/>━━━━━━━━━━<br/>priority-ordered<br/>decision table"}
        write_diag["★ diagnosis_{ts}.md<br/>━━━━━━━━━━<br/>failure_type, is_fixable<br/>evidence + recommended action"]
        emit_tokens["★ emit output tokens<br/>━━━━━━━━━━<br/>diagnosis_path=<br/>failure_type=<br/>is_fixable="]
    end

    subgraph ExperimentPhase ["Experiment Phase"]
        run_experiment["run_experiment<br/>━━━━━━━━━━<br/>run-experiment skill<br/>stale_threshold: 2400, retries: 2"]
    end

    START --> plan_phase
    plan_phase --> implement_phase

    implement_phase -->|"on_success"| next_phase
    implement_phase -->|"on_failure"| troubleshoot
    implement_phase -->|"on_exhausted / on_context_limit"| run_experiment

    next_phase -->|"more_phases"| plan_phase
    next_phase -->|"done"| run_experiment

    troubleshoot --> init_idx
    init_idx --> session_lookup
    session_lookup -->|"session found"| read_diags
    session_lookup -->|"no session / missing log"| write_diag
    read_diags --> classify

    classify -->|"context_limit → context_exhaustion, fixable=true"| write_diag
    classify -->|"stale + write=0 → stale_timeout, fixable=true"| write_diag
    classify -->|"exit!=0 + build error → build_failure, fixable=true"| write_diag
    classify -->|"infra error / OOM → environment_error, fixable=false"| write_diag
    classify -->|"unknown"| write_diag

    write_diag --> emit_tokens
    emit_tokens --> route_fix

    route_fix -->|"is_fixable=true"| plan_phase
    route_fix -->|"is_fixable=false"| ESCALATE

    troubleshoot -->|"on_failure (skill crash)"| ESCALATE

    run_experiment --> COMPLETE

    class START,ESCALATE,COMPLETE terminal;
    class plan_phase,implement_phase handler;
    class next_phase,route_fix,classify stateNode;
    class troubleshoot,init_idx,session_lookup,read_diags,write_diag,emit_tokens newComponent;
```

### Module Dependency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%%
graph TB
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph L2_Recipe ["L2 — Recipe System"]
        recipe_io["recipe/io.py<br/>━━━━━━━━━━<br/>load_recipe, builtin_recipes_dir"]
        recipe_validator["recipe/validator.py<br/>━━━━━━━━━━<br/>validate_recipe"]
        recipe_contracts["recipe/contracts.py<br/>━━━━━━━━━━<br/>contract card generation"]
    end

    subgraph L1_Workspace ["L1 — Workspace"]
        workspace_skills["workspace/skills.py<br/>━━━━━━━━━━<br/>SkillResolver<br/>discovers skills_extended/"]
    end

    subgraph L0_Core ["L0 — Core"]
        core_paths["core/paths.py<br/>━━━━━━━━━━<br/>pkg_root()<br/>canonical package root"]
    end

    subgraph DataRecipes ["Data — Recipes (YAML)"]
        research_yaml["● recipes/research.yaml<br/>━━━━━━━━━━<br/>implement-experiment (was: retry-worktree)<br/>on_failure → troubleshoot_implement_failure<br/>on_exhausted → run_experiment"]
    end

    subgraph DataContracts ["Data — Contracts (YAML)"]
        skill_contracts["● recipe/skill_contracts.yaml<br/>━━━━━━━━━━<br/>★ troubleshoot-experiment entry<br/>is_fixable output pattern"]
    end

    subgraph DataSkills ["Data — Skills (SKILL.md)"]
        troubleshoot_skill["★ skills_extended/troubleshoot-experiment/<br/>━━━━━━━━━━<br/>session log reader<br/>failure classifier, is_fixable emitter"]
        implement_exp["skills_extended/implement-experiment/<br/>━━━━━━━━━━<br/>no experiment execution<br/>routes exhaustion → run-experiment"]
    end

    subgraph Tests ["Tests"]
        test_diag["★ tests/recipe/test_research_recipe_diag.py<br/>━━━━━━━━━━<br/>validates research.yaml routing<br/>asserts skill_command swap"]
        test_contracts["★ tests/skills/test_troubleshoot_experiment_contracts.py<br/>━━━━━━━━━━<br/>SkillResolver discovery<br/>SKILL.md existence"]
        test_skills_ws["● tests/workspace/test_skills.py<br/>━━━━━━━━━━<br/>skill count +1"]
    end

    recipe_io -->|"loads at runtime"| research_yaml
    recipe_validator -->|"validates"| research_yaml
    recipe_contracts -->|"loads at runtime"| skill_contracts
    research_yaml -->|"skill_command references"| troubleshoot_skill
    research_yaml -->|"skill_command references"| implement_exp
    skill_contracts -->|"contract entry for"| troubleshoot_skill
    workspace_skills -->|"discovers via pkg_root()"| troubleshoot_skill
    workspace_skills -->|"uses"| core_paths
    test_diag -->|"imports"| recipe_io
    test_diag -->|"imports"| recipe_validator
    test_contracts -->|"imports"| workspace_skills
    test_contracts -->|"imports"| core_paths
    test_skills_ws -->|"imports"| workspace_skills

    class recipe_io,recipe_validator,recipe_contracts phase;
    class workspace_skills handler;
    class core_paths stateNode;
    class research_yaml,skill_contracts output;
    class troubleshoot_skill newComponent;
    class implement_exp handler;
    class test_diag,test_contracts newComponent;
    class test_skills_ws handler;
```

Closes #635

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/impl-20260405-193031-162971/.autoskillit/temp/make-plan/research_recipe_troubleshoot_plan_2026-04-05_193500.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| plan | 2.9k | 93.0k | 4.6M | 271.4k | 4 | 37m 40s |
| verify | 109 | 64.2k | 5.4M | 277.1k | 4 | 28m 55s |
| implement | 224 | 71.2k | 12.5M | 282.5k | 4 | 32m 50s |
| audit_impl | 117 | 25.1k | 1.0M | 114.6k | 3 | 12m 6s |
| open_pr | 131 | 76.9k | 4.8M | 232.2k | 4 | 27m 43s |
| review_pr | 100 | 134.7k | 4.3M | 237.6k | 3 | 38m 8s |
| resolve_review | 77 | 40.4k | 3.7M | 117.7k | 2 | 18m 16s |
| fix | 91 | 32.1k | 3.8M | 120.9k | 2 | 21m 36s |
| diagnose_ci | 13 | 1.4k | 161.4k | 15.6k | 1 | 37s |
| resolve_ci | 18 | 3.7k | 293.8k | 29.1k | 1 | 3m 2s |
| **Total** | 3.8k | 542.7k | 40.5M | 1.7M | | 3h 40m |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
#639)

## Summary

The quota guard hook fails open 93% of the time because `open_kitchen`
primes the cache once (via `_prime_quota_cache`) but no mechanism keeps
it fresh. The cache TTL is 300 seconds; pipeline sessions run for hours.
After 5 minutes, every hook invocation sees a stale cache, logs
`cache_miss`, and approves unconditionally via `sys.exit(0)`.

The root architectural weakness is that `open_kitchen`/`close_kitchen`
act as a gate toggle (open/close) but not as a **service lifecycle
boundary**. There is no concept of services that start with the kitchen
and stop when it closes. The fix introduces a reusable service lifecycle
pattern: any background service tied to the kitchen session is started
in `_open_kitchen_handler` and cancelled in `_close_kitchen_handler`,
with `ToolContext` holding the task handle. The quota refresh loop is
the first instance of this pattern.

A secondary structural gap is also closed: `cache_max_age` (the hook's
expiry threshold) and the refresh interval previously had no enforced
relationship. A new `cache_refresh_interval` config field is introduced
with a structural contract `cache_refresh_interval < cache_max_age`,
making it architecturally impossible for the loop to fall behind the
TTL. This contract is enforced by a test that asserts it directly
against `defaults.yaml`.

## Architecture Impact

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    SESSION_START([Kitchen Session Start])
    SESSION_END([Kitchen Session End])

    %% CACHE DATA NODE %%
    CACHE[("autoskillit_quota_cache.json<br/>━━━━━━━━━━<br/>fetched_at + five_hour.utilization")]

    subgraph KitchenOpen ["● open_kitchen — Kitchen Open Lifecycle"]
        direction TB
        GATE_ON["● gate.enable()<br/>━━━━━━━━━━<br/>Reveal gated MCP tools"]
        WRITE_CFG["_write_hook_config()<br/>━━━━━━━━━━<br/>Write threshold + cache_max_age<br/>to .autoskillit/temp/"]
        PRIME["_prime_quota_cache()<br/>━━━━━━━━━━<br/>check_and_sleep_if_needed()<br/>read-first → fetch if stale → write T=0"]
        START_TASK["● create_background_task(<br/>★ _quota_refresh_loop(config)<br/>)<br/>━━━━━━━━━━<br/>Stored as ctx.quota_refresh_task"]
    end

    subgraph RefreshLoop ["★ _quota_refresh_loop — Background Service (asyncio.Task)"]
        direction TB
        LOOP_SLEEP["asyncio.sleep(<br/>★ cache_refresh_interval<br/>)<br/>━━━━━━━━━━<br/>Structural contract:<br/>interval < cache_max_age"]
        REFRESH["★ _refresh_quota_cache(config)<br/>━━━━━━━━━━<br/>Always fetches unconditionally<br/>(no read-first optimization)<br/>_fetch_quota() → _write_cache()"]
        LOOP_ERR{"Exception<br/>in refresh?"}
        LOG_WARN["logger.warning(<br/>'quota_refresh_loop_error'<br/>)<br/>━━━━━━━━━━<br/>Loop continues"]
    end

    subgraph RunSkill ["● run_skill — MCP Tool Execution"]
        direction TB
        EXEC["executor.run(skill_command)<br/>━━━━━━━━━━<br/>Headless Claude session"]
        AUDIT{"success?"}
        AUDIT_OK["audit.record_success()"]
        AUDIT_FAIL["notify 'run_skill failed'"]
        POST_REFRESH["● background.submit(<br/>★ _refresh_quota_cache(config)<br/>label='quota_post_run_refresh'<br/>)<br/>━━━━━━━━━━<br/>Defense-in-depth:<br/>ensures cache fresh for next hook"]
    end

    subgraph HookProcess ["quota_check.py — PreToolUse Hook Subprocess"]
        direction TB
        READ_CACHE["_read_quota_cache(<br/>max_age=cache_max_age<br/>)<br/>━━━━━━━━━━<br/>Reads cache file from disk"]
        FRESH{"cache age<br/>≤ cache_max_age?"}
        THRESHOLD{"utilization<br/>≥ threshold?"}
        APPROVE["sys.exit(0)<br/>━━━━━━━━━━<br/>→ approve run_skill"]
        DENY["print deny JSON<br/>sys.exit(0)<br/>━━━━━━━━━━<br/>→ block run_skill"]
        FAIL_OPEN["log cache_miss<br/>sys.exit(0)<br/>━━━━━━━━━━<br/>→ fail open (approve)"]
    end

    subgraph KitchenClose ["● close_kitchen — Teardown"]
        direction TB
        CANCEL["● ctx.quota_refresh_task.cancel()<br/>ctx.quota_refresh_task = None<br/>━━━━━━━━━━<br/>Terminates _quota_refresh_loop<br/>via CancelledError"]
        GATE_OFF["gate.disable()<br/>━━━━━━━━━━<br/>Hide gated MCP tools"]
        DEL_CFG["hook_cfg_path.unlink()<br/>━━━━━━━━━━<br/>Remove hook config file"]
    end

    %% MAIN FLOW %%
    SESSION_START --> GATE_ON
    GATE_ON --> WRITE_CFG
    WRITE_CFG --> PRIME
    PRIME -->|"writes T=0"| CACHE
    PRIME --> START_TASK
    START_TASK -.->|"spawns background task"| LOOP_SLEEP

    %% BACKGROUND LOOP %%
    LOOP_SLEEP --> REFRESH
    REFRESH -->|"always writes fresh"| CACHE
    REFRESH --> LOOP_ERR
    LOOP_ERR -->|"no exception"| LOOP_SLEEP
    LOOP_ERR -->|"exception caught"| LOG_WARN
    LOG_WARN --> LOOP_SLEEP

    %% RUN_SKILL FLOW %%
    START_TASK --> EXEC
    EXEC --> AUDIT
    AUDIT -->|"yes"| AUDIT_OK
    AUDIT -->|"no"| AUDIT_FAIL
    AUDIT_OK --> POST_REFRESH
    AUDIT_FAIL --> POST_REFRESH
    POST_REFRESH -->|"fire-and-forget write"| CACHE
    POST_REFRESH --> EXEC

    %% HOOK FLOW %%
    CACHE -.->|"read by subprocess"| READ_CACHE
    READ_CACHE --> FRESH
    FRESH -->|"yes — fresh"| THRESHOLD
    FRESH -->|"no — stale"| FAIL_OPEN
    THRESHOLD -->|"below threshold"| APPROVE
    THRESHOLD -->|"at/above threshold"| DENY

    %% CLOSE FLOW %%
    EXEC -->|"session ends"| CANCEL
    CANCEL -.->|"CancelledError propagates"| LOOP_SLEEP
    CANCEL --> GATE_OFF
    GATE_OFF --> DEL_CFG
    DEL_CFG --> SESSION_END

    %% CLASS ASSIGNMENTS %%
    class SESSION_START,SESSION_END terminal;
    class CACHE stateNode;
    class GATE_ON,WRITE_CFG,PRIME,START_TASK handler;
    class LOOP_SLEEP,LOG_WARN phase;
    class REFRESH,POST_REFRESH newComponent;
    class EXEC,AUDIT_OK,AUDIT_FAIL handler;
    class CANCEL,GATE_OFF,DEL_CFG handler;
    class READ_CACHE integration;
    class FRESH,THRESHOLD,AUDIT,LOOP_ERR detector;
    class APPROVE,DENY,FAIL_OPEN output;
```

**Legend:** ★ New component · ● Modified component

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 58, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    subgraph ConfigContract ["● QuotaGuardConfig — INIT_ONLY Fields (Set Once, Never Mutated)"]
        direction LR
        CFG_MAX_AGE["cache_max_age<br/>━━━━━━━━━━<br/>300 s (default)<br/>Hook TTL — max age<br/>before cache_miss"]
        CFG_INTERVAL["★ cache_refresh_interval<br/>━━━━━━━━━━<br/>240 s (default)<br/>Loop sleep between<br/>proactive writes"]
        STRUCT_CONTRACT["★ STRUCTURAL CONTRACT<br/>━━━━━━━━━━<br/>cache_refresh_interval<br/>MUST be < cache_max_age<br/>(enforced by test)"]
        CFG_INTERVAL -->|"enforced constraint"| STRUCT_CONTRACT
        CFG_MAX_AGE -->|"enforced constraint"| STRUCT_CONTRACT
    end

    subgraph TaskLifecycle ["★ ToolContext.quota_refresh_task — Kitchen-Scoped MUTABLE Field"]
        direction LR
        TASK_NULL_START["State: None<br/>━━━━━━━━━━<br/>Before open_kitchen<br/>No task running"]
        TASK_RUNNING["★ State: asyncio.Task<br/>━━━━━━━━━━<br/>_quota_refresh_loop coroutine<br/>Background refresh active"]
        TASK_NULL_END["State: None<br/>━━━━━━━━━━<br/>After close_kitchen<br/>Task cancelled + cleared"]

        TASK_NULL_START -->|"● open_kitchen:\ncreate_background_task()"| TASK_RUNNING
        TASK_RUNNING -->|"● close_kitchen:\ntask.cancel() + task = None"| TASK_NULL_END
    end

    subgraph CacheStateMachine ["autoskillit_quota_cache.json — Cache File State Transitions"]
        direction TB
        CACHE_ABSENT["State: ABSENT<br/>━━━━━━━━━━<br/>No cache file exists<br/>→ hook: cache_miss (fail-open)"]
        CACHE_FRESH["State: FRESH<br/>━━━━━━━━━━<br/>age ≤ cache_max_age<br/>→ hook: enforce threshold"]
        CACHE_EXPIRING["State: EXPIRING (age approaches max_age)<br/>━━━━━━━━━━<br/>★ Proactive refresh fires<br/>before expiry (interval < max_age)"]
        CACHE_EXPIRED["State: EXPIRED (age > cache_max_age)<br/>━━━━━━━━━━<br/>→ hook: cache_miss (fail-open)<br/>Only possible if loop crashes"]

        CACHE_ABSENT -->|"_prime_quota_cache() at T=0"| CACHE_FRESH
        CACHE_FRESH -->|"age increases over time"| CACHE_EXPIRING
        CACHE_EXPIRING -->|"★ _quota_refresh_loop writes\nbefore expiry (interval < max_age)"| CACHE_FRESH
        CACHE_EXPIRING -->|"★ post-run background.submit()\nafter each run_skill"| CACHE_FRESH
        CACHE_FRESH -.->|"loop crash (telemetry visible)"| CACHE_EXPIRED
        CACHE_EXPIRED -->|"loop recovery or\nnext run_skill post-refresh"| CACHE_FRESH
    end

    subgraph ValidationGates ["Hook Validation Gates (quota_check.py — PreToolUse)"]
        direction TB
        GATE_AGE["● Gate 1: Age Check<br/>━━━━━━━━━━<br/>_read_cache(max_age=cache_max_age)<br/>age ≤ cache_max_age?"]
        GATE_THRESHOLD["Gate 2: Threshold Check<br/>━━━━━━━━━━<br/>utilization < threshold?"]
        GATE_PASS["→ APPROVE<br/>━━━━━━━━━━<br/>sys.exit(0)"]
        GATE_DENY["→ DENY<br/>━━━━━━━━━━<br/>block run_skill"]
        GATE_MISS["→ FAIL-OPEN (cache_miss)<br/>━━━━━━━━━━<br/>Preserved by design\ntelemetry visible via quota_events.jsonl"]

        GATE_AGE -->|"age ≤ max_age"| GATE_THRESHOLD
        GATE_AGE -->|"age > max_age (stale)"| GATE_MISS
        GATE_THRESHOLD -->|"below threshold"| GATE_PASS
        GATE_THRESHOLD -->|"≥ threshold"| GATE_DENY
    end

    %% Cross-subgraph connections %%
    STRUCT_CONTRACT -->|"interval guarantees\ncache never expires\nduring normal operation"| CACHE_EXPIRING
    TASK_RUNNING -->|"★ loop writes every\ncache_refresh_interval"| CACHE_FRESH
    CACHE_FRESH -->|"read by subprocess"| GATE_AGE

    %% CLASS ASSIGNMENTS %%
    class CFG_MAX_AGE,CFG_INTERVAL detector;
    class STRUCT_CONTRACT newComponent;
    class TASK_NULL_START,TASK_NULL_END phase;
    class TASK_RUNNING newComponent;
    class CACHE_ABSENT detector;
    class CACHE_FRESH output;
    class CACHE_EXPIRING gap;
    class CACHE_EXPIRED detector;
    class GATE_AGE,GATE_THRESHOLD stateNode;
    class GATE_PASS output;
    class GATE_DENY handler;
    class GATE_MISS cli;
```

**Legend:** ★ New component · ● Modified component

### Concurrency Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    CACHE_FILE[("autoskillit_quota_cache.json<br/>━━━━━━━━━━<br/>Shared via atomic_write<br/>Cross-process IPC boundary")]

    subgraph EventLoop ["MCP Server — asyncio Event Loop (single-threaded cooperative)"]
        direction TB

        subgraph KitchenLifecycle ["● Kitchen Lifecycle — Coroutines"]
            direction LR
            OPEN["● _open_kitchen_handler()<br/>━━━━━━━━━━<br/>Runs in event loop as coroutine"]
            CLOSE["● _close_kitchen_handler()<br/>━━━━━━━━━━<br/>Runs in event loop as coroutine"]
        end

        subgraph BackgroundTask ["★ Kitchen-Scoped Background Task (asyncio.Task, caller-owned)"]
            direction TB
            CREATE_TASK["★ create_background_task(<br/>_quota_refresh_loop(config)<br/>)<br/>━━━━━━━━━━<br/>asyncio.create_task() — no supervision wrapper<br/>handle stored in ctx.quota_refresh_task"]
            LOOP_CORO["★ _quota_refresh_loop coroutine<br/>━━━━━━━━━━<br/>while True:<br/>  await asyncio.sleep(cache_refresh_interval)<br/>  await _refresh_quota_cache(config)<br/>CancelledError propagates from sleep → exits"]
            CANCEL["● close_kitchen:<br/>ctx.quota_refresh_task.cancel()<br/>━━━━━━━━━━<br/>Delivers CancelledError at\nnext await asyncio.sleep()\nctx.quota_refresh_task = None"]
        end

        subgraph SupervisedTasks ["● DefaultBackgroundSupervisor — Fire-and-Forget (supervised)"]
            direction TB
            SUBMIT["● background.submit(<br/>★ _refresh_quota_cache(config),<br/>label='quota_post_run_refresh'<br/>)<br/>━━━━━━━━━━<br/>Called after each run_skill completes\nReturns immediately (fire-and-forget)"]
            SUPERVISE["_supervise_task wrapper<br/>━━━━━━━━━━<br/>Catches Exception → logs + audits\nCancelledError → write_status('cancelled')"]
        end

        subgraph RunSkillHandler ["● run_skill MCP Tool Handler"]
            direction TB
            EXECUTOR["executor.run(skill_command)<br/>━━━━━━━━━━<br/>await headless session"]
            POST["● Post-completion hook:<br/>background.submit(_refresh_quota_cache)<br/>━━━━━━━━━━<br/>Fire-and-forget refresh triggered<br/>after EVERY run_skill"]
        end
    end

    subgraph HookSubprocess ["quota_check.py — OS Subprocess (separate process, stdlib-only)"]
        direction TB
        HOOK_READ["_read_quota_cache(<br/>max_age=cache_max_age<br/>)<br/>━━━━━━━━━━<br/>Read-only access to cache file\nNo Python object sharing\nNo asyncio — blocking I/O"]
        HOOK_DECIDE{"cache fresh?"}
        HOOK_APPROVE["sys.exit(0) → approve"]
        HOOK_MISS["sys.exit(0) → fail-open"]
    end

    %% FLOW %%
    OPEN --> CREATE_TASK
    CREATE_TASK --> LOOP_CORO
    LOOP_CORO -->|"every cache_refresh_interval\nawait asyncio.sleep() yields to event loop"| LOOP_CORO
    LOOP_CORO -->|"★ unconditional fetch+write"| CACHE_FILE

    EXECUTOR --> POST
    POST --> SUBMIT
    SUBMIT --> SUPERVISE
    SUPERVISE -->|"★ _refresh_quota_cache writes"| CACHE_FILE

    CLOSE --> CANCEL
    CANCEL -.->|"CancelledError at await asyncio.sleep()"| LOOP_CORO

    CACHE_FILE -->|"cross-process read\natomic_write ensures no torn reads"| HOOK_READ
    HOOK_READ --> HOOK_DECIDE
    HOOK_DECIDE -->|"fresh"| HOOK_APPROVE
    HOOK_DECIDE -->|"stale"| HOOK_MISS

    %% CLASS ASSIGNMENTS %%
    class CACHE_FILE stateNode;
    class OPEN,CLOSE handler;
    class CREATE_TASK newComponent;
    class LOOP_CORO newComponent;
    class CANCEL handler;
    class SUBMIT,POST newComponent;
    class SUPERVISE phase;
    class EXECUTOR handler;
    class HOOK_READ integration;
    class HOOK_DECIDE detector;
    class HOOK_APPROVE output;
    class HOOK_MISS phase;
```

**Legend:** ★ New component · ● Modified component

Closes #638

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/remediation-20260405-212016-751478/.autoskillit/temp/rectify/rectify_quota_guard_cache_refresh_2026-04-06_045000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| investigate | 19 | 9.8k | 265.8k | 42.1k | 1 | 6m 47s |
| rectify | 21 | 29.3k | 494.6k | 55.9k | 1 | 10m 19s |
| dry_walkthrough | 2.1k | 13.9k | 1.1M | 81.2k | 1 | 4m 5s |
| implement | 81 | 29.5k | 5.6M | 89.3k | 1 | 10m 11s |
| assess | 67 | 39.4k | 4.7M | 93.7k | 1 | 16m 8s |
| open_pr | 3.0k | 31.2k | 2.2M | 101.1k | 1 | 9m 56s |
| **Total** | 5.3k | 153.2k | 14.4M | 463.2k | | 57m 30s |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
github-actions Bot and others added 28 commits April 14, 2026 17:15
…nv (#921)

## Summary

Single-commit fix for a v0.8.31 regression where `run_cmd` stripped the
entire parent process environment when `step_name` was set. The `_env`
dict passed to `_run_subprocess` contained only `SCENARIO_STEP_NAME`,
replacing all inherited environment variables (`HOME`, `PATH`, etc.).
This broke `gh` CLI authentication and any tool depending on standard
environment variables.

The fix merges `SCENARIO_STEP_NAME` into `os.environ` instead of
replacing it. Two test additions guard against regression: a unit test
verifying `PATH`/`HOME` preservation in the child env, and an AST
structural guard ensuring all `_run_subprocess(env=...)` calls in
`server/` start from `os.environ`.

Closes #915

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| investigate | 48 | 6.6k | 511.0k | 53.7k | 1 | 4m 2s |
| rectify | 5.2k | 9.8k | 591.0k | 59.7k | 1 | 5m 14s |
| dry_walkthrough | 398 | 10.9k | 1.3M | 70.5k | 1 | 4m 41s |
| implement | 66 | 7.3k | 896.0k | 61.3k | 1 | 3m 31s |
| prepare_pr | 25 | 3.7k | 166.6k | 19.8k | 1 | 1m 25s |
| run_arch_lenses | 42 | 4.8k | 472.2k | 35.6k | 1 | 3m 38s |
| compose_pr | 23 | 1.6k | 129.5k | 14.0k | 1 | 41s |
| **Total** | 5.8k | 44.6k | 4.0M | 314.6k | | 23m 14s |

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ypes (#922)

## Summary

`build_interactive_cmd()` is a pure pass-through for environment
variables — it forwards caller-supplied `env_extras` to
`build_claude_env` without injecting any defaults. This is structurally
asymmetric with `build_full_headless_cmd()`, which internally injects
`MAX_MCP_OUTPUT_TOKENS=50000` and `AUTOSKILLIT_HEADLESS=1` before
delegating. The asymmetry means interactive launch paths depend on
caller discipline to inject required vars, and when
`_launch_cook_session` (the `order` command's launch path) was never
updated by PR #910, the gap went undetected.

The fix adds a `_SESSION_BASELINE_ENV` frozen mapping in `commands.py`
that `build_interactive_cmd` and `build_headless_resume_cmd` merge as
defaults. Removes redundant caller-side injection from `_cook.py`. Adds
structural guard tests covering all three session builders.

Closes #916

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/remediation-20260414-084428-773530/.autoskillit/temp/rectify/rectify_max_mcp_output_tokens_interactive_gap_2026-04-14_090500.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| investigate | 2.6k | 8.3k | 628.8k | 48.1k | 1 | 4m 55s |
| rectify | 4.1k | 26.1k | 1.8M | 161.5k | 3 | 12m 42s |
| dry_walkthrough | 3.2k | 10.1k | 1.2M | 78.6k | 1 | 4m 39s |
| implement | 77 | 10.5k | 1.6M | 54.7k | 1 | 4m 19s |
| prepare_pr | 30 | 4.5k | 374.0k | 34.2k | 1 | 1m 34s |
| run_arch_lenses | 49 | 8.2k | 538.5k | 47.7k | 1 | 2m 52s |
| compose_pr | 23 | 1.5k | 128.6k | 15.6k | 1 | 39s |
| **Total** | 10.1k | 69.2k | 6.3M | 440.6k | | 31m 42s |

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary

Every `autoskillit order` session fails its first `open_kitchen` call
due to two independent architectural weaknesses: (1) FastMCP emits
`outputSchema` and `annotations` fields in `tools/list` responses that
trigger Claude Code bug #25081, silently dropping ALL tools — even after
the MCP handshake completes; (2) the cook session auto-submits an
initial message as a positional CLI arg, racing the LLM's first tool
call against MCP tool discovery. The fix introduces a centralized
wire-format sanitization middleware in the FastMCP pipeline, a
wire-format compliance test that prevents regression from dependency
upgrades, and prompt restructuring to eliminate the unconditional "MUST
be first" tool-call directive.

Closes #913

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/remediation-20260414-080347-642271/.autoskillit/temp/rectify/rectify_mcp_init_race_wire_format_2026-04-14_153000.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
… Timeout (#918)

## Summary

`DefaultMergeQueueWatcher.wait()` used a single exception class
(`ClassifierInconclusive`) as a control-flow signal for two semantically
incompatible situations: (1) CI checks are still running — an *expected
transient* state that should poll until the outer `timeout_seconds`
deadline — and (2) no positive classification signal matched at all — a
*genuinely ambiguous* state that warrants a bounded retry ceiling before
giving up. Both raised the same exception and incremented the same
`inconclusive_count` counter, so CI that took longer than
`max_inconclusive_retries × poll_interval` (default: 75 seconds)
received a `timeout` result before the outer deadline was reached.

The fix is a **type-level distinction**: `ClassifierInconclusive` is
split into `CIStillRunning` (transient — no budget consumed) and
`NoPositiveSignal` (ambiguous — bounded budget), caught separately in
`wait()`. Additionally, `inconclusive_count` resets on queue re-entry so
budget from a prior exit episode does not bleed into subsequent ones.

Closes #911

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/remediation-20260414-074614-472907/.autoskillit/temp/rectify/rectify_merge_queue_inconclusive_budget_2026-04-14_083000_part_a.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| investigate | 120 | 8.4k | 389.7k | 40.7k | 1 | 5m 44s |
| rectify | 5.1k | 24.2k | 1.0M | 70.4k | 1 | 9m 13s |
| review | 98 | 8.2k | 346.5k | 39.7k | 1 | 2m 38s |
| dry_walkthrough | 308 | 28.1k | 1.7M | 105.3k | 2 | 8m 31s |
| implement | 492 | 21.1k | 2.6M | 94.0k | 2 | 6m 37s |
| assess | 226 | 8.5k | 1.2M | 42.8k | 1 | 10m 40s |
| prepare_pr | 52 | 5.6k | 145.0k | 25.7k | 1 | 1m 23s |
| run_arch_lenses | 261 | 24.6k | 1.1M | 163.1k | 3 | 7m 6s |
| compose_pr | 51 | 3.4k | 134.0k | 16.2k | 1 | 1m 5s |
| **Total** | 6.7k | 132.1k | 8.7M | 598.0k | | 53m 0s |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary

Adds a new `disable_quota_guard` MCP tool that allows order sessions to
bypass quota guard enforcement when needed. The quota guard hook
(`quota_guard.py`) is updated to respect the new disabled state, and
`_hook_settings.py` is extended to expose the disable flag. Kitchen
tooling, CLI cook path, and type constants are updated to register and
expose the new tool.

Closes #919

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| review_approach | 158 | 12.4k | 1.1M | 98.1k | 1 | 4m 53s |
| make_plan | 195 | 14.2k | 1.4M | 83.3k | 1 | 4m 22s |
| dry_walkthrough | 2.3k | 13.0k | 935.8k | 77.9k | 1 | 3m 52s |
| implement | 570 | 25.7k | 4.6M | 81.3k | 1 | 8m 37s |
| resolve_failures | 226 | 10.6k | 1.4M | 52.7k | 1 | 11m 17s |
| prepare_pr | 101 | 5.3k | 174.6k | 28.2k | 1 | 1m 27s |
| run_arch_lenses | 157 | 13.7k | 597.9k | 43.2k | 1 | 9m 22s |
| compose_pr | 67 | 1.8k | 178.2k | 14.4k | 1 | 43s |
| **Total** | 3.8k | 96.7k | 10.3M | 479.1k | | 44m 35s |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary

Four discrete bugs share a single architectural root: there is no
canonical owner for `installed_plugins.json`. Every caller that touches
the file re-implements its own structural interpretation of the JSON
format, and two of those implementations got the nesting wrong
(`{"plugins": {...}}` vs flat `{}`). The test fixtures matched the wrong
implementations, so the regression passed CI.

The architectural fix is a **typed repository class** —
`InstalledPluginsFile` — that becomes the single authorized interface
for all reads and writes of `installed_plugins.json`. Once all call
sites are routed through this class, the correct nesting is in exactly
one place. A wrong structure cannot be silently introduced by a new call
site: the class's API enforces the contract.

Two companion fixes complete the immunity:
- `install()` must return a typed value distinguishing "complete" from
"deferred (CLAUDECODE)" so callers cannot proceed with misleading
success output.
- `find_broken_hook_scripts` must apply the same `_is_own_hook`
ownership filter that `_extract_script_basenames` already uses,
eliminating false positives on user-defined inline hooks.

Part B (separate task) will cover `ensure_project_temp` call site
completeness across CLI entry points.

## Architecture Impact

### Repository/Data Access Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 65, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    subgraph Callers ["CALLERS"]
        direction TB
        APP["● app.py<br/>━━━━━━━━━━<br/>CLI entry: install,<br/>init, doctor, upgrade"]
        MARKET["● _marketplace.py<br/>━━━━━━━━━━<br/>install() / upgrade()<br/>_clear_plugin_cache()"]
        DOCTOR["● _doctor.py<br/>━━━━━━━━━━<br/>_check_installed_plugins_entry()<br/>_check_hook_registry_drift()<br/>_check_hook_health()"]
    end

    subgraph PluginRepo ["★ PLUGIN REGISTRY (new)"]
        direction TB
        IPF["★ InstalledPluginsFile<br/>━━━━━━━━━━<br/>get_plugins() → read<br/>contains(ref) → read<br/>remove(ref) → read+write<br/>cli/_installed_plugins.py"]
    end

    subgraph HookRepo ["● HOOK REGISTRY (modified)"]
        direction TB
        HREG["● HOOK_REGISTRY<br/>━━━━━━━━━━<br/>List[HookDef] — canonical<br/>source of truth<br/>hook_registry.py"]
        GEN["● generate_hooks_json()<br/>━━━━━━━━━━<br/>HookDef → JSON structure<br/>hook_registry.py"]
        DRIFT["● _count_hook_registry_drift()<br/>━━━━━━━━━━<br/>canonical vs deployed<br/>hook_registry.py"]
        FIND["find_broken_hook_scripts()<br/>━━━━━━━━━━<br/>deployed scripts<br/>existence check"]
        EXTRACT["_extract_script_basenames()<br/>━━━━━━━━━━<br/>settings.json hooks<br/>→ basename set"]
    end

    subgraph IOLayer ["I/O PRIMITIVE"]
        AW["atomic_write()<br/>━━━━━━━━━━<br/>fsync + os.replace<br/>core/io.py"]
    end

    subgraph Storage ["STORAGE"]
        direction TB
        INSTJSON[("installed_plugins.json<br/>━━━━━━━━━━<br/>~/.claude/plugins/<br/>installed_plugins.json")]
        HOOKSJSON[("hooks.json<br/>━━━━━━━━━━<br/>hooks/hooks.json<br/>plugin manifest")]
        SETTINGS[("settings.json<br/>━━━━━━━━━━<br/>~/.claude/settings.json<br/>per-scope hook registration")]
        CLAUDEJSON[("~/.claude.json<br/>━━━━━━━━━━<br/>Legacy mcpServers<br/>config")]
        CACHE[("Plugin Cache Dir<br/>━━━━━━━━━━<br/>~/.claude/plugins/<br/>cache/autoskillit-local/")]
    end

    %% Caller → Repository relationships %%
    APP -->|"invokes"| MARKET
    APP -->|"invokes"| DOCTOR

    MARKET -->|"reads/writes"| IPF
    MARKET -->|"reads"| HREG
    MARKET -->|"writes hooks.json"| AW

    DOCTOR -->|"reads"| IPF
    DOCTOR -->|"reads"| DRIFT
    DOCTOR -->|"reads"| FIND
    DOCTOR -->|"reads"| SETTINGS

    %% Hook Registry internal %%
    HREG -->|"drives"| GEN
    HREG -->|"drives"| DRIFT
    GEN -->|"writes via"| AW
    DRIFT -->|"reads"| EXTRACT
    EXTRACT -->|"parses"| SETTINGS
    FIND -->|"parses"| SETTINGS

    %% Repository → Storage %%
    IPF -->|"reads"| INSTJSON
    IPF -->|"writes via atomic_write"| AW
    AW -->|"writes"| INSTJSON
    AW -->|"writes"| HOOKSJSON

    %% Cache eviction (marketplace) %%
    MARKET -->|"deletes"| CACHE

    %% Doctor reads legacy config %%
    DOCTOR -->|"reads"| CLAUDEJSON

    %% CLASS ASSIGNMENTS %%
    class APP,MARKET cli;
    class DOCTOR detector;
    class IPF newComponent;
    class HREG,GEN stateNode;
    class DRIFT,FIND,EXTRACT phase;
    class AW handler;
    class INSTJSON,HOOKSJSON,SETTINGS,CLAUDEJSON,CACHE integration;
```

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 42, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    INSTALL_START([autoskillit install])
    INSTALL_DEFERRED([DEFERRED<br/>━━━━━━━━━━<br/>user must run manually])
    INSTALL_COMPLETE([INSTALL COMPLETE])
    DOCTOR_START([autoskillit doctor])
    DOCTOR_END([DOCTOR COMPLETE])

    %% ─────────────────────────────────────────────
    %%  INSTALL FLOW
    %% ─────────────────────────────────────────────

    subgraph InstallPreflight ["Install — Preflight"]
        direction TB
        ValidateScope{"● scope valid?<br/>━━━━━━━━━━<br/>user / project / local"}
        EnsureMarketplace["_ensure_marketplace()<br/>━━━━━━━━━━<br/>mkdir ~/.autoskillit/marketplace<br/>write marketplace.json<br/>symlink → pkg_root()"]
        GuardCLAUDECODE{"CLAUDECODE env set?<br/>━━━━━━━━━━<br/>running inside session?"}
        GuardClaude{"claude on PATH?<br/>━━━━━━━━━━<br/>shutil.which('claude')"}
        EnsureWorkspace["_ensure_workspace_ready()<br/>━━━━━━━━━━<br/>ensure_project_temp()<br/>upgrade() if scripts/ present"]
    end

    subgraph CacheClear ["Install — Cache Clearing (●)"]
        direction TB
        RmCacheDir["shutil.rmtree(cache_dir)<br/>━━━━━━━━━━<br/>~/.claude/plugins/cache/<br/>autoskillit-local/autoskillit"]
        InstalledPluginsRemove["★ InstalledPluginsFile.remove()<br/>━━━━━━━━━━<br/>reads installed_plugins.json<br/>deletes 'autoskillit@autoskillit-local'<br/>atomic_write() back"]
        HooksJsonRegen["● generate_hooks_json()<br/>━━━━━━━━━━<br/>iterates HOOK_REGISTRY<br/>writes hooks/hooks.json<br/>embeds HOOK_REGISTRY_HASH"]
    end

    subgraph PluginRegistration ["Install — Plugin Registration"]
        direction TB
        MarketplaceAdd["claude plugin marketplace add<br/>━━━━━━━━━━<br/>subprocess, timeout=30s"]
        PluginInstall["claude plugin install<br/>━━━━━━━━━━<br/>plugin_ref @ scope"]
        EvictDirect["evict_direct_mcp_entry()<br/>━━━━━━━━━━<br/>remove stale direct entry<br/>from ~/.claude.json"]
    end

    subgraph HookSync ["Install — Hook Sync (●)"]
        direction TB
        SweepAllScopes["sweep_all_scopes_for_orphans()<br/>━━━━━━━━━━<br/>iterate user + project + local<br/>remove orphaned hook entries"]
        SyncHooks["sync_hooks_to_settings()<br/>━━━━━━━━━━<br/>write canonical hooks<br/>to target scope settings.json"]
    end

    %% ─────────────────────────────────────────────
    %%  SHARED STATE
    %% ─────────────────────────────────────────────

    subgraph SharedState ["★ InstalledPluginsFile (_installed_plugins.py)"]
        direction LR
        IPF_Read["_read()<br/>━━━━━━━━━━<br/>json.loads(path)<br/>→ {} on missing/error"]
        IPF_Contains["contains(plugin_ref)<br/>━━━━━━━━━━<br/>plugin_ref in get_plugins()"]
        IPF_Remove["remove(plugin_ref)<br/>━━━━━━━━━━<br/>del plugins[ref]<br/>atomic_write() back"]
        InstalledPluginsJSON[("installed_plugins.json<br/>━━━━━━━━━━<br/>~/.claude/plugins/<br/>{version,plugins:{}}")]
    end

    %% ─────────────────────────────────────────────
    %%  DOCTOR FLOW
    %% ─────────────────────────────────────────────

    subgraph DoctorMCP ["Doctor — MCP Checks"]
        direction TB
        ChkStaleMCP["check: stale_mcp_servers<br/>━━━━━━━━━━<br/>dead binary paths in ~/.claude.json"]
        ChkMCPReg["check: mcp_server_registered<br/>━━━━━━━━━━<br/>mcpServers entry OR<br/>claude plugin list"]
        ChkDualReg["● check: dual_mcp_registration<br/>━━━━━━━━━━<br/>direct entry AND plugin both present?<br/>→ WARNING: split-brain"]
        ChkPluginCache["check: plugin_cache_exists<br/>━━━━━━━━━━<br/>~/.claude/plugins/cache/... dir?"]
        ChkInstalledPlugins["● check: installed_plugins_entry<br/>━━━━━━━━━━<br/>InstalledPluginsFile.contains()<br/>'autoskillit@autoskillit-local'?"]
    end

    subgraph DoctorHooks ["Doctor — Hook Checks (●)"]
        direction TB
        ChkHookHealth["check: hook_health<br/>━━━━━━━━━━<br/>find_broken_hook_scripts()<br/>script files exist on disk?"]
        ChkHookReg["check: hook_registration<br/>━━━━━━━━━━<br/>canonical_script_basenames()<br/>all scripts in settings.json?"]
        ChkHookDrift["● check: hook_registry_drift<br/>━━━━━━━━━━<br/>_count_hook_registry_drift()<br/>orphaned | missing hooks<br/>iterates all scopes"]
    end

    subgraph DoctorHealth ["Doctor — Health Checks"]
        direction TB
        ChkVersion["check: version_consistency<br/>━━━━━━━━━━<br/>pkg version == plugin.json version?"]
        ChkConfig["check: project_config<br/>━━━━━━━━━━<br/>.autoskillit/config.yaml exists?"]
        ChkGitignore["check: gitignore_completeness<br/>━━━━━━━━━━<br/>.autoskillit/ entries covered?"]
        ChkQuota["check: quota_cache_schema<br/>━━━━━━━━━━<br/>schema_version drift?"]
        ChkInstallClass["check: install_classification<br/>━━━━━━━━━━<br/>detect_install() → InstallType"]
        ChkSourceDrift["check: source_version_drift<br/>━━━━━━━━━━<br/>commit SHA vs reference SHA<br/>(network + disk-cache TTL)"]
        ChkEditableInst["check: editable_install_source_exists<br/>━━━━━━━━━━<br/>direct_url.json → source path exists?"]
        ChkDismissal["check: update_dismissal_state<br/>━━━━━━━━━━<br/>dismissed_at + window expiry"]
    end

    %% ─────────────────────────────────────────────
    %%  INSTALL EDGES
    %% ─────────────────────────────────────────────
    INSTALL_START --> ValidateScope
    ValidateScope -->|"invalid"| INSTALL_COMPLETE
    ValidateScope -->|"valid"| EnsureMarketplace
    EnsureMarketplace --> GuardCLAUDECODE
    GuardCLAUDECODE -->|"yes — inside session"| INSTALL_DEFERRED
    GuardCLAUDECODE -->|"no"| GuardClaude
    GuardClaude -->|"missing"| INSTALL_COMPLETE
    GuardClaude -->|"found"| EnsureWorkspace
    EnsureWorkspace --> RmCacheDir
    RmCacheDir --> InstalledPluginsRemove
    InstalledPluginsRemove -->|"reads/writes"| InstalledPluginsJSON
    InstalledPluginsRemove --> HooksJsonRegen
    HooksJsonRegen --> MarketplaceAdd
    MarketplaceAdd -->|"returncode != 0"| INSTALL_COMPLETE
    MarketplaceAdd -->|"ok"| PluginInstall
    PluginInstall -->|"returncode != 0"| INSTALL_COMPLETE
    PluginInstall -->|"ok"| EvictDirect
    EvictDirect --> SweepAllScopes
    SweepAllScopes --> SyncHooks
    SyncHooks --> INSTALL_COMPLETE

    %% ─────────────────────────────────────────────
    %%  DOCTOR EDGES
    %% ─────────────────────────────────────────────
    DOCTOR_START --> ChkStaleMCP
    ChkStaleMCP --> ChkMCPReg
    ChkMCPReg --> ChkDualReg
    ChkDualReg --> ChkPluginCache
    ChkPluginCache --> ChkInstalledPlugins
    ChkInstalledPlugins -->|"reads"| IPF_Contains
    IPF_Contains -->|"reads"| IPF_Read
    IPF_Read -->|"reads"| InstalledPluginsJSON
    ChkInstalledPlugins --> ChkHookHealth
    ChkHookHealth --> ChkHookReg
    ChkHookReg --> ChkHookDrift
    ChkHookDrift --> ChkVersion
    ChkVersion --> ChkConfig
    ChkConfig --> ChkGitignore
    ChkGitignore --> ChkQuota
    ChkQuota --> ChkInstallClass
    ChkInstallClass --> ChkSourceDrift
    ChkSourceDrift --> ChkEditableInst
    ChkEditableInst --> ChkDismissal
    ChkDismissal --> DOCTOR_END

    %% ─────────────────────────────────────────────
    %%  SHARED STATE INTERNAL EDGES
    %% ─────────────────────────────────────────────
    IPF_Remove -->|"atomic_write()"| InstalledPluginsJSON

    %% ─────────────────────────────────────────────
    %%  CLASS ASSIGNMENTS
    %% ─────────────────────────────────────────────
    class INSTALL_START,INSTALL_DEFERRED,INSTALL_COMPLETE,DOCTOR_START,DOCTOR_END terminal;
    class ValidateScope,GuardCLAUDECODE,GuardClaude stateNode;
    class EnsureMarketplace,EnsureWorkspace,HooksJsonRegen,MarketplaceAdd,PluginInstall,EvictDirect handler;
    class RmCacheDir,SweepAllScopes,SyncHooks phase;
    class InstalledPluginsRemove,IPF_Read,IPF_Contains,IPF_Remove newComponent;
    class InstalledPluginsJSON output;
    class ChkStaleMCP,ChkMCPReg,ChkDualReg,ChkPluginCache,ChkInstalledPlugins detector;
    class ChkHookHealth,ChkHookReg,ChkHookDrift detector;
    class ChkVersion,ChkConfig,ChkGitignore,ChkQuota,ChkInstallClass,ChkSourceDrift,ChkEditableInst,ChkDismissal phase;
```

Closes #912

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/remediation-20260414-075239-201688/.autoskillit/temp/rectify/rectify_doctor-install-disconnect_2026-04-14_120000_part_a.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| review_approach | 158 | 12.4k | 1.1M | 98.1k | 1 | 4m 53s |
| make_plan | 195 | 14.2k | 1.4M | 83.3k | 1 | 4m 22s |
| dry_walkthrough | 2.3k | 13.0k | 935.8k | 77.9k | 1 | 3m 52s |
| implement | 570 | 25.7k | 4.6M | 81.3k | 1 | 8m 37s |
| resolve_failures | 226 | 10.6k | 1.4M | 52.7k | 1 | 11m 17s |
| prepare_pr | 101 | 5.3k | 174.6k | 28.2k | 1 | 1m 27s |
| run_arch_lenses | 157 | 13.7k | 597.9k | 43.2k | 1 | 9m 22s |
| compose_pr | 67 | 1.8k | 178.2k | 14.4k | 1 | 43s |
| review_pr | 94 | 21.1k | 374.8k | 48.0k | 1 | 6m 6s |
| **Total** | 3.9k | 117.8k | 10.7M | 527.1k | | 50m 42s |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep integration version 0.8.38 (ahead of main's 0.7.0) across
pyproject.toml, plugin.json, and uv.lock.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace direct dict access with entry.get("dismissed_at") and explicit
None check so that a missing key returns False immediately instead of
falling through to the broad except Exception handler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test_check tool handler returns {passed, stdout, stderr} but the
pretty-output formatter was reading the legacy "output" key, silently
receiving empty string. Update formatter to read stdout/stderr matching
the _fmt_run_cmd pattern, and update test payloads accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users with the old quota_guard.threshold key in config.yaml get a
ConfigSchemaError at startup with a difflib hint, but no migration
note existed to guide the transition to the dual-window thresholds
(short_window_threshold / long_window_threshold).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
evict_direct_mcp_entry was called unconditionally in main(), including
on the 'serve' path (MCP server startup). This caused needless reads
of ~/.claude.json on every MCP server spawn. Move it inside the
existing non-serve guard so it only runs on interactive CLI commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_count_hook_registry_drift and _check_hook_health were imported from
_doctor (which itself imports from hook_registry) and re-exported in
__all__ despite no consumers using the autoskillit.cli path. Remove
the unnecessary re-export chain and the private names from __all__.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- _lifespan.py: init bg_tasks=[] before try to prevent NameError in
  finally if imports fail
- tools_ci.py: record step timing before early return when ci_watcher
  or merge_queue_watcher is None
- merge_queue.py: remove dead self._max_inconclusive_retries storage
  and fix _text_has_push_trigger docstring precision claim
- remote_resolver.py: await cancelled io_task after proc.kill() to
  collect CancelledError
- rules_inputs.py: rename rule to hyphenated convention
  (research-output-mode-enum) and use word-boundary-aware regex for
  recommended input check
- _fmt_primitives.py: remove dead _read_hook_config() function
- settings.py: extract _EXIT_GRACE_BUFFER_MS constant with comment
  explaining unit conversion and safety margin
- clone.py: save loop result to avoid redundant _probe_single_remote
  subprocess call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 9 inline if/continue blocks in _detect_dead_outputs with a
consolidated _OBSERVABILITY_CAPTURES frozenset and a single
_is_observability_capture() guard function. The exemption policy is
now inspectable in one place and adding new entries is a one-line
table addition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- kitchen_state.py: add UnicodeDecodeError to sweep_stale_markers
  exception tuple to prevent corrupted binary files from aborting sweep
- _factory.py: defer _is_plugin_installed import to break server→cli
  module-level L3 sibling dependency
- clone_guard.py: skip git clean when git reset fails to preserve
  diagnostic evidence
- experiment_type_registry.py: add isinstance checks for dict-typed
  YAML fields before dict() coercion
- commands.py: document overlap between _HEADLESS_EXCLUSIVE_VARS and
  IDE_ENV_DENYLIST
- _hook_settings.py + _fmt_primitives.py: add cross-reference comments
  linking parallel hook config path definitions
- headless.py: remove redundant comment on _PATH_CAPTURE regex

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Gate rollback: disable gate on open_kitchen setup failure
- Wire compat: strip tool.title alongside output_schema/annotations
- Factory: use keyword arg for DefaultMergeQueueWatcher (consistency)
- tools_ci: standardize exc_info=True logging style
- process.py: make timeout_scope None guard consistent with unconditional access
- recording.py: cleanup TemporaryDirectory on scenario parse failure
- readiness.py: delegate to kitchen_state.get_state_dir() (remove duplication)
- io.py: remove dead _ROOT_GITIGNORE_ENTRIES constant
- contracts.py: promote _RESULT_CAPTURE_RE/_INPUT_REF_RE to public names
- _analysis.py: fix _SIMPLE_WHEN_RE to require paired quotes
- rules_merge.py: BFS multi-hop predecessor search for commit_guard
- rules_graph.py: reachability check for cycle exit targets that loop back
- _update_checks.py: fix stale docstring (network=False → network=True)
- _doctor.py: skip plugin cache check on editable dev installs
- open_kitchen_guard.py: surface marker write failure in hook output
- gate.py: add KillReason.NOT_APPLICABLE for gate/headless errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug fixes:
- anomaly_detection: fire ZOMBIE_PERSISTENT exactly once (== 3, not >= 3)
- tools_issue_lifecycle: add missing 'error' key in remove_label failure
- tools_git: return error when all branch name suffixes 2..100 exhausted
- rules_inputs: use 'ingredient:name' for step_name to avoid misleading output
- clone_registry: add AttributeError to except for non-dict JSON guard
- recording: fix resource leak when make_scenario_player() raises

Slop/docs:
- _lifespan: replace verbatim docstring with meaningful description
- _fmt_status: trim duplicate 3-line routing docstrings to one-liners
- test_server_tool_registration: fix stale "40 tools" count (now 44)

Tests:
- Remove duplicate test_no_anomalies_for_normal_session_still_holds
- Remove redundant atexit assertion from test_recording_runner_recorder_is_public

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cli/_update_checks: _api_sha now tries refs/tags for tag revisions
- config/settings: annotate _EXIT_GRACE_BUFFER_MS as ClassVar[int]
- execution/_process_monitor: cache psutil.Process objects across calls
  so cpu_percent(interval=0) returns meaningful deltas
- hooks/_hook_settings: add ENV_DISABLED env-var override for disabled
- workspace/clone_registry: wrap open+flock in try/except in __enter__
  to prevent fd leak if flock() raises
- recipe/_analysis: extract_blocks accepts precomputed predecessors map
  to avoid duplicate computation; add warning logs for fallback
  entry/exit selection
- recipe/rules_fixing: use deque.popleft() instead of list.pop(0)
- recipe/rules_reachability: use ctx.predecessors in _ancestors();
  _find_capture_producers returns all producers
- recipe/rules_contracts: log warning on unreadable SKILL.md
- server/tools_kitchen: add gate.disable() on start_quota_refresh
  failure for consistency
- server/_factory: make recording ImportError degrade gracefully like
  replay path
- server/_wire_compat: use model_copy() instead of in-place mutation
  to avoid modifying shared FastMCP tool registry objects

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update JSON write site allowlist line numbers for clone_registry and
  tools_kitchen after code changes shifted lines
- Wire compat middleware tests: use model_copy mock returns instead of
  in-place mutation expectations
- Process monitor tests: account for two-call priming pattern with
  cached psutil.Process objects; clear module cache between tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant