test by Trecek · Pull Request #945 · TalonT-Org/AutoSkillit

Trecek · 2026-04-15T13:44:45Z

No description provided.

…ipeline (#611) ## Summary Every recipe (`implementation`, `remediation`, `implementation-groups`, `merge-prs`) previously had an interactive `confirm_cleanup` prompt at its terminal step. When `process-issues` drives batch processing, this halted the pipeline waiting for user input. A `defer_cleanup` flag was designed to bypass it, but made "interrupt the pipeline" the default and "don't interrupt" the opt-in. The fix: remove the interactive cleanup path entirely from all recipes. Every terminal step unconditionally calls `register_clone_status` (success or failure), writing to a shared registry file. After all issues in `process-issues` complete, a single `batch_cleanup_clones` call deletes all success-status clones and preserves all error-status clones. No prompts. No flags. No per-issue decisions. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; START([● process-issues starts batch]) subgraph PerIssue ["Per-Issue Recipe (× N issues)"] direction TB RECIPE["● Recipe Pipeline ━━━━━━━━━━ implementation / remediation implementation-groups / merge-prs plan → implement → test → push → PR → wait"] OUTCOME{"terminal outcome?"} REL_S["● release_issue_success ━━━━━━━━━━ release GitHub issue claim on_success/on_failure → register"] REL_F["● release_issue_failure ━━━━━━━━━━ release on error on_success/on_failure → register_failure"] REG_S["● register_clone_success ━━━━━━━━━━ register_clone_status status='success' on_success/on_failure → done"] REG_F["● register_clone_failure ━━━━━━━━━━ register_clone_status status='error' on_success/on_failure → escalate_stop"] DONE["● done ━━━━━━━━━━ action: stop (success)"] FAIL["● escalate_stop ━━━━━━━━━━ action: stop (failure)"] end REGISTRY[("● clone-cleanup-registry.json ━━━━━━━━━━ .autoskillit/temp/ accumulated entries")] subgraph PostBatch ["● After ALL Batches Complete (process-issues Step 3d)"] direction LR BATCH["● batch_cleanup_clones ━━━━━━━━━━ reads registry deletes status=success clones preserves status=error clones no prompt, one call"] PRESERVED["preserved clones ━━━━━━━━━━ status=error kept for investigation"] DELETED["deleted clones ━━━━━━━━━━ status=success removed disk reclaimed"] end END_OK([COMPLETE]) START --> RECIPE RECIPE --> OUTCOME OUTCOME -->|"success path"| REL_S OUTCOME -->|"failure path"| REL_F REL_S --> REG_S REL_F --> REG_F REG_S -->|"writes status=success"| REGISTRY REG_F -->|"writes status=error"| REGISTRY REG_S --> DONE REG_F --> FAIL DONE -->|"after all issues done"| BATCH FAIL -->|"after all issues done"| BATCH BATCH -->|"reads registry"| REGISTRY BATCH --> PRESERVED BATCH --> DELETED DELETED --> END_OK PRESERVED --> END_OK class START,END_OK terminal; class RECIPE handler; class OUTCOME stateNode; class REL_S,REL_F phase; class REG_S,REG_F,BATCH newComponent; class DONE phase; class FAIL detector; class REGISTRY stateNode; class PRESERVED,DELETED output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start and end states | | Orange | Handler | Recipe pipeline execution | | Teal | State | Decision routing and registry storage | | Purple | Phase | Control flow nodes (release, done) | | Green | New/Modified | ● Modified steps (register, batch cleanup) | | Red | Detector | Failure terminal (escalate_stop) | | Dark Teal | Output | Clone disposition artifacts | ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; START([Pipeline Terminal Step]) subgraph WritePath ["● WRITE: Recipe Terminal Registration (once per clone)"] direction LR REG_S["● register_clone_success ━━━━━━━━━━ INIT_ONLY write status='success' clone_path (immutable)"] REG_F["● register_clone_failure ━━━━━━━━━━ INIT_ONLY write status='error' clone_path (immutable)"] end subgraph Registry ["● Registry File — APPEND_ONLY during run"] direction TB ENTRY["● clone-cleanup-registry.json ━━━━━━━━━━ entries: [{clone_path, status, step_name, timestamp}] written N times (once per clone) never mutated after write"] end subgraph ReadPath ["● READ: Batch Cleanup (once, post-run)"] direction LR BATCH["● batch_cleanup_clones ━━━━━━━━━━ reads all entries partitions by status"] GATE{"status?"} DEL["delete clone dir ━━━━━━━━━━ status=success disk reclaimed"] KEEP["preserve clone dir ━━━━━━━━━━ status=error for investigation"] end subgraph Contracts ["Contract Cards (recipe input contracts)"] direction LR C1["★ contracts/implementation-groups.yaml ━━━━━━━━━━ NEW — no defer_cleanup no registry_path"] C2["● contracts/implementation.yaml ━━━━━━━━━━ updated — removed defer_cleanup, registry_path"] C3["● contracts/remediation.yaml ━━━━━━━━━━ updated — removed defer_cleanup, registry_path"] C4["● contracts/merge-prs.yaml ━━━━━━━━━━ updated — removed defer_cleanup registry_path, keep_clone_on_failure"] end ELIMINATED["ELIMINATED state ━━━━━━━━━━ defer_cleanup ingredient registry_path ingredient keep_clone_on_failure ingredient check_defer_cleanup step confirm_cleanup step"] END_OK([COMPLETE]) START -->|"success terminal"| REG_S START -->|"failure terminal"| REG_F REG_S -->|"appends entry"| ENTRY REG_F -->|"appends entry"| ENTRY ENTRY -->|"read once post-run"| BATCH BATCH --> GATE GATE -->|"status=success"| DEL GATE -->|"status=error"| KEEP DEL --> END_OK KEEP --> END_OK C1 -.->|"contract enforces"| REG_S C2 -.->|"contract enforces"| REG_S C3 -.->|"contract enforces"| REG_S C4 -.->|"contract enforces"| REG_S ELIMINATED -.->|"no longer written"| ENTRY class START,END_OK terminal; class REG_S,REG_F,BATCH newComponent; class ENTRY stateNode; class GATE stateNode; class DEL,KEEP output; class C1 phase; class C2,C3,C4 phase; class ELIMINATED detector; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Pipeline start and end | | Green | ● Modified / New | register steps and batch cleanup (this PR) | | Teal | State | Registry file and status decision | | Purple | Phase | Contract card files | | Dark Teal | Output | Clone disposition outcomes | | Red | Eliminated | State that no longer exists | Closes #610 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-185031-682892/.autoskillit/temp/make-plan/process_issues_defer_clone_cleanup_plan_2026-04-04_000000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 36 | 16.9k | 1.4M | 1 | 6m 15s | | **Total** | 10.1k | 383.2k | 42.0M | | 2h 51m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…ocal (#612) ## Summary Move `smoke-test.yaml` and its companion artifacts (contract card, flow diagram) from the bundled `src/autoskillit/recipes/` directory to the project-local `.autoskillit/recipes/` directory. This makes smoke-test invisible to end-user projects while remaining fully functional when running from the AutoSkillit repository root. The existing project-local recipe discovery mechanism already supports this — no production code changes are needed. All changes are file relocations and test updates. ## Requirements ### MOVE — Recipe File Relocation - **REQ-MOVE-001:** The file `src/autoskillit/recipes/smoke-test.yaml` must be relocated to `.autoskillit/recipes/smoke-test.yaml` at the project root. - **REQ-MOVE-002:** Associated contract card(s) in `src/autoskillit/recipes/contracts/` matching `smoke-test*` must be relocated to `.autoskillit/recipes/contracts/`. - **REQ-MOVE-003:** Associated diagram(s) in `src/autoskillit/recipes/diagrams/` matching `smoke-test*` must be relocated to `.autoskillit/recipes/diagrams/`. ### LIST — Listing Behavior - **REQ-LIST-001:** The smoke-test recipe must not appear in `list_recipes` output when the current working directory is outside the AutoSkillit repository. - **REQ-LIST-002:** The smoke-test recipe must appear in `list_recipes` output with source `PROJECT` when the current working directory is the AutoSkillit repository root. ### LOAD — Pipeline Compatibility - **REQ-LOAD-001:** `load_recipe("smoke-test")` must succeed when invoked from the AutoSkillit repository root. - **REQ-LOAD-002:** Existing smoke-test pipeline execution must remain functionally identical after the move. ### TEST — Test Updates - **REQ-TEST-001:** Tests that assert smoke-test has `RecipeSource.BUILTIN` must be updated to assert `RecipeSource.PROJECT`. - **REQ-TEST-002:** Tests that count the number of bundled recipes must be updated to reflect the removal of smoke-test from the bundled set. ## Architecture Impact ### Operational Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; START(["list_recipes / find_recipe_by_name called"]) subgraph ProjectLocal ["★ PROJECT-LOCAL SCAN (priority 1)"] direction TB PROJ_DIR["★ .autoskillit/recipes/ ━━━━━━━━━━ source = PROJECT ★ smoke-test.yaml (moved here)"] PROJ_CONTRACT["★ .autoskillit/recipes/contracts/ ━━━━━━━━━━ ★ smoke-test.yaml"] PROJ_DIAGRAM["★ .autoskillit/recipes/diagrams/ ━━━━━━━━━━ ★ smoke-test.md"] end subgraph Bundled ["BUNDLED SCAN (priority 2)"] direction TB BUILTIN_DIR["src/autoskillit/recipes/ ━━━━━━━━━━ source = BUILTIN implementation, remediation, merge-prs, impl-groups (smoke-test removed)"] end DEDUP["Dedup via seen set ━━━━━━━━━━ Project names shadow bundled"] subgraph AutoskillitRepo ["AUTOSKILLIT REPO CONTEXT"] direction TB CLI_LIST["● autoskillit recipes list ━━━━━━━━━━ Shows smoke-test (source: project)"] CLI_ORDER["autoskillit order ━━━━━━━━━━ Pipeline execution menu"] CLI_RENDER["autoskillit recipes render ━━━━━━━━━━ _recipes_dir_for(PROJECT) → .autoskillit/recipes/diagrams/"] end subgraph ExternalProject ["EXTERNAL PROJECT CONTEXT"] direction TB EXT_LIST["autoskillit recipes list ━━━━━━━━━━ smoke-test NOT visible (no project-local copy)"] end START --> PROJ_DIR PROJ_DIR --> DEDUP DEDUP --> BUILTIN_DIR PROJ_DIR --> PROJ_CONTRACT PROJ_DIR --> PROJ_DIAGRAM DEDUP --> CLI_LIST DEDUP --> CLI_ORDER CLI_RENDER --> PROJ_DIAGRAM DEDUP --> EXT_LIST class START terminal; class PROJ_DIR,PROJ_CONTRACT,PROJ_DIAGRAM newComponent; class BUILTIN_DIR stateNode; class DEDUP handler; class CLI_LIST,CLI_ORDER,CLI_RENDER cli; class EXT_LIST detector; ``` ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; subgraph Tests ["TESTS (modified ●)"] direction TB T_SMOKE["● test_smoke_pipeline.py ━━━━━━━━━━ uses SMOKE_SCRIPT → project-local path"] T_BUNDLED["● test_bundled_recipes.py ━━━━━━━━━━ smoke_yaml fixture → project-local path"] T_POLICY["● test_bundled_recipe_hidden_policy.py ━━━━━━━━━━ BUNDLED_RECIPE_NAMES smoke-test removed"] T_TOOLS["● test_tools_recipe.py ━━━━━━━━━━ list_recipes assertion smoke-test NOT in bundled"] T_ENGINE["● test_engine.py ━━━━━━━━━━ contract adapter test → project-local path"] end subgraph L3 ["L3 — SERVER"] direction TB TOOLS_RECIPE["server.tools_recipe ━━━━━━━━━━ list_recipes, load_recipe validate_recipe"] end subgraph L2R ["L2 — RECIPE"] direction TB RECIPE_IO["recipe.io ━━━━━━━━━━ builtin_recipes_dir() list_recipes()"] RECIPE_VALIDATOR["recipe.validator ━━━━━━━━━━ run_semantic_rules analyze_dataflow"] RECIPE_CONTRACTS["recipe.contracts ━━━━━━━━━━ load_bundled_manifest"] end subgraph L2M ["L2 — MIGRATION"] direction TB MIG_ENGINE["migration.engine ━━━━━━━━━━ default_migration_engine contract adapters"] end subgraph L0 ["L0 — CORE"] direction TB CORE_PATHS["core.paths ━━━━━━━━━━ pkg_root() → bundled dir fan-in: all layers"] end subgraph Artifacts ["★ PROJECT-LOCAL ARTIFACTS (new)"] direction TB PROJ_RECIPE["★ .autoskillit/recipes/ ━━━━━━━━━━ smoke-test.yaml"] PROJ_CONTRACT["★ .autoskillit/recipes/contracts/ ━━━━━━━━━━ smoke-test.yaml"] PROJ_DIAGRAM["★ .autoskillit/recipes/diagrams/ ━━━━━━━━━━ smoke-test.md"] end T_SMOKE -->|"imports"| TOOLS_RECIPE T_SMOKE -->|"imports"| RECIPE_IO T_BUNDLED -->|"imports"| RECIPE_IO T_BUNDLED -->|"imports"| RECIPE_CONTRACTS T_POLICY -->|"imports"| CORE_PATHS T_TOOLS -->|"imports"| TOOLS_RECIPE T_ENGINE -->|"imports"| CORE_PATHS T_ENGINE -->|"imports"| MIG_ENGINE TOOLS_RECIPE -->|"imports"| RECIPE_IO RECIPE_IO -->|"builtin_recipes_dir()"| CORE_PATHS RECIPE_VALIDATOR -->|"imports"| RECIPE_IO RECIPE_CONTRACTS -->|"imports"| RECIPE_IO MIG_ENGINE -->|"imports"| CORE_PATHS T_SMOKE -.->|"now reads"| PROJ_RECIPE T_BUNDLED -.->|"now reads"| PROJ_RECIPE T_ENGINE -.->|"now reads"| PROJ_CONTRACT class T_SMOKE,T_BUNDLED,T_POLICY,T_TOOLS,T_ENGINE phase; class TOOLS_RECIPE cli; class RECIPE_IO,RECIPE_VALIDATOR,RECIPE_CONTRACTS handler; class MIG_ENGINE handler; class CORE_PATHS stateNode; class PROJ_RECIPE,PROJ_CONTRACT,PROJ_DIAGRAM newComponent; ``` Closes #600 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-190817-394673/.autoskillit/temp/make-plan/move_smoke_test_recipe_plan_2026-04-04_190817.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 74 | 37.1k | 3.0M | 2 | 12m 44s | | **Total** | 10.1k | 403.4k | 43.6M | | 2h 58m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…Type in review-design (#614) ## Summary The `review-design` skill has L1 severity calibration that correctly caps `estimand_clarity` and `hypothesis_falsifiability` by `experiment_type` — benchmarks can never produce L1 critical findings. But the red-team dimension has **no analogous calibration**, meaning any critical red-team finding triggers STOP regardless of experiment type. This creates an unresolvable loop for benchmarks: the red-team always finds new critical issues at progressively higher abstraction (the Hydra pattern), exhausting retries without ever producing GO. The fix adds a red-team severity calibration rubric to `review-design/SKILL.md` (mirroring the L1 rubric), updates the verdict logic to apply the cap before building `stop_triggers`, and adds diminishing-return awareness to `resolve-design-review/SKILL.md` so it can detect goalposts-moving across rounds. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([Plan submitted]) GO([GO → execute]) REVISE_OUT([REVISE → revise_design]) REVISED_OUT([revised → revise_design]) FAILED_OUT([failed → design_rejected]) subgraph ReviewDesign ["● review-design/SKILL.md"] direction TB L1["L1 Analysis ━━━━━━━━━━ estimand_clarity + hypothesis_falsifiability"] L1GATE{"L1 Fail-Fast ━━━━━━━━━━ Any L1 critical?"} PARALLEL["L2 + L3 + L4 + RT ━━━━━━━━━━ Parallel analysis"] RTCAP["● RT Severity Cap ━━━━━━━━━━ RT_MAX_SEVERITY[experiment_type] Downgrade if above ceiling"] MERGE["Merge + Dedup ━━━━━━━━━━ All findings pooled"] VERDICT{"● Verdict Logic ━━━━━━━━━━ stop_triggers built AFTER rt_cap applied"} end subgraph ResolveDesign ["● resolve-design-review/SKILL.md"] direction TB PARSE["Step 1: Parse Dashboard ━━━━━━━━━━ Extract stop-trigger findings Classify ADDRESSABLE/STRUCTURAL/DISCUSS"] DIMCHECK{"prior_revision_guidance ━━━━━━━━━━ provided?"} DIMRET["● Step 1.5: Diminishing-Return ━━━━━━━━━━ Compare ADDRESSABLE themes vs prior guidance entries"] GOALPOST{"goalposts_moving ━━━━━━━━━━ true for any finding?"} RECLASSIFY["● Reclassify ━━━━━━━━━━ ADDRESSABLE → STRUCTURAL annotate prior_theme_match"] RESGATE{"Any ADDRESSABLE or DISCUSS?"} end subgraph RecipeRouting ["● research.yaml — resolve_design_review step"] direction LR RECIPE["skill_command passes ━━━━━━━━━━ $context.revision_guidance as optional 3rd arg"] end START --> L1 L1 --> L1GATE L1GATE -->|"yes (L1 critical)"| MERGE L1GATE -->|"no"| PARALLEL PARALLEL --> RTCAP RTCAP --> MERGE MERGE --> VERDICT VERDICT -->|"stop_triggers present"| RECIPE VERDICT -->|"critical or ≥3 warnings"| REVISE_OUT VERDICT -->|"otherwise"| GO RECIPE --> PARSE PARSE --> DIMCHECK DIMCHECK -->|"yes"| DIMRET DIMCHECK -->|"no (round 1)"| RESGATE DIMRET --> GOALPOST GOALPOST -->|"true"| RECLASSIFY GOALPOST -->|"false"| RESGATE RECLASSIFY --> RESGATE RESGATE -->|"yes"| REVISED_OUT RESGATE -->|"all STRUCTURAL"| FAILED_OUT class START,GO,REVISE_OUT,REVISED_OUT,FAILED_OUT terminal; class L1,PARALLEL handler; class L1GATE,VERDICT,DIMCHECK,GOALPOST,RESGATE stateNode; class MERGE,PARSE phase; class RTCAP,DIMRET,RECLASSIFY newComponent; class RECIPE detector; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start and outcome states | | Orange | Handler | Analysis agents (L1, parallel L2-L4+RT) | | Teal | State | Decision points and verdict routing | | Purple | Phase | Merge and parse aggregation steps | | Green | Modified Component | ● Nodes changed by this PR (RT cap, diminishing-return detection, reclassify, recipe routing) | | Red | Detector | Recipe routing gate (passes revision_guidance) | Closes #609 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-185816-184240/.autoskillit/temp/make-plan/add-red-team-severity-calibration-by-experiment-type_plan_2026-04-04_185816.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 135 | 68.4k | 5.4M | 4 | 23m 1s | | review_pr | 31 | 22.8k | 1.2M | 1 | 5m 50s | | **Total** | 10.2k | 457.5k | 47.2M | | 3h 14m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

#615) ## Summary The token summary table (displayed in PRs, terminal, and compact KV output) collapses 4 distinct Claude API token fields into 3 misleading columns. The column labeled "input" actually shows only the tiny uncached delta (`input_tokens`), and "cached" silently sums two cost-distinct categories (`cache_read_input_tokens` at 0.1x billing + `cache_creation_input_tokens` at 1.25x billing). This change splits the display into 4 token columns — `uncached`, `output`, `cache_read`, `cache_write` — across all 3 independent formatter implementations and their tests. No data model, extraction, or storage changes are needed — `TokenEntry` already preserves all 4 fields. This is purely a formatting-layer fix. ## Architecture Impact ### Data Lineage Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart LR classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph API ["Claude API Response"] direction TB F1["input_tokens ━━━━━━━━━━ Uncached delta"] F2["output_tokens ━━━━━━━━━━ Generated tokens"] F3["cache_read_input_tokens ━━━━━━━━━━ 0.1x billing"] F4["cache_creation_input_tokens ━━━━━━━━━━ 1.25x billing"] end subgraph Storage ["TokenEntry Storage"] TE[("TokenEntry ━━━━━━━━━━ 4 fields intact Accumulated per step")] TJ[("token_usage.json ━━━━━━━━━━ Persisted session data All 4 fields")] end subgraph Canonical ["● telemetry_fmt.py (Canonical Formatter)"] direction TB FMD["● format_token_table() ━━━━━━━━━━ Markdown table Step|uncached|output|cache_read|cache_write|count|time"] FTM["● format_token_table_terminal() ━━━━━━━━━━ Terminal table UNCACHED|OUTPUT|CACHE_RD|CACHE_WR"] FKV["● format_compact_kv() ━━━━━━━━━━ Compact KV uc:|out:|cr:|cw:"] end subgraph Hooks ["Stdlib Hooks (no autoskillit imports)"] direction TB TSA["● token_summary_appender._format_table() ━━━━━━━━━━ Reads token_usage.json Markdown table → GitHub PR body"] POS["● pretty_output._fmt_get_token_summary() ━━━━━━━━━━ Reads get_token_summary JSON Compact KV → PostToolUse"] POR["● pretty_output._fmt_run_skill() ━━━━━━━━━━ Reads run_skill result dict Inline KV → PostToolUse"] end subgraph Outputs ["Display Targets"] direction TB MD["PR Body ━━━━━━━━━━ GitHub markdown table"] TERM["Terminal ━━━━━━━━━━ Padded column output"] KV["Compact KV ━━━━━━━━━━ One-liner summaries"] HOOK["PostToolUse Output ━━━━━━━━━━ Hook-formatted display"] end F1 --> TE F2 --> TE F3 --> TE F4 --> TE TE --> TJ TE --> FMD TE --> FTM TE --> FKV TJ --> TSA TJ -.-> POS FMD -->|"markdown rows"| MD FTM -->|"padded columns"| TERM FKV -->|"kv lines"| KV TSA -->|"gh api PATCH"| MD POS -->|"formatted text"| HOOK POR -->|"formatted text"| HOOK class F1,F2,F3,F4 cli; class TE,TJ stateNode; class FMD,FTM,FKV handler; class TSA,POS,POR integration; class MD,TERM,KV,HOOK output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | API Fields | 4 Claude API token categories from usage response | | Teal | Storage | TokenEntry dataclass + persisted JSON session files | | Orange | Canonical Formatter | 3 functions in telemetry_fmt.py (all ● modified) | | Red | Stdlib Hooks | Independent hook implementations (all ● modified) | | Dark Teal | Outputs | Display targets: PR body, terminal, compact KV, PostToolUse | ### Operational Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph Triggers ["OPERATOR TRIGGERS"] direction TB GTS["get_token_summary ━━━━━━━━━━ MCP tool call format=json|markdown"] RS["run_skill ━━━━━━━━━━ MCP tool call Headless session"] PRPATCH["PR body update ━━━━━━━━━━ After open-pr skill PostToolUse event"] end subgraph State ["TOKEN STATE (read/write)"] direction TB TL[("DefaultTokenLog ━━━━━━━━━━ In-memory accumulator 4 fields per step")] TJ[("token_usage.json ━━━━━━━━━━ Per-session disk files Read by stdlib hooks")] end subgraph Formatters ["● FORMATTERS (modified)"] direction TB TF["● telemetry_fmt.py ━━━━━━━━━━ format_token_table() format_token_table_terminal() format_compact_kv()"] TSA["● token_summary_appender.py ━━━━━━━━━━ _format_table() Stdlib-only hook"] PO["● pretty_output.py ━━━━━━━━━━ _fmt_get_token_summary() _fmt_run_skill()"] end subgraph Outputs ["OBSERVABILITY OUTPUTS (write-only)"] direction TB MDTBL["PR Body Table ━━━━━━━━━━ ## Token Usage Summary Step|uncached|output|cache_read|cache_write|count|time"] TERM["Terminal Table ━━━━━━━━━━ STEP UNCACHED OUTPUT CACHE_RD CACHE_WR COUNT TIME Padded for readability"] KV["Compact KV ━━━━━━━━━━ name xN [uc:X out:X cr:X cw:X t:Xs] total_uncached / total_cache_read / total_cache_write"] HOOK["PostToolUse Display ━━━━━━━━━━ tokens_uncached: tokens_cache_read: tokens_cache_write:"] end GTS -->|"reads"| TL TL -.->|"flush"| TJ TJ -->|"load_sessions"| TSA TJ -.->|"via MCP JSON payload"| PO GTS --> TF TF -->|"markdown"| MDTBL TF -->|"terminal"| TERM TF -->|"compact"| KV RS -->|"PostToolUse event"| PO PO -->|"_fmt_run_skill"| HOOK PO -->|"_fmt_get_token_summary"| KV PRPATCH -->|"PostToolUse event"| TSA TSA -->|"gh api PATCH"| MDTBL class GTS,RS,PRPATCH cli; class TL,TJ stateNode; class TF,TSA,PO handler; class MDTBL,TERM,KV,HOOK output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Triggers | Operator-initiated MCP tool calls and PostToolUse events | | Teal | State | Token accumulator (read/write) and persisted JSON files | | Orange | Formatters | 3 modified formatter implementations (all ● changed) | | Dark Teal | Outputs | Write-only observability artifacts: PR table, terminal, compact KV | Closes #604 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-190817-266225/.autoskillit/temp/make-plan/token_summary_4_columns_plan_2026-04-04_191000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

## Summary Add `api-simulator` as a dev dependency and use its `mock_http_server` pytest fixture to test the quota guard's real HTTP path end-to-end. Currently all quota tests monkeypatch `_fetch_quota` at the function level — the actual httpx client construction, header injection (`Authorization: Bearer`, `anthropic-beta`), response parsing, and error handling are never exercised. This plan introduces a `base_url` parameter to `_fetch_quota` and `check_and_sleep_if_needed`, then writes 7 tests that point the real httpx client at `mock_http_server` to exercise the full HTTP path. **Files changed:** 3 (`pyproject.toml`, `src/autoskillit/execution/quota.py`, new `tests/execution/test_quota_http.py`) **Existing tests:** Unchanged — all monkeypatch-based tests in `test_quota.py` remain as-is. ## Requirements ### DEP — Dependency Integration - **REQ-DEP-001:** The system must include `api-simulator` as a dev-only dependency with a pinned git tag source. - **REQ-DEP-002:** The api-simulator dependency must not appear in production runtime dependencies. ### CFG — URL Configurability - **REQ-CFG-001:** `_fetch_quota` must accept a `base_url` parameter defaulting to `https://api.anthropic.com`. - **REQ-CFG-002:** `check_and_sleep_if_needed` must thread the `base_url` parameter through to `_fetch_quota` at both call sites. - **REQ-CFG-003:** The production behavior must be unchanged when `base_url` is not explicitly provided. ### HTTP — HTTP Path Verification - **REQ-HTTP-001:** Tests must exercise the real httpx client construction path, not monkeypatch `_fetch_quota`. - **REQ-HTTP-002:** Tests must verify that the `Authorization: Bearer` header is sent on the request. - **REQ-HTTP-003:** Tests must verify that the `anthropic-beta: oauth-2025-04-20` header is sent on the request. - **REQ-HTTP-004:** Tests must verify correct JSON response parsing for the `five_hour` utilization shape. ### ERR — Error Handling Verification - **REQ-ERR-001:** Tests must verify fail-open behavior on HTTP 4xx/5xx responses. - **REQ-ERR-002:** Tests must verify fail-open behavior on network timeout. - **REQ-ERR-003:** Tests must verify that the above-threshold path triggers a double-fetch (two HTTP requests). ### COMPAT — Backward Compatibility - **REQ-COMPAT-001:** Existing `test_quota.py` tests must continue to pass unchanged. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([START: check_and_sleep_if_needed]) subgraph GatePhase ["Gate Phase"] direction TB ENABLED{"config.enabled?"} DISABLED(["RETURN should_sleep: false"]) end subgraph CachePhase ["Cache Phase"] direction TB CACHE["_read_cache ━━━━━━━━━━ Read local JSON cache"] CACHE_HIT{"Cache fresh? ━━━━━━━━━━ age ≤ max_age?"} end subgraph FetchPhase ["HTTP Fetch Phase"] direction TB FETCH["● _fetch_quota ━━━━━━━━━━ ★ base_url parameter httpx.AsyncClient GET"] BASEURL["★ base_url ━━━━━━━━━━ default: api.anthropic.com test: mock_http_server.url"] PARSE["Parse Response ━━━━━━━━━━ five_hour.utilization Z→+00:00 normalization"] end subgraph DecisionPhase ["Threshold Decision"] direction TB THRESHOLD{"utilization ≥ threshold?"} RESETS_AT1{"resets_at is None? (Gate 1)"} REFETCH["● _fetch_quota re-fetch ━━━━━━━━━━ ★ base_url threaded Double-fetch for accuracy"] RESETS_AT2{"resets_at still None? (Gate 2)"} end subgraph Results ["Results"] BELOW(["RETURN should_sleep: false"]) FALLBACK1(["RETURN should_sleep: true reason: unknown_reset fallback ≥ 60s"]) FALLBACK2(["RETURN should_sleep: true reason: unknown_reset fallback ≥ 60s"]) SLEEP(["RETURN should_sleep: true sleep_seconds computed"]) FAILOPEN(["RETURN should_sleep: false error key present"]) end subgraph TestInfra ["★ Test Infrastructure (test_quota_http.py)"] direction TB MOCK["★ mock_http_server ━━━━━━━━━━ api-simulator fixture HTTP server"] REGISTER["★ register / register_sequence ━━━━━━━━━━ Custom endpoint responses Status codes, delays"] INSPECT["★ get_requests / request_count ━━━━━━━━━━ Header verification Double-fetch assertion"] end START --> ENABLED ENABLED -->|"false"| DISABLED ENABLED -->|"true"| CACHE CACHE --> CACHE_HIT CACHE_HIT -->|"fresh + below threshold"| BELOW CACHE_HIT -->|"miss or expired"| FETCH FETCH --> BASEURL BASEURL --> PARSE PARSE --> THRESHOLD THRESHOLD -->|"below"| BELOW THRESHOLD -->|"above"| RESETS_AT1 RESETS_AT1 -->|"None"| FALLBACK1 RESETS_AT1 -->|"present"| REFETCH REFETCH --> RESETS_AT2 RESETS_AT2 -->|"None"| FALLBACK2 RESETS_AT2 -->|"present"| SLEEP FETCH -.->|"HTTP error / timeout"| FAILOPEN MOCK -.->|"serves responses to"| BASEURL REGISTER -.->|"configures"| MOCK INSPECT -.->|"verifies headers / count"| FETCH class START terminal; class DISABLED,BELOW,FALLBACK1,FALLBACK2,SLEEP,FAILOPEN phase; class ENABLED,CACHE_HIT,THRESHOLD,RESETS_AT1,RESETS_AT2 stateNode; class CACHE,PARSE handler; class FETCH,REFETCH handler; class BASEURL,MOCK,REGISTER,INSPECT newComponent; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Entry point | | Teal | State | Decision points and routing | | Orange | Handler | Processing nodes (cache read, HTTP fetch, parse) | | Green | New Component | ★ New `base_url` parameter and test infrastructure | | Purple | Phase | Result return paths | ### Development Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; subgraph Deps ["● DEPENDENCY MANIFEST (pyproject.toml)"] direction TB PYPROJECT["● pyproject.toml ━━━━━━━━━━ hatchling build backend requires-python ≥ 3.11"] DEVDEPS["● dev optional-dependencies ━━━━━━━━━━ pytest, pytest-asyncio, pytest-httpx, pytest-xdist, pytest-timeout, ruff, import-linter, packaging"] APISIM["★ api-simulator ━━━━━━━━━━ New dev dependency HTTP mock fixture provider"] UVSRC["★ [tool.uv.sources] ━━━━━━━━━━ api-simulator pinned git: TalonT-Org/api-simulator branch: main"] UVLOCK["● uv.lock ━━━━━━━━━━ Regenerated with api-simulator entry"] end subgraph Quality ["CODE QUALITY GATES (pre-commit)"] direction TB FORMAT["ruff format ━━━━━━━━━━ Auto-fix code style reads + modifies src"] LINT["ruff check ━━━━━━━━━━ Auto-fix lint violations reads + modifies src"] TYPES["mypy ━━━━━━━━━━ Type checking reads src, reports only"] UVCHECK["uv lock check ━━━━━━━━━━ Verifies lockfile sync reads uv.lock"] SECRETS["gitleaks ━━━━━━━━━━ Secret scanning reads staged files"] IMPORTLINT["import-linter ━━━━━━━━━━ Layer contract enforcement IL-001 through IL-007"] end subgraph Testing ["TEST FRAMEWORK"] direction TB PYTEST["pytest + pytest-asyncio ━━━━━━━━━━ asyncio_mode=auto timeout=60s signal"] XDIST["pytest-xdist -n 4 ━━━━━━━━━━ Parallel test workers worksteal distribution"] UNITQUOTA["● test_quota.py ━━━━━━━━━━ 23 unit tests monkeypatch _fetch_quota mock signature updated"] HTTPQUOTA["★ test_quota_http.py ━━━━━━━━━━ 7 end-to-end HTTP tests real httpx client path no monkeypatching"] MOCKSERVER["★ mock_http_server fixture ━━━━━━━━━━ api-simulator provides register / register_sequence get_requests / request_count"] end subgraph EntryPoints ["ENTRY POINTS"] CLI["autoskillit CLI ━━━━━━━━━━ autoskillit.cli:main"] end PYPROJECT --> DEVDEPS DEVDEPS --> APISIM APISIM --> UVSRC UVSRC --> UVLOCK PYPROJECT --> FORMAT FORMAT --> LINT LINT --> TYPES TYPES --> UVCHECK UVCHECK --> SECRETS SECRETS --> IMPORTLINT IMPORTLINT --> PYTEST PYTEST --> XDIST XDIST --> UNITQUOTA XDIST --> HTTPQUOTA APISIM -.->|"provides fixture"| MOCKSERVER MOCKSERVER -.->|"injected into"| HTTPQUOTA PYPROJECT --> CLI class PYPROJECT,DEVDEPS,UVLOCK phase; class APISIM,UVSRC,HTTPQUOTA,MOCKSERVER newComponent; class UNITQUOTA handler; class FORMAT,LINT,TYPES,UVCHECK,SECRETS,IMPORTLINT detector; class PYTEST,XDIST handler; class CLI output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Purple | Build Config | pyproject.toml, dev deps, lockfile | | Green | New Component | ★ api-simulator dep, uv.sources, HTTP test file, mock fixture | | Orange | Test Framework | pytest, xdist, existing test_quota.py | | Red | Quality Gates | ruff, mypy, uv lock check, gitleaks, import-linter | | Dark Teal | Entry Points | CLI entry point | Closes #607 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260404-190816-816130/.autoskillit/temp/make-plan/integrate_api_simulator_quota_guard_plan_2026-04-04_191500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | plan | 5.5k | 76.5k | 6.0M | 5 | 32m 41s | | verify | 3.1k | 86.2k | 5.4M | 5 | 31m 25s | | implement | 1.1k | 116.2k | 22.6M | 6 | 50m 55s | | fix | 214 | 28.4k | 3.5M | 5 | 30m 58s | | audit_impl | 137 | 58.9k | 3.1M | 5 | 19m 28s | | open_pr | 100 | 51.3k | 3.9M | 3 | 16m 38s | | **Total** | 10.2k | 417.5k | 44.5M | | 3h 2m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

## Summary The `zero_writes` gate in `execution/headless.py` fires unconditionally when `write_behavior.mode == "always"` and `write_call_count == 0`. The `resolve-failures` contract declares `write_behavior: always`, but the skill legitimately exits with zero `Edit`/`Write` calls when the worktree is already green (0 fix iterations). The gate has no escape path for this case — `success=True` is demoted to `zero_writes`, killing an otherwise correct pipeline run. This PR changes the contract to `conditional` mode with a pattern gated on the `fixes_applied` structured token, extends the same fix to `retry-worktree` and `resolve-review`, and adds a semantic rule to prevent regression. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; %% TERMINALS %% START([run_skill called]) SUCCESS(["✓ success=True subtype=success"]) DEMOTED(["✗ success=False subtype=zero_writes"]) subgraph Contract ["● Contract Resolution"] direction TB YAML["● skill_contracts.yaml ━━━━━━━━━━ resolve-failures: write_behavior: conditional write_expected_when: - fixes_applied ≥ 1 regex"] FACTORY["● _factory.py ━━━━━━━━━━ _resolve_write_behavior() reads contract via lru_cache"] SPEC["WriteBehaviorSpec ━━━━━━━━━━ mode=conditional expected_when=(pattern,)"] end subgraph Execution ["● Skill Execution"] direction TB SESSION["headless subprocess ━━━━━━━━━━ run tests, apply fixes via Bash / Edit / Write"] TOKEN["● Structured Token ━━━━━━━━━━ fixes_applied = N emitted at Step 4"] COUNT["write_call_count ━━━━━━━━━━ count Edit + Write in tool_uses"] end subgraph Gate ["● Zero-Write Gate"] direction TB GUARD{"success=True AND write_count=0 AND write_behavior≠None?"} MODE{"● mode? ━━━━━━━━━━ always vs conditional"} PATTERN{"● _check_expected_patterns ━━━━━━━━━━ AND-match all patterns against session output"} EXPECT{"write_expected AND write_count=0?"} end %% FLOW %% START --> YAML YAML -->|"reads"| FACTORY FACTORY -->|"builds"| SPEC SPEC -->|"passed to executor"| SESSION SESSION --> TOKEN SESSION --> COUNT TOKEN --> GUARD COUNT --> GUARD GUARD -->|"No — gate inactive"| SUCCESS GUARD -->|"Yes"| MODE MODE -->|"always"| EXPECT MODE -->|"conditional"| PATTERN PATTERN -->|"fixes_applied=0 no match → False"| SUCCESS PATTERN -->|"fixes_applied≥1 match → True"| EXPECT EXPECT -->|"write_count > 0 artifact written"| SUCCESS EXPECT -->|"write_count = 0 no artifact"| DEMOTED %% CLASS ASSIGNMENTS %% class START,SUCCESS,DEMOTED terminal; class YAML,SPEC stateNode; class FACTORY,SESSION,COUNT handler; class TOKEN output; class GUARD,MODE,PATTERN,EXPECT detector; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph ContractFields ["● INIT_ONLY: Contract Fields (YAML → frozen)"] direction TB WB["● write_behavior ━━━━━━━━━━ always ∣ conditional ∣ null Set in skill_contracts.yaml Cached via @lru_cache"] WEW["● write_expected_when ━━━━━━━━━━ list of regex patterns AND-semantics at gate Empty = no pattern gate"] end subgraph SpecFields ["INIT_ONLY: WriteBehaviorSpec (frozen dataclass)"] direction TB MODE["● mode: str ∣ None ━━━━━━━━━━ Mirrors write_behavior Frozen after construction"] EXPECTED["● expected_when: tuple ━━━━━━━━━━ Immutable tuple of patterns Frozen after construction"] end subgraph SessionState ["MUTABLE + APPEND: Session State"] direction TB TOOLS["tool_uses: list ━━━━━━━━━━ APPEND_ONLY during session Each Edit/Write appended"] RESULT["● session output: str ━━━━━━━━━━ Contains structured tokens fixes_applied = N"] WCC["write_call_count: int ━━━━━━━━━━ DERIVED from tool_uses count(Edit + Write)"] end subgraph GateState ["● MUTABLE: SkillResult Fields (gate mutations)"] direction TB SUCCESS["● success: bool ━━━━━━━━━━ Init: True (if session ok) Gate may demote → False"] SUBTYPE["● subtype: str ━━━━━━━━━━ Init: success Gate may set → zero_writes"] RETRY["● needs_retry: bool ━━━━━━━━━━ Init: False Gate may set → True"] end subgraph Validation ["● VALIDATION GATES"] direction TB G1{"● mode check ━━━━━━━━━━ always → write_expected=True conditional → check patterns"} G2{"● _check_expected_patterns ━━━━━━━━━━ AND over all patterns re.search each on output"} G3{"write_expected AND write_count == 0? ━━━━━━━━━━ Demote if both True"} end %% FLOW: Contract → Spec %% WB -->|"reads"| MODE WEW -->|"reads"| EXPECTED %% FLOW: Spec → Gate %% MODE -->|"determines gate path"| G1 EXPECTED -->|"provides patterns"| G2 %% FLOW: Session → Gate %% TOOLS -->|"derives"| WCC RESULT -->|"scanned by"| G2 WCC -->|"checked by"| G3 %% FLOW: Gate decisions %% G1 -->|"conditional"| G2 G1 -->|"always"| G3 G2 -->|"match → True"| G3 G2 -->|"no match → False"| SUCCESS %% FLOW: Gate → Mutation %% G3 -->|"demote"| SUBTYPE G3 -->|"demote"| RETRY G3 -->|"preserve"| SUCCESS %% CLASS ASSIGNMENTS %% class WB,WEW detector; class MODE,EXPECTED detector; class TOOLS handler; class RESULT output; class WCC phase; class SUCCESS,SUBTYPE,RETRY gap; class G1,G2,G3 stateNode; ``` Closes #603 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/remediation-20260404-212507-745574/.autoskillit/temp/rectify/rectify_zero-writes-false-positive_2026-04-04_215019_part_a.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | input | output | cached | count | time | |------|-------|--------|--------|-------|------| | investigate | 31 | 12.6k | 747.1k | 1 | 6m 34s | | rectify | 11.4k | 57.9k | 2.0M | 1 | 27m 28s | | review | 3.6k | 7.2k | 216.3k | 1 | 8m 0s | | dry_walkthrough | 51 | 30.8k | 2.3M | 2 | 11m 22s | | implement | 2.2k | 28.2k | 3.0M | 2 | 10m 56s | | assess | 44 | 7.8k | 1.1M | 2 | 8m 43s | | audit_impl | 30 | 18.6k | 654.7k | 2 | 9m 10s | | open_pr | 28 | 15.8k | 1.0M | 1 | 7m 3s | | **Total** | 17.3k | 178.9k | 11.1M | | 1h 29m | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…alation Routing, and Pack Fix (#620) ## Summary This part adds the post-review re-validation loop and escalation consumption infrastructure to `research.yaml`, adds the `needs_rerun` structured output token to `resolve-research-review/SKILL.md`, and fixes the missing `exp-lens` pack registration. Additionally adds the data provenance lifecycle across 5 research pipeline skills (plan-experiment, run-experiment, write-report, review-design, review-research-pr) with contract and guard tests. ## Requirements ### DATA — Data Provenance Lifecycle - **REQ-DATA-001:** The `plan-experiment` skill must generate a Data Manifest section in every experiment plan that maps each hypothesis to its required data source(s), specifying source type (synthetic, fixture, external, gitignored), acquisition method (generate, download, copy), and verification criteria. - **REQ-DATA-002:** When the research task directive or issue specifies using particular data, the `plan-experiment` skill must include explicit acquisition steps for that data in the plan — the plan must not assume data will already be present. - **REQ-DATA-003:** The `run-experiment` skill pre-flight must perform a hypothesis-to-data mapping check against the Data Manifest: for each hypothesis, verify its required data source is present and non-empty before execution begins. - **REQ-DATA-004:** When `run-experiment` pre-flight finds that data the plan said would be acquired is missing, it must emit a structured `blocked_hypotheses` list and treat this as a FAIL — not silently degrade to N/A. - **REQ-DATA-005:** The `review-design` skill must include data acquisition completeness as a reviewable dimension at sufficient weight to influence the verdict (not L-weight), checking that every hypothesis has a data source, every external source has an acquisition step, and every gitignored path has a generation/download step. - **REQ-DATA-006:** The `review-research-pr` skill must include a `data-scope` review dimension that checks whether the experiment's data coverage matches the research task directive and flags when all benchmarks used only synthetic data for a domain-specific project. ### REPORT — Write-Report Data Scope Guardrails - **REQ-REPORT-001:** The `write-report` skill must include a mandatory Data Scope Statement in the Executive Summary that explicitly states what data types were used for all benchmarks and whether domain target data was present, absent, or partial. - **REQ-REPORT-002:** The `write-report` skill must perform a Metrics Provenance Check before including any `*_metrics.json` files: verify they were generated during the current experiment. If stale or unrelated, disclose and omit with explanation rather than silently dropping. - **REQ-REPORT-003:** The `write-report` skill must enforce pre-specified hypothesis gate thresholds: when a gate is not met, the report must state this as a failure, and GO recommendations must reference the specific gate that was met rather than silently substituting a different threshold. ### REVAL — Post-Review Re-Validation Loop - **REQ-REVAL-001:** The `resolve-research-review` skill must emit a structured output token (`needs_rerun = true/false`) indicating whether any `rerun_required` escalations exist, so the recipe can capture and route on it. - **REQ-REVAL-002:** The `research.yaml` recipe must include a routing step after `resolve_research_review` that checks for `rerun_required` escalations and routes to a `re_run_experiment` step when present. - **REQ-REVAL-003:** The `re_run_experiment` step must perform a targeted re-run of affected benchmarks/analyses (not a full experiment replay) using the same data and scripts, then flow to `re_write_report` → `re_push_research`. - **REQ-REVAL-004:** When only `design_flaw` escalations exist (no `rerun_required`), the recipe must annotate the PR body with the escalation details and continue to push. ### ESC — Escalation Consumption - **REQ-ESC-001:** The `research.yaml` recipe must include a `check_escalations` step between `resolve_research_review` and `re_push_research` that reads `escalation_records_{pr}.json` and routes based on escalation strategy types. - **REQ-ESC-002:** The `check_escalations` step must distinguish between `rerun_required` escalations (route to re-validation) and `design_flaw`-only escalations (annotate and continue). ### PACK — Exp-Lens Pack Registration - **REQ-PACK-001:** The `research.yaml` recipe must declare `requires_packs: [research, exp-lens]` so that all 18 exp-lens skills are available in headless sessions during the research recipe pipeline. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; PUSH_BR([push_branch ━━━━━━━━━━ git push worktree]) subgraph PRReview ["PR Review Phase"] direction TB OPEN["open_research_pr ━━━━━━━━━━ run_skill: open-pr"] GUARD{"guard_pr_url ━━━━━━━━━━ context.pr_url?"} REVIEW["● review_research_pr ━━━━━━━━━━ run_skill: review-research-pr captures: verdict"] end subgraph Resolution ["Review Resolution"] direction TB RESOLVE["● resolve_research_review ━━━━━━━━━━ run_skill: resolve-research-review captures: needs_rerun retries: 2"] end subgraph EscalationRouting ["★ Escalation Routing (New)"] direction TB CHECK{"★ check_escalations ━━━━━━━━━━ action: route context.needs_rerun?"} end subgraph RevalidationLoop ["★ Re-Validation Loop (New)"] direction TB RERUN["★ re_run_experiment ━━━━━━━━━━ run-experiment --adjust targeted benchmark re-run"] REWRITE["★ re_write_report ━━━━━━━━━━ write-report updated results"] RETEST["★ re_test ━━━━━━━━━━ test_check post-revalidation gate"] end REPUSH["● re_push_research ━━━━━━━━━━ run_cmd: git push"] COMPLETE([research_complete ━━━━━━━━━━ action: stop]) PUSH_BR --> OPEN OPEN --> GUARD GUARD -->|"pr_url truthy"| REVIEW GUARD -->|"no pr_url"| COMPLETE REVIEW -->|"changes_requested"| RESOLVE REVIEW -->|"approved / needs_human"| COMPLETE RESOLVE -->|"on_success"| CHECK RESOLVE -->|"on_failure / exhausted"| COMPLETE CHECK -->|"needs_rerun == true"| RERUN CHECK -->|"default (false/absent)"| REPUSH RERUN -->|"on_success"| REWRITE RERUN -->|"on_failure / context_limit"| REPUSH REWRITE -->|"on_success"| RETEST REWRITE -->|"on_failure / context_limit"| REPUSH RETEST -->|"pass or fail"| REPUSH REPUSH --> COMPLETE class PUSH_BR,COMPLETE terminal; class GUARD,CHECK stateNode; class OPEN,REVIEW,RESOLVE handler; class RERUN,REWRITE,RETEST newComponent; class REPUSH phase; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph Manifest ["★ Data Manifest Contract (INIT_ONLY)"] direction TB DM["★ data_manifest ━━━━━━━━━━ hypothesis[], source_type, acquisition, location, verification, depends_on"] V9{"★ V9 Gate ━━━━━━━━━━ Every hypothesis has source? External has acquisition? Gitignored has generation?"} end subgraph DesignGate ["★ Design Review Gate"] direction TB DAQ{"★ data_acquisition L4 ━━━━━━━━━━ Hypothesis coverage? External readiness? Directive compliance?"} end subgraph PreFlight ["★ Run-Experiment Pre-Flight"] direction TB PF{"★ Data Manifest Verification ━━━━━━━━━━ location exists? acquisition succeeds?"} BH["★ blocked_hypotheses ━━━━━━━━━━ APPEND_ONLY H5: missing at path"] end subgraph ReportGates ["★ Write-Report Validation Gates"] direction TB DSS["★ Data Scope Statement ━━━━━━━━━━ Mandatory in Executive Summary data types + domain coverage"] MPC["★ Metrics Provenance ━━━━━━━━━━ timestamp + relevance check disclose, never silently drop"] GE["★ Gate Enforcement ━━━━━━━━━━ pre-specified thresholds only no silent substitution"] end subgraph ReviewGate ["★ PR Review Gate"] direction TB DSCOPE["★ data-scope dimension ━━━━━━━━━━ Scope coverage? Claims qualified? Statement present?"] end subgraph EscalationState ["● Resolve Output Contract"] direction TB ESC["escalation_records ━━━━━━━━━━ APPEND_ONLY strategy: rerun_required strategy: design_flaw"] NR["● needs_rerun ━━━━━━━━━━ DERIVED from escalations any rerun_required → true else → false"] end DM -->|"writes"| V9 V9 -->|"PASS: plan saved"| DAQ V9 -->|"FAIL: plan rejected"| FAIL_PLAN([Plan Rejected]) DAQ -->|"GO: proceed"| PF DAQ -->|"STOP: hypothesis has no source"| REVISE([Revise Plan]) DAQ -->|"REVISE: missing verification"| REVISE PF -->|"ALL READY"| DSS PF -->|"BLOCKED: data missing"| BH BH --> FAIL_RUN([Status: FAILED]) DM -.->|"reads manifest"| PF DM -.->|"reads manifest"| DSS DM -.->|"reads manifest"| DSCOPE DSS --> MPC MPC --> GE GE -->|"report committed"| DSCOPE DSCOPE -->|"findings"| ESC ESC -->|"derive"| NR NR -->|"true → re-validate"| RERUN([Re-Validation Loop]) NR -->|"false → push"| PUSH([Direct Push]) class DM detector; class V9,DAQ,PF stateNode; class BH,ESC handler; class DSS,MPC,GE newComponent; class DSCOPE newComponent; class NR phase; class FAIL_PLAN,FAIL_RUN,REVISE gap; class RERUN,PUSH cli; ``` Closes #618 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-074034-301298/.autoskillit/temp/make-plan/research_recipe_data_provenance_plan_2026-04-05_074500_part_a.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 587 | 30.7k | 1.2M | 112.6k | 1 | 13m 29s | | verify | 73 | 35.9k | 3.7M | 137.0k | 2 | 11m 23s | | implement | 2.1k | 36.2k | 5.9M | 155.2k | 2 | 17m 4s | | fix | 50 | 13.2k | 2.1M | 64.5k | 1 | 10m 53s | | audit_impl | 28 | 17.3k | 786.1k | 51.7k | 1 | 5m 55s | | open_pr | 23 | 17.1k | 736.1k | 58.6k | 1 | 8m 12s | | **Total** | 2.9k | 150.3k | 14.5M | 579.5k | | 1h 6m | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

## Summary When a headless session spawns background agents via Claude Code's `Agent` tool with `run_in_background: true`, Claude Code defers the `type=result` NDJSON record until all background agents finish. If autoskillit kills the process tree after Channel B confirms completion, the deferred `type=result` is never flushed to stdout. `parse_session_result` classifies the output as `UNPARSEABLE`, which gates out all recovery paths and Channel B bypass — producing a false failure for sessions that completed successfully. The fix adds a **pre-gate Channel B drain-race recovery** in `_build_skill_result` that runs *before* the `session.session_complete` gate. When Channel B confirmed completion but the session is UNPARSEABLE/EMPTY_OUTPUT, it reconstructs the result from `assistant_messages` (which are written to stdout BEFORE the deferred `type=result`) and promotes the session to SUCCESS, unlocking all downstream recovery paths and Channel B bypass naturally. ## Architecture Impact ### Error/Resilience Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; START(["● _build_skill_result ━━━━━━━━━━ Entry with SubprocessResult"]) subgraph PreGate ["● PRE-GATE: Channel B Drain-Race Recovery"] direction TB CB_CHECK{"● Channel B? + subtype in RECOVERABLE_SUBTYPES? + completion_marker?"} CB_RECOVER["● _recover_from_separate_marker ━━━━━━━━━━ Reconstruct result from assistant_messages"] CB_PROMOTE["● Promote session ━━━━━━━━━━ subtype → SUCCESS is_error → False"] CB_SKIP["No recovery needed ━━━━━━━━━━ Pass through unchanged"] end subgraph CompletionGate ["session.session_complete Gate"] direction TB GATE{"session_complete? ━━━━━━━━━━ not is_error AND subtype not in FAILURE_SUBTYPES"} MARKER_RECOVER["_recover_from_separate_marker ━━━━━━━━━━ Marker-based recovery"] PATTERN_RECOVER["_recover_block_from_assistant_messages ━━━━━━━━━━ Pattern-based recovery"] SYNTH["_synthesize_from_write_artifacts ━━━━━━━━━━ UNMONITORED only"] SKIP_RECOVERY["Skip all recovery ━━━━━━━━━━ TIMEOUT / genuine failure"] end subgraph Outcome ["● _compute_outcome"] direction TB CB_BYPASS{"● Channel B bypass in _compute_success?"} CONTENT_CHECK["_check_session_content ━━━━━━━━━━ 6-gate validation"] DEAD_END{"Dead-end guard ━━━━━━━━━━ ABSENT → DRAIN_RACE CONTRACT_VIOLATION → FAIL"} end subgraph PostOutcome ["Post-Outcome Gates"] direction TB BUDGET["_apply_budget_guard ━━━━━━━━━━ Max consecutive retries"] CONTRACT["CONTRACT_RECOVERY gate ━━━━━━━━━━ adjudicated_failure + write evidence"] ZERO_WRITE["Zero-write gate ━━━━━━━━━━ Expected writes missing"] end subgraph Terminals ["TERMINAL STATES"] T_SUCCESS([SUCCEEDED]) T_RETRY([RETRIABLE DRAIN_RACE / RESUME / CONTRACT_RECOVERY]) T_FAIL([FAILED]) T_BUDGET([BUDGET_EXHAUSTED]) end START --> CB_CHECK CB_CHECK -->|"Yes: CHANNEL_B + UNPARSEABLE or EMPTY_OUTPUT"| CB_RECOVER CB_CHECK -->|"No: other channel or non-recoverable subtype"| CB_SKIP CB_RECOVER -->|"Recovery succeeds: marker standalone + substantive content"| CB_PROMOTE CB_RECOVER -->|"Recovery fails: no marker in messages"| CB_SKIP CB_PROMOTE --> GATE CB_SKIP --> GATE GATE -->|"True: session promoted or originally complete"| MARKER_RECOVER GATE -->|"False: TIMEOUT / unrecoverable subtype"| SKIP_RECOVERY MARKER_RECOVER --> PATTERN_RECOVER PATTERN_RECOVER --> SYNTH SYNTH --> CB_BYPASS SKIP_RECOVERY --> CB_BYPASS CB_BYPASS -->|"CHANNEL_B + session_complete + patterns pass"| T_SUCCESS CB_BYPASS -->|"No bypass: falls to termination dispatch"| CONTENT_CHECK CONTENT_CHECK -->|"All 6 gates pass"| T_SUCCESS CONTENT_CHECK -->|"Any gate fails"| DEAD_END DEAD_END -->|"ABSENT + channel confirmed"| T_RETRY DEAD_END -->|"CONTRACT_VIOLATION / SESSION_ERROR"| T_FAIL T_RETRY --> BUDGET BUDGET -->|"Under limit"| CONTRACT BUDGET -->|"Exceeded"| T_BUDGET CONTRACT -->|"adjudicated_failure + writes ≥ 1"| T_RETRY CONTRACT -->|"No match"| ZERO_WRITE ZERO_WRITE -->|"Expected writes missing"| T_RETRY ZERO_WRITE -->|"No issue"| T_SUCCESS %% CLASS ASSIGNMENTS %% class START terminal; class CB_CHECK,GATE,CB_BYPASS,DEAD_END stateNode; class CB_RECOVER,CB_PROMOTE newComponent; class CB_SKIP,SKIP_RECOVERY gap; class MARKER_RECOVER,PATTERN_RECOVER,SYNTH handler; class CONTENT_CHECK phase; class BUDGET,CONTRACT,ZERO_WRITE detector; class T_SUCCESS,T_RETRY,T_FAIL,T_BUDGET terminal; ``` ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; START(["● _build_skill_result ━━━━━━━━━━ SubprocessResult input"]) subgraph EarlyExit ["Phase 1: Early Exit Interception"] direction TB TERM_CHECK{"termination reason?"} STALE_PATH["STALE handler ━━━━━━━━━━ Attempt stdout recovery then retry or fail"] TIMEOUT_PATH["TIMEOUT handler ━━━━━━━━━━ Override subtype=TIMEOUT is_error=True"] end PARSE["parse_session_result ━━━━━━━━━━ NDJSON → ClaudeSessionResult extracts assistant_messages"] subgraph DrainRace ["● Phase 2: Channel B Drain-Race Recovery"] direction TB CB_MATCH{"● match channel ━━━━━━━━━━ CHANNEL_B + UNPARSEABLE/EMPTY_OUTPUT + completion_marker?"} CB_RECON["● _recover_from_separate_marker ━━━━━━━━━━ Check marker standalone in assistant_messages"] CB_PROMOTE["● Promote session ━━━━━━━━━━ subtype → SUCCESS is_error → False"] CB_NONE["No drain-race ━━━━━━━━━━ Session unchanged"] end subgraph GatedRecovery ["Phase 3: Completion-Gated Recovery"] direction TB GATE{"session_complete? ━━━━━━━━━━ not is_error AND subtype ∉ FAILURE_SUBTYPES"} REC_MARKER["_recover_from_separate_marker ━━━━━━━━━━ Join assistant_messages when marker is standalone"] REC_PATTERN["_recover_block_from_assistant ━━━━━━━━━━ Patterns in messages not in result"] REC_SYNTH["_synthesize_from_write_artifacts ━━━━━━━━━━ UNMONITORED only: inject write paths"] GATE_SKIP["Skip recovery ━━━━━━━━━━ Incomplete session"] end subgraph ComputeOutcome ["● Phase 4: Outcome Adjudication"] direction TB COMPUTE["● _compute_outcome ━━━━━━━━━━ _compute_success + _compute_retry"] SUCCESS_CHECK{"● success?"} RETRY_CHECK{"needs_retry?"} end subgraph PostGates ["Phase 5: Post-Outcome Gates"] direction TB BUDGET_G["_apply_budget_guard ━━━━━━━━━━ consecutive_failures > max_retries?"] CONTRACT_G{"CONTRACT_RECOVERY? ━━━━━━━━━━ adjudicated_failure + write_count ≥ 1"} ZERO_G{"zero_write_gate? ━━━━━━━━━━ success but no Write/Edit calls"} end T_SUCCESS([SUCCEEDED]) T_RETRY([RETRIABLE]) T_FAIL([FAILED]) %% FLOW %% START --> TERM_CHECK TERM_CHECK -->|"STALE"| STALE_PATH TERM_CHECK -->|"TIMED_OUT"| TIMEOUT_PATH TERM_CHECK -->|"COMPLETED / NATURAL_EXIT"| PARSE STALE_PATH --> T_RETRY TIMEOUT_PATH --> PARSE PARSE --> CB_MATCH CB_MATCH -->|"Yes: all 3 guards pass"| CB_RECON CB_MATCH -->|"No: wrong channel / wrong subtype / no marker"| CB_NONE CB_RECON -->|"Marker found standalone + substantive content"| CB_PROMOTE CB_RECON -->|"No marker or empty content"| CB_NONE CB_PROMOTE --> GATE CB_NONE --> GATE GATE -->|"True: complete session"| REC_MARKER GATE -->|"False: incomplete"| GATE_SKIP REC_MARKER --> REC_PATTERN REC_PATTERN --> REC_SYNTH REC_SYNTH --> COMPUTE GATE_SKIP --> COMPUTE COMPUTE --> SUCCESS_CHECK SUCCESS_CHECK -->|"True"| ZERO_G SUCCESS_CHECK -->|"False"| RETRY_CHECK RETRY_CHECK -->|"True"| BUDGET_G RETRY_CHECK -->|"False"| CONTRACT_G BUDGET_G -->|"Under limit"| T_RETRY BUDGET_G -->|"Exhausted"| T_FAIL CONTRACT_G -->|"Yes: promote to retry"| BUDGET_G CONTRACT_G -->|"No"| T_FAIL ZERO_G -->|"Writes expected but count = 0"| T_RETRY ZERO_G -->|"OK"| T_SUCCESS %% CLASS ASSIGNMENTS %% class START,T_SUCCESS,T_RETRY,T_FAIL terminal; class TERM_CHECK,CB_MATCH,GATE,SUCCESS_CHECK,RETRY_CHECK stateNode; class STALE_PATH,TIMEOUT_PATH,PARSE handler; class CB_RECON,CB_PROMOTE newComponent; class CB_NONE,GATE_SKIP gap; class REC_MARKER,REC_PATTERN,REC_SYNTH handler; class COMPUTE phase; class BUDGET_G,CONTRACT_G,ZERO_G detector; ``` Closes #619 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-619-20260405-085642-620214/.autoskillit/temp/make-plan/channel_b_drain_race_recovery_plan_2026-04-05_090230.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 42 | 18.8k | 1.6M | 80.7k | 1 | 9m 8s | | verify | 17 | 17.4k | 687.5k | 79.7k | 1 | 6m 55s | | implement | 77 | 28.2k | 4.4M | 89.7k | 1 | 15m 40s | | audit_impl | 14 | 8.9k | 348.9k | 43.4k | 1 | 3m 4s | | open_pr | 3.0k | 17.7k | 865.3k | 63.1k | 1 | 7m 30s | | **Total** | 3.1k | 91.0k | 8.0M | 356.6k | | 42m 19s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…ulator FakeClaudeCLI (#624) ## Summary Add 10 end-to-end tests in a new file `tests/execution/test_session_classification_e2e.py` that exercise the full session failure classification pipeline — from raw NDJSON subprocess output produced by api-simulator's `fake_claude` fixture through `parse_session_result()` and `_build_skill_result()` to final `SkillResult` classification. Today all headless tests use `MockSubprocessRunner` with pre-constructed `SubprocessResult` objects; the NDJSON parsing and classification logic is never exercised against realistic subprocess output. These tests close that gap using 4 groups: NDJSON stream robustness (4 tests), context exhaustion edge cases (2 tests), kill boundary scenarios (2 tests), and process behavior simulation (2 tests). No production code changes are required. The `api-simulator` dev dependency was added by #607. ## Requirements ### BRIDGE — Integration Bridge - **REQ-BRIDGE-001:** Tests must use `fake_claude.run()` to produce real subprocess output, not hand-constructed strings. - **REQ-BRIDGE-002:** Tests must feed `proc.stdout` through `parse_session_result()` from `autoskillit.execution.session`. - **REQ-BRIDGE-003:** Tests must wrap the parsed result in a `SubprocessResult` and pass it to `_build_skill_result()` for full classification. ### PARSE — NDJSON Parse Robustness - **REQ-PARSE-001:** The parser must correctly skip `type=system` / `api_retry` records and still extract the final `type=result` record. - **REQ-PARSE-002:** The parser must handle non-JSON lines (stream corruption) gracefully without losing valid records. - **REQ-PARSE-003:** When multiple `type=result` records appear, the last one must determine classification. ### CTX — Context Exhaustion - **REQ-CTX-001:** A flat assistant record containing the context exhaustion marker with no `type=result` record must classify as `context_exhaustion` with `needs_retry=True`. - **REQ-CTX-002:** A `type=result` record with `is_error=True` and `errors` containing the marker must classify as retriable with `retry_reason=RESUME`. ### KILL — Kill Boundary - **REQ-KILL-001:** A truncated stream (via `truncate_after`) must produce `subtype=unparseable` or partial classification with nonzero exit code. - **REQ-KILL-002:** An `interrupted` subtype with nonzero exit code must result in `needs_retry=False` (gated by returncode). ### PROC — Process Behavior - **REQ-PROC-001:** The hang-after-result scenario must verify that the result record was emitted to stdout before the process hung. - **REQ-PROC-002:** Mid-stream exit via `inject_exit` must produce the correct exit code and truncated stdout. ### COMPAT — Compatibility - **REQ-COMPAT-001:** Existing `test_headless.py` and `test_session.py` tests must remain unchanged and passing. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([FakeClaudeCLI ━━━━━━━━━━ api-simulator fixture]) subgraph Bridge ["★ E2E Test Bridge (new)"] direction TB RUN["★ fake_claude.run() ━━━━━━━━━━ CompletedProcess with real NDJSON stdout"] WRAP["★ _classify() / inline ━━━━━━━━━━ Wrap in SubprocessResult pid=0, caller termination"] end subgraph Parse ["parse_session_result()"] direction TB SCAN{"stdout empty?"} LOOP["Scan NDJSON lines ━━━━━━━━━━ JSON decode; skip errors last type=result wins"] CTX_FLAG{"flat assistant output_tokens=0 + ctx marker?"} RESULT_FOUND{"result record found?"} end subgraph Classify ["_compute_outcome()"] direction TB SUCCESS_GATE{"_compute_success ━━━━━━━━━━ returncode=0? is_error? result?"} RETRY_GATE{"_compute_retry ━━━━━━━━━━ session.needs_retry? kill anomaly?"} CONTRA{"contradiction success+retry?"} DEADEND{"dead-end failed+confirmed +ABSENT?"} end subgraph Normalize ["_normalize_subtype()"] NORM["Map raw CLI subtype ━━━━━━━━━━ to final string label"] end subgraph Gates ["Post-Classification Gates"] BUDGET{"budget exhausted?"} ZERO{"zero writes when expected?"} end subgraph Outcomes ["SkillResult"] direction LR OK([success]) CTX([context_exhaustion needs_retry=True]) EMPTY([empty_output / unparseable]) INTR([interrupted needs_retry=False]) FAIL([failure terminal]) end START --> RUN RUN --> WRAP WRAP --> SCAN SCAN -->|"empty"| EMPTY SCAN -->|"non-empty"| LOOP LOOP --> CTX_FLAG CTX_FLAG -->|"yes → jsonl_context_exhausted=True"| RESULT_FOUND CTX_FLAG -->|"no"| RESULT_FOUND RESULT_FOUND -->|"yes"| SUCCESS_GATE RESULT_FOUND -->|"no → UNPARSEABLE / CTX_EXHAUSTION"| RETRY_GATE SUCCESS_GATE --> RETRY_GATE RETRY_GATE --> CONTRA CONTRA -->|"demote success"| DEADEND CONTRA -->|"consistent"| DEADEND DEADEND -->|"DRAIN_RACE"| NORM DEADEND -->|"terminal"| NORM NORM --> BUDGET BUDGET -->|"BUDGET_EXHAUSTED"| FAIL BUDGET -->|"ok"| ZERO ZERO -->|"zero_writes"| CTX ZERO -->|"ok"| OK SUCCESS_GATE -->|"returncode!=0"| INTR class START terminal; class RUN,WRAP newComponent; class LOOP handler; class SCAN,CTX_FLAG,RESULT_FOUND stateNode; class SUCCESS_GATE,RETRY_GATE,CONTRA,DEADEND phase; class NORM handler; class BUDGET,ZERO detector; class OK,CTX,EMPTY,INTR,FAIL terminal; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start (FakeClaudeCLI), final SkillResult outcomes | | Green | New Component | ★ `_classify()` bridge helper and `fake_claude.run()` — new test code | | Orange | Handler | NDJSON scan/accumulation and subtype normalization | | Teal | State | Decision points: empty check, context flag, result found | | Purple | Phase | Outcome computation gates (success, retry, contradiction, dead-end) | | Red | Detector | Post-classification guards (budget, zero-write) | ### Error/Resilience Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; START([★ E2E Test Suite ━━━━━━━━━━ 10 failure scenarios via FakeClaudeCLI]) subgraph ParseGates ["NDJSON Parse Resilience Gates"] direction TB EMPTY_CHECK{"stdout empty?"} JSON_ERR["Corrupt / non-JSON lines ━━━━━━━━━━ silently skipped (test 2: corrupt_stream)"] API_RETRY["api_retry records ━━━━━━━━━━ skipped — not type=result (test 1: inject_api_retry)"] LAST_WINS["Multiple result records ━━━━━━━━━━ last record wins (test 3: two results)"] EXHAUST["Exhausted retries ━━━━━━━━━━ no result record emitted (test 4: exhaust=True)"] end subgraph CtxDetect ["Context Exhaustion Detection"] direction TB FLAT_DETECT{"flat assistant output_tokens=0 + ctx marker? (test 5)"} ERR_DETECT{"is_error=True AND marker in errors[]? (test 6)"} CTX_FLAG["jsonl_context_exhausted ━━━━━━━━━━ race-resilient flag"] end subgraph KillGates ["Kill Boundary Gates"] direction TB RC_CHECK{"returncode != 0?"} KILL_ANOM{"_is_kill_anomaly? ━━━━━━━━━━ UNPARSEABLE /\nEMPTY_OUTPUT /\nINTERRUPTED"} INTR_GATE{"subtype=interrupted + rc != 0? (test 8)"} end subgraph PostGates ["Post-Classification Guards"] BUDGET{"consecutive failures > budget max?"} ZERO_WRITE{"success AND write_count=0 AND write expected?"} end T_SUCCESS([success ━━━━━━━━━━ needs_retry=False]) T_CTX([context_exhaustion ━━━━━━━━━━ needs_retry=True, RESUME]) T_EMPTY([empty_output / unparseable ━━━━━━━━━━ needs_retry=True via RESUME]) T_INTR([interrupted ━━━━━━━━━━ needs_retry=False, terminal]) T_BUDGET([budget_exhausted ━━━━━━━━━━ needs_retry=False, terminal]) T_ZERO([zero_writes ━━━━━━━━━━ needs_retry=True]) START --> EMPTY_CHECK EMPTY_CHECK -->|"empty stdout"| T_EMPTY EMPTY_CHECK -->|"has content"| JSON_ERR JSON_ERR -->|"skip bad lines, continue"| API_RETRY API_RETRY -->|"skip, continue to result"| LAST_WINS LAST_WINS -->|"no result"| EXHAUST EXHAUST -->|"empty_output / unparseable"| T_EMPTY LAST_WINS -->|"result found"| FLAT_DETECT FLAT_DETECT -->|"yes"| CTX_FLAG FLAT_DETECT -->|"no"| ERR_DETECT ERR_DETECT -->|"yes"| CTX_FLAG CTX_FLAG -->|"needs_retry=True"| T_CTX ERR_DETECT -->|"no"| RC_CHECK RC_CHECK -->|"nonzero (test 7,8,10)"| INTR_GATE INTR_GATE -->|"yes → no retry"| T_INTR INTR_GATE -->|"no"| T_EMPTY RC_CHECK -->|"zero"| KILL_ANOM KILL_ANOM -->|"anomaly → RESUME retry"| T_EMPTY KILL_ANOM -->|"no anomaly"| BUDGET BUDGET -->|"exceeded"| T_BUDGET BUDGET -->|"ok"| ZERO_WRITE ZERO_WRITE -->|"violation"| T_ZERO ZERO_WRITE -->|"ok"| T_SUCCESS class START newComponent; class EMPTY_CHECK,FLAT_DETECT,ERR_DETECT,RC_CHECK,KILL_ANOM,INTR_GATE stateNode; class JSON_ERR,API_RETRY,LAST_WINS,EXHAUST,CTX_FLAG handler; class BUDGET,ZERO_WRITE detector; class T_SUCCESS,T_CTX,T_EMPTY,T_INTR,T_BUDGET,T_ZERO terminal; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Green | New Component | ★ E2E test suite (new) — exercises all failure paths | | Teal | Decision Gates | Key detection and routing decisions | | Orange | Handler | Parse resilience processing and flag setting | | Red | Guard | Post-classification safety guards (budget, zero-write) | | Dark Blue | Terminal | Final SkillResult outcome states | ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; TEST["★ test_session_classification_e2e.py ━━━━━━━━━━ 10 scenarios assert field contracts across all classification paths"] subgraph ParseState ["INIT_ONLY — Set by Parser, Never Overwritten"] direction LR CTX_EX["jsonl_context_exhausted ━━━━━━━━━━ flat assistant → True read by _is_context_exhausted()"] RC["returncode / termination ━━━━━━━━━━ from SubprocessResult used in all compute_* gates"] SID["session_id ━━━━━━━━━━ from result record passed through unchanged"] end subgraph DerivedState ["DERIVED — Computed, Not Stored During Parse"] direction TB SUCCESS_D["success ━━━━━━━━━━ returncode=0 AND content gates must be False if needs_retry=True"] RETRY_D["needs_retry + retry_reason ━━━━━━━━━━ RESUME / ZERO_WRITES / etc. only valid pair if needs_retry=True"] SUBTYPE_D["subtype (normalized) ━━━━━━━━━━ 'success' / 'context_exhaustion' / 'interrupted' / etc."] end subgraph Contracts ["CONTRACT ENFORCEMENT GATES"] direction TB CONTRA_GATE{"Contradiction Guard ━━━━━━━━━━ success=True AND needs_retry=True?"} INTR_GATE{"Interrupted Gate ━━━━━━━━━━ subtype=interrupted AND rc != 0?"} CTX_GATE{"Context Exhaustion ━━━━━━━━━━ jsonl_context_exhausted OR marker in errors[]?"} BUDGET_GATE{"Budget Guard ━━━━━━━━━━ consecutive failures > budget max?"} end subgraph ResumeStates ["RESUME SAFETY — needs_retry contract"] direction LR RESUME_OK(["needs_retry=True retry_reason=RESUME ━━━━━━━━━━ context_exhaustion path"]) NO_RETRY(["needs_retry=False retry_reason=NONE ━━━━━━━━━━ interrupted + rc!=0 path"]) BUDGET_STOP(["needs_retry=False retry_reason=BUDGET_EXHAUSTED ━━━━━━━━━━ terminal, no more retries"]) end TEST -->|"asserts all contracts"| CTX_EX TEST --> RC TEST --> SID CTX_EX -->|"read by"| CTX_GATE RC -->|"read by"| INTR_GATE RC -->|"read by"| CONTRA_GATE CTX_GATE -->|"exhausted → needs_retry=True"| RETRY_D CTX_GATE -->|"not exhausted"| INTR_GATE INTR_GATE -->|"interrupted+rc!=0 → terminal"| NO_RETRY INTR_GATE -->|"other"| CONTRA_GATE CONTRA_GATE -->|"contradiction → demote success"| SUCCESS_D CONTRA_GATE -->|"consistent"| SUCCESS_D RETRY_D --> BUDGET_GATE SUCCESS_D --> BUDGET_GATE SUBTYPE_D --> BUDGET_GATE BUDGET_GATE -->|"exceeded → clamp"| BUDGET_STOP BUDGET_GATE -->|"within budget"| RESUME_OK class TEST newComponent; class CTX_EX,RC,SID detector; class SUCCESS_D,RETRY_D,SUBTYPE_D phase; class CTX_GATE,INTR_GATE,CONTRA_GATE,BUDGET_GATE stateNode; class RESUME_OK,NO_RETRY,BUDGET_STOP cli; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Green | New Component | ★ E2E test suite — asserts all field contracts | | Red | INIT_ONLY | Fields set by parser, never overwritten | | Purple | Derived | Fields computed from classification, not stored during parse | | Teal | Gates | Contract enforcement decision points | | Dark Blue | Resume States | Terminal resume-safety outcomes | Closes #608 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-608-20260405-085643-660865/.autoskillit/temp/make-plan/test_session_failure_classification_with_api_simulator_plan_2026-04-05_090300.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 31 | 22.4k | 812.6k | 59.1k | 1 | 12m 6s | | verify | 21 | 17.2k | 863.3k | 66.7k | 1 | 9m 28s | | implement | 2.5k | 9.4k | 1.1M | 48.2k | 1 | 5m 43s | | fix | 21 | 7.3k | 703.0k | 42.4k | 1 | 7m 38s | | audit_impl | 10 | 7.4k | 139.9k | 39.6k | 1 | 3m 29s | | open_pr | 47 | 27.2k | 2.2M | 74.8k | 1 | 10m 44s | | **Total** | 2.7k | 90.9k | 5.8M | 330.8k | | 49m 11s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…rect Changes (#623) ## Summary When `implement-worktree-no-merge` runs and the model ignores instructions to create a worktree (via `git worktree add`), it edits files directly in the clone directory. This leaves dirty uncommitted changes (or direct commits) on the clone's branch. On retry, the next session inherits a contaminated working tree. This plan adds a **clone contamination guard** to the headless execution pipeline. The guard: 1. Snapshots the clone's HEAD SHA before each worktree-based skill session 2. After a failed session where no worktree was created, detects contamination (uncommitted changes or direct commits) 3. Reverts the clone to its pre-session state 4. Logs the cleanup for pipeline observability Key architectural insight: `EnterWorktree` does not exist in this codebase. Worktree creation uses standard `git worktree add` via Bash, and success is signaled by emitting `worktree_path = <path>` tokens in assistant messages. Detection of "no worktree created" is therefore: no `worktree_path` token in `session.assistant_messages`. ## Requirements ### Snapshot (SNAP) - **REQ-SNAP-001:** The system must capture the clone HEAD SHA before each `run_skill` invocation for worktree-based skills (implement-worktree-no-merge, retry-worktree). - **REQ-SNAP-002:** The system must capture the clone working tree cleanliness state (clean/dirty) before each `run_skill` invocation for worktree-based skills. ### Detection (DET) - **REQ-DET-001:** The system must detect uncommitted changes in the clone CWD after a worktree-based skill session that was adjudicated as failure. - **REQ-DET-002:** The system must detect direct commits in the clone (HEAD differs from pre-session SHA) after a worktree-based skill session that was adjudicated as failure. - **REQ-DET-003:** The system must verify whether `EnterWorktree` was called during the session by inspecting tool_uses in the session result. ### Revert (REV) - **REQ-REV-001:** The system must revert uncommitted changes in the clone when contamination is detected (git checkout + git clean). - **REQ-REV-002:** The system must revert direct commits in the clone when contamination is detected (git reset to pre-session SHA). - **REQ-REV-003:** The revert must only execute when all three conditions are met: worktree-based skill, adjudicated failure, and no EnterWorktree call in tool_uses. ### Observability (OBS) - **REQ-OBS-001:** The system must log all contamination detection and revert actions in the audit log with sufficient detail for pipeline visibility. - **REQ-OBS-002:** The audit log entry must include the pre-session SHA, post-session SHA, list of contaminated files, and revert action taken. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START(["● run_headless_core()"]) subgraph PreSession ["★ Pre-Session Snapshot"] direction TB IS_WT{"★ is_worktree_skill? ━━━━━━━━━━ implement-worktree-no-merge or retry-worktree in cmd"} IS_CLONE{"★ not is_git_worktree? ━━━━━━━━━━ cwd is clone root, not a worktree"} SNAP["★ snapshot_clone_state() ━━━━━━━━━━ git rev-parse HEAD → CloneSnapshot(head_sha)"] end subgraph Session ["Existing Session Lifecycle"] direction TB RUN["● runner() subprocess ━━━━━━━━━━ Headless Claude CLI"] BUILD["● _build_skill_result() ━━━━━━━━━━ Adjudication + gates worktree_path always extracted"] end subgraph PostGuard ["★ Post-Session Clone Guard"] direction TB CHK_SNAP{"★ snapshot captured? ━━━━━━━━━━ _clone_snapshot is not None"} CHK_SUCC{"★ skill_result.success?"} CHK_WT{"★ worktree_path set? ━━━━━━━━━━ skill_result.worktree_path is not None"} DETECT["★ detect_contamination() ━━━━━━━━━━ git rev-parse HEAD → post_sha git status --porcelain → files"] CHK_DIRTY{"★ contamination found? ━━━━━━━━━━ post_sha ≠ pre_sha OR dirty files"} REVERT["★ revert_contamination() ━━━━━━━━━━ git reset --hard pre_sha git clean -fd"] AUDIT["★ audit.record_failure() ━━━━━━━━━━ subtype=clone_contamination RetryReason.CLONE_CONTAMINATION"] end FLUSH["● flush_session_log() ━━━━━━━━━━ ★ clone_contamination_reverted → summary.json"] RETURN(["● return skill_result"]) SKIP_SNAP(["skip → _clone_snapshot=None"]) START --> IS_WT IS_WT -->|"no: not a worktree skill"| SKIP_SNAP IS_WT -->|"yes"| IS_CLONE IS_CLONE -->|"already a worktree CWD"| SKIP_SNAP IS_CLONE -->|"clone root CWD"| SNAP SNAP --> RUN SKIP_SNAP --> RUN RUN --> BUILD BUILD --> CHK_SNAP CHK_SNAP -->|"no snapshot"| FLUSH CHK_SNAP -->|"snapshot exists"| CHK_SUCC CHK_SUCC -->|"success=True"| FLUSH CHK_SUCC -->|"success=False"| CHK_WT CHK_WT -->|"worktree created"| FLUSH CHK_WT -->|"no worktree"| DETECT DETECT --> CHK_DIRTY CHK_DIRTY -->|"clean"| FLUSH CHK_DIRTY -->|"contaminated"| REVERT REVERT --> AUDIT AUDIT --> FLUSH FLUSH --> RETURN class START,RETURN,SKIP_SNAP terminal; class IS_WT,IS_CLONE,CHK_SNAP,CHK_SUCC,CHK_WT,CHK_DIRTY stateNode; class RUN,BUILD,FLUSH handler; class SNAP,DETECT,REVERT,AUDIT newComponent; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Entry/exit points of `run_headless_core` | | Teal | State/Decision | Routing decisions that control guard activation | | Orange | Handler | Existing subprocess, adjudication, and telemetry nodes | | Green | New Component | New clone contamination guard components (★) | ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; subgraph L3 ["L3 — SERVER (existing, unchanged)"] direction LR SERVER["server/tools_execution.py ━━━━━━━━━━ run_skill, run_cmd handlers"] end subgraph L1 ["L1 — EXECUTION"] direction TB HEADLESS["● execution/headless.py ━━━━━━━━━━ run_headless_core() _build_skill_result()"] CLONE_GUARD["★ execution/clone_guard.py ━━━━━━━━━━ is_worktree_skill() snapshot_clone_state() check_and_revert_clone_contamination()"] SESSION_LOG["● execution/session_log.py ━━━━━━━━━━ flush_session_log() ★ clone_contamination_reverted"] COMMANDS["execution/commands.py ━━━━━━━━━━ build_full_headless_cmd()"] SESSION["execution/session.py ━━━━━━━━━━ ClaudeSessionResult"] end subgraph L0 ["L0 — CORE (zero autoskillit imports)"] direction TB ENUMS["● core/_type_enums.py ━━━━━━━━━━ RetryReason enum ★ CLONE_CONTAMINATION added"] TYPES["core/types.py ━━━━━━━━━━ SkillResult, FailureRecord AuditStore, SubprocessRunner"] PATHS["core/paths.py ━━━━━━━━━━ is_git_worktree()"] LOGGING["core/logging.py ━━━━━━━━━━ get_logger()"] CORE_INIT["core/__init__.py ━━━━━━━━━━ Re-exports all L0 surface"] end subgraph Ext ["EXTERNAL (stdlib)"] STDLIB["dataclasses, pathlib datetime, typing"] end SERVER -->|"imports run_headless"| HEADLESS HEADLESS -->|"★ imports 3 functions"| CLONE_GUARD HEADLESS -->|"imports"| COMMANDS HEADLESS -->|"imports"| SESSION HEADLESS -->|"imports"| SESSION_LOG HEADLESS -->|"imports core surface"| CORE_INIT CLONE_GUARD -->|"★ imports FailureRecord RetryReason, SkillResult get_logger, is_git_worktree"| CORE_INIT SESSION_LOG -->|"imports"| LOGGING CORE_INIT -->|"re-exports"| ENUMS CORE_INIT -->|"re-exports"| TYPES CORE_INIT -->|"re-exports"| PATHS CORE_INIT -->|"re-exports"| LOGGING TYPES -->|"imports RetryReason"| ENUMS CLONE_GUARD -->|"stdlib only"| STDLIB ENUMS -->|"stdlib only"| STDLIB class SERVER cli; class HEADLESS,SESSION_LOG,COMMANDS,SESSION handler; class CLONE_GUARD newComponent; class ENUMS,TYPES,PATHS,LOGGING,CORE_INIT stateNode; class STDLIB integration; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Server (L3) | MCP tool handlers — top application layer | | Orange | Execution (L1) | Service/orchestration layer modules | | Green | New Module | `clone_guard.py` — new L1 execution module (★) | | Teal | Core (L0) | Stable vocabulary/type layer — high fan-in | | Red | External | Standard library dependencies | Closes #617 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-617-20260405-085643-202786/.autoskillit/temp/make-plan/clone_contamination_guard_plan_2026-04-05_090600.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 6.9k | 23.4k | 1.7M | 82.7k | 1 | 10m 39s | | verify | 33 | 20.7k | 1.4M | 55.6k | 1 | 8m 39s | | implement | 81 | 24.3k | 4.4M | 89.7k | 1 | 10m 6s | | fix | 40 | 14.4k | 1.7M | 62.9k | 1 | 9m 17s | | audit_impl | 13 | 11.0k | 288.2k | 45.3k | 1 | 4m 14s | | open_pr | 28 | 20.1k | 1.0M | 55.4k | 1 | 7m 18s | | **Total** | 7.1k | 113.9k | 10.5M | 391.6k | | 50m 15s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…rtifact Merge Phase (#625) ## Summary Add a six-step archival phase to the end of the research recipe (`research.yaml`) that separates research artifacts from experimental code before completion. After all review cycles, re-runs, and CI checks finish, the new phase: (1) captures the experiment branch name, (2) creates a clean artifact-only branch containing only `research/` from a temporary worktree, (3) opens an artifact PR targeting the base branch, (4) tags the full experiment branch under `archive/research/` for permanent reference, (5) closes the original experiment PR with cross-reference links, then (6) proceeds to `research_complete`. Every archival step degrades gracefully — `on_failure` routes to `research_complete` so the pipeline never blocks on archival failures. ## Requirements ### SPLIT — Artifact Extraction - **REQ-SPLIT-001:** The recipe must create a new branch from the base branch (e.g., main) containing only the `research/` directory contents from the experiment branch, with no production source file changes. - **REQ-SPLIT-002:** The artifact extraction must use `git checkout <experiment-branch> -- research/` (or equivalent) to copy only the research directory's file state, not replay commit history. - **REQ-SPLIT-003:** The artifact-only branch must produce a single clean commit with a descriptive message referencing the experiment name. ### PR — Artifact PR - **REQ-PR-001:** The recipe must open a PR targeting the base branch with the artifact-only branch, referencing the original experiment PR number and summarizing key findings in the body. - **REQ-PR-002:** The artifact PR must contain zero changes to production source files — only files under `research/`. ### TAG — Branch Archival - **REQ-TAG-001:** The recipe must create an annotated git tag with the prefix `archive/research/` capturing the final state of the experiment branch (after all reviews, re-runs, and CI pass). - **REQ-TAG-002:** The annotated tag message must include the experiment name and a note that the report was merged via the artifact PR. - **REQ-TAG-003:** The tag must be pushed to the remote before the experiment branch is cleaned up. ### CLOSE — Experiment PR Closure - **REQ-CLOSE-001:** The recipe must close the original experiment PR with a comment linking to the artifact PR, the archive tag, and any follow-up implementation issues. - **REQ-CLOSE-002:** The closure comment must explain why the PR was not merged (experimental code in production source files) and where the research record is preserved. ### ORDER — Execution Ordering - **REQ-ORDER-001:** The archival phase must execute only after all review cycles, review resolutions, experiment re-runs (per #618), and CI checks have completed successfully. - **REQ-ORDER-002:** The archival phase must be the final phase before `research_complete`, not interleaved with review or re-validation steps. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; subgraph PostReview ["● Post-Review Phase (modified routing)"] direction TB GPR{"guard_pr_url ━━━━━━━━━━ pr_url set?"} RRP["● review_research_pr ━━━━━━━━━━ run_skill: review-pr skip_when_false: review_pr"] RRR["● resolve_research_review ━━━━━━━━━━ run_skill: resolve-review retries: 2"] CE{"check_escalations ━━━━━━━━━━ needs_rerun?"} RERUN["re_run_experiment ━━━━━━━━━━ run-experiment --adjust"] REWRITE["re_write_report ━━━━━━━━━━ write-report"] RETEST["re_test ━━━━━━━━━━ test_check"] REPUSH["● re_push_research ━━━━━━━━━━ git push"] end subgraph Archival ["★ Archival Phase (new)"] direction TB BA{"★ begin_archival ━━━━━━━━━━ pr_url truthy?"} CEB["★ capture_experiment_branch ━━━━━━━━━━ git rev-parse HEAD captures: experiment_branch"] CAB["★ create_artifact_branch ━━━━━━━━━━ worktree + checkout research/ captures: artifact_branch"] OAP["★ open_artifact_pr ━━━━━━━━━━ gh pr create (research/ only) captures: artifact_pr_url"] TEB["★ tag_experiment_branch ━━━━━━━━━━ git tag -a archive/research/* captures: archive_tag"] CEP["★ close_experiment_pr ━━━━━━━━━━ gh pr close + comment"] end RC([research_complete ━━━━━━━━━━ action: stop]) GPR -->|"pr_url empty"| RC GPR -->|"pr_url truthy"| RRP RRP -->|"changes_requested"| RRR RRP -->|"needs_human / default / fail"| BA RRR -->|"success"| CE RRR -->|"exhausted / fail"| BA CE -->|"needs_rerun=true"| RERUN CE -->|"default"| REPUSH RERUN --> REWRITE --> RETEST --> REPUSH REPUSH -->|"success / fail"| BA BA -->|"pr_url truthy"| CEB BA -->|"default"| RC CEB -->|"success"| CAB CEB -->|"fail"| RC CAB -->|"success"| OAP CAB -->|"fail"| RC OAP -->|"success"| TEB OAP -->|"fail"| RC TEB -->|"success"| CEP TEB -->|"fail"| RC CEP -->|"success / fail"| RC class GPR,CE,BA stateNode; class RRP,RRR,RERUN,REWRITE,RETEST,REPUSH handler; class CEB,CAB,OAP,TEB,CEP newComponent; class RC terminal; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | `research_complete` stop state | | Teal | State/Route | Decision and routing steps (guard_pr_url, check_escalations, begin_archival) | | Orange | Handler | Existing processing steps — `●` marks modified routing targets | | Green | New Component | Six new archival steps (`★`) — linear chain with graceful degradation | Closes #621 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-101015-593986/.autoskillit/temp/make-plan/research_recipe_post_completion_archival_plan_2026-04-05_101500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.2k | 36.6k | 1.4M | 90.3k | 1 | 16m 17s | | verify | 32 | 25.8k | 1.2M | 55.5k | 1 | 14m 5s | | implement | 48 | 14.0k | 1.9M | 50.5k | 1 | 5m 52s | | audit_impl | 16 | 9.7k | 178.9k | 55.3k | 2 | 4m 31s | | open_pr | 22 | 11.7k | 690.1k | 46.2k | 1 | 4m 26s | | **Total** | 2.3k | 97.7k | 5.4M | 297.8k | | 45m 13s | --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

## Summary - Add `Configure git auth for private deps` step to `patch-bump-integration.yml` and `version-bump.yml` before `uv lock` runs - Fixes authentication failure when resolving the private `api-simulator` git dependency added in PR #613 - Mirrors the existing auth pattern already present in `tests.yml` (line 76) ## Root Cause PR #613 added `api-simulator` as a private git dependency in `pyproject.toml`. The `tests.yml` workflow was updated with git auth, but both version-bump workflows were missed. Every PR merged to `integration` since then fails at the `uv lock` step with: ``` fatal: could not read Username for 'https://github.com': terminal prompts disabled ``` ## Test plan - [ ] This PR's own CI passes (tests.yml) - [ ] After merge, the patch-bump workflow should succeed — verify by checking the `bump-patch` check on this PR's merge commit - [ ] Re-run a recent failed bump-patch workflow to confirm the fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## Summary Fixes a 3-iteration ejection loop in the merge queue pipeline by introducing ejection-cause enrichment (`ejected_ci_failure` state and `ejection_cause` field in `wait_for_merge_queue`), a CI gate after every force-push (`ci_watch_post_queue_fix` step), and two post-rebase manifest validation gates (language-aware validity check and duplicate key scan) in `resolve-merge-conflicts`. Closes all six gaps identified in #627: blind CI ejection routing, missing CI gate after re-push, absent manifest/semantic validation, and missing `head_sha` in CI results. <details> <summary>Individual Group Plans</summary> ### Group 1: Implementation Plan: Queue Ejection Loop Fix — PART A ONLY This part addresses the Python code layer for the queue ejection loop fix (Gaps 2 and 5 from issue #627). **Gap 2** — `execution/merge_queue.py` currently returns `pr_state="ejected"` for every ejection regardless of cause. When GitHub's CI fails on a merge-group commit, the recipe cannot distinguish a CI failure ejection from a conflict ejection, so it retries conflict resolution indefinitely (no-op rebase loop). The fix: when the ejection is confirmed and `checks_state == "FAILURE"`, return `pr_state="ejected_ci_failure"` plus an `ejection_cause="ci_failure"` field, allowing recipe `on_result` routing to send CI failures directly to `diagnose_ci` instead of `queue_ejected_fix`. **Gap 5** — `server/tools_ci.py` infers `head_sha` from `git rev-parse HEAD` but never includes it in the JSON response. Recipe orchestrators cannot verify that CI results correspond to the current HEAD after a force-push. The fix: include `head_sha` in the `wait_for_ci` return dict when it was resolved. ### Group 2: Implementation Plan: Queue Ejection Loop Fix — PART B ONLY This part addresses the recipe and skill layer of the queue ejection loop fix (Gaps 1, 3, 4, 6 from issue #627). Part A (code layer) must be implemented first — this part routes on `pr_state="ejected_ci_failure"` which Part A introduces. **Gap 1** — `re_push_queue_fix` routes directly to `reenter_merge_queue` after force-push, bypassing CI. Fix: insert a new `ci_watch_post_queue_fix` step between `re_push_queue_fix` and `reenter_merge_queue`, mirroring the existing `ci_watch` step. **Gap 6** — `wait_for_queue` routes all `ejected` states to `queue_ejected_fix` (conflict resolution), even when the ejection was caused by a CI failure that conflict resolution cannot fix. Fix: add an `ejected_ci_failure` route before `ejected` in `wait_for_queue.on_result`, routing to `diagnose_ci` instead. **Gap 3** — `resolve-merge-conflicts` SKILL.md runs only `pre-commit run --all-files` post-rebase. Fix: add Step 5a — language-detected manifest validation using fast non-compiling checks. **Gap 4** — Even a clean rebase can produce duplicate keys when both branches independently added the same dependency. Fix: add Step 5b — targeted duplicate key scan in TOML/JSON manifest files. Applied to: `recipes/implementation.yaml`, `recipes/remediation.yaml`, `recipes/implementation-groups.yaml`, `skills_extended/resolve-merge-conflicts/SKILL.md`. </details> ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; %% TERMINALS %% START([wait_for_queue\nrecipe step]) END_OK([release_issue_success]) END_FAIL([release_issue_failure]) END_TIMEOUT([release_issue_timeout]) END_DIAG([diagnose_ci]) subgraph MQPoll ["● Merge Queue Watcher (merge_queue.py)"] direction TB POLL["poll GitHub GraphQL\n━━━━━━━━━━\nPR state + queue state\n+ checks_state"] MERGED{"merged?"} CI_FAIL{"● checks_state\n== 'FAILURE'?"} CONFIRM["confirmation window\n━━━━━━━━━━\nnot_in_queue_cycles++"] CONFIRMED{"cycles ≥ threshold?"} STALL{"stall retries\nexhausted?"} TIMEOUT{"deadline\nexceeded?"} end subgraph EjectRoute ["● Recipe Ejection Routing (implementation.yaml)"] direction TB ROUTE{"● pr_state?"} REENROLL["reenroll_stalled_pr\n━━━━━━━━━━\ntoggle_auto_merge tool"] end subgraph ConflictFix ["● Conflict Fix Sub-Flow (implementation.yaml)"] direction TB QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"] ESC{"escalation_required?"} REPUSH["re_push_queue_fix\n━━━━━━━━━━\npush_to_remote force=true"] CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci tool\ntimeout=300s"] CI_PASS{"CI pass?"} DETECT["detect_ci_conflict\n━━━━━━━━━━\ndiagnose-ci skill"] REENTER["reenter_merge_queue\n━━━━━━━━━━\ngh pr merge --squash --auto"] end subgraph WFCITool ["● wait_for_ci tool handler (tools_ci.py)"] direction LR INFER["infer head_sha\n━━━━━━━━━━\ngit rev-parse HEAD"] CIWAIT["ci_watcher.wait(scope)"] ENRICH["● result includes head_sha\n━━━━━━━━━━\nverifies SHA matches HEAD\nafter force-push"] end %% MAIN FLOW %% START --> POLL POLL --> MERGED MERGED -->|"yes"| END_OK MERGED -->|"no"| CONFIRM CONFIRM --> CONFIRMED CONFIRMED -->|"no"| STALL CONFIRMED -->|"yes (not in queue)"| CI_FAIL STALL -->|"yes"| END_TIMEOUT STALL -->|"no"| TIMEOUT TIMEOUT -->|"yes"| END_TIMEOUT TIMEOUT -->|"no"| POLL CI_FAIL -->|"yes"| ROUTE CI_FAIL -->|"no"| ROUTE ROUTE -->|"ejected_ci_failure\n(● new route)"| END_DIAG ROUTE -->|"ejected"| QFIX ROUTE -->|"stalled"| REENROLL ROUTE -->|"timeout"| END_TIMEOUT REENROLL -->|"success"| START REENROLL -->|"failure"| END_FAIL QFIX --> ESC ESC -->|"true"| END_FAIL ESC -->|"false"| REPUSH REPUSH -->|"failure"| END_FAIL REPUSH -->|"success"| CI_WATCH CI_WATCH --> INFER --> CIWAIT --> ENRICH ENRICH --> CI_PASS CI_PASS -->|"failure"| DETECT CI_PASS -->|"success"| REENTER DETECT --> END_FAIL REENTER -->|"success"| START REENTER -->|"failure"| END_FAIL %% CLASS ASSIGNMENTS %% class START terminal; class END_OK,END_FAIL,END_TIMEOUT,END_DIAG terminal; class POLL,CONFIRM handler; class MERGED,CONFIRMED,STALL,TIMEOUT stateNode; class CI_FAIL,ROUTE,ESC,CI_PASS detector; class QFIX,REPUSH,REENTER handler; class REENROLL,DETECT handler; class CI_WATCH,INFER,CIWAIT,ENRICH newComponent; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph MQResult ["wait_for_merge_queue Return Dict (merge_queue.py)"] direction TB PS["● pr_state : str\n━━━━━━━━━━\nmerged | ejected\nejected_ci_failure | stalled\ntimeout | error\n(bare literals, no StrEnum)"] SUC["success : bool\n━━━━━━━━━━\ntrue only for 'merged'"] REASON["reason : str\n━━━━━━━━━━\nhuman-readable\nalways present"] STALL["stall_retries_attempted : int\n━━━━━━━━━━\nalways present\nexcept 'error' path"] EC["● ejection_cause : str\n━━━━━━━━━━\n'ci_failure' only\nwhen pr_state==ejected_ci_failure\nCONDITIONAL FIELD"] end subgraph InternalPoll ["PRFetchState — Internal Polling State (not returned)"] direction LR CHECKS["checks_state : str|None\n━━━━━━━━━━\nGitHub StatusCheckRollup\nNone = no checks configured"] INQUEUE["in_queue : bool\n━━━━━━━━━━\nPR in mergeQueue.entries"] QSTATE["queue_state : str|None\n━━━━━━━━━━\nUNMERGEABLE | AWAITING_CHECKS\n| LOCKED | null"] end subgraph Gate1 ["● Ejection Decision Gate (merge_queue.py)"] direction TB CFAIL{"checks_state\n== 'FAILURE'?"} SET_ECI["● set pr_state='ejected_ci_failure'\n━━━━━━━━━━\nejection_cause='ci_failure'\nINJECTED into result"] SET_EJ["set pr_state='ejected'\n━━━━━━━━━━\nno ejection_cause field\n(absent, not null)"] end subgraph CIScope ["CIRunScope — Frozen Input Scope (core/types)"] direction LR WF["workflow : str|None\n━━━━━━━━━━\ne.g. 'tests.yml'"] HS["● head_sha : str|None\n━━━━━━━━━━\ngit rev-parse HEAD\nor caller-supplied"] end subgraph CIResult ["● wait_for_ci Return Dict (tools_ci.py)"] direction TB RUNID["run_id : int|None\n━━━━━━━━━━\nGitHub Actions run ID"] CONC["conclusion : str\n━━━━━━━━━━\nsuccess|failure|cancelled\naction_required|timed_out\nno_runs|error|unknown"] FJOBS["failed_jobs : list\n━━━━━━━━━━\nalways present\nempty on billing errors"] HSHA["● head_sha : str\n━━━━━━━━━━\nCONDITIONAL: present only\nwhen scope.head_sha truthy\ninjected by tool layer"] end subgraph ConsumerGate ["Recipe Routing Gate (on_result)"] direction TB ROUTE{"pr_state value?"} R1["ejected_ci_failure\n→ diagnose_ci"] R2["ejected\n→ queue_ejected_fix"] R3["merged|stalled|timeout\n→ other routes"] end %% FLOW %% CHECKS --> CFAIL INQUEUE --> CFAIL QSTATE --> CFAIL CFAIL -->|"FAILURE"| SET_ECI CFAIL -->|"other"| SET_EJ SET_ECI --> PS SET_ECI --> EC SET_EJ --> PS PS --> SUC PS --> REASON PS --> STALL HS --> CIResult WF --> CIResult RUNID --> CONC CONC --> FJOBS FJOBS --> HSHA PS --> ROUTE EC --> ROUTE ROUTE --> R1 ROUTE --> R2 ROUTE --> R3 HSHA -.->|"verifies HEAD\nafter force-push"| R2 %% CLASS ASSIGNMENTS %% class PS,EC,HSHA,SET_ECI,HS,CFAIL gap; class SUC,REASON,STALL,RUNID,CONC,FJOBS output; class CHECKS,INQUEUE,QSTATE,WF stateNode; class SET_EJ handler; class ROUTE,R1,R2,R3 detector; class InternalPoll phase; ``` ### Error/Resilience Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; END_OK([release_issue_success]) END_FAIL([release_issue_failure\n━━━━━━━━━━\nhuman escalation\nclone preserved]) END_DIAG([diagnose_ci]) subgraph MQLoop ["● Merge Queue Poll Loop (merge_queue.py)"] direction TB POLL["GraphQL fetch\n━━━━━━━━━━\nPR + queue state"] POLL_ERR{"Exception\ncaught?"} TIMEOUT_CHK{"deadline\nexceeded?"} STALL_CHK{"stall retries\n≥ max (3)?"} end subgraph EjectGate ["● Ejection Classification Gate (merge_queue.py)"] direction TB EJECT_DECISION{"● checks_state\n== 'FAILURE'?"} CI_EJ["● ejected_ci_failure\n━━━━━━━━━━\nejection_cause=ci_failure\nskips conflict resolution"] CONF_EJ["ejected\n━━━━━━━━━━\nno cause field\nconflict resolution"] end subgraph StallBreaker ["Stall Circuit Breaker (merge_queue.py)"] direction LR TOGGLE["_toggle_auto_merge\n━━━━━━━━━━\ndisable → 2s → re-enable\nbackoff: 30/60/120s"] TOGGLE_ERR{"Exception\ncaught?"} end subgraph ConflictPath ["Conflict Resolution Path (implementation.yaml)"] direction TB QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"] ESC_CHK{"escalation\nrequired?"} REPUSH["re_push_queue_fix\n━━━━━━━━━━\nforce-push"] REPUSH_FAIL{"push\nfailed?"} end subgraph CIGate ["● CI Gate After Re-Push (implementation.yaml + tools_ci.py)"] direction TB CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci, timeout=300s\nincludes head_sha"] CI_CONC{"conclusion\n== success?"} DETECT["detect_ci_conflict\n━━━━━━━━━━\ngit merge-base check\n(stale base?)"] DETECT_CHK{"stale\nbase?"} CI_CF["ci_conflict_fix\n━━━━━━━━━━\nresolve-merge-conflicts"] end subgraph ManifestGates ["● Post-Rebase Manifest Validation (SKILL.md)"] direction TB STEP5A["● Step 5a: manifest validity\n━━━━━━━━━━\ncargo metadata / node JSON.parse\nuv lock --check / tomllib"] STEP5A_CHK{"manifest\nvalid?"} STEP5B["● Step 5b: duplicate key scan\n━━━━━━━━━━\nTOML dep sections\nJSON object_pairs_hook"] STEP5B_CHK{"duplicates\nfound?"} REBASE_ABORT["git rebase --abort\n━━━━━━━━━━\nescalation_required=true"] end %% POLL LOOP FLOW %% POLL --> POLL_ERR POLL_ERR -->|"yes: log + retry"| POLL POLL_ERR -->|"no"| TIMEOUT_CHK TIMEOUT_CHK -->|"yes"| END_FAIL TIMEOUT_CHK -->|"no"| STALL_CHK STALL_CHK -->|"yes: stalled"| END_FAIL STALL_CHK -->|"no: stall attempt"| TOGGLE TOGGLE --> TOGGLE_ERR TOGGLE_ERR -->|"yes: log + increment"| STALL_CHK TOGGLE_ERR -->|"no: success"| POLL %% EJECTION GATE %% STALL_CHK -->|"ejection confirmed"| EJECT_DECISION EJECT_DECISION -->|"FAILURE"| CI_EJ EJECT_DECISION -->|"other"| CONF_EJ CI_EJ --> END_DIAG CONF_EJ --> QFIX %% CONFLICT PATH %% QFIX --> STEP5A STEP5A --> STEP5A_CHK STEP5A_CHK -->|"invalid"| REBASE_ABORT STEP5A_CHK -->|"valid"| STEP5B STEP5B --> STEP5B_CHK STEP5B_CHK -->|"duplicates"| REBASE_ABORT STEP5B_CHK -->|"clean"| ESC_CHK REBASE_ABORT --> ESC_CHK ESC_CHK -->|"true"| END_FAIL ESC_CHK -->|"false"| REPUSH REPUSH --> REPUSH_FAIL REPUSH_FAIL -->|"yes"| END_FAIL REPUSH_FAIL -->|"no"| CI_WATCH %% CI GATE %% CI_WATCH --> CI_CONC CI_CONC -->|"yes"| END_OK CI_CONC -->|"no"| DETECT DETECT --> DETECT_CHK DETECT_CHK -->|"yes: stale base"| CI_CF DETECT_CHK -->|"no: code failure"| END_DIAG CI_CF --> ESC_CHK %% CLASS ASSIGNMENTS %% class END_OK,END_FAIL,END_DIAG terminal; class POLL,TOGGLE handler; class POLL_ERR,TOGGLE_ERR,TIMEOUT_CHK,STALL_CHK gap; class EJECT_DECISION,CI_CONC,DETECT_CHK,STEP5A_CHK,STEP5B_CHK,ESC_CHK,REPUSH_FAIL detector; class CI_EJ,CONF_EJ,REBASE_ABORT output; class QFIX,REPUSH,CI_WATCH,DETECT,CI_CF handler; class STEP5A,STEP5B phase; ``` Closes #627 ## Implementation Plan Plan files: - `/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_a.md` - `/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_b.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 37 | 31.7k | 1.9M | 113.2k | 1 | 11m 19s | | review | 3.4k | 5.6k | 147.3k | 41.5k | 1 | 5m 45s | | verify | 44 | 35.4k | 1.9M | 144.8k | 2 | 11m 15s | | implement | 100 | 33.5k | 4.6M | 123.5k | 2 | 12m 17s | | audit_impl | 15 | 14.0k | 279.5k | 44.2k | 1 | 3m 46s | | open_pr | 33 | 30.5k | 1.2M | 68.1k | 1 | 10m 58s | | **Total** | 3.6k | 150.8k | 9.9M | 535.3k | | 55m 23s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…Artifact Preservation (#630) ## Summary The review-design skill has four compounding defects that make GO verdicts structurally unreachable. This plan fixes all four: 1. **Threshold unreachable** — Replace the static `>= 3` warning threshold with a proportional formula based on active dimensions (`active_dimensions * WARNING_BUDGET_PER_DIM` where budget = 5), calibrated so that the spectral-init v6 baseline (32 warnings across ~7 dimensions, deemed "substantively sound") would receive a GO verdict. 2. **Prescriptive findings** — Add evaluative-only constraints to Critical Constraints and a shared subagent evaluation scope block before Step 2, requiring findings to describe WHAT is lacking, never HOW to fix it. 3. **Scope drift** — Add a design scope boundary to the shared subagent block, prohibiting evaluation of implementation code snippets and constraining review to experimental design elements. 4. **Artifact preservation** — Enhance the `create_worktree` step in research.yaml to copy all review-cycle artifacts (dashboards, revision guidance, plan versions, resolve-design-review output) into `research/.../artifacts/`, and add a `commit_research_artifacts` step before `push_branch` to capture phase-groups and phase-plans from the worktree. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; START([plan_experiment]) COMPLETE([research_complete]) STOP_OUT([design_rejected]) subgraph DesignReview ["● review_design Step (research.yaml)"] direction TB RD["● review_design ━━━━━━━━━━ run_skill retries: 2"] REVISE_ROUTE["revise_design ━━━━━━━━━━ route → plan_experiment"] RESOLVE["resolve_design_review ━━━━━━━━━━ run_skill, retries: 1"] end subgraph VerdictSynthesis ["● Step 7: Verdict Synthesis (review-design SKILL.md)"] direction TB SCOPE["● Evaluative Scope Gate ━━━━━━━━━━ Findings: WHAT is lacking Design boundary only"] RTCAP["rt_cap = RT_MAX_SEVERITY ━━━━━━━━━━ Downgrade red_team severity by type"] CLASSIFY["Classify findings ━━━━━━━━━━ critical_findings warning_findings"] ACTIVE["● active_dimensions ━━━━━━━━━━ count spawned non-SILENT dims (L1+L2+L3+L4+RT)"] THRESH["★ warning_threshold ━━━━━━━━━━ active_dims × 5 WARNING_BUDGET_PER_DIM=5"] VERDICT{"● Verdict Decision ━━━━━━━━━━ stop_triggers? critical? warnings≥threshold?"} end subgraph ArtifactPath ["★ Artifact Commit Path (research.yaml)"] direction TB TEST["● test ━━━━━━━━━━ test_check"] FIX["fix_tests ━━━━━━━━━━ run_skill"] RETEST["● retest ━━━━━━━━━━ test_check"] COMMIT["★ commit_research_artifacts ━━━━━━━━━━ run_cmd: copy phase-groups phase-plans → artifacts/ on_failure: push_branch"] end PUSH["push_branch ━━━━━━━━━━ run_cmd"] START -->|"run review_design"| RD RD -->|"STOP verdict"| RESOLVE RD -->|"REVISE verdict"| REVISE_ROUTE RD -->|"GO verdict"| create_worktree REVISE_ROUTE -->|"loop back"| START RESOLVE -->|"revised"| REVISE_ROUTE RESOLVE -->|"failed"| STOP_OUT RD -->|"on_failure / on_exhausted"| create_worktree create_worktree["create_worktree ━━━━━━━━━━ ★ copies review-cycles plan-versions artifacts"] create_worktree --> decompose["decompose_phases plan_phase implement_phase"] decompose --> experiment["run_experiment write_report"] experiment --> TEST TEST -->|"pass"| COMMIT TEST -->|"fail"| FIX FIX --> RETEST RETEST -->|"pass"| COMMIT RETEST -->|"fail"| PUSH COMMIT -->|"success or failure"| PUSH PUSH --> COMPLETE SCOPE -.->|"constraint applied to all dimension subagents"| CLASSIFY RTCAP --> CLASSIFY CLASSIFY --> ACTIVE ACTIVE --> THRESH THRESH --> VERDICT VERDICT -->|"stop_triggers"| STOP_OUT VERDICT -->|"critical_findings or warnings ≥ threshold"| REVISE_ROUTE VERDICT -->|"else"| create_worktree class START,COMPLETE,STOP_OUT terminal; class RD,RESOLVE,decompose,experiment,FIX handler; class REVISE_ROUTE,RTCAP,CLASSIFY phase; class VERDICT,ACTIVE stateNode; class SCOPE detector; class THRESH,COMMIT,create_worktree newComponent; class TEST,RETEST,PUSH output; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Start, complete, and terminal states | | Orange | Handler | Processing steps (run_skill, run_cmd) | | Purple | Phase | Control flow, routing, severity capping | | Teal | State | Decision and counting nodes | | Red | Detector | Constraint gates (evaluative scope) | | Green | New | ★ new components, ● modified components | | Dark Teal | Output | test_check steps and push_branch | Closes #629 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-160303-009353/.autoskillit/temp/make-plan/fix-review-design-threshold-unreachable-prescriptive-finding_plan_2026-04-05_161500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.8k | 22.6k | 1.2M | 85.0k | 1 | 10m 36s | | verify | 30 | 14.6k | 1.5M | 74.8k | 1 | 8m 28s | | implement | 62 | 19.9k | 4.1M | 92.5k | 1 | 7m 41s | | audit_impl | 87 | 10.6k | 473.5k | 47.1k | 1 | 6m 41s | | open_pr | 25 | 11.7k | 806.3k | 48.9k | 1 | 4m 22s | | **Total** | 3.0k | 79.4k | 8.1M | 348.3k | | 37m 50s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…ound Bash Tasks (#633) ## Summary Headless sessions running long-lived background Bash tasks (e.g. `cargo bench` launched via `run_in_background: true`) are killed as stale because the staleness signal is JSONL file growth, not actual session liveness. When the LLM goes idle waiting for a background child, the JSONL stops growing and the 20-minute staleness threshold is breached — even though child processes are actively running. Three changes eliminate this class of false kill: 1. **`_has_active_child_processes`** — a second suppression gate in `_session_log_monitor` that checks child process CPU activity before issuing a kill. Added alongside the existing `_has_active_api_connection` port-443 gate. 2. **`RecipeStep.stale_threshold`** — an optional per-step threshold field that recipe authors can raise for steps known to run long-lived experiments, passed through `run_skill` → `run_headless_core` → `_session_log_monitor`. 3. **Recipe YAML overrides** — `stale_threshold: 2400` (40 min) on specific long-running steps in `research.yaml`, `implementation.yaml`, `remediation.yaml`, `implementation-groups.yaml`, and `merge-prs.yaml`. ## Requirements ### STALE — Staleness Suppression via Child Process Detection - **REQ-STALE-001:** The system must detect active child processes in the headless session's process tree when the stale threshold is breached. - **REQ-STALE-002:** The system must suppress the stale kill when any child process in the tree reports CPU usage exceeding ~10% via `cpu_percent(interval=0)`. - **REQ-STALE-003:** The system must reset the staleness clock (`last_change`) when child process activity suppresses the stale kill, identical to the existing `_has_active_api_connection` suppression behavior. - **REQ-STALE-004:** The child process detection must follow the established exception-handling pattern, silently skipping `NoSuchProcess`, `ZombieProcess`, and `AccessDenied` errors per process. - **REQ-STALE-005:** The child process detection must only execute when the stale threshold has already been breached (zero performance impact during normal operation). - **REQ-STALE-006:** The child process detection must emit a structured log warning when suppressing a stale kill, following the pattern established by `_has_active_api_connection`. ### SCHEMA — Per-Step Stale Threshold in RecipeStep - **REQ-SCHEMA-001:** The `RecipeStep` dataclass must accept an optional `stale_threshold` field of type `int | None` with no default value (defaults to `None`). - **REQ-SCHEMA-002:** When `stale_threshold` is `None` on a recipe step, the global `RunSkillConfig.stale_threshold` (1200s) must apply. - **REQ-SCHEMA-003:** The `run_skill` MCP tool handler must accept an optional `stale_threshold` parameter and forward it to `run_headless_core`. - **REQ-SCHEMA-004:** The recipe validator must reject `stale_threshold` values that are not positive integers when set. ### RECIPE — Research Recipe Step Overrides - **REQ-RECIPE-001:** Research-oriented recipes must set `stale_threshold: 2400` (40 minutes) on specific long-running steps (e.g., `implement_phase`, `run_experiment`). - **REQ-RECIPE-002:** Fast-completing steps (e.g., `plan_phase`) must not have a `stale_threshold` override, relying on the global default. ### TEST — Test Coverage - **REQ-TEST-001:** Unit tests must verify `_has_active_child_processes` returns `True` when a child process exceeds the CPU threshold. - **REQ-TEST-002:** Unit tests must verify `_has_active_child_processes` returns `False` when all children are idle, when no children exist, and when exceptions are raised. - **REQ-TEST-003:** An integration test must verify stale suppression when a child process is CPU-active but has no port-443 connection. - **REQ-TEST-004:** The existing `TestSessionLogMonitorStaleSuppressionGate` test class must be extended with the child-process-active scenario. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([SESSION LAUNCHED]) T_COMPLETE([COMPLETION]) T_STALE([STALE — KILL]) %% CONFIG CHAIN %% subgraph Config ["● RECIPE STEP CONFIG (stale_threshold flow)"] direction TB RecipeStep["● RecipeStep YAML ━━━━━━━━━━ stale_threshold: 2400 (or unset → None)"] RunSkill["● run_skill handler ━━━━━━━━━━ tools_execution.py stale_threshold: int | None"] Runner["DefaultSubprocessRunner ━━━━━━━━━━ process.py default: 1200s"] end %% PHASE 1 %% subgraph Phase1 ["PHASE 1 — JSONL File Discovery (poll 1s, timeout 30s)"] direction TB P1_Poll["Poll session_log_dir ━━━━━━━━━━ ctime > spawn_time? Match session_id?"] P1_Found{"File found within 30s?"} end %% PHASE 2 %% subgraph Phase2 ["● PHASE 2 — Staleness Monitor Loop (poll every 2s)"] direction TB P2_Stat["stat(session_file) ━━━━━━━━━━ current_size vs last_size"] P2_Grew{"JSONL grew?"} P2_Marker["Read new content ━━━━━━━━━━ scan for completion marker in JSONL"] P2_MarkerFound{"Completion marker found?"} P2_ResetGrow["last_size = current_size last_change = now()"] P2_Elapsed{"elapsed >= stale_threshold?"} end %% SUPPRESSION GATES %% subgraph Gates ["● SUPPRESSION GATES (only fire when stale threshold breached)"] direction TB Gate1["_has_active_api_connection ━━━━━━━━━━ Walk proc tree ESTABLISHED port-443?"] Gate1_Active{"API conn active?"} Gate2["● _has_active_child_processes ━━━━━━━━━━ Walk child procs cpu_percent > 10%?"] Gate2_Active{"Child CPU > 10%?"} ResetClock["last_change = now() ━━━━━━━━━━ Suppress stale kill reset staleness clock"] end %% CONNECTIONS %% START --> RecipeStep RecipeStep -->|"stale_threshold (int|None)"| RunSkill RunSkill -->|"float(x) or None → default 1200s"| Runner Runner -->|"stale_threshold, pid"| P1_Poll P1_Poll --> P1_Found P1_Found -->|"yes"| P2_Stat P1_Found -->|"no (30s timeout)"| T_STALE P2_Stat --> P2_Grew P2_Grew -->|"yes"| P2_ResetGrow P2_ResetGrow --> P2_Marker P2_Marker --> P2_MarkerFound P2_MarkerFound -->|"yes"| T_COMPLETE P2_MarkerFound -->|"no"| P2_Elapsed P2_Grew -->|"no"| P2_Elapsed P2_Elapsed -->|"no (wait)"| P2_Stat P2_Elapsed -->|"yes"| Gate1 Gate1 --> Gate1_Active Gate1_Active -->|"yes"| ResetClock Gate1_Active -->|"no"| Gate2 Gate2 --> Gate2_Active Gate2_Active -->|"yes"| ResetClock Gate2_Active -->|"no"| T_STALE ResetClock -->|"continue loop"| P2_Stat %% CLASS ASSIGNMENTS %% class START,T_COMPLETE,T_STALE terminal; class RecipeStep,RunSkill handler; class Runner stateNode; class P1_Poll,P2_Stat,P2_Marker,P2_ResetGrow,ResetClock phase; class P1_Found,P2_Grew,P2_MarkerFound,P2_Elapsed,Gate1_Active,Gate2_Active stateNode; class Gate1 handler; class Gate2 newComponent; ``` ### Concurrency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([SESSION LAUNCHED]) COMPLETE([TASK GROUP CANCELLED]) %% MAIN THREAD: Sequential setup %% subgraph MainSeq ["MAIN COROUTINE — Sequential Setup"] direction TB SpawnProc["Spawn Claude Code process ━━━━━━━━━━ asyncio subprocess get proc.pid"] CreateAcc["Create RaceAccumulator + trigger ━━━━━━━━━━ anyio.Event (idempotent set) channel_b_ready Event"] OpenTG["anyio.create_task_group() ━━━━━━━━━━ Fork: start 4–5 coroutines as tg.start_soon(...)"] TrigWait["await trigger.wait() ━━━━━━━━━━ Block until first watcher wins (or wall-clock timeout)"] DrainWait["Optional drain window ━━━━━━━━━━ await channel_b_ready if process exited but B pending"] CancelTG["tg.cancel_scope.cancel() ━━━━━━━━━━ Tear down all remaining tasks"] Resolve["resolve_termination(RaceSignals) ━━━━━━━━━━ Priority: exit > stale > completion"] end %% TASK GROUP: Concurrent watchers %% subgraph TaskGroup ["anyio TASK GROUP — Concurrent Watchers (cooperative, single event loop)"] direction LR subgraph ChA ["Channel A"] WatchProc["_watch_process ━━━━━━━━━━ await proc.wait() acc.process_exited=True"] WatchHB["_watch_heartbeat ━━━━━━━━━━ poll stdout NDJSON 0.5s acc.channel_a_confirmed=True"] end subgraph ChB ["● Channel B — Session Log"] ExtractID["_extract_stdout_session_id ━━━━━━━━━━ poll stdout for type=system sets stdout_session_id_ready"] WatchSL["● _watch_session_log ━━━━━━━━━━ calls _session_log_monitor acc.channel_b_status=COMPLETION|STALE"] end end %% STALENESS SUPPRESSION %% subgraph StaleGates ["● STALENESS SUPPRESSION — Sync psutil walks (inside _session_log_monitor)"] direction TB Gate1["_has_active_api_connection(pid) ━━━━━━━━━━ [parent + children(recursive=True)] net_connections port-443 ESTABLISHED?"] Gate2["● _has_active_child_processes(pid) ━━━━━━━━━━ [children(recursive=True) only] cpu_percent(interval=0) > 10%?"] ResetClock["last_change = monotonic() ━━━━━━━━━━ suppress stale kill continue Phase 2 loop"] ReturnStale["return STALE ━━━━━━━━━━ acc.channel_b_status = STALE trigger.set()"] end %% FLOW %% START --> SpawnProc SpawnProc --> CreateAcc CreateAcc --> OpenTG OpenTG -->|"tg.start_soon"| WatchProc OpenTG -->|"tg.start_soon"| WatchHB OpenTG -->|"tg.start_soon"| ExtractID OpenTG -->|"tg.start_soon"| WatchSL WatchProc -->|"trigger.set()"| TrigWait WatchHB -->|"trigger.set()"| TrigWait WatchSL -->|"trigger.set() after drain"| TrigWait WatchSL -->|"stale threshold breached"| Gate1 Gate1 -->|"no API conn"| Gate2 Gate2 -->|"child CPU active"| ResetClock Gate2 -->|"no activity"| ReturnStale Gate1 -->|"API conn active"| ResetClock ResetClock -->|"continue loop"| WatchSL TrigWait --> DrainWait DrainWait --> CancelTG CancelTG --> Resolve Resolve --> COMPLETE %% CLASS ASSIGNMENTS %% class START,COMPLETE terminal; class SpawnProc,CreateAcc,TrigWait,DrainWait,CancelTG,Resolve phase; class OpenTG detector; class WatchProc,WatchHB handler; class ExtractID handler; class WatchSL handler; class Gate1 handler; class Gate2 newComponent; class ResetClock output; class ReturnStale detector; ``` ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([RECIPE YAML LOADED]) T_PASS([VALID — forwarded to run_skill]) T_FAIL([INVALID — validation error]) %% PARSE LAYER %% subgraph Parse ["● YAML → RecipeStep (io.py _parse_step)"] direction TB YAMLRead["● YAML key read ━━━━━━━━━━ data.get('stale_threshold') absent → None (no coercion)"] Construct["● RecipeStep(...) ━━━━━━━━━━ stale_threshold: int | None = None No __post_init__ mutations"] IntegrityGuard["_PARSE_STEP_HANDLED_FIELDS guard ━━━━━━━━━━ compile-time assert: fields == dataclass RuntimeError if diverged"] end %% VALIDATION LAYER %% subgraph Validation ["● STRUCTURAL VALIDATION (validator.py validate_recipe)"] direction TB IsNone{"stale_threshold is None?"} TypeCheck{"isinstance(int) AND > 0?"} AppendError["append error ━━━━━━━━━━ 'must be positive integer when set'"] PassThrough["field passes ━━━━━━━━━━ no validation error for None or valid int"] end %% SEMANTIC LAYER %% subgraph Semantic ["● SEMANTIC RULE — _TOOL_PARAMS registry (rules_tools.py)"] direction TB ToolParamsCheck["_TOOL_PARAMS['run_skill'] ━━━━━━━━━━ frozenset includes 'stale_threshold' dead-with-param rule: NO warning"] OtherToolWarn["Other tools ━━━━━━━━━━ stale_threshold not in their params dead-with-param: WARNING emitted"] end %% EXECUTION FORWARDING %% subgraph Execution ["EXECUTION FORWARDING (tools_execution.py run_skill)"] direction TB NullPath["stale_threshold = None ━━━━━━━━━━ → DefaultSubprocessRunner default = 1200s (global config)"] OverridePath["stale_threshold = int ━━━━━━━━━━ float(stale_threshold) → overrides global default"] Monitor["_session_log_monitor ━━━━━━━━━━ stale_threshold used as breach-detection window"] end %% FLOW %% START --> YAMLRead YAMLRead --> Construct Construct --> IntegrityGuard IntegrityGuard -->|"fields match — import OK"| IsNone IsNone -->|"yes (absent or None)"| PassThrough IsNone -->|"no (value present)"| TypeCheck TypeCheck -->|"valid"| PassThrough TypeCheck -->|"invalid (non-int or ≤ 0)"| AppendError AppendError --> T_FAIL PassThrough --> ToolParamsCheck ToolParamsCheck -->|"tool: run_skill"| T_PASS ToolParamsCheck -->|"other tool"| OtherToolWarn T_PASS --> NullPath T_PASS --> OverridePath NullPath --> Monitor OverridePath --> Monitor Monitor --> T_PASS %% CLASS ASSIGNMENTS %% class START,T_PASS,T_FAIL terminal; class YAMLRead,Construct handler; class IntegrityGuard detector; class IsNone,TypeCheck stateNode; class AppendError detector; class PassThrough output; class ToolParamsCheck newComponent; class OtherToolWarn gap; class NullPath,OverridePath,Monitor phase; ``` Closes #631 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-170436-566038/.autoskillit/temp/make-plan/fix_false_stale_kills_plan_2026-04-05_000000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.8k | 45.6k | 2.0M | 151.7k | 2 | 19m 31s | | verify | 62 | 36.0k | 3.3M | 155.3k | 2 | 15m 1s | | implement | 149 | 47.2k | 9.6M | 183.8k | 2 | 16m 24s | | audit_impl | 102 | 20.0k | 762.1k | 90.1k | 2 | 10m 31s | | open_pr | 69 | 39.4k | 2.6M | 116.8k | 2 | 15m 32s | | review_pr | 38 | 57.4k | 1.8M | 103.1k | 1 | 18m 47s | | resolve_review | 55 | 32.5k | 3.1M | 84.3k | 1 | 14m 9s | | fix | 38 | 14.6k | 1.3M | 58.3k | 1 | 9m 9s | | **Total** | 3.3k | 292.6k | 24.3M | 943.5k | | 1h 59m | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

## Summary All four bundled recipes (`implementation`, `remediation`, `merge-prs`, `implementation-groups`) currently ship with `audit: default: "true"`, meaning `audit-impl` runs unless explicitly disabled. This plan changes all four recipes to `default: "false"` so `audit-impl` is skipped by default and becomes opt-in. No structural changes to the step graph, routing, or test infrastructure are needed — only the ingredient default changes. **Scope:** 4 YAML ingredient default changes + 1 test assertion added. ## Requirements ### RCFG — Recipe Configuration - **REQ-RCFG-001:** The `audit` input in `implementation.yaml` must default to `"false"`. - **REQ-RCFG-002:** The `audit` input in `implementation-groups.yaml` must default to `"false"`. - **REQ-RCFG-003:** The `audit` input in `remediation.yaml` must default to `"false"`. - **REQ-RCFG-004:** The `audit` input in `merge-prs.yaml` must default to `"false"`. - **REQ-RCFG-005:** The `audit_impl` step definition and its `skip_when_false: "inputs.audit"` guard must remain unchanged in all recipes. - **REQ-RCFG-006:** Callers must still be able to opt in to audit-impl by passing `audit: "true"` at pipeline invocation time. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; %% TERMINALS %% START([Pipeline Invoked]) CONTINUE([Continue to push / merge]) ERROR([escalate_stop / register_clone_failure]) subgraph Ingredient ["● Ingredient Resolution"] direction TB AuditIng["● audit ingredient ━━━━━━━━━━ BEFORE: default='true' AFTER: default='false'"] end subgraph Gate ["skip_when_false Gate"] direction TB SkipCheck{"inputs.audit == 'true'?"} SkipBypass["BYPASS ━━━━━━━━━━ Skip audit_impl (now default path)"] RunAudit["● run audit-impl skill ━━━━━━━━━━ runs /autoskillit:audit-impl (now opt-in path)"] Verdict{"GO / NO GO?"} Remediate["remediate ━━━━━━━━━━ Route to remediation or re-plan"] end %% FLOW %% START --> AuditIng AuditIng -->|"resolves to 'false' (new default)"| SkipCheck SkipCheck -->|"false (default — bypass)"| SkipBypass SkipCheck -->|"true (opt-in — explicit)"| RunAudit RunAudit --> Verdict Verdict -->|"GO"| CONTINUE Verdict -->|"NO GO"| Remediate Verdict -->|"error"| ERROR Remediate -->|"re-plan loop"| START SkipBypass --> CONTINUE %% CLASS ASSIGNMENTS %% class START,CONTINUE,ERROR terminal; class AuditIng handler; class SkipCheck,Verdict stateNode; class SkipBypass phase; class RunAudit detector; class Remediate phase; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Pipeline start, continuation, and error states | | Teal | State | Decision gates (skip_when_false, GO/NO GO) | | Orange | Handler | ● Audit ingredient (modified: default flipped to "false") | | Red | Detector | ● audit-impl skill execution (now opt-in path) | | Purple | Phase | Bypass path (now default) and remediation routing | ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% START([Recipe Invoked]) GATE([skip_when_false Evaluated]) subgraph Contracts ["● INGREDIENT CONTRACT DEFINITIONS"] direction TB ImplYaml["● implementation.yaml ━━━━━━━━━━ audit: default: 'false' (was: 'true')"] ImplGroupsYaml["● implementation-groups.yaml ━━━━━━━━━━ audit: default: 'false' (was: 'true')"] RemediationYaml["● remediation.yaml ━━━━━━━━━━ audit: default: 'false' (was: 'true')"] MergePrsYaml["● merge-prs.yaml ━━━━━━━━━━ audit: default: 'false' (was: 'true')"] end subgraph Resolution ["INIT_ONLY: Ingredient Resolution"] direction TB CallerSupplied["Caller-supplied value ━━━━━━━━━━ audit='true' (opt-in) INIT_ONLY — frozen for run"] DefaultApplied["● Contract default applied ━━━━━━━━━━ audit='false' INIT_ONLY — frozen for run"] end subgraph TestGate ["● CONTRACT VALIDATION (test_bundled_recipes.py)"] direction TB TestAssert["● test_audit_ingredient_defaults_to_false ━━━━━━━━━━ @pytest.mark.parametrize asserts audit.default == 'false' for all 4 recipes"] end %% FLOW %% START -->|"caller passes audit='true'"| CallerSupplied START -->|"no audit arg (default)"| DefaultApplied ImplYaml --> DefaultApplied ImplGroupsYaml --> DefaultApplied RemediationYaml --> DefaultApplied MergePrsYaml --> DefaultApplied CallerSupplied --> GATE DefaultApplied --> GATE Contracts -.->|"validated by"| TestAssert %% CLASS ASSIGNMENTS %% class START terminal; class GATE stateNode; class ImplYaml,ImplGroupsYaml,RemediationYaml,MergePrsYaml handler; class CallerSupplied detector; class DefaultApplied phase; class TestAssert gap; ``` **Color Legend:** | Color | Category | Description | |-------|----------|-------------| | Dark Blue | Terminal | Pipeline invocation point | | Teal | Gate | skip_when_false evaluation (INIT_ONLY field read) | | Orange | Contract | ● Recipe YAML ingredient contract definitions (modified) | | Red | Opt-in | Caller-supplied value override (explicit audit='true') | | Purple | Default | ● Contract default applied (now 'false') | | Yellow | Test | ● Contract validation test assertion (new) | Closes #632 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-180825-135856/.autoskillit/temp/make-plan/feat_default_audit_impl_off_plan_2026-04-05_181000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.8k | 60.3k | 4.0M | 213.2k | 3 | 24m 25s | | verify | 82 | 43.0k | 3.9M | 193.2k | 3 | 22m 22s | | implement | 176 | 53.6k | 10.3M | 221.3k | 3 | 18m 51s | | audit_impl | 117 | 25.1k | 1.0M | 114.6k | 3 | 12m 6s | | open_pr | 101 | 60.0k | 3.7M | 168.5k | 3 | 22m 39s | | review_pr | 71 | 112.5k | 3.4M | 189.2k | 2 | 33m 19s | | resolve_review | 77 | 40.4k | 3.7M | 117.7k | 2 | 18m 16s | | fix | 38 | 14.6k | 1.3M | 58.3k | 1 | 9m 9s | | **Total** | 3.5k | 409.5k | 31.4M | 1.3M | | 2h 41m | Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Increase sensitivity to catch quota exhaustion earlier, giving more buffer before hard API limits are hit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…l for Experiment Failures (#636) ## Summary This plan adds automated failure diagnosis to the research pipeline (issue #635). There are two distinct requirements: **DIAG**: Create a `troubleshoot-experiment` skill that reads session logs and process traces to classify why a research step failed, then emit a structured diagnostic artifact and `is_fixable` signal. Wire this skill into `research.yaml` so that `implement_phase` failures route to it instead of dying at `escalate_stop`. **SEP**: Fix the structural misuse of `retry-worktree` in `implement_phase`. The skill `retry-worktree` is designed to *resume* context-exhausted `implement-worktree` sessions — it is not a primary implementation driver. The research recipe already has the correct purpose-built skill: `implement-experiment`, which explicitly forbids experiment execution during implementation and routes context exhaustion directly to `run-experiment`. Switching `implement_phase` to use `implement-experiment` addresses REQ-SEP-001 and REQ-SEP-002 at the skill level, where the constraint is enforceable. ## Requirements ### DIAG — Experiment Failure Diagnosis - **REQ-DIAG-001:** The system must provide a skill that investigates why a research recipe step failed by reading session logs and process traces. - **REQ-DIAG-002:** The skill must classify the failure type (stale timeout, context exhaustion, build failure, data missing, parameter issue, unknown). - **REQ-DIAG-003:** The skill must emit a structured diagnostic artifact that downstream steps or the human can act on. - **REQ-DIAG-004:** The research recipe must route experiment failures to the diagnostic skill instead of `escalate_stop`. ### SEP — Structural Separation of Implementation and Execution - **REQ-SEP-001:** Implementation worktree steps must not perform experiment execution (benchmarks, profiling, data collection). - **REQ-SEP-002:** Experiment execution must route through the `run_experiment` step (or equivalent) which has appropriate timeout and retry semantics. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; START([RESEARCH PIPELINE]) ESCALATE([escalate_stop]) COMPLETE([research_complete]) subgraph PhaseMgmt ["Phase Management"] plan_phase["● plan_phase ━━━━━━━━━━ make-plan skill plans current group"] implement_phase["● implement_phase ━━━━━━━━━━ implement-experiment (was: retry-worktree) stale_threshold: 2400"] next_phase{"next_phase_or_experiment ━━━━━━━━━━ more phases?"} end subgraph DiagPhase ["★ Failure Diagnosis (NEW)"] troubleshoot["★ troubleshoot_implement_failure ━━━━━━━━━━ troubleshoot-experiment skill worktree_path + implement_phase"] route_fix{"★ route_implement_failure ━━━━━━━━━━ is_fixable?"} end subgraph SkillInternals ["★ troubleshoot-experiment Internals"] direction TB init_idx["★ initialize code-index ━━━━━━━━━━ set_project_path(worktree_path)"] session_lookup["★ locate failed session ━━━━━━━━━━ sessions.jsonl select success=false + cwd match"] read_diags["★ read session diagnostics ━━━━━━━━━━ summary.json: termination_reason write_call_count, exit_code anomalies.jsonl: kind, severity"] classify{"★ classify failure type ━━━━━━━━━━ priority-ordered decision table"} write_diag["★ diagnosis_{ts}.md ━━━━━━━━━━ failure_type, is_fixable evidence + recommended action"] emit_tokens["★ emit output tokens ━━━━━━━━━━ diagnosis_path= failure_type= is_fixable="] end subgraph ExperimentPhase ["Experiment Phase"] run_experiment["run_experiment ━━━━━━━━━━ run-experiment skill stale_threshold: 2400, retries: 2"] end START --> plan_phase plan_phase --> implement_phase implement_phase -->|"on_success"| next_phase implement_phase -->|"on_failure"| troubleshoot implement_phase -->|"on_exhausted / on_context_limit"| run_experiment next_phase -->|"more_phases"| plan_phase next_phase -->|"done"| run_experiment troubleshoot --> init_idx init_idx --> session_lookup session_lookup -->|"session found"| read_diags session_lookup -->|"no session / missing log"| write_diag read_diags --> classify classify -->|"context_limit → context_exhaustion, fixable=true"| write_diag classify -->|"stale + write=0 → stale_timeout, fixable=true"| write_diag classify -->|"exit!=0 + build error → build_failure, fixable=true"| write_diag classify -->|"infra error / OOM → environment_error, fixable=false"| write_diag classify -->|"unknown"| write_diag write_diag --> emit_tokens emit_tokens --> route_fix route_fix -->|"is_fixable=true"| plan_phase route_fix -->|"is_fixable=false"| ESCALATE troubleshoot -->|"on_failure (skill crash)"| ESCALATE run_experiment --> COMPLETE class START,ESCALATE,COMPLETE terminal; class plan_phase,implement_phase handler; class next_phase,route_fix,classify stateNode; class troubleshoot,init_idx,session_lookup,read_diags,write_diag,emit_tokens newComponent; ``` ### Module Dependency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 70, 'curve': 'basis'}}}%% graph TB classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph L2_Recipe ["L2 — Recipe System"] recipe_io["recipe/io.py ━━━━━━━━━━ load_recipe, builtin_recipes_dir"] recipe_validator["recipe/validator.py ━━━━━━━━━━ validate_recipe"] recipe_contracts["recipe/contracts.py ━━━━━━━━━━ contract card generation"] end subgraph L1_Workspace ["L1 — Workspace"] workspace_skills["workspace/skills.py ━━━━━━━━━━ SkillResolver discovers skills_extended/"] end subgraph L0_Core ["L0 — Core"] core_paths["core/paths.py ━━━━━━━━━━ pkg_root() canonical package root"] end subgraph DataRecipes ["Data — Recipes (YAML)"] research_yaml["● recipes/research.yaml ━━━━━━━━━━ implement-experiment (was: retry-worktree) on_failure → troubleshoot_implement_failure on_exhausted → run_experiment"] end subgraph DataContracts ["Data — Contracts (YAML)"] skill_contracts["● recipe/skill_contracts.yaml ━━━━━━━━━━ ★ troubleshoot-experiment entry is_fixable output pattern"] end subgraph DataSkills ["Data — Skills (SKILL.md)"] troubleshoot_skill["★ skills_extended/troubleshoot-experiment/ ━━━━━━━━━━ session log reader failure classifier, is_fixable emitter"] implement_exp["skills_extended/implement-experiment/ ━━━━━━━━━━ no experiment execution routes exhaustion → run-experiment"] end subgraph Tests ["Tests"] test_diag["★ tests/recipe/test_research_recipe_diag.py ━━━━━━━━━━ validates research.yaml routing asserts skill_command swap"] test_contracts["★ tests/skills/test_troubleshoot_experiment_contracts.py ━━━━━━━━━━ SkillResolver discovery SKILL.md existence"] test_skills_ws["● tests/workspace/test_skills.py ━━━━━━━━━━ skill count +1"] end recipe_io -->|"loads at runtime"| research_yaml recipe_validator -->|"validates"| research_yaml recipe_contracts -->|"loads at runtime"| skill_contracts research_yaml -->|"skill_command references"| troubleshoot_skill research_yaml -->|"skill_command references"| implement_exp skill_contracts -->|"contract entry for"| troubleshoot_skill workspace_skills -->|"discovers via pkg_root()"| troubleshoot_skill workspace_skills -->|"uses"| core_paths test_diag -->|"imports"| recipe_io test_diag -->|"imports"| recipe_validator test_contracts -->|"imports"| workspace_skills test_contracts -->|"imports"| core_paths test_skills_ws -->|"imports"| workspace_skills class recipe_io,recipe_validator,recipe_contracts phase; class workspace_skills handler; class core_paths stateNode; class research_yaml,skill_contracts output; class troubleshoot_skill newComponent; class implement_exp handler; class test_diag,test_contracts newComponent; class test_skills_ws handler; ``` Closes #635 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/impl-20260405-193031-162971/.autoskillit/temp/make-plan/research_recipe_troubleshoot_plan_2026-04-05_193500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | plan | 2.9k | 93.0k | 4.6M | 271.4k | 4 | 37m 40s | | verify | 109 | 64.2k | 5.4M | 277.1k | 4 | 28m 55s | | implement | 224 | 71.2k | 12.5M | 282.5k | 4 | 32m 50s | | audit_impl | 117 | 25.1k | 1.0M | 114.6k | 3 | 12m 6s | | open_pr | 131 | 76.9k | 4.8M | 232.2k | 4 | 27m 43s | | review_pr | 100 | 134.7k | 4.3M | 237.6k | 3 | 38m 8s | | resolve_review | 77 | 40.4k | 3.7M | 117.7k | 2 | 18m 16s | | fix | 91 | 32.1k | 3.8M | 120.9k | 2 | 21m 36s | | diagnose_ci | 13 | 1.4k | 161.4k | 15.6k | 1 | 37s | | resolve_ci | 18 | 3.7k | 293.8k | 29.1k | 1 | 3m 2s | | **Total** | 3.8k | 542.7k | 40.5M | 1.7M | | 3h 40m | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

#639) ## Summary The quota guard hook fails open 93% of the time because `open_kitchen` primes the cache once (via `_prime_quota_cache`) but no mechanism keeps it fresh. The cache TTL is 300 seconds; pipeline sessions run for hours. After 5 minutes, every hook invocation sees a stale cache, logs `cache_miss`, and approves unconditionally via `sys.exit(0)`. The root architectural weakness is that `open_kitchen`/`close_kitchen` act as a gate toggle (open/close) but not as a **service lifecycle boundary**. There is no concept of services that start with the kitchen and stop when it closes. The fix introduces a reusable service lifecycle pattern: any background service tied to the kitchen session is started in `_open_kitchen_handler` and cancelled in `_close_kitchen_handler`, with `ToolContext` holding the task handle. The quota refresh loop is the first instance of this pattern. A secondary structural gap is also closed: `cache_max_age` (the hook's expiry threshold) and the refresh interval previously had no enforced relationship. A new `cache_refresh_interval` config field is introduced with a structural contract `cache_refresh_interval < cache_max_age`, making it architecturally impossible for the loop to fall behind the TTL. This contract is enforced by a test that asserts it directly against `defaults.yaml`. ## Architecture Impact ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; %% TERMINALS %% SESSION_START([Kitchen Session Start]) SESSION_END([Kitchen Session End]) %% CACHE DATA NODE %% CACHE[("autoskillit_quota_cache.json ━━━━━━━━━━ fetched_at + five_hour.utilization")] subgraph KitchenOpen ["● open_kitchen — Kitchen Open Lifecycle"] direction TB GATE_ON["● gate.enable() ━━━━━━━━━━ Reveal gated MCP tools"] WRITE_CFG["_write_hook_config() ━━━━━━━━━━ Write threshold + cache_max_age to .autoskillit/temp/"] PRIME["_prime_quota_cache() ━━━━━━━━━━ check_and_sleep_if_needed() read-first → fetch if stale → write T=0"] START_TASK["● create_background_task( ★ _quota_refresh_loop(config) ) ━━━━━━━━━━ Stored as ctx.quota_refresh_task"] end subgraph RefreshLoop ["★ _quota_refresh_loop — Background Service (asyncio.Task)"] direction TB LOOP_SLEEP["asyncio.sleep( ★ cache_refresh_interval ) ━━━━━━━━━━ Structural contract: interval < cache_max_age"] REFRESH["★ _refresh_quota_cache(config) ━━━━━━━━━━ Always fetches unconditionally (no read-first optimization) _fetch_quota() → _write_cache()"] LOOP_ERR{"Exception in refresh?"} LOG_WARN["logger.warning( 'quota_refresh_loop_error' ) ━━━━━━━━━━ Loop continues"] end subgraph RunSkill ["● run_skill — MCP Tool Execution"] direction TB EXEC["executor.run(skill_command) ━━━━━━━━━━ Headless Claude session"] AUDIT{"success?"} AUDIT_OK["audit.record_success()"] AUDIT_FAIL["notify 'run_skill failed'"] POST_REFRESH["● background.submit( ★ _refresh_quota_cache(config) label='quota_post_run_refresh' ) ━━━━━━━━━━ Defense-in-depth: ensures cache fresh for next hook"] end subgraph HookProcess ["quota_check.py — PreToolUse Hook Subprocess"] direction TB READ_CACHE["_read_quota_cache( max_age=cache_max_age ) ━━━━━━━━━━ Reads cache file from disk"] FRESH{"cache age ≤ cache_max_age?"} THRESHOLD{"utilization ≥ threshold?"} APPROVE["sys.exit(0) ━━━━━━━━━━ → approve run_skill"] DENY["print deny JSON sys.exit(0) ━━━━━━━━━━ → block run_skill"] FAIL_OPEN["log cache_miss sys.exit(0) ━━━━━━━━━━ → fail open (approve)"] end subgraph KitchenClose ["● close_kitchen — Teardown"] direction TB CANCEL["● ctx.quota_refresh_task.cancel() ctx.quota_refresh_task = None ━━━━━━━━━━ Terminates _quota_refresh_loop via CancelledError"] GATE_OFF["gate.disable() ━━━━━━━━━━ Hide gated MCP tools"] DEL_CFG["hook_cfg_path.unlink() ━━━━━━━━━━ Remove hook config file"] end %% MAIN FLOW %% SESSION_START --> GATE_ON GATE_ON --> WRITE_CFG WRITE_CFG --> PRIME PRIME -->|"writes T=0"| CACHE PRIME --> START_TASK START_TASK -.->|"spawns background task"| LOOP_SLEEP %% BACKGROUND LOOP %% LOOP_SLEEP --> REFRESH REFRESH -->|"always writes fresh"| CACHE REFRESH --> LOOP_ERR LOOP_ERR -->|"no exception"| LOOP_SLEEP LOOP_ERR -->|"exception caught"| LOG_WARN LOG_WARN --> LOOP_SLEEP %% RUN_SKILL FLOW %% START_TASK --> EXEC EXEC --> AUDIT AUDIT -->|"yes"| AUDIT_OK AUDIT -->|"no"| AUDIT_FAIL AUDIT_OK --> POST_REFRESH AUDIT_FAIL --> POST_REFRESH POST_REFRESH -->|"fire-and-forget write"| CACHE POST_REFRESH --> EXEC %% HOOK FLOW %% CACHE -.->|"read by subprocess"| READ_CACHE READ_CACHE --> FRESH FRESH -->|"yes — fresh"| THRESHOLD FRESH -->|"no — stale"| FAIL_OPEN THRESHOLD -->|"below threshold"| APPROVE THRESHOLD -->|"at/above threshold"| DENY %% CLOSE FLOW %% EXEC -->|"session ends"| CANCEL CANCEL -.->|"CancelledError propagates"| LOOP_SLEEP CANCEL --> GATE_OFF GATE_OFF --> DEL_CFG DEL_CFG --> SESSION_END %% CLASS ASSIGNMENTS %% class SESSION_START,SESSION_END terminal; class CACHE stateNode; class GATE_ON,WRITE_CFG,PRIME,START_TASK handler; class LOOP_SLEEP,LOG_WARN phase; class REFRESH,POST_REFRESH newComponent; class EXEC,AUDIT_OK,AUDIT_FAIL handler; class CANCEL,GATE_OFF,DEL_CFG handler; class READ_CACHE integration; class FRESH,THRESHOLD,AUDIT,LOOP_ERR detector; class APPROVE,DENY,FAIL_OPEN output; ``` **Legend:** ★ New component · ● Modified component ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 58, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; subgraph ConfigContract ["● QuotaGuardConfig — INIT_ONLY Fields (Set Once, Never Mutated)"] direction LR CFG_MAX_AGE["cache_max_age ━━━━━━━━━━ 300 s (default) Hook TTL — max age before cache_miss"] CFG_INTERVAL["★ cache_refresh_interval ━━━━━━━━━━ 240 s (default) Loop sleep between proactive writes"] STRUCT_CONTRACT["★ STRUCTURAL CONTRACT ━━━━━━━━━━ cache_refresh_interval MUST be < cache_max_age (enforced by test)"] CFG_INTERVAL -->|"enforced constraint"| STRUCT_CONTRACT CFG_MAX_AGE -->|"enforced constraint"| STRUCT_CONTRACT end subgraph TaskLifecycle ["★ ToolContext.quota_refresh_task — Kitchen-Scoped MUTABLE Field"] direction LR TASK_NULL_START["State: None ━━━━━━━━━━ Before open_kitchen No task running"] TASK_RUNNING["★ State: asyncio.Task ━━━━━━━━━━ _quota_refresh_loop coroutine Background refresh active"] TASK_NULL_END["State: None ━━━━━━━━━━ After close_kitchen Task cancelled + cleared"] TASK_NULL_START -->|"● open_kitchen:\ncreate_background_task()"| TASK_RUNNING TASK_RUNNING -->|"● close_kitchen:\ntask.cancel() + task = None"| TASK_NULL_END end subgraph CacheStateMachine ["autoskillit_quota_cache.json — Cache File State Transitions"] direction TB CACHE_ABSENT["State: ABSENT ━━━━━━━━━━ No cache file exists → hook: cache_miss (fail-open)"] CACHE_FRESH["State: FRESH ━━━━━━━━━━ age ≤ cache_max_age → hook: enforce threshold"] CACHE_EXPIRING["State: EXPIRING (age approaches max_age) ━━━━━━━━━━ ★ Proactive refresh fires before expiry (interval < max_age)"] CACHE_EXPIRED["State: EXPIRED (age > cache_max_age) ━━━━━━━━━━ → hook: cache_miss (fail-open) Only possible if loop crashes"] CACHE_ABSENT -->|"_prime_quota_cache() at T=0"| CACHE_FRESH CACHE_FRESH -->|"age increases over time"| CACHE_EXPIRING CACHE_EXPIRING -->|"★ _quota_refresh_loop writes\nbefore expiry (interval < max_age)"| CACHE_FRESH CACHE_EXPIRING -->|"★ post-run background.submit()\nafter each run_skill"| CACHE_FRESH CACHE_FRESH -.->|"loop crash (telemetry visible)"| CACHE_EXPIRED CACHE_EXPIRED -->|"loop recovery or\nnext run_skill post-refresh"| CACHE_FRESH end subgraph ValidationGates ["Hook Validation Gates (quota_check.py — PreToolUse)"] direction TB GATE_AGE["● Gate 1: Age Check ━━━━━━━━━━ _read_cache(max_age=cache_max_age) age ≤ cache_max_age?"] GATE_THRESHOLD["Gate 2: Threshold Check ━━━━━━━━━━ utilization < threshold?"] GATE_PASS["→ APPROVE ━━━━━━━━━━ sys.exit(0)"] GATE_DENY["→ DENY ━━━━━━━━━━ block run_skill"] GATE_MISS["→ FAIL-OPEN (cache_miss) ━━━━━━━━━━ Preserved by design\ntelemetry visible via quota_events.jsonl"] GATE_AGE -->|"age ≤ max_age"| GATE_THRESHOLD GATE_AGE -->|"age > max_age (stale)"| GATE_MISS GATE_THRESHOLD -->|"below threshold"| GATE_PASS GATE_THRESHOLD -->|"≥ threshold"| GATE_DENY end %% Cross-subgraph connections %% STRUCT_CONTRACT -->|"interval guarantees\ncache never expires\nduring normal operation"| CACHE_EXPIRING TASK_RUNNING -->|"★ loop writes every\ncache_refresh_interval"| CACHE_FRESH CACHE_FRESH -->|"read by subprocess"| GATE_AGE %% CLASS ASSIGNMENTS %% class CFG_MAX_AGE,CFG_INTERVAL detector; class STRUCT_CONTRACT newComponent; class TASK_NULL_START,TASK_NULL_END phase; class TASK_RUNNING newComponent; class CACHE_ABSENT detector; class CACHE_FRESH output; class CACHE_EXPIRING gap; class CACHE_EXPIRED detector; class GATE_AGE,GATE_THRESHOLD stateNode; class GATE_PASS output; class GATE_DENY handler; class GATE_MISS cli; ``` **Legend:** ★ New component · ● Modified component ### Concurrency Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; CACHE_FILE[("autoskillit_quota_cache.json ━━━━━━━━━━ Shared via atomic_write Cross-process IPC boundary")] subgraph EventLoop ["MCP Server — asyncio Event Loop (single-threaded cooperative)"] direction TB subgraph KitchenLifecycle ["● Kitchen Lifecycle — Coroutines"] direction LR OPEN["● _open_kitchen_handler() ━━━━━━━━━━ Runs in event loop as coroutine"] CLOSE["● _close_kitchen_handler() ━━━━━━━━━━ Runs in event loop as coroutine"] end subgraph BackgroundTask ["★ Kitchen-Scoped Background Task (asyncio.Task, caller-owned)"] direction TB CREATE_TASK["★ create_background_task( _quota_refresh_loop(config) ) ━━━━━━━━━━ asyncio.create_task() — no supervision wrapper handle stored in ctx.quota_refresh_task"] LOOP_CORO["★ _quota_refresh_loop coroutine ━━━━━━━━━━ while True: await asyncio.sleep(cache_refresh_interval) await _refresh_quota_cache(config) CancelledError propagates from sleep → exits"] CANCEL["● close_kitchen: ctx.quota_refresh_task.cancel() ━━━━━━━━━━ Delivers CancelledError at\nnext await asyncio.sleep()\nctx.quota_refresh_task = None"] end subgraph SupervisedTasks ["● DefaultBackgroundSupervisor — Fire-and-Forget (supervised)"] direction TB SUBMIT["● background.submit( ★ _refresh_quota_cache(config), label='quota_post_run_refresh' ) ━━━━━━━━━━ Called after each run_skill completes\nReturns immediately (fire-and-forget)"] SUPERVISE["_supervise_task wrapper ━━━━━━━━━━ Catches Exception → logs + audits\nCancelledError → write_status('cancelled')"] end subgraph RunSkillHandler ["● run_skill MCP Tool Handler"] direction TB EXECUTOR["executor.run(skill_command) ━━━━━━━━━━ await headless session"] POST["● Post-completion hook: background.submit(_refresh_quota_cache) ━━━━━━━━━━ Fire-and-forget refresh triggered after EVERY run_skill"] end end subgraph HookSubprocess ["quota_check.py — OS Subprocess (separate process, stdlib-only)"] direction TB HOOK_READ["_read_quota_cache( max_age=cache_max_age ) ━━━━━━━━━━ Read-only access to cache file\nNo Python object sharing\nNo asyncio — blocking I/O"] HOOK_DECIDE{"cache fresh?"} HOOK_APPROVE["sys.exit(0) → approve"] HOOK_MISS["sys.exit(0) → fail-open"] end %% FLOW %% OPEN --> CREATE_TASK CREATE_TASK --> LOOP_CORO LOOP_CORO -->|"every cache_refresh_interval\nawait asyncio.sleep() yields to event loop"| LOOP_CORO LOOP_CORO -->|"★ unconditional fetch+write"| CACHE_FILE EXECUTOR --> POST POST --> SUBMIT SUBMIT --> SUPERVISE SUPERVISE -->|"★ _refresh_quota_cache writes"| CACHE_FILE CLOSE --> CANCEL CANCEL -.->|"CancelledError at await asyncio.sleep()"| LOOP_CORO CACHE_FILE -->|"cross-process read\natomic_write ensures no torn reads"| HOOK_READ HOOK_READ --> HOOK_DECIDE HOOK_DECIDE -->|"fresh"| HOOK_APPROVE HOOK_DECIDE -->|"stale"| HOOK_MISS %% CLASS ASSIGNMENTS %% class CACHE_FILE stateNode; class OPEN,CLOSE handler; class CREATE_TASK newComponent; class LOOP_CORO newComponent; class CANCEL handler; class SUBMIT,POST newComponent; class SUPERVISE phase; class EXECUTOR handler; class HOOK_READ integration; class HOOK_DECIDE detector; class HOOK_APPROVE output; class HOOK_MISS phase; ``` **Legend:** ★ New component · ● Modified component Closes #638 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/remediation-20260405-212016-751478/.autoskillit/temp/rectify/rectify_quota_guard_cache_refresh_2026-04-06_045000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | investigate | 19 | 9.8k | 265.8k | 42.1k | 1 | 6m 47s | | rectify | 21 | 29.3k | 494.6k | 55.9k | 1 | 10m 19s | | dry_walkthrough | 2.1k | 13.9k | 1.1M | 81.2k | 1 | 4m 5s | | implement | 81 | 29.5k | 5.6M | 89.3k | 1 | 10m 11s | | assess | 67 | 39.4k | 4.7M | 93.7k | 1 | 16m 8s | | open_pr | 3.0k | 31.2k | 2.2M | 101.1k | 1 | 9m 56s | | **Total** | 5.3k | 153.2k | 14.4M | 463.2k | | 57m 30s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…nv (#921) ## Summary Single-commit fix for a v0.8.31 regression where `run_cmd` stripped the entire parent process environment when `step_name` was set. The `_env` dict passed to `_run_subprocess` contained only `SCENARIO_STEP_NAME`, replacing all inherited environment variables (`HOME`, `PATH`, etc.). This broke `gh` CLI authentication and any tool depending on standard environment variables. The fix merges `SCENARIO_STEP_NAME` into `os.environ` instead of replacing it. Two test additions guard against regression: a unit test verifying `PATH`/`HOME` preservation in the child env, and an AST structural guard ensuring all `_run_subprocess(env=...)` calls in `server/` start from `os.environ`. Closes #915 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | investigate | 48 | 6.6k | 511.0k | 53.7k | 1 | 4m 2s | | rectify | 5.2k | 9.8k | 591.0k | 59.7k | 1 | 5m 14s | | dry_walkthrough | 398 | 10.9k | 1.3M | 70.5k | 1 | 4m 41s | | implement | 66 | 7.3k | 896.0k | 61.3k | 1 | 3m 31s | | prepare_pr | 25 | 3.7k | 166.6k | 19.8k | 1 | 1m 25s | | run_arch_lenses | 42 | 4.8k | 472.2k | 35.6k | 1 | 3m 38s | | compose_pr | 23 | 1.6k | 129.5k | 14.0k | 1 | 41s | | **Total** | 5.8k | 44.6k | 4.0M | 314.6k | | 23m 14s | Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…ypes (#922) ## Summary `build_interactive_cmd()` is a pure pass-through for environment variables — it forwards caller-supplied `env_extras` to `build_claude_env` without injecting any defaults. This is structurally asymmetric with `build_full_headless_cmd()`, which internally injects `MAX_MCP_OUTPUT_TOKENS=50000` and `AUTOSKILLIT_HEADLESS=1` before delegating. The asymmetry means interactive launch paths depend on caller discipline to inject required vars, and when `_launch_cook_session` (the `order` command's launch path) was never updated by PR #910, the gap went undetected. The fix adds a `_SESSION_BASELINE_ENV` frozen mapping in `commands.py` that `build_interactive_cmd` and `build_headless_resume_cmd` merge as defaults. Removes redundant caller-side injection from `_cook.py`. Adds structural guard tests covering all three session builders. Closes #916 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/remediation-20260414-084428-773530/.autoskillit/temp/rectify/rectify_max_mcp_output_tokens_interactive_gap_2026-04-14_090500.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | investigate | 2.6k | 8.3k | 628.8k | 48.1k | 1 | 4m 55s | | rectify | 4.1k | 26.1k | 1.8M | 161.5k | 3 | 12m 42s | | dry_walkthrough | 3.2k | 10.1k | 1.2M | 78.6k | 1 | 4m 39s | | implement | 77 | 10.5k | 1.6M | 54.7k | 1 | 4m 19s | | prepare_pr | 30 | 4.5k | 374.0k | 34.2k | 1 | 1m 34s | | run_arch_lenses | 49 | 8.2k | 538.5k | 47.7k | 1 | 2m 52s | | compose_pr | 23 | 1.5k | 128.6k | 15.6k | 1 | 39s | | **Total** | 10.1k | 69.2k | 6.3M | 440.6k | | 31m 42s | Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

## Summary Every `autoskillit order` session fails its first `open_kitchen` call due to two independent architectural weaknesses: (1) FastMCP emits `outputSchema` and `annotations` fields in `tools/list` responses that trigger Claude Code bug #25081, silently dropping ALL tools — even after the MCP handshake completes; (2) the cook session auto-submits an initial message as a positional CLI arg, racing the LLM's first tool call against MCP tool discovery. The fix introduces a centralized wire-format sanitization middleware in the FastMCP pipeline, a wire-format compliance test that prevents regression from dependency upgrades, and prompt restructuring to eliminate the unconditional "MUST be first" tool-call directive. Closes #913 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/remediation-20260414-080347-642271/.autoskillit/temp/rectify/rectify_mcp_init_race_wire_format_2026-04-14_153000.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

… Timeout (#918) ## Summary `DefaultMergeQueueWatcher.wait()` used a single exception class (`ClassifierInconclusive`) as a control-flow signal for two semantically incompatible situations: (1) CI checks are still running — an *expected transient* state that should poll until the outer `timeout_seconds` deadline — and (2) no positive classification signal matched at all — a *genuinely ambiguous* state that warrants a bounded retry ceiling before giving up. Both raised the same exception and incremented the same `inconclusive_count` counter, so CI that took longer than `max_inconclusive_retries × poll_interval` (default: 75 seconds) received a `timeout` result before the outer deadline was reached. The fix is a **type-level distinction**: `ClassifierInconclusive` is split into `CIStillRunning` (transient — no budget consumed) and `NoPositiveSignal` (ambiguous — bounded budget), caught separately in `wait()`. Additionally, `inconclusive_count` resets on queue re-entry so budget from a prior exit episode does not bleed into subsequent ones. Closes #911 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/remediation-20260414-074614-472907/.autoskillit/temp/rectify/rectify_merge_queue_inconclusive_budget_2026-04-14_083000_part_a.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | investigate | 120 | 8.4k | 389.7k | 40.7k | 1 | 5m 44s | | rectify | 5.1k | 24.2k | 1.0M | 70.4k | 1 | 9m 13s | | review | 98 | 8.2k | 346.5k | 39.7k | 1 | 2m 38s | | dry_walkthrough | 308 | 28.1k | 1.7M | 105.3k | 2 | 8m 31s | | implement | 492 | 21.1k | 2.6M | 94.0k | 2 | 6m 37s | | assess | 226 | 8.5k | 1.2M | 42.8k | 1 | 10m 40s | | prepare_pr | 52 | 5.6k | 145.0k | 25.7k | 1 | 1m 23s | | run_arch_lenses | 261 | 24.6k | 1.1M | 163.1k | 3 | 7m 6s | | compose_pr | 51 | 3.4k | 134.0k | 16.2k | 1 | 1m 5s | | **Total** | 6.7k | 132.1k | 8.7M | 598.0k | | 53m 0s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

## Summary Adds a new `disable_quota_guard` MCP tool that allows order sessions to bypass quota guard enforcement when needed. The quota guard hook (`quota_guard.py`) is updated to respect the new disabled state, and `_hook_settings.py` is extended to expose the disable flag. Kitchen tooling, CLI cook path, and type constants are updated to register and expose the new tool. Closes #919 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | review_approach | 158 | 12.4k | 1.1M | 98.1k | 1 | 4m 53s | | make_plan | 195 | 14.2k | 1.4M | 83.3k | 1 | 4m 22s | | dry_walkthrough | 2.3k | 13.0k | 935.8k | 77.9k | 1 | 3m 52s | | implement | 570 | 25.7k | 4.6M | 81.3k | 1 | 8m 37s | | resolve_failures | 226 | 10.6k | 1.4M | 52.7k | 1 | 11m 17s | | prepare_pr | 101 | 5.3k | 174.6k | 28.2k | 1 | 1m 27s | | run_arch_lenses | 157 | 13.7k | 597.9k | 43.2k | 1 | 9m 22s | | compose_pr | 67 | 1.8k | 178.2k | 14.4k | 1 | 43s | | **Total** | 3.8k | 96.7k | 10.3M | 479.1k | | 44m 35s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

## Summary Four discrete bugs share a single architectural root: there is no canonical owner for `installed_plugins.json`. Every caller that touches the file re-implements its own structural interpretation of the JSON format, and two of those implementations got the nesting wrong (`{"plugins": {...}}` vs flat `{}`). The test fixtures matched the wrong implementations, so the regression passed CI. The architectural fix is a **typed repository class** — `InstalledPluginsFile` — that becomes the single authorized interface for all reads and writes of `installed_plugins.json`. Once all call sites are routed through this class, the correct nesting is in exactly one place. A wrong structure cannot be silently introduced by a new call site: the class's API enforces the contract. Two companion fixes complete the immunity: - `install()` must return a typed value distinguishing "complete" from "deferred (CLAUDECODE)" so callers cannot proceed with misleading success output. - `find_broken_hook_scripts` must apply the same `_is_own_hook` ownership filter that `_extract_script_basenames` already uses, eliminating false positives on user-defined inline hooks. Part B (separate task) will cover `ensure_project_temp` call site completeness across CLI entry points. ## Architecture Impact ### Repository/Data Access Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 65, 'curve': 'basis'}}}%% flowchart LR %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; subgraph Callers ["CALLERS"] direction TB APP["● app.py ━━━━━━━━━━ CLI entry: install, init, doctor, upgrade"] MARKET["● _marketplace.py ━━━━━━━━━━ install() / upgrade() _clear_plugin_cache()"] DOCTOR["● _doctor.py ━━━━━━━━━━ _check_installed_plugins_entry() _check_hook_registry_drift() _check_hook_health()"] end subgraph PluginRepo ["★ PLUGIN REGISTRY (new)"] direction TB IPF["★ InstalledPluginsFile ━━━━━━━━━━ get_plugins() → read contains(ref) → read remove(ref) → read+write cli/_installed_plugins.py"] end subgraph HookRepo ["● HOOK REGISTRY (modified)"] direction TB HREG["● HOOK_REGISTRY ━━━━━━━━━━ List[HookDef] — canonical source of truth hook_registry.py"] GEN["● generate_hooks_json() ━━━━━━━━━━ HookDef → JSON structure hook_registry.py"] DRIFT["● _count_hook_registry_drift() ━━━━━━━━━━ canonical vs deployed hook_registry.py"] FIND["find_broken_hook_scripts() ━━━━━━━━━━ deployed scripts existence check"] EXTRACT["_extract_script_basenames() ━━━━━━━━━━ settings.json hooks → basename set"] end subgraph IOLayer ["I/O PRIMITIVE"] AW["atomic_write() ━━━━━━━━━━ fsync + os.replace core/io.py"] end subgraph Storage ["STORAGE"] direction TB INSTJSON[("installed_plugins.json ━━━━━━━━━━ ~/.claude/plugins/ installed_plugins.json")] HOOKSJSON[("hooks.json ━━━━━━━━━━ hooks/hooks.json plugin manifest")] SETTINGS[("settings.json ━━━━━━━━━━ ~/.claude/settings.json per-scope hook registration")] CLAUDEJSON[("~/.claude.json ━━━━━━━━━━ Legacy mcpServers config")] CACHE[("Plugin Cache Dir ━━━━━━━━━━ ~/.claude/plugins/ cache/autoskillit-local/")] end %% Caller → Repository relationships %% APP -->|"invokes"| MARKET APP -->|"invokes"| DOCTOR MARKET -->|"reads/writes"| IPF MARKET -->|"reads"| HREG MARKET -->|"writes hooks.json"| AW DOCTOR -->|"reads"| IPF DOCTOR -->|"reads"| DRIFT DOCTOR -->|"reads"| FIND DOCTOR -->|"reads"| SETTINGS %% Hook Registry internal %% HREG -->|"drives"| GEN HREG -->|"drives"| DRIFT GEN -->|"writes via"| AW DRIFT -->|"reads"| EXTRACT EXTRACT -->|"parses"| SETTINGS FIND -->|"parses"| SETTINGS %% Repository → Storage %% IPF -->|"reads"| INSTJSON IPF -->|"writes via atomic_write"| AW AW -->|"writes"| INSTJSON AW -->|"writes"| HOOKSJSON %% Cache eviction (marketplace) %% MARKET -->|"deletes"| CACHE %% Doctor reads legacy config %% DOCTOR -->|"reads"| CLAUDEJSON %% CLASS ASSIGNMENTS %% class APP,MARKET cli; class DOCTOR detector; class IPF newComponent; class HREG,GEN stateNode; class DRIFT,FIND,EXTRACT phase; class AW handler; class INSTJSON,HOOKSJSON,SETTINGS,CLAUDEJSON,CACHE integration; ``` ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 42, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% TERMINALS %% INSTALL_START([autoskillit install]) INSTALL_DEFERRED([DEFERRED ━━━━━━━━━━ user must run manually]) INSTALL_COMPLETE([INSTALL COMPLETE]) DOCTOR_START([autoskillit doctor]) DOCTOR_END([DOCTOR COMPLETE]) %% ───────────────────────────────────────────── %% INSTALL FLOW %% ───────────────────────────────────────────── subgraph InstallPreflight ["Install — Preflight"] direction TB ValidateScope{"● scope valid? ━━━━━━━━━━ user / project / local"} EnsureMarketplace["_ensure_marketplace() ━━━━━━━━━━ mkdir ~/.autoskillit/marketplace write marketplace.json symlink → pkg_root()"] GuardCLAUDECODE{"CLAUDECODE env set? ━━━━━━━━━━ running inside session?"} GuardClaude{"claude on PATH? ━━━━━━━━━━ shutil.which('claude')"} EnsureWorkspace["_ensure_workspace_ready() ━━━━━━━━━━ ensure_project_temp() upgrade() if scripts/ present"] end subgraph CacheClear ["Install — Cache Clearing (●)"] direction TB RmCacheDir["shutil.rmtree(cache_dir) ━━━━━━━━━━ ~/.claude/plugins/cache/ autoskillit-local/autoskillit"] InstalledPluginsRemove["★ InstalledPluginsFile.remove() ━━━━━━━━━━ reads installed_plugins.json deletes 'autoskillit@autoskillit-local' atomic_write() back"] HooksJsonRegen["● generate_hooks_json() ━━━━━━━━━━ iterates HOOK_REGISTRY writes hooks/hooks.json embeds HOOK_REGISTRY_HASH"] end subgraph PluginRegistration ["Install — Plugin Registration"] direction TB MarketplaceAdd["claude plugin marketplace add ━━━━━━━━━━ subprocess, timeout=30s"] PluginInstall["claude plugin install ━━━━━━━━━━ plugin_ref @ scope"] EvictDirect["evict_direct_mcp_entry() ━━━━━━━━━━ remove stale direct entry from ~/.claude.json"] end subgraph HookSync ["Install — Hook Sync (●)"] direction TB SweepAllScopes["sweep_all_scopes_for_orphans() ━━━━━━━━━━ iterate user + project + local remove orphaned hook entries"] SyncHooks["sync_hooks_to_settings() ━━━━━━━━━━ write canonical hooks to target scope settings.json"] end %% ───────────────────────────────────────────── %% SHARED STATE %% ───────────────────────────────────────────── subgraph SharedState ["★ InstalledPluginsFile (_installed_plugins.py)"] direction LR IPF_Read["_read() ━━━━━━━━━━ json.loads(path) → {} on missing/error"] IPF_Contains["contains(plugin_ref) ━━━━━━━━━━ plugin_ref in get_plugins()"] IPF_Remove["remove(plugin_ref) ━━━━━━━━━━ del plugins[ref] atomic_write() back"] InstalledPluginsJSON[("installed_plugins.json ━━━━━━━━━━ ~/.claude/plugins/ {version,plugins:{}}")] end %% ───────────────────────────────────────────── %% DOCTOR FLOW %% ───────────────────────────────────────────── subgraph DoctorMCP ["Doctor — MCP Checks"] direction TB ChkStaleMCP["check: stale_mcp_servers ━━━━━━━━━━ dead binary paths in ~/.claude.json"] ChkMCPReg["check: mcp_server_registered ━━━━━━━━━━ mcpServers entry OR claude plugin list"] ChkDualReg["● check: dual_mcp_registration ━━━━━━━━━━ direct entry AND plugin both present? → WARNING: split-brain"] ChkPluginCache["check: plugin_cache_exists ━━━━━━━━━━ ~/.claude/plugins/cache/... dir?"] ChkInstalledPlugins["● check: installed_plugins_entry ━━━━━━━━━━ InstalledPluginsFile.contains() 'autoskillit@autoskillit-local'?"] end subgraph DoctorHooks ["Doctor — Hook Checks (●)"] direction TB ChkHookHealth["check: hook_health ━━━━━━━━━━ find_broken_hook_scripts() script files exist on disk?"] ChkHookReg["check: hook_registration ━━━━━━━━━━ canonical_script_basenames() all scripts in settings.json?"] ChkHookDrift["● check: hook_registry_drift ━━━━━━━━━━ _count_hook_registry_drift() orphaned | missing hooks iterates all scopes"] end subgraph DoctorHealth ["Doctor — Health Checks"] direction TB ChkVersion["check: version_consistency ━━━━━━━━━━ pkg version == plugin.json version?"] ChkConfig["check: project_config ━━━━━━━━━━ .autoskillit/config.yaml exists?"] ChkGitignore["check: gitignore_completeness ━━━━━━━━━━ .autoskillit/ entries covered?"] ChkQuota["check: quota_cache_schema ━━━━━━━━━━ schema_version drift?"] ChkInstallClass["check: install_classification ━━━━━━━━━━ detect_install() → InstallType"] ChkSourceDrift["check: source_version_drift ━━━━━━━━━━ commit SHA vs reference SHA (network + disk-cache TTL)"] ChkEditableInst["check: editable_install_source_exists ━━━━━━━━━━ direct_url.json → source path exists?"] ChkDismissal["check: update_dismissal_state ━━━━━━━━━━ dismissed_at + window expiry"] end %% ───────────────────────────────────────────── %% INSTALL EDGES %% ───────────────────────────────────────────── INSTALL_START --> ValidateScope ValidateScope -->|"invalid"| INSTALL_COMPLETE ValidateScope -->|"valid"| EnsureMarketplace EnsureMarketplace --> GuardCLAUDECODE GuardCLAUDECODE -->|"yes — inside session"| INSTALL_DEFERRED GuardCLAUDECODE -->|"no"| GuardClaude GuardClaude -->|"missing"| INSTALL_COMPLETE GuardClaude -->|"found"| EnsureWorkspace EnsureWorkspace --> RmCacheDir RmCacheDir --> InstalledPluginsRemove InstalledPluginsRemove -->|"reads/writes"| InstalledPluginsJSON InstalledPluginsRemove --> HooksJsonRegen HooksJsonRegen --> MarketplaceAdd MarketplaceAdd -->|"returncode != 0"| INSTALL_COMPLETE MarketplaceAdd -->|"ok"| PluginInstall PluginInstall -->|"returncode != 0"| INSTALL_COMPLETE PluginInstall -->|"ok"| EvictDirect EvictDirect --> SweepAllScopes SweepAllScopes --> SyncHooks SyncHooks --> INSTALL_COMPLETE %% ───────────────────────────────────────────── %% DOCTOR EDGES %% ───────────────────────────────────────────── DOCTOR_START --> ChkStaleMCP ChkStaleMCP --> ChkMCPReg ChkMCPReg --> ChkDualReg ChkDualReg --> ChkPluginCache ChkPluginCache --> ChkInstalledPlugins ChkInstalledPlugins -->|"reads"| IPF_Contains IPF_Contains -->|"reads"| IPF_Read IPF_Read -->|"reads"| InstalledPluginsJSON ChkInstalledPlugins --> ChkHookHealth ChkHookHealth --> ChkHookReg ChkHookReg --> ChkHookDrift ChkHookDrift --> ChkVersion ChkVersion --> ChkConfig ChkConfig --> ChkGitignore ChkGitignore --> ChkQuota ChkQuota --> ChkInstallClass ChkInstallClass --> ChkSourceDrift ChkSourceDrift --> ChkEditableInst ChkEditableInst --> ChkDismissal ChkDismissal --> DOCTOR_END %% ───────────────────────────────────────────── %% SHARED STATE INTERNAL EDGES %% ───────────────────────────────────────────── IPF_Remove -->|"atomic_write()"| InstalledPluginsJSON %% ───────────────────────────────────────────── %% CLASS ASSIGNMENTS %% ───────────────────────────────────────────── class INSTALL_START,INSTALL_DEFERRED,INSTALL_COMPLETE,DOCTOR_START,DOCTOR_END terminal; class ValidateScope,GuardCLAUDECODE,GuardClaude stateNode; class EnsureMarketplace,EnsureWorkspace,HooksJsonRegen,MarketplaceAdd,PluginInstall,EvictDirect handler; class RmCacheDir,SweepAllScopes,SyncHooks phase; class InstalledPluginsRemove,IPF_Read,IPF_Contains,IPF_Remove newComponent; class InstalledPluginsJSON output; class ChkStaleMCP,ChkMCPReg,ChkDualReg,ChkPluginCache,ChkInstalledPlugins detector; class ChkHookHealth,ChkHookReg,ChkHookDrift detector; class ChkVersion,ChkConfig,ChkGitignore,ChkQuota,ChkInstallClass,ChkSourceDrift,ChkEditableInst,ChkDismissal phase; ``` Closes #912 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/remediation-20260414-075239-201688/.autoskillit/temp/rectify/rectify_doctor-install-disconnect_2026-04-14_120000_part_a.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | review_approach | 158 | 12.4k | 1.1M | 98.1k | 1 | 4m 53s | | make_plan | 195 | 14.2k | 1.4M | 83.3k | 1 | 4m 22s | | dry_walkthrough | 2.3k | 13.0k | 935.8k | 77.9k | 1 | 3m 52s | | implement | 570 | 25.7k | 4.6M | 81.3k | 1 | 8m 37s | | resolve_failures | 226 | 10.6k | 1.4M | 52.7k | 1 | 11m 17s | | prepare_pr | 101 | 5.3k | 174.6k | 28.2k | 1 | 1m 27s | | run_arch_lenses | 157 | 13.7k | 597.9k | 43.2k | 1 | 9m 22s | | compose_pr | 67 | 1.8k | 178.2k | 14.4k | 1 | 43s | | review_pr | 94 | 21.1k | 374.8k | 48.0k | 1 | 6m 6s | | **Total** | 3.9k | 117.8k | 10.7M | 527.1k | | 50m 42s | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Keep integration version 0.8.38 (ahead of main's 0.7.0) across pyproject.toml, plugin.json, and uv.lock. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace direct dict access with entry.get("dismissed_at") and explicit None check so that a missing key returns False immediately instead of falling through to the broad except Exception handler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The test_check tool handler returns {passed, stdout, stderr} but the pretty-output formatter was reading the legacy "output" key, silently receiving empty string. Update formatter to read stdout/stderr matching the _fmt_run_cmd pattern, and update test payloads accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Users with the old quota_guard.threshold key in config.yaml get a ConfigSchemaError at startup with a difflib hint, but no migration note existed to guide the transition to the dual-window thresholds (short_window_threshold / long_window_threshold). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

evict_direct_mcp_entry was called unconditionally in main(), including on the 'serve' path (MCP server startup). This caused needless reads of ~/.claude.json on every MCP server spawn. Move it inside the existing non-serve guard so it only runs on interactive CLI commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

_count_hook_registry_drift and _check_hook_health were imported from _doctor (which itself imports from hook_registry) and re-exported in __all__ despite no consumers using the autoskillit.cli path. Remove the unnecessary re-export chain and the private names from __all__. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- _lifespan.py: init bg_tasks=[] before try to prevent NameError in finally if imports fail - tools_ci.py: record step timing before early return when ci_watcher or merge_queue_watcher is None - merge_queue.py: remove dead self._max_inconclusive_retries storage and fix _text_has_push_trigger docstring precision claim - remote_resolver.py: await cancelled io_task after proc.kill() to collect CancelledError - rules_inputs.py: rename rule to hyphenated convention (research-output-mode-enum) and use word-boundary-aware regex for recommended input check - _fmt_primitives.py: remove dead _read_hook_config() function - settings.py: extract _EXIT_GRACE_BUFFER_MS constant with comment explaining unit conversion and safety margin - clone.py: save loop result to avoid redundant _probe_single_remote subprocess call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace 9 inline if/continue blocks in _detect_dead_outputs with a consolidated _OBSERVABILITY_CAPTURES frozenset and a single _is_observability_capture() guard function. The exemption policy is now inspectable in one place and adding new entries is a one-line table addition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- kitchen_state.py: add UnicodeDecodeError to sweep_stale_markers exception tuple to prevent corrupted binary files from aborting sweep - _factory.py: defer _is_plugin_installed import to break server→cli module-level L3 sibling dependency - clone_guard.py: skip git clean when git reset fails to preserve diagnostic evidence - experiment_type_registry.py: add isinstance checks for dict-typed YAML fields before dict() coercion - commands.py: document overlap between _HEADLESS_EXCLUSIVE_VARS and IDE_ENV_DENYLIST - _hook_settings.py + _fmt_primitives.py: add cross-reference comments linking parallel hook config path definitions - headless.py: remove redundant comment on _PATH_CAPTURE regex Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Gate rollback: disable gate on open_kitchen setup failure - Wire compat: strip tool.title alongside output_schema/annotations - Factory: use keyword arg for DefaultMergeQueueWatcher (consistency) - tools_ci: standardize exc_info=True logging style - process.py: make timeout_scope None guard consistent with unconditional access - recording.py: cleanup TemporaryDirectory on scenario parse failure - readiness.py: delegate to kitchen_state.get_state_dir() (remove duplication) - io.py: remove dead _ROOT_GITIGNORE_ENTRIES constant - contracts.py: promote _RESULT_CAPTURE_RE/_INPUT_REF_RE to public names - _analysis.py: fix _SIMPLE_WHEN_RE to require paired quotes - rules_merge.py: BFS multi-hop predecessor search for commit_guard - rules_graph.py: reachability check for cycle exit targets that loop back - _update_checks.py: fix stale docstring (network=False → network=True) - _doctor.py: skip plugin cache check on editable dev installs - open_kitchen_guard.py: surface marker write failure in hook output - gate.py: add KillReason.NOT_APPLICABLE for gate/headless errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bug fixes: - anomaly_detection: fire ZOMBIE_PERSISTENT exactly once (== 3, not >= 3) - tools_issue_lifecycle: add missing 'error' key in remove_label failure - tools_git: return error when all branch name suffixes 2..100 exhausted - rules_inputs: use 'ingredient:name' for step_name to avoid misleading output - clone_registry: add AttributeError to except for non-dict JSON guard - recording: fix resource leak when make_scenario_player() raises Slop/docs: - _lifespan: replace verbatim docstring with meaningful description - _fmt_status: trim duplicate 3-line routing docstrings to one-liners - test_server_tool_registration: fix stale "40 tools" count (now 44) Tests: - Remove duplicate test_no_anomalies_for_normal_session_still_holds - Remove redundant atexit assertion from test_recording_runner_recorder_is_public Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- cli/_update_checks: _api_sha now tries refs/tags for tag revisions - config/settings: annotate _EXIT_GRACE_BUFFER_MS as ClassVar[int] - execution/_process_monitor: cache psutil.Process objects across calls so cpu_percent(interval=0) returns meaningful deltas - hooks/_hook_settings: add ENV_DISABLED env-var override for disabled - workspace/clone_registry: wrap open+flock in try/except in __enter__ to prevent fd leak if flock() raises - recipe/_analysis: extract_blocks accepts precomputed predecessors map to avoid duplicate computation; add warning logs for fallback entry/exit selection - recipe/rules_fixing: use deque.popleft() instead of list.pop(0) - recipe/rules_reachability: use ctx.predecessors in _ancestors(); _find_capture_producers returns all producers - recipe/rules_contracts: log warning on unreadable SKILL.md - server/tools_kitchen: add gate.disable() on start_quota_refresh failure for consistency - server/_factory: make recording ImportError degrade gracefully like replay path - server/_wire_compat: use model_copy() instead of in-place mutation to avoid modifying shared FastMCP tool registry objects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Update JSON write site allowlist line numbers for clone_registry and tools_kitchen after code changes shifted lines - Wire compat middleware tests: use model_copy mock returns instead of in-place mutation expectations - Process monitor tests: account for two-call priming pattern with cached psutil.Process objects; clear module cache between tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs)

Trecek and others added 30 commits April 5, 2026 03:54

chore: bump version to 0.7.12

869287e

chore: bump version to 0.7.13

5b995f8

chore: bump version to 0.7.14

3f9ed12

chore: bump version to 0.7.15

f9234cd

chore: bump version to 0.7.16

d9ffa9c

chore: bump version to 0.7.17

ea3eb7f

chore: bump version to 0.7.18

681fab9

chore: bump version to 0.7.19

2a6d889

chore: bump version to 0.7.20

864cb6b

fix: lower quota guard threshold from 90% to 85%

56362f8

Increase sensitivity to catch quota exhaustion earlier, giving more buffer before hard API limits are hit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: bump version to 0.7.21

56b79b3

chore: bump version to 0.7.22

b62950b

github-actions Bot and others added 28 commits April 14, 2026 17:15

chore: bump version to 0.8.32

a5d6de1

chore: bump version to 0.8.33

26ae0db

chore: bump version to 0.8.34

c9b5ec4

chore: bump version to 0.8.35

a0305a1

chore: bump version to 0.8.36

1b820d1

chore: bump version to 0.8.37

40cb2aa

chore: bump version to 0.8.38

f919360

Merge main into integration — resolve version conflicts

7dde8bb

Keep integration version 0.8.38 (ahead of main's 0.7.0) across pyproject.toml, plugin.json, and uv.lock. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #925 from TalonT-Org/integration

bb81d16

Promote integration to main (198 PRs, 182 issues, 102 fixes, 107 features, 1 infra, 7 tests, 4 docs)

chore: bump version to 0.9.0

66976a3

This was referenced Apr 15, 2026

Gate api-simulator dev dependency to Linux only #954

Open

Dismiss false-positive CodeQL alerts blocking PR #945 #955

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test#945

test#945
Trecek wants to merge 355 commits intostablefrom
main

Trecek commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Trecek commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant