Conversation
Add structured CheckpointContext with failureKind and touchSet violation metadata to replace error-string parsing. Add completed->failed and failed->running to VALID_JOB_TRANSITIONS to support touchSet violation detection and agent relaunch flows. Closes #61
Record failureKind and metadata (touchSetViolations, touchSetPatterns) when setting on_error checkpoints. This replaces error-string parsing with structured data for accept/relaunch/retry decision-making. Update touchSet violation notification to present all three options: accept, relaunch (agent fixes), or retry (user fixes, re-validate).
New method relaunches a failed job in its existing worktree with a correction prompt containing the original task, touchSet violations, and allowed patterns. Reuses branch/worktree, kills stale tmux session, creates fresh tmux + job entry, and lets the agent fix violations before touchSet re-validates on completion.
…prove Three distinct actions for touchSet violations: - Accept (no params): moves checkpoint job to ready_to_merge - Relaunch (relaunch param): spawns agent in existing worktree with correction prompt to fix violations - Retry (retry param): re-validates touchSet before proceeding, fixing the bug where retry silently skipped validation Update test assertions to match new output format.
…prove Add TouchSet Enforcement section explaining the three violation options. Update mc_plan_approve reference with retry and relaunch parameters. Expand touchSet field description in JobSpec table. Add touchSet validation step to Merge Train flow. Add FAQ entry for touchSet violations.
… paths The accept and retry paths in mc_plan_approve called updatePlanJob() to set the job to ready_to_merge, then immediately called savePlan() with the stale in-memory plan object — overwriting the update and leaving the job stuck in failed state. Fix by updating the in-memory plan.jobs directly before savePlan(), removing the now-unnecessary updatePlanJob import. Also adds checkpointContext assertions to the orchestrator touchSet test.
Adds Phase 5G with 47 tests covering touchSet violation detection, accept path, relaunch path, retry with re-validation, mutual exclusion, and non-touchset relaunch rejection. Updates job states reference, state transition maps, coverage matrix, results tracking (276 → 323 tests), quick smoke test, and key risks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a plan job completes but modifies files outside its
touchSet, users now have three clear options instead of a single ambiguous "retry" that silently bypassed validation.Closes #61
Changes
CheckpointContexttype withfailureKindand violation metadata replaces error-string parsing for decision-makingcompleted → failedandfailed → runningadded toVALID_JOB_TRANSITIONSto support the detection and relaunch flowsmc_plan_approve(checkpoint: "on_error")without retry/relaunch accepts touchSet violations and moves the job toready_to_mergemc_plan_approve(checkpoint: "on_error", relaunch: "jobName")spawns a new agent in the existing worktree with a correction prompt containing the original task, violation list, and allowed patternsmc_plan_approve(checkpoint: "on_error", retry: "jobName")now re-validates touchSet before proceeding, fixing the bug where retry silently skipped validationon_errorcheckpoints (touchset, merge conflict, test failure, job failure) now store structured contextFiles modified
src/lib/plan-types.tsCheckpointContext,FailureKindtypes; state machine transitionssrc/lib/schemas.tsCheckpointContextonPlanSpecsrc/lib/orchestrator.tsrelaunchJobForCorrectionmethod; checkpoint context on allsetCheckpointcalls; updated notificationssrc/tools/plan-approve.tstests/tools/plan-approve.test.tsTesting
bun run buildpassesbun testpasses (600/600)Notes
relaunchJobForCorrectionmethod reuses the existing worktree and branch. It kills any stale tmux session, writes a correction prompt, and creates a fresh tmux session + job entry. The agent sees the full git history and can selectively revert violating files.ready_to_mergewhich bypassed validation entirely (since validation only runs oncompletedjobs in the reconciler).retryandrelaunchare mutually exclusive.retry= "I fixed it manually, re-check."relaunch= "spawn an agent to fix it."