Skip to content

afx workspace recover: revives builders that were previously cleaned up via afx cleanup #915

@amrmelsayed

Description

@amrmelsayed

What's broken

afx workspace recover resurrects builders that were previously cleaned up via afx cleanup -p <id>. The cleanup left the worktree + branch + porch state on disk (the documented default behavior — preserves user scratch), and workspace recover then treats that preserved state as "a builder whose shellper died and should be revived."

The two operations have contradictory views of the same on-disk state:

  • afx cleanup writes worktree-preserved files as a deliberate post-cleanup snapshot ("to remove: git worktree remove ...").
  • afx workspace recover sees the same files as evidence of a crashed builder and re-spawns the Tower terminal.

Net effect: the user ran workspace recover to revive a different set of crashed builders, and a previously-cleaned-up project got swept in as collateral.

Concrete incident (cluesmith/shannon, 2026-05-27)

  • 2026-05-25 — PIR builder for shannon issue #1778 ("infra(native): wire pnpm -F native test into Turbo + CI") investigated and determined no PR was needed (CI already runs native tests via the existing apps/* sweep). Builder closed the GitHub issue without a PR and emitted the "ready for cleanup" notification.
  • Architect ran afx cleanup -p 1778. Output included the standard "Worktree preserved at: /Users/.../.builders/pir-1778" + "Branch preserved: builder/pir-1778".
  • 2026-05-27 — user ran afx workspace recover to revive a different set of builders that had been killed by a Tower restart.
  • Side effect: builder-pir-1778 re-appeared in afx status with the worktree at its preserved post-cleanup commit. Porch phase: implement. The issue is still CLOSED on GitHub.

The builder then began making forward progress on the reopened porch state, producing commits that contradict the closed-issue disposition (21ed8bd75 adds a turbo.json change, e049c6bd9 reverts a ci.yml change).

Expected behavior

workspace recover should NOT revive a builder that was intentionally cleaned up. Specific signals it could check:

  1. A cleanup marker file dropped by afx cleanup (e.g., .agent-farm/cleaned-up or a status.yaml field like cleaned_up_at: <ts>). This is the most direct fix — give cleanup and recover a shared piece of state that lets one see the other's intent.
  2. GitHub issue state. If the linked issue is CLOSED (and there's no open PR referencing it), skip recovery. Per-project knowable from existing porch state.
  3. Tower's own record of how the terminal exited. If cleanup killed the terminal cleanly (vs. a crash / SIGKILL from a reboot), recover should treat that as intentional.

(1) is the cleanest because it works without depending on GitHub state OR Tower's process bookkeeping.

Workaround

For now, after every workspace recover run, the user has to manually inspect afx status for builders that shouldn't be there and re-cleanup them. Each false-positive revival costs ~5 minutes of investigation + cleanup, plus the risk of accidentally landing the wrong work.

Severity

Medium. Doesn't lose user data, but breaks the assumption that afx cleanup is a terminal operation — recover silently undoes it. Surface area grows the more builders accumulate over time (every preserved worktree is a future false-positive candidate).

Related

  • afx workspace recover --max-age defaults to 7 days — the 2026-05-25 → 2026-05-27 gap was within that window. Raising the default wouldn't help because the same race exists at any age threshold.
  • Sibling design point: afx cleanup already prints "To remove: git worktree remove ..." / "To delete: git branch -d ..." as a reminder of follow-up steps. If we expect the user to run those manually, recover should at least filter out worktrees whose branches were deleted — but the branch typically isn't deleted either, so this isn't a robust signal.

Acceptance

  • afx cleanup -p <id> writes a cleanup marker (file or status.yaml field)
  • afx workspace recover reads the marker and skips marked projects
  • workspace recover --include-stale or a new flag can override the skip if the user really wants to revive a cleaned-up project
  • Test: cleanup → recover round-trip leaves Tower state unchanged

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions