Skip to content

fix(integrity): roll back legacy packages on failed reinstall (InstallStash)#137

Merged
Sunrisepeak merged 4 commits into
mainfrom
fix/install-integrity-rollback
Jun 21, 2026
Merged

fix(integrity): roll back legacy packages on failed reinstall (InstallStash)#137
Sunrisepeak merged 4 commits into
mainfrom
fix/install-integrity-rollback

Conversation

@Sunrisepeak

Copy link
Copy Markdown
Member

Problem (问题2: mcpp run → llvm clang++ exit 127)

Switching to the llvm toolchain produced:

'... clang++ --version' exited with status 127
# real stderr:
clang++: error while loading shared libraries: libz.so.1: cannot open shared object file

clang is fine — its libz.so.1 comes (via RUNPATH) from xim-x-zlib, whose
version directory had been deleted and never restored. Root cause: a
content-complete legacy package (installed before the .mcpp_ok marker
system, so unmarked) is deleted outright by clean_incomplete_install() on the
resolve path before a reinstall. When that reinstall then fails (xlings
extracting to $XLINGS_HOME instead of the sandbox, network error, …), the
working package is gone for good — and mcpp toolchain install llvm doesn't
help because it only reinstalls llvm itself, not the orphaned zlib sibling.

Fix

Add InstallStash, an RAII guard replacing the bare clean on the resolve
path. It renames the no-marker dir aside instead of deleting it, and:

  • commit() on each success path → drops the backup;
  • destructor on any failure path:
    • keeps the new install if verdir now has a marker (reinstall won);
    • else restores a looks_complete_legacy() stash and adopts it
      (writes the marker, so the next resolve trusts it instead of repeating the
      delete-then-fail cycle);
    • else discards genuine half-extracted residue (historical semantics).

Wired into Fetcher::resolve_xpkg_path with stash.commit() before each
success return.

Tests

New tests/unit/test_install_integrity.cpp — 5 gtest cases, all green locally
(mcpp test: 22 passed / 0 failed):

  • RestoresLegacyPackageOnFailedReinstall
  • CommitDropsBackupAndKeepsNewInstall
  • KeepsNewCompleteInstallWhenUncommitted
  • DiscardsNonLegacyResidueOnFailure
  • NoopWhenAlreadyComplete

Scope

Only the resolve/install path's pre-clean changes from delete→stash. Normal
installs (already-complete dirs return earlier; genuine residue still
discarded) are unchanged. Companion to mcpplibs/mcpp-index#43 (the
compat.xcb codegen fix). Full analysis in
.agents/docs/2026-06-21-xcb-and-install-integrity-cross-repo-fix.md.

Follow-ups (separate PRs, tracked in the doc §3.2): recipe-declared verify
product lists, disabling looks_complete_legacy marking on fresh installs,
and a mcpp self doctor RUNPATH / dangling-symlink check.

…lStash)

A content-complete *legacy* package (no .mcpp_ok marker) was deleted
outright by clean_incomplete_install() on the resolve path before a
reinstall. If the reinstall then failed (xlings extracting to
$XLINGS_HOME, network error, etc.), the working package was gone for
good — e.g. xim-x-zlib vanished, leaving clang's RUNPATH dangling and
`clang++ --version` failing with 'libz.so.1: cannot open shared object'
(exit 127), with no way to recover short of a manual reinstall.

Add InstallStash, an RAII guard that replaces the bare clean on the
resolve path: it renames the no-marker dir aside instead of deleting it,
commits (drops the backup) on each success path, and on any failure path
its destructor:
  - keeps the new install if verdir now has a marker (reinstall won);
  - else restores a looks_complete_legacy() stash and adopts it (writes
    the marker) so the next resolve trusts it instead of repeating the
    delete-then-fail cycle;
  - else discards genuine half-extracted residue (historical semantics).

Wire it into Fetcher::resolve_xpkg_path with stash.commit() before each
success return. Add 5 gtest cases covering restore / commit /
keep-new-complete / discard-residue / noop-when-marked.

Also lands the cross-repo root-cause + implementation doc under
.agents/docs/ (covers this and the compat.xcb codegen fix in mcpp-index).
The 3 stash tests that read a payload via std::ifstream then called
fs::remove_all() on the temp tree left the stream in scope. POSIX allows
unlinking open files; Windows does not, so remove_all() threw 'being
used by another process' and the tests failed only on the windows CI leg.

Read through a read_first_line() helper that closes the handle before
returning. Production InstallStash code is unchanged.
@Sunrisepeak Sunrisepeak merged commit 2a67dd4 into main Jun 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant