Skip to content

Develop-branch code-review fixes: 9 Critical bugs + cross-cutting cleanup#39

Merged
mswdev merged 16 commits intodevelopfrom
feature/develop-review-fixes
May 4, 2026
Merged

Develop-branch code-review fixes: 9 Critical bugs + cross-cutting cleanup#39
mswdev merged 16 commits intodevelopfrom
feature/develop-review-fixes

Conversation

@mswdev
Copy link
Copy Markdown
Owner

@mswdev mswdev commented May 4, 2026

Summary

A 5-reviewer team audit of the develop branch surfaced 9 Critical bugs (data-loss, security, broken happy paths) plus cross-cutting cleanup. This PR fixes everything in three phases. All fixes ship with regression tests; the suite goes from 467 → 501 passing.

Phase 1 — Critical bug fixes (9)

Each commit has a bug-then-fix narrative; the failing test was written first and verified to fail on the previous implementation.

  • fix(core/registry) — macOS registry lock was non-functional. The EXIT trap inside _core_registry_acquire_mkdir_lock fired when that function returned (zsh function-trap semantics), so the lockdir was released before the caller's jq write. Two concurrent writers could clobber each other. Trap moved into _core_registry_update_mkdir_fallback and eager-expanded at trap-set time.
  • fix(core/config)ckipper config set ports 3000,3030,6006 corrupted the global config. Writer emitted a quoted scalar regardless of int_array schema type, so for port in "${CKIPPER_PORTS[@]}" in lib/worktree/ports.zsh iterated once with the whole CSV. Type-aware writer now emits KEY=(a b c) for int_array.
  • fix(account) — keychain picker was silently broken on macOS adopt flow. local -n _picked_ref="$2" is invalid in zsh; the assignment leaked to a global. Switched to stdout-capture pattern.
  • fix(account)account remove could rm -rf a live config dir. Mirrored rename's _core_assert_no_running_claude guard before any state mutation.
  • fix(account)account default and account remove printed success on silent registry-write failure (and remove continued to delete the dir + keychain entry). Wrapped both calls in if ! _core_registry_update … and gated cleanup on success.
  • fix(sync/settings)settings_* strategy interpolated the dotted item id into a jq filter (cleanup-period-days parsed as .cleanup - .period - .days — subtraction). Hyphenated keys (most Claude Code settings) caused mid-stream rollback; dotted keys silently corrupted destination JSON. Switched to JSON path arrays + --argjson p / getpath / setpath throughout the 5-verb contract.
  • fix(hooks)bash-guardrails.sh allowed rm -r /home/user/important (regex required both r and f), and self-blocked git push --force-with-lease (the recommended replacement the error message itself suggests). Both regexes corrected; seven new bats cases.

Phase 2 — Cross-cutting cleanup (7)

  • refactor(core) — moved schema.zsh from lib/config/ to lib/core/ (foundation modules read its arrays directly; the prior placement violated the dependency-direction rule).
  • refactor(sync) — consolidated _CKIPPER_SYNC_CTX declaration into _shared.zsh (was duplicated across engine, dispatcher, and preview modules — half-applied /simplify residue).
  • refactor(core) — namespaced bare module constants with _CORE_ prefix (KEYCHAIN_TIMEOUT_SECONDS, REGISTRY_FILE_PERMS, LOCK_*, _CKIPPER_GLOBAL_PREFIX).
  • refactor(worktree) — namespaced bare module constants with _CKIPPER_WT_ prefix (SHASUM_BITS, MAX_PORT_FALLBACK_ATTEMPTS, etc.).
  • refactor(worktree) — boolean style standardized to [[ "$X" = "true" ]] and unquoted for x in $(find …) loops converted to while IFS= read -r -d '' null-delimited iteration (4 sites).
  • fix(config) — namespace dispatcher honors <subcommand> --help (was returning "Unknown flag: '--help'"). Audited the other dispatchers; account/worktree/setup/run already complied.
  • fix(hooks) — unified the .git/info/ path-protection alternation between bash-guardrails.sh and protect-claude-config.sh (Bash blocked only attributes/exclude; Edit/Write blocked all of info/).
  • chore(ci) — added permissions: contents: read and timeout-minutes: 15 to the workflow.

Phase 3 — Polish

  • _ckipper_account_list_row reduced from 4 to 3 positional args (under the cap) by reading default from a module-level context global.
  • Stderr discipline: lib/worktree/build-image.zsh:10 and lib/worktree/ports.zsh:41 now write errors/warnings to stderr.
  • lib/worktree/normal-mode.zsh cd calls guarded; missing-worktree-path surfaces a stderr error.

Deferred to follow-up PRs

  • SHA-pin actions/checkout@v4 (left as @v4 — needs a known-good SHA picked from the action's release page).
  • Broader hook adversarial-test coverage (credential-read patterns, gh gist create, tee bypass on read-passthrough).
  • set -uo pipefail on install.sh / docker/entrypoint.sh / docker/init-firewall.sh — surfaces latent unset-var issues that need case-by-case judgment; cleanest as a focused PR.
  • Silent-error-swallow audit (2>/dev/null || true patterns) — open-ended; defer.

Test plan

  • 501 bats tests pass (make test-unit — bats portion)
  • make lint shell parts pass (shellcheck, shfmt, zsh -n)
  • make lint-merge-guards (function-prefix discipline) passes
  • Each Critical fix has a regression test that fails on the prior implementation and passes after the fix
  • ruff and pytest pass in CI (not installed locally; CI on macos-latest runs them)

mswdev added 16 commits May 3, 2026 22:31
In zsh an `EXIT` trap set inside a function fires when *that* function
returns, regardless of `local_traps` in the caller. Previously the trap
was set inside `_core_registry_acquire_mkdir_lock`, so the lockdir was
released the moment acquire returned — before the caller's jq write
ever ran. A concurrent writer could grab the lock during the supposed
critical section and clobber the in-flight update.

Move the trap into `_core_registry_update_mkdir_fallback` (the function
whose lifetime spans the critical section) and eager-expand `\$lockdir`
at trap-set time, since by the time the trap actually fires the
function's local is already out of scope.

Add three bats tests:
- acquire leaves the lockdir intact for the caller after returning;
- fallback releases the lockdir after a successful update;
- fallback releases the lockdir after a jq failure.

Verified that the test for the regression fails on the previous
implementation and passes after the fix.
`_core_config_write_global` emitted every value as a quoted scalar
regardless of schema type, so `ckipper config set ports 3000,3030,6006`
rewrote the templated `CKIPPER_PORTS=(3000 3030 6006)` array literal as
`CKIPPER_PORTS="3000,3030,6006"`. After the next shell startup,
`CKIPPER_PORTS` was a string, so `for port in "${CKIPPER_PORTS[@]}"` in
`lib/worktree/ports.zsh` iterated once with the whole CSV — silently
breaking port forwarding under Docker mode.

Extract `_core_config_format_line` and pick the on-disk form by schema
type: int_array → `KEY=(a b c)`, everything else → `KEY="value"`.
Teach `_core_config_read_global` to recognise the array form and
return CSV so `ckipper config get ports` round-trips with whatever the
user passed to `set`.

Add four bats tests covering the regression, the round-trip, the
sourcing-side shape, and idempotent rewrite of an existing array
literal in place.
…meref)

`local -n _picked_ref="$2"` is invalid in zsh — the `local` builtin
errored silently and the subsequent `_picked_ref="$picked"` leaked the
value to a literal global named `_picked_ref` instead of the caller's
variable. The adopt flow on macOS therefore never honoured the user's
Keychain selection: `picked` stayed empty, so the registry recorded
the account with no associated Keychain service.

Drop the failed nameref pattern. Echo the chosen service to stdout,
make the caller capture it: `picked=$(_helper "$name") || return 1`.
This matches the rest of the module's style (snapshot, validate, etc.
all use stdout-capture).

Add four bats tests: stdout shape on valid pick, empty output on skip,
empty output when the keychain has no candidates, and the existing
"invalid service shape" error path.
`_ckipper_account_remove` proceeded straight into the destructive
cleanup helpers — `_ckipper_account_cleanup_dir` runs `rm -rf` on the
account's config directory — without ever asserting that no Claude
process was alive. A user with Claude open in another terminal who ran
`ckipper account remove <name>` would have their live session's
config dir removed from underneath the running process.

`account rename` already has the guard at
`_ckipper_account_rename_check_preconditions:495`. `account remove` is
strictly more destructive; mirror the guard right after the
"is registered?" check and before any state mutation.

Add a bats case that runs `_ckipper_account_remove work` with a stub
pgrep reporting a fake process and `CKIPPER_FORCE=0`, and asserts
non-zero exit, the registry entry preserved, and the config dir intact.
Both `_ckipper_account_default` and `_ckipper_account_remove` invoked
`_core_registry_update` without checking its exit status, then printed
success ("Default account is now …" / "Unregistered …") regardless of
whether the registry write actually landed. `account_remove` was the
worse of the two: after a silent registry-write failure it still
proceeded to `rm -rf` the config dir and prompt to delete the keychain
entry — based on a registry state that hadn't changed.

Wrap both calls in `if ! _core_registry_update …; then …` and gate the
follow-up cleanup helpers on success. Surface a clear stderr error and
return 1 on failure.

Add bats tests that stub `_core_registry_update` to fail and assert
non-zero exit, an "Error" message on stderr, the absence of the
success line, and (for remove) that the config dir survives.
`_ckipper_account_sync_settings_*` interpolated the dotted item id
straight into a jq filter string (`.cleanup-period-days`), which jq
parsed as `.cleanup - .period - .days` — subtraction. Most Claude
Code settings (cleanup-period-days, max-tokens, …) hit this. Result:
`compare` returned "new" for both source and destination (both jq
calls compile-failed silently), and `apply` aborted with a jq compile
error mid-stream, rolling back the entire sync slice.

A second silent-corruption shape: keys containing literal dots (rare
but legal JSON, e.g. `{"some.key": …}`) were split on `.` into a
multi-element path, so `setpath` planted a nested
`{"some":{"key":null}}` and lost the original flat key — with rc=0.

Fix the contract so the id NEVER goes through a jq filter parser:
- `enumerate` emits the path as a JSON-encoded array
  (`["cleanup-period-days"]`, `["permissions","allow"]`); the display
  column keeps the dotted form for the preview UI.
- `compare`, `summary`, `diff`, `apply` consume the id via
  `--argjson p "$id"` and use `getpath`/`setpath` — never
  string-interpolated jq filters, never `split(".")`.
- Drop the dead `_ckipper_account_sync_settings_jq_path` helper.

Update the existing settings tests to pass JSON-array ids. Add three
new tests: hyphenated-key enumerate (id has correct shape), hyphenated
compare (round-trips without jq compile error), hyphenated apply
(value lands without aborting), and dotted-key apply (no nested-object
leak).
Two regex bugs in the bash-guardrails safety hook:

1. The recursive-delete guard required *both* `r` AND `f` in the rm
   flags (regex alternatives `r[a-zA-Z]*f` or `f[a-zA-Z]*r` or
   `--recursive`). Plain `rm -r /home/user/important` matched none of
   them and was silently allowed. With `--dangerously-skip-permissions`
   this could erase arbitrary user state. Replace with a single
   alternation that requires *either* an r/R flag or `--recursive`.

2. The git-history guard used `--force\b`. `\b` fires at any
   word/non-word boundary, including the `-` in `--force-with-lease` —
   the very replacement the error message tells the user to switch to.
   Anchor the right side against whitespace-or-end-of-string instead.

Add seven new bats cases: rm -r blocked, rm -R blocked,
rm --recursive blocked, rm -rf node_modules still allowed,
--force-with-lease passes, --force still blocked, -f still blocked.
The schema arrays (`_CKIPPER_SCHEMA_TYPE`, `_DEFAULT`, `_SCOPE`,
`_DESCRIPTION`) are pure data — no functions, no behavior. They were
sourced from `lib/config/schema.zsh` but read directly by foundation
modules (`lib/core/config.zsh`, `lib/core/registry.zsh`), which
violated the dependency-direction rule from
`.claude/rules/file-organization.md`: "Parent directories MUST NOT
import from child route/feature directories."

Move the file to `lib/core/schema.zsh` (still 9 source files in
`lib/core/`, under the 10-file cap) and chase every `lib/config/schema`
reference across:
  - source: ckipper.zsh, lib/account/sync/strategies/structured.zsh,
    lib/account/sync/registry.zsh, lib/account/doctor.zsh,
    lib/setup/{prompts,apply}.zsh, lib/worktree/worktree.zsh,
    lib/core/{config,registry}.zsh
  - tests: every *_test.bats file that sourced the schema
  - docs: README.md, CONTRIBUTING.md, doc-comments

`lib/config/` still owns the user-facing get/set/unset/list/edit
subcommands — only the data file moved. `make lint-merge-guards` and
the full bats suite remain green (493 tests).
…d.zsh

The same `typeset -gA _CKIPPER_SYNC_CTX` declaration lived in three
sibling modules — engine.zsh, dispatcher.zsh, preview.zsh — each with
an elaborate "re-declaration without `=()` is a no-op" comment
justifying the duplication. Residue of the /simplify pass that left
the parameter-object refactor half-applied: a single `=()` slip in any
of the three would silently reset state set by the others.

Move the canonical declaration into `_shared.zsh` (sourced first in
the production load order) and drop the three redundant copies. The
three module test files (engine_test.bats, dispatcher_test.bats,
preview_test.bats) now also source `_shared.zsh` so the declaration
is in scope when each test exercises its module in isolation.

Full bats suite remains green (493 tests).
Bare-named globals like `KEYCHAIN_TIMEOUT_SECONDS`, `REGISTRY_FILE_PERMS`,
and the various `LOCK_*` constants leaked into the user's zsh
namespace and could collide with random shell vars. The 9894aaf
"standardize prompt helpers and module-constant prefixes" commit moved
prompt/style/fuzzy/help to the `_CORE_<MODULE>_*` convention; keychain,
registry, and config were missed.

Rename in this commit:
  KEYCHAIN_TIMEOUT_SECONDS         → _CORE_KEYCHAIN_TIMEOUT_SECONDS
  REGISTRY_FILE_PERMS              → _CORE_REGISTRY_FILE_PERMS
  LOCK_NOTIFY_THRESHOLD_ATTEMPTS   → _CORE_REGISTRY_LOCK_NOTIFY_THRESHOLD_ATTEMPTS
  LOCK_MAX_ATTEMPTS                → _CORE_REGISTRY_LOCK_MAX_ATTEMPTS
  STALE_LOCK_AGE_THRESHOLD_SECONDS → _CORE_REGISTRY_STALE_LOCK_AGE_THRESHOLD_SECONDS
  LOCK_RETRY_INTERVAL_SECONDS      → _CORE_REGISTRY_LOCK_RETRY_INTERVAL_SECONDS
  _CKIPPER_GLOBAL_PREFIX           → _CORE_CONFIG_GLOBAL_PREFIX

Definitions and references updated. Test files included. Bats suite
remains green (493 tests).
… prefix

Same pattern as the previous core/ rename: bare-named globals leaked
into the user's namespace and could collide with arbitrary shell vars.

Rename:
  SHASUM_BITS                → _CKIPPER_WT_SHASUM_BITS
  CLAUDE_CONTAINER_UID       → _CKIPPER_WT_CLAUDE_CONTAINER_UID
  CLAUDE_CONTAINER_GID       → _CKIPPER_WT_CLAUDE_CONTAINER_GID
  CREDS_TMPFS_MODE           → _CKIPPER_WT_CREDS_TMPFS_MODE
  CREDS_TMPFS_SIZE           → _CKIPPER_WT_CREDS_TMPFS_SIZE
  DOCKER_GROUP_ADD_HOST_ROOT → _CKIPPER_WT_DOCKER_GROUP_ADD_HOST_ROOT
  MAX_PORT_FALLBACK_ATTEMPTS → _CKIPPER_WT_MAX_PORT_FALLBACK_ATTEMPTS

The `_CKIPPER_WT_` prefix matches the existing convention for
worktree-module shared state (`CKIPPER_WT_DOCKER_ARGS`,
`CKIPPER_WT_FLAG_DOCKER`, etc.); the leading underscore distinguishes
"private constant" from "module-shared state". Bats remains green (493).
…nd loops

Two shell-style fixes bundled because they overlap in the worktree files:

1. Boolean comparison style: `[[ "$X" = true ]]` → `[[ "$X" = "true" ]]`.
   The unquoted form is non-canonical per `.claude/rules/shell-conventions.md`
   ("test with `[[ \"$x\" = \"true\" ]]`"). Five sites in
   run.zsh/docker-mode.zsh/worktree.zsh.

2. `for x in $(find ...)` loops silently lose paths containing spaces /
   tabs / newlines because the unquoted command substitution undergoes
   word-splitting on $IFS. Convert to null-delimited iteration:
       while IFS= read -r -d '' x; do ... done < <(find ... -print0)
   Four sites: ckipper.zsh:304/317 (completion), launcher/menu.zsh:96
   (project enumeration), worktree.zsh:331 (.env file copy on create).

Also rename `CKIPPER_WT_FIND_MAX_DEPTH` → `_CKIPPER_WT_FIND_MAX_DEPTH`
(touched here anyway by the find-loop edit; the leading underscore
matches the module-private-constant convention used elsewhere in
worktree/).

Bats remains green (493 tests).
The contract in ckipper.zsh:191-193 ("Run \`ckipper <namespace>
<subcommand> --help\` for per-subcommand details") held for the
account and worktree dispatchers but not for config — running
\`ckipper config get --help\` returned "Unknown flag: '--help'"
because get/set/unset/list/edit forwarded the flag straight to the
handler.

Mirror the account/worktree pattern: short-circuit `<subcmd> --help`
to `_ckipper_config_help` before dispatching the handler. Five new
bats cases — one per subcommand — assert the exit-zero + overview
text. Other dispatchers (account, worktree, setup, run) already
comply; audited as part of this change.
`bash-guardrails.sh:78` blocked only `.git/info/attributes` and
`.git/info/exclude`; `protect-claude-config.sh:47` blocked all of
`.git/info/`. The two hooks gated the same threat (a container-side
edit landing on a host-side `.git/` file that executes on the next git
invocation) but disagreed on the surface — Slice 5 of the develop
review flagged this as the load-bearing inconsistency.

Align the inner subpath alternation:
    .git/(config|info/|hooks/|worktrees/)
The leading anchor still differs (Bash sees command strings,
Edit/Write sees realpath-resolved file paths), but the protected-path
list is identical now. Cross-reference comments added to both files so
future drift is harder to introduce silently.

Three new bats cases: writes anywhere under `.git/info/` are blocked,
writes to `.git/info/attributes` are blocked, reads of `.git/info/`
files still pass through.
Two standard supply-chain hardening steps for the workflow:

1. Top-level `permissions: contents: read`. The default GITHUB_TOKEN
   on `push` carries write access to the repo; with no opt-in
   permissions block any action invoked here would inherit that. The
   `make lint` / `make test-unit` job only needs to read source — make
   that explicit so a future regression that pulls in a malicious
   action can't escalate silently.

2. `timeout-minutes: 15` on the lint-and-test job. A hung step (network
   stall in `pip install`, an infinite loop in a test) currently runs
   for 6 hours before GitHub kills it. 15 min is generous: current CI
   is well under 5 min including brew install + pip install + the full
   bats + ruff + pytest passes.

SHA-pinning `actions/checkout@v4` to a commit hash is the third
standard hardening but I'm leaving that as a follow-up so we can pick
a known-good SHA from the actions/checkout release page rather than
guessing.
Three small follow-ups from the develop-branch review:

1. `_ckipper_account_list_row` took 4 positional args (.claude/rules
   3-param cap). Move `default` into a module-level
   `_CKIPPER_ACCOUNT_LIST_DEFAULT` global that
   `_ckipper_account_list` populates once per invocation; the row
   helper drops to 3 args + 1 documented context read. Same pattern
   already in use elsewhere in this module (_CKIPPER_FINALIZE_CTX,
   _CKIPPER_RENAME_CTX).

2. `lib/worktree/build-image.zsh:10` and `lib/worktree/ports.zsh:41`
   wrote error/warning text to stdout. Per the project checklist,
   error paths surface to monitoring — move both to stderr (`>&2`).

3. `lib/worktree/normal-mode.zsh` `cd` calls were unguarded. If the
   worktree path went missing between create and run, the command
   silently executed in the caller's directory (and the trailing
   restoring `cd "$old_pwd"` likewise). Add an upfront existence check,
   gate each `cd` with `|| return 1` (or `|| true` for the best-effort
   restore), and surface a stderr error when the path is missing.

Bats remains green (501).
@mswdev mswdev marked this pull request as ready for review May 4, 2026 06:06
@mswdev mswdev merged commit dd797ca into develop May 4, 2026
1 check passed
@mswdev mswdev deleted the feature/develop-review-fixes branch May 4, 2026 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant