Develop-branch code-review fixes: 9 Critical bugs + cross-cutting cleanup#39
Merged
Develop-branch code-review fixes: 9 Critical bugs + cross-cutting cleanup#39
Conversation
In zsh an `EXIT` trap set inside a function fires when *that* function returns, regardless of `local_traps` in the caller. Previously the trap was set inside `_core_registry_acquire_mkdir_lock`, so the lockdir was released the moment acquire returned — before the caller's jq write ever ran. A concurrent writer could grab the lock during the supposed critical section and clobber the in-flight update. Move the trap into `_core_registry_update_mkdir_fallback` (the function whose lifetime spans the critical section) and eager-expand `\$lockdir` at trap-set time, since by the time the trap actually fires the function's local is already out of scope. Add three bats tests: - acquire leaves the lockdir intact for the caller after returning; - fallback releases the lockdir after a successful update; - fallback releases the lockdir after a jq failure. Verified that the test for the regression fails on the previous implementation and passes after the fix.
`_core_config_write_global` emitted every value as a quoted scalar
regardless of schema type, so `ckipper config set ports 3000,3030,6006`
rewrote the templated `CKIPPER_PORTS=(3000 3030 6006)` array literal as
`CKIPPER_PORTS="3000,3030,6006"`. After the next shell startup,
`CKIPPER_PORTS` was a string, so `for port in "${CKIPPER_PORTS[@]}"` in
`lib/worktree/ports.zsh` iterated once with the whole CSV — silently
breaking port forwarding under Docker mode.
Extract `_core_config_format_line` and pick the on-disk form by schema
type: int_array → `KEY=(a b c)`, everything else → `KEY="value"`.
Teach `_core_config_read_global` to recognise the array form and
return CSV so `ckipper config get ports` round-trips with whatever the
user passed to `set`.
Add four bats tests covering the regression, the round-trip, the
sourcing-side shape, and idempotent rewrite of an existing array
literal in place.
…meref) `local -n _picked_ref="$2"` is invalid in zsh — the `local` builtin errored silently and the subsequent `_picked_ref="$picked"` leaked the value to a literal global named `_picked_ref` instead of the caller's variable. The adopt flow on macOS therefore never honoured the user's Keychain selection: `picked` stayed empty, so the registry recorded the account with no associated Keychain service. Drop the failed nameref pattern. Echo the chosen service to stdout, make the caller capture it: `picked=$(_helper "$name") || return 1`. This matches the rest of the module's style (snapshot, validate, etc. all use stdout-capture). Add four bats tests: stdout shape on valid pick, empty output on skip, empty output when the keychain has no candidates, and the existing "invalid service shape" error path.
`_ckipper_account_remove` proceeded straight into the destructive cleanup helpers — `_ckipper_account_cleanup_dir` runs `rm -rf` on the account's config directory — without ever asserting that no Claude process was alive. A user with Claude open in another terminal who ran `ckipper account remove <name>` would have their live session's config dir removed from underneath the running process. `account rename` already has the guard at `_ckipper_account_rename_check_preconditions:495`. `account remove` is strictly more destructive; mirror the guard right after the "is registered?" check and before any state mutation. Add a bats case that runs `_ckipper_account_remove work` with a stub pgrep reporting a fake process and `CKIPPER_FORCE=0`, and asserts non-zero exit, the registry entry preserved, and the config dir intact.
Both `_ckipper_account_default` and `_ckipper_account_remove` invoked
`_core_registry_update` without checking its exit status, then printed
success ("Default account is now …" / "Unregistered …") regardless of
whether the registry write actually landed. `account_remove` was the
worse of the two: after a silent registry-write failure it still
proceeded to `rm -rf` the config dir and prompt to delete the keychain
entry — based on a registry state that hadn't changed.
Wrap both calls in `if ! _core_registry_update …; then …` and gate the
follow-up cleanup helpers on success. Surface a clear stderr error and
return 1 on failure.
Add bats tests that stub `_core_registry_update` to fail and assert
non-zero exit, an "Error" message on stderr, the absence of the
success line, and (for remove) that the config dir survives.
`_ckipper_account_sync_settings_*` interpolated the dotted item id
straight into a jq filter string (`.cleanup-period-days`), which jq
parsed as `.cleanup - .period - .days` — subtraction. Most Claude
Code settings (cleanup-period-days, max-tokens, …) hit this. Result:
`compare` returned "new" for both source and destination (both jq
calls compile-failed silently), and `apply` aborted with a jq compile
error mid-stream, rolling back the entire sync slice.
A second silent-corruption shape: keys containing literal dots (rare
but legal JSON, e.g. `{"some.key": …}`) were split on `.` into a
multi-element path, so `setpath` planted a nested
`{"some":{"key":null}}` and lost the original flat key — with rc=0.
Fix the contract so the id NEVER goes through a jq filter parser:
- `enumerate` emits the path as a JSON-encoded array
(`["cleanup-period-days"]`, `["permissions","allow"]`); the display
column keeps the dotted form for the preview UI.
- `compare`, `summary`, `diff`, `apply` consume the id via
`--argjson p "$id"` and use `getpath`/`setpath` — never
string-interpolated jq filters, never `split(".")`.
- Drop the dead `_ckipper_account_sync_settings_jq_path` helper.
Update the existing settings tests to pass JSON-array ids. Add three
new tests: hyphenated-key enumerate (id has correct shape), hyphenated
compare (round-trips without jq compile error), hyphenated apply
(value lands without aborting), and dotted-key apply (no nested-object
leak).
Two regex bugs in the bash-guardrails safety hook: 1. The recursive-delete guard required *both* `r` AND `f` in the rm flags (regex alternatives `r[a-zA-Z]*f` or `f[a-zA-Z]*r` or `--recursive`). Plain `rm -r /home/user/important` matched none of them and was silently allowed. With `--dangerously-skip-permissions` this could erase arbitrary user state. Replace with a single alternation that requires *either* an r/R flag or `--recursive`. 2. The git-history guard used `--force\b`. `\b` fires at any word/non-word boundary, including the `-` in `--force-with-lease` — the very replacement the error message tells the user to switch to. Anchor the right side against whitespace-or-end-of-string instead. Add seven new bats cases: rm -r blocked, rm -R blocked, rm --recursive blocked, rm -rf node_modules still allowed, --force-with-lease passes, --force still blocked, -f still blocked.
The schema arrays (`_CKIPPER_SCHEMA_TYPE`, `_DEFAULT`, `_SCOPE`,
`_DESCRIPTION`) are pure data — no functions, no behavior. They were
sourced from `lib/config/schema.zsh` but read directly by foundation
modules (`lib/core/config.zsh`, `lib/core/registry.zsh`), which
violated the dependency-direction rule from
`.claude/rules/file-organization.md`: "Parent directories MUST NOT
import from child route/feature directories."
Move the file to `lib/core/schema.zsh` (still 9 source files in
`lib/core/`, under the 10-file cap) and chase every `lib/config/schema`
reference across:
- source: ckipper.zsh, lib/account/sync/strategies/structured.zsh,
lib/account/sync/registry.zsh, lib/account/doctor.zsh,
lib/setup/{prompts,apply}.zsh, lib/worktree/worktree.zsh,
lib/core/{config,registry}.zsh
- tests: every *_test.bats file that sourced the schema
- docs: README.md, CONTRIBUTING.md, doc-comments
`lib/config/` still owns the user-facing get/set/unset/list/edit
subcommands — only the data file moved. `make lint-merge-guards` and
the full bats suite remain green (493 tests).
…d.zsh The same `typeset -gA _CKIPPER_SYNC_CTX` declaration lived in three sibling modules — engine.zsh, dispatcher.zsh, preview.zsh — each with an elaborate "re-declaration without `=()` is a no-op" comment justifying the duplication. Residue of the /simplify pass that left the parameter-object refactor half-applied: a single `=()` slip in any of the three would silently reset state set by the others. Move the canonical declaration into `_shared.zsh` (sourced first in the production load order) and drop the three redundant copies. The three module test files (engine_test.bats, dispatcher_test.bats, preview_test.bats) now also source `_shared.zsh` so the declaration is in scope when each test exercises its module in isolation. Full bats suite remains green (493 tests).
Bare-named globals like `KEYCHAIN_TIMEOUT_SECONDS`, `REGISTRY_FILE_PERMS`, and the various `LOCK_*` constants leaked into the user's zsh namespace and could collide with random shell vars. The 9894aaf "standardize prompt helpers and module-constant prefixes" commit moved prompt/style/fuzzy/help to the `_CORE_<MODULE>_*` convention; keychain, registry, and config were missed. Rename in this commit: KEYCHAIN_TIMEOUT_SECONDS → _CORE_KEYCHAIN_TIMEOUT_SECONDS REGISTRY_FILE_PERMS → _CORE_REGISTRY_FILE_PERMS LOCK_NOTIFY_THRESHOLD_ATTEMPTS → _CORE_REGISTRY_LOCK_NOTIFY_THRESHOLD_ATTEMPTS LOCK_MAX_ATTEMPTS → _CORE_REGISTRY_LOCK_MAX_ATTEMPTS STALE_LOCK_AGE_THRESHOLD_SECONDS → _CORE_REGISTRY_STALE_LOCK_AGE_THRESHOLD_SECONDS LOCK_RETRY_INTERVAL_SECONDS → _CORE_REGISTRY_LOCK_RETRY_INTERVAL_SECONDS _CKIPPER_GLOBAL_PREFIX → _CORE_CONFIG_GLOBAL_PREFIX Definitions and references updated. Test files included. Bats suite remains green (493 tests).
… prefix Same pattern as the previous core/ rename: bare-named globals leaked into the user's namespace and could collide with arbitrary shell vars. Rename: SHASUM_BITS → _CKIPPER_WT_SHASUM_BITS CLAUDE_CONTAINER_UID → _CKIPPER_WT_CLAUDE_CONTAINER_UID CLAUDE_CONTAINER_GID → _CKIPPER_WT_CLAUDE_CONTAINER_GID CREDS_TMPFS_MODE → _CKIPPER_WT_CREDS_TMPFS_MODE CREDS_TMPFS_SIZE → _CKIPPER_WT_CREDS_TMPFS_SIZE DOCKER_GROUP_ADD_HOST_ROOT → _CKIPPER_WT_DOCKER_GROUP_ADD_HOST_ROOT MAX_PORT_FALLBACK_ATTEMPTS → _CKIPPER_WT_MAX_PORT_FALLBACK_ATTEMPTS The `_CKIPPER_WT_` prefix matches the existing convention for worktree-module shared state (`CKIPPER_WT_DOCKER_ARGS`, `CKIPPER_WT_FLAG_DOCKER`, etc.); the leading underscore distinguishes "private constant" from "module-shared state". Bats remains green (493).
…nd loops
Two shell-style fixes bundled because they overlap in the worktree files:
1. Boolean comparison style: `[[ "$X" = true ]]` → `[[ "$X" = "true" ]]`.
The unquoted form is non-canonical per `.claude/rules/shell-conventions.md`
("test with `[[ \"$x\" = \"true\" ]]`"). Five sites in
run.zsh/docker-mode.zsh/worktree.zsh.
2. `for x in $(find ...)` loops silently lose paths containing spaces /
tabs / newlines because the unquoted command substitution undergoes
word-splitting on $IFS. Convert to null-delimited iteration:
while IFS= read -r -d '' x; do ... done < <(find ... -print0)
Four sites: ckipper.zsh:304/317 (completion), launcher/menu.zsh:96
(project enumeration), worktree.zsh:331 (.env file copy on create).
Also rename `CKIPPER_WT_FIND_MAX_DEPTH` → `_CKIPPER_WT_FIND_MAX_DEPTH`
(touched here anyway by the find-loop edit; the leading underscore
matches the module-private-constant convention used elsewhere in
worktree/).
Bats remains green (493 tests).
The contract in ckipper.zsh:191-193 ("Run \`ckipper <namespace>
<subcommand> --help\` for per-subcommand details") held for the
account and worktree dispatchers but not for config — running
\`ckipper config get --help\` returned "Unknown flag: '--help'"
because get/set/unset/list/edit forwarded the flag straight to the
handler.
Mirror the account/worktree pattern: short-circuit `<subcmd> --help`
to `_ckipper_config_help` before dispatching the handler. Five new
bats cases — one per subcommand — assert the exit-zero + overview
text. Other dispatchers (account, worktree, setup, run) already
comply; audited as part of this change.
`bash-guardrails.sh:78` blocked only `.git/info/attributes` and
`.git/info/exclude`; `protect-claude-config.sh:47` blocked all of
`.git/info/`. The two hooks gated the same threat (a container-side
edit landing on a host-side `.git/` file that executes on the next git
invocation) but disagreed on the surface — Slice 5 of the develop
review flagged this as the load-bearing inconsistency.
Align the inner subpath alternation:
.git/(config|info/|hooks/|worktrees/)
The leading anchor still differs (Bash sees command strings,
Edit/Write sees realpath-resolved file paths), but the protected-path
list is identical now. Cross-reference comments added to both files so
future drift is harder to introduce silently.
Three new bats cases: writes anywhere under `.git/info/` are blocked,
writes to `.git/info/attributes` are blocked, reads of `.git/info/`
files still pass through.
Two standard supply-chain hardening steps for the workflow: 1. Top-level `permissions: contents: read`. The default GITHUB_TOKEN on `push` carries write access to the repo; with no opt-in permissions block any action invoked here would inherit that. The `make lint` / `make test-unit` job only needs to read source — make that explicit so a future regression that pulls in a malicious action can't escalate silently. 2. `timeout-minutes: 15` on the lint-and-test job. A hung step (network stall in `pip install`, an infinite loop in a test) currently runs for 6 hours before GitHub kills it. 15 min is generous: current CI is well under 5 min including brew install + pip install + the full bats + ruff + pytest passes. SHA-pinning `actions/checkout@v4` to a commit hash is the third standard hardening but I'm leaving that as a follow-up so we can pick a known-good SHA from the actions/checkout release page rather than guessing.
Three small follow-ups from the develop-branch review: 1. `_ckipper_account_list_row` took 4 positional args (.claude/rules 3-param cap). Move `default` into a module-level `_CKIPPER_ACCOUNT_LIST_DEFAULT` global that `_ckipper_account_list` populates once per invocation; the row helper drops to 3 args + 1 documented context read. Same pattern already in use elsewhere in this module (_CKIPPER_FINALIZE_CTX, _CKIPPER_RENAME_CTX). 2. `lib/worktree/build-image.zsh:10` and `lib/worktree/ports.zsh:41` wrote error/warning text to stdout. Per the project checklist, error paths surface to monitoring — move both to stderr (`>&2`). 3. `lib/worktree/normal-mode.zsh` `cd` calls were unguarded. If the worktree path went missing between create and run, the command silently executed in the caller's directory (and the trailing restoring `cd "$old_pwd"` likewise). Add an upfront existence check, gate each `cd` with `|| return 1` (or `|| true` for the best-effort restore), and surface a stderr error when the path is missing. Bats remains green (501).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A 5-reviewer team audit of the
developbranch surfaced 9 Critical bugs (data-loss, security, broken happy paths) plus cross-cutting cleanup. This PR fixes everything in three phases. All fixes ship with regression tests; the suite goes from 467 → 501 passing.Phase 1 — Critical bug fixes (9)
Each commit has a bug-then-fix narrative; the failing test was written first and verified to fail on the previous implementation.
fix(core/registry)— macOS registry lock was non-functional. TheEXITtrap inside_core_registry_acquire_mkdir_lockfired when that function returned (zsh function-trap semantics), so the lockdir was released before the caller's jq write. Two concurrent writers could clobber each other. Trap moved into_core_registry_update_mkdir_fallbackand eager-expanded at trap-set time.fix(core/config)—ckipper config set ports 3000,3030,6006corrupted the global config. Writer emitted a quoted scalar regardless ofint_arrayschema type, sofor port in "${CKIPPER_PORTS[@]}"inlib/worktree/ports.zshiterated once with the whole CSV. Type-aware writer now emitsKEY=(a b c)forint_array.fix(account)— keychain picker was silently broken on macOS adopt flow.local -n _picked_ref="$2"is invalid in zsh; the assignment leaked to a global. Switched to stdout-capture pattern.fix(account)—account removecouldrm -rfa live config dir. Mirroredrename's_core_assert_no_running_claudeguard before any state mutation.fix(account)—account defaultandaccount removeprinted success on silent registry-write failure (andremovecontinued to delete the dir + keychain entry). Wrapped both calls inif ! _core_registry_update …and gated cleanup on success.fix(sync/settings)—settings_*strategy interpolated the dotted item id into a jq filter (cleanup-period-daysparsed as.cleanup - .period - .days— subtraction). Hyphenated keys (most Claude Code settings) caused mid-stream rollback; dotted keys silently corrupted destination JSON. Switched to JSON path arrays +--argjson p/getpath/setpaththroughout the 5-verb contract.fix(hooks)—bash-guardrails.shallowedrm -r /home/user/important(regex required bothrandf), and self-blockedgit push --force-with-lease(the recommended replacement the error message itself suggests). Both regexes corrected; seven new bats cases.Phase 2 — Cross-cutting cleanup (7)
refactor(core)— movedschema.zshfromlib/config/tolib/core/(foundation modules read its arrays directly; the prior placement violated the dependency-direction rule).refactor(sync)— consolidated_CKIPPER_SYNC_CTXdeclaration into_shared.zsh(was duplicated across engine, dispatcher, and preview modules — half-applied/simplifyresidue).refactor(core)— namespaced bare module constants with_CORE_prefix (KEYCHAIN_TIMEOUT_SECONDS,REGISTRY_FILE_PERMS,LOCK_*,_CKIPPER_GLOBAL_PREFIX).refactor(worktree)— namespaced bare module constants with_CKIPPER_WT_prefix (SHASUM_BITS,MAX_PORT_FALLBACK_ATTEMPTS, etc.).refactor(worktree)— boolean style standardized to[[ "$X" = "true" ]]and unquotedfor x in $(find …)loops converted towhile IFS= read -r -d ''null-delimited iteration (4 sites).fix(config)— namespace dispatcher honors<subcommand> --help(was returning "Unknown flag: '--help'"). Audited the other dispatchers; account/worktree/setup/run already complied.fix(hooks)— unified the.git/info/path-protection alternation betweenbash-guardrails.shandprotect-claude-config.sh(Bash blocked only attributes/exclude; Edit/Write blocked all of info/).chore(ci)— addedpermissions: contents: readandtimeout-minutes: 15to the workflow.Phase 3 — Polish
_ckipper_account_list_rowreduced from 4 to 3 positional args (under the cap) by readingdefaultfrom a module-level context global.lib/worktree/build-image.zsh:10andlib/worktree/ports.zsh:41now write errors/warnings to stderr.lib/worktree/normal-mode.zshcdcalls guarded; missing-worktree-path surfaces a stderr error.Deferred to follow-up PRs
actions/checkout@v4(left as@v4— needs a known-good SHA picked from the action's release page).gh gist create,teebypass on read-passthrough).set -uo pipefailoninstall.sh/docker/entrypoint.sh/docker/init-firewall.sh— surfaces latent unset-var issues that need case-by-case judgment; cleanest as a focused PR.2>/dev/null || truepatterns) — open-ended; defer.Test plan
make test-unit— bats portion)make lintshell parts pass (shellcheck,shfmt,zsh -n)make lint-merge-guards(function-prefix discipline) passesruffandpytestpass in CI (not installed locally; CI onmacos-latestruns them)