Skip to content

Audit fixes: rename corruption, retry cancellation, privacy gaps, build/CI hardening#1071

Draft
r3dbars wants to merge 13 commits into
mainfrom
claude/heuristic-wing-fc0954
Draft

Audit fixes: rename corruption, retry cancellation, privacy gaps, build/CI hardening#1071
r3dbars wants to merge 13 commits into
mainfrom
claude/heuristic-wing-fc0954

Conversation

@r3dbars

@r3dbars r3dbars commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Why

A deep multi-agent repo audit (8 subsystem maps, 8 audit dimensions, adversarially verified findings) surfaced a cluster of correctness, privacy, licensing, and build-infrastructure issues. This PR executes everything from the audit's task plan that didn't need a product decision: the Milestone 0/1 correctness and privacy fixes, the quick wins, and the high-leverage build/test-infra items.

Product Impact

  • Affects: meetings + dictation + agent artifacts + docs
  • Lane: meeting reliability (plus release ops for the build/CI changes)
  • Why this matters: two user-data bugs (speaker rename could corrupt transcripts of same-named speakers; cancelled retries kept running and could contend with fresh pipelines), one privacy gap (Sentry extras bypassed the diagnostic allowlist), a license compliance hole in the shipped DMG (GPL eSpeak NG with zero attribution), and a build system where the dev and shipped builds had already silently diverged.

What changed

  • Speaker rename/merge is row-targeted in frontmatter, channel-scoped when display names collide, fails closed when truly ambiguous, and scans subfolders (RetroactiveSpeakerUpdater). 3 new tests; 33 existing tests pin common-case behavior unchanged.
  • Retry cancellation: the retry's real work is registered in activeTasks (was a no-op sentinel), so cancelAll() reaches in-flight inference and the single-pipeline guard holds. New regression test can't pass with the sentinel pattern.
  • Local summaries: model runs moved off the Swift-concurrency cooperative pool (dedicated queue + cancellation bridge), stderr drained concurrently (no more 64KB-pipe stall → 15-min timeout), SIGTERM→grace→SIGKILL escalation.
  • Privacy: Sentry extras now pass the same allowedDiagnosticTagKeys filter as tags; capture(error:context:) no longer stores under a key the sanitizer always drops; events.jsonl/reliability.jsonl rotate at 10MB by atomic rename (incl. a rotation-boundary event-loss fix found by an independent review pass).
  • Licensing: THIRD_PARTY_LICENSES.md with verbatim upstream-verified texts (eSpeak NG GPL-3.0 + corresponding-source note, Sparkle, Sentry, FluidAudio, MLX, swift-transformers, WhisperKit), bundled into the app by both builds; README scopes MIT to repo code.
  • Build system: build-deps.sh builds into staging and swaps atomically (a mid-build failure no longer strands the checkout), traps its multi-GB scratch tree, runs under set -euo pipefail; dev and beta builds share one swiftc arg source (scripts/entrypoints/lib/swiftc-app-args.sh) with a contract-test pin — the ScreenCaptureKit/sqlite3 divergence is resolved explicitly.
  • Test infra: preflight/matrix drift closed (4 missing rule families restored) plus a generic contract test that parses every matrix check; repo-hygiene syntax checks are globs (6 python + 2 ruby scripts were silently unchecked); run-tests.sh catches root test files outside the *Tests.swift glob; new swift-ci.yml builds + tests on PRs (continue-on-error pending first-run runner validation).
  • Tools: CLI --config actually applies (was a silent no-op; unsupported keys now error); MCP index propagates SQL write errors with rollback (no more half-indexed meetings marked indexed) and search returns real frontmatter titles.
  • Smaller fixes: CoreAudio CFString leak per device lookup (takeRetainedValue), Home row menus respect isEnabled (autoenablesItems = false), QA health gate requires macOS 26 (was 14.2), nightly-digest defaults no longer hardcode personal paths.
  • Docs: CHANGELOG caught up (1.1.44–1.1.47 from release history), stale cancelSession() ref removed, QA CLAUDE.md false threading claims fixed, codex-review gate defined in AGENTS.md, Apple Foundation Models provider documented.

How I checked it

  • scripts/dev/agent-preflight.sh
  • Selected checks from .agents/test-matrix.yml for the files changed (full union)
  • bash build.sh --no-open (incl. launch smoke + bundle perf gate)
  • bash run-tests.sh (4,717 tests)
  • Performance budget passed (bundle gate via build.sh)
  • bash run-integration-smoke.sh
  • swift test (388 tests) + swift test --package-path for CLI (45), MCP (74), QA
  • bash build-deps.sh --force (validates the new atomic staging logic end-to-end)
  • SKIP_NOTARIZATION=1 bash build-beta.sh '' redbars (release path with shared swiftc args + bundled licenses)
  • Manual check: independent Codex review pass over the full diff (PASS; its one P2 — rotation-boundary event loss — is fixed in the last commit)

Risk Review

  • Privacy / local-first behavior reviewed (changes tighten the off-device allowlist; nothing new leaves the device)
  • Storage path or migration impact reviewed (rename scan is now recursive but depth-bounded; every write still gated by frontmatter db_id match)
  • Public-facing copy stays concrete and matches current product scope
  • Release/update impact reviewed (no version bump; both builds now bundle THIRD_PARTY_LICENSES.md — DMG grows by ~72KB)
  • Agent PRs link the issue/workpad and stay draft until human review (draft; no linked issue — audit-driven)
  • UI changes include sanitized .agent-review/visuals/ evidence (n/a — the only UI change is an NSMenu enable-state fix with no visual delta)
  • No private transcripts, audio, tokens, personal paths, or customer data are included (the nightly-digest change removes hardcoded personal paths)

Notes

  • swift-ci.yml needs one validation run: if the hosted macos-26 image can't target arm64-apple-macos26.0, switch to a self-hosted runner per its header comment, then flip to a required check.
  • Decisions still open from the audit: deleting the 247 merged remote branches, TranscriptedSettingsView split timing (collides with the Home redesign), FluidAudio upgrade appetite.
  • Follow-ups spun off separately: CLI/MCP transcript-parser dedup, calendar queries off the main actor, save-time restyle off the main actor.
  • One audit finding was corrected rather than "fixed": meeting start already force-revalidates the cached system-audio permission, so no change was made there.

Agent handoff

COORD_DONE: GREEN | PR URL below | 12 commits: 2 data-integrity fixes, 1 perf/process fix, 2 privacy fixes, licensing, build/test-infra hardening, CI workflow, docs | recommend: validate swift-ci runner, then delete merged remote branches | decisions needed: branch cleanup, settings-split timing, FluidAudio upgrade | checks: full matrix union (build, 4717 fast, 388 core, smokes, beta build) | next: human review + first CI run

🤖 Generated with Claude Code

r3dbars and others added 13 commits June 10, 2026 20:45
Settings rename/merge used unscoped global string replacement across whole
transcripts, so renaming one "John" rewrote every same-named speaker's YAML
row, body labels, and wiki links in the file. Renames are now row-targeted in
frontmatter, fall back to channel-scoped label rewrites when another speaker
shares the display name, and fail closed (with a log) when the conflict is on
the same channel. The transcript scan is also recursive now, so renames reach
transcripts users organized into subfolders.

Covered by three new tests: shared-name no-bleed, same-channel fail-closed,
and subfolder scanning; the 33 existing rename/merge tests pin the
common-case behavior unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s it

retryFailedTranscription parked a no-op sentinel Task in activeTasks while the
actual inference ran in the caller's task. cancelAll() cancelled only the
sentinel: the retry's Parakeet/diarizer work kept running, and a subsequent
startTranscription passed the activeTasks.isEmpty guard — exactly the
concurrent single-instance-model contention the guard exists to prevent.

The retry body now runs inside the stored task (registered before the first
suspension point, same as before), so cancellation propagates into the
in-flight inference. New regression test blocks an engine until real
cancellation arrives — it cannot pass with the sentinel pattern.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ng on stderr

Gemma summary generation busy-waited on the cooperative thread pool for up to
15 minutes per run (Thread.sleep loop inside an async chain) and only drained
the child's stderr after exit, so a chatty runner filling the ~64KB pipe
buffer turned fast failures into full timeouts. Timeout/cancel also only
SIGTERMed the top-level nice/uv process.

Blocking model runs now execute on a dedicated dispatch queue with the
awaiting task's cancellation bridged in via an explicit probe, stderr drains
concurrently through a locked buffer, and termination escalates SIGTERM →
grace → SIGKILL so a hung MLX run cannot keep multi-GB memory pinned after
its summary was cancelled.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
captureObservabilityEvent forwarded the full merged event context as Sentry
extras, bypassing the curated allowedDiagnosticTagKeys that gates tags — a
free-text value under an innocuous key (or a device-name key, since "device"
is not a drop fragment) would leave the device after only key-drop + regex
redaction. Extras now carry the same allowlist-filtered subset as tags. The
capture(error:context:) parameter also landed under the key "context", which
the sanitizer's own fragment list always drops; it now lands under "detail"
so the parameter actually reaches Sentry.

events.jsonl and reliability.jsonl grew without bound (13MB/3.5MB on a real
install). Both now rotate by atomic rename at 10MB — one generation kept, no
full-file read, concurrent flock writers keep appending to the rotated file
instead of losing records.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
CoreAudio's get-rule returns a +1 CFString from AudioObjectGetPropertyData;
takeUnretainedValue leaked one string per device on every prewarm, device
change, and recording start. And HomeRowMoreMenuButton built its NSMenu
without autoenablesItems = false, so AppKit re-enabled every item whose
target responds to the action — disabled row actions rendered enabled and
silently no-opped.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…errors

toOfflineDiarizerConfig() returned .default unconditionally, silently
discarding all 16 user-supplied diarizer tuning fields; the 15 fields the
pinned FluidAudio exposes now map through (starting from .default, non-nil
only), and unsupported keys — including exclusiveSegments, which the pinned
FluidAudio predates — fail loudly with a ValidationError instead of being
dropped.

TranscriptedMCP's index swallowed every SQL write error including
BEGIN/COMMIT, so a failed insert left a meeting half-indexed but marked
indexed. Write helpers now throw, transactions roll back, and nothing is
marked indexed unless the whole batch commits — verified by tests that break
the schema mid-lifecycle and assert a later reconcile retries cleanly.
handleSearch also hydrates real frontmatter titles now instead of mangling
filenames ("Call_2026-03-26_16-04-11" no longer renders as "2026:03:26
16:04:11").

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
build-deps.sh deleted all four artifact dirs before building replacements, so
any mid-build failure stranded the checkout with no working build or tests
until a full successful rebuild; it also leaked a multi-GB SwiftPM scratch
tree per run (40 leftovers totaling ~85GB found). It now builds into
.deps-staging and swaps in by rename only after full success, traps scratch
cleanup on every exit path, and runs under set -euo pipefail. Verified by a
real --force rebuild.

The dev and release builds also kept two hand-maintained ~40-line swiftc
invocations that had already diverged (build.sh linked ScreenCaptureKit but
not sqlite3; build-beta.sh the reverse — each worked only via autolink).
Both now source scripts/entrypoints/lib/swiftc-app-args.sh for frameworks,
linker inputs, and the source list, expanded as quoted arrays so paths with
spaces can no longer split or inject flags. build.sh additionally gained
TRANSCRIPTED_SKIP_LAUNCH_SMOKE=1 for CI runners without a GUI session, and
both builds bundle THIRD_PARTY_LICENSES.md into Contents/Resources.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…guard

agent-preflight.sh claimed to mirror .agents/test-matrix.yml but had drifted:
no rules for the slow-pasteback smoke, packaged-app smoke, privacy-leak
sweep, or local-summary fixture, so agents trusting preflight skipped
matrix-required checks. The rules are restored and a new contract test parses
every check string out of the matrix and asserts preflight covers it — adding
a matrix rule without preflight coverage now fails the fast suite.

repo-hygiene.yml's hand-listed syntax checks (6 of 11 python and 2 of 3 ruby
scripts never checked) are replaced with find-based globs; the workflow pin
asserts the glob form. run-tests.sh also gains a guard for root test-shaped
files outside the *Tests.swift glob (FooTest.swift would previously neither
run nor trip the sync check), and a contract suite pins the shared swiftc
argument library so the dev/release builds cannot silently diverge again.
The fast-tests compile list registers ObservabilityLogRotation.swift.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The hygiene workflow never compiled a line of Swift, so merge safety for the
entire test pyramid rested on local discipline. This adds a PR workflow that
restores prebuilt deps from a cache keyed on the staleness inputs, builds the
app (launch smoke skipped — hosted runners have no reliable GUI session), and
runs the fast suite plus Core package tests.

Marked continue-on-error and runs-on macos-26 pending one green run: if the
hosted image cannot target arm64-apple-macos26.0, switch to a self-hosted
runner per the header comment, then flip this to a required check.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ch-up

The DMG shipped eSpeak NG (GPL-3.0-or-later), Sparkle, Sentry, and statically
linked FluidAudio/MLX/swift-transformers/WhisperKit with no license texts —
the only license in the tree was the repo's own MIT file, and README said
simply "MIT". THIRD_PARTY_LICENSES.md now carries verbatim license texts
fetched from each pinned upstream revision (with a corresponding-source note
for eSpeak NG), ships inside the app bundle via both build scripts, and
README scopes MIT to this repo's own code.

CHANGELOG was four shipped releases behind; 1.1.44-1.1.47 entries are
reconstructed from the GitHub releases and the PR history between the
Info.plist version bumps.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…efined

CLAUDE.md referenced a DictationSessionController.cancelSession() hook that
no longer exists anywhere in Sources. TranscriptedQA's CLAUDE.md claimed
validators run on background threads and SQLite readers use dedicated queues
(neither is true), and its health check passed on macOS 14.2+ machines that
cannot run the macOS-26-only app — the gate now requires 26. AGENTS.md's
mandated codex-review gate was defined nowhere; it now says what qualifies
and what to record. The Meeting docs note the shipping Apple Foundation
Models summary provider. generate-nightly-digest.py derives its repo/output
defaults from the script location instead of one machine's personal paths.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Found by an independent review pass over the branch: when a warning/error
event's preceding info-buffer flush pushed events.jsonl over the rotation
threshold, write() closed the handle — and the triggering event's own write
then no-opped against the nil handle, silently losing the highest-priority
record in the cycle. write() now re-prepares (which performs the rotation)
when it finds the handle closed mid-cycle.

Verified: full fast suite + bash build.sh --no-open both green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
First CI run validated the macos-26 runner (deps + app build green) and
exposed two pre-existing environment assumptions in the fast suite:

- DictationTranscriptWriter tests construct fixed -0500 instants and assert
  locally-formatted headings/filenames, so a UTC runner shifts times and the
  midnight-boundary filename. Reproduced locally under TZ=UTC; the workflow
  now pins TZ=America/Chicago to match the supported interactive environment.

- Two clipboard suites are wall-clock timing proofs (20/70ms observer
  windows, an elapsed-time floor) that shared-runner scheduler jitter makes
  unprovable. They now skip — loudly — only when
  TRANSCRIPTED_SKIP_TIMING_SENSITIVE_TESTS=1, which only the CI workflow
  sets; every local bash run-tests.sh still runs them.

Verified: TZ=UTC run reproduces exactly the CI failures; the CI-equivalent
env passes 4709/4709; the default local run still executes the full 4717.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant