Audit fixes: rename corruption, retry cancellation, privacy gaps, build/CI hardening#1071
Draft
r3dbars wants to merge 13 commits into
Draft
Audit fixes: rename corruption, retry cancellation, privacy gaps, build/CI hardening#1071r3dbars wants to merge 13 commits into
r3dbars wants to merge 13 commits into
Conversation
Settings rename/merge used unscoped global string replacement across whole transcripts, so renaming one "John" rewrote every same-named speaker's YAML row, body labels, and wiki links in the file. Renames are now row-targeted in frontmatter, fall back to channel-scoped label rewrites when another speaker shares the display name, and fail closed (with a log) when the conflict is on the same channel. The transcript scan is also recursive now, so renames reach transcripts users organized into subfolders. Covered by three new tests: shared-name no-bleed, same-channel fail-closed, and subfolder scanning; the 33 existing rename/merge tests pin the common-case behavior unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s it retryFailedTranscription parked a no-op sentinel Task in activeTasks while the actual inference ran in the caller's task. cancelAll() cancelled only the sentinel: the retry's Parakeet/diarizer work kept running, and a subsequent startTranscription passed the activeTasks.isEmpty guard — exactly the concurrent single-instance-model contention the guard exists to prevent. The retry body now runs inside the stored task (registered before the first suspension point, same as before), so cancellation propagates into the in-flight inference. New regression test blocks an engine until real cancellation arrives — it cannot pass with the sentinel pattern. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ng on stderr Gemma summary generation busy-waited on the cooperative thread pool for up to 15 minutes per run (Thread.sleep loop inside an async chain) and only drained the child's stderr after exit, so a chatty runner filling the ~64KB pipe buffer turned fast failures into full timeouts. Timeout/cancel also only SIGTERMed the top-level nice/uv process. Blocking model runs now execute on a dedicated dispatch queue with the awaiting task's cancellation bridged in via an explicit probe, stderr drains concurrently through a locked buffer, and termination escalates SIGTERM → grace → SIGKILL so a hung MLX run cannot keep multi-GB memory pinned after its summary was cancelled. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
captureObservabilityEvent forwarded the full merged event context as Sentry extras, bypassing the curated allowedDiagnosticTagKeys that gates tags — a free-text value under an innocuous key (or a device-name key, since "device" is not a drop fragment) would leave the device after only key-drop + regex redaction. Extras now carry the same allowlist-filtered subset as tags. The capture(error:context:) parameter also landed under the key "context", which the sanitizer's own fragment list always drops; it now lands under "detail" so the parameter actually reaches Sentry. events.jsonl and reliability.jsonl grew without bound (13MB/3.5MB on a real install). Both now rotate by atomic rename at 10MB — one generation kept, no full-file read, concurrent flock writers keep appending to the rotated file instead of losing records. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
CoreAudio's get-rule returns a +1 CFString from AudioObjectGetPropertyData; takeUnretainedValue leaked one string per device on every prewarm, device change, and recording start. And HomeRowMoreMenuButton built its NSMenu without autoenablesItems = false, so AppKit re-enabled every item whose target responds to the action — disabled row actions rendered enabled and silently no-opped. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…errors
toOfflineDiarizerConfig() returned .default unconditionally, silently
discarding all 16 user-supplied diarizer tuning fields; the 15 fields the
pinned FluidAudio exposes now map through (starting from .default, non-nil
only), and unsupported keys — including exclusiveSegments, which the pinned
FluidAudio predates — fail loudly with a ValidationError instead of being
dropped.
TranscriptedMCP's index swallowed every SQL write error including
BEGIN/COMMIT, so a failed insert left a meeting half-indexed but marked
indexed. Write helpers now throw, transactions roll back, and nothing is
marked indexed unless the whole batch commits — verified by tests that break
the schema mid-lifecycle and assert a later reconcile retries cleanly.
handleSearch also hydrates real frontmatter titles now instead of mangling
filenames ("Call_2026-03-26_16-04-11" no longer renders as "2026:03:26
16:04:11").
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
build-deps.sh deleted all four artifact dirs before building replacements, so any mid-build failure stranded the checkout with no working build or tests until a full successful rebuild; it also leaked a multi-GB SwiftPM scratch tree per run (40 leftovers totaling ~85GB found). It now builds into .deps-staging and swaps in by rename only after full success, traps scratch cleanup on every exit path, and runs under set -euo pipefail. Verified by a real --force rebuild. The dev and release builds also kept two hand-maintained ~40-line swiftc invocations that had already diverged (build.sh linked ScreenCaptureKit but not sqlite3; build-beta.sh the reverse — each worked only via autolink). Both now source scripts/entrypoints/lib/swiftc-app-args.sh for frameworks, linker inputs, and the source list, expanded as quoted arrays so paths with spaces can no longer split or inject flags. build.sh additionally gained TRANSCRIPTED_SKIP_LAUNCH_SMOKE=1 for CI runners without a GUI session, and both builds bundle THIRD_PARTY_LICENSES.md into Contents/Resources. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…guard agent-preflight.sh claimed to mirror .agents/test-matrix.yml but had drifted: no rules for the slow-pasteback smoke, packaged-app smoke, privacy-leak sweep, or local-summary fixture, so agents trusting preflight skipped matrix-required checks. The rules are restored and a new contract test parses every check string out of the matrix and asserts preflight covers it — adding a matrix rule without preflight coverage now fails the fast suite. repo-hygiene.yml's hand-listed syntax checks (6 of 11 python and 2 of 3 ruby scripts never checked) are replaced with find-based globs; the workflow pin asserts the glob form. run-tests.sh also gains a guard for root test-shaped files outside the *Tests.swift glob (FooTest.swift would previously neither run nor trip the sync check), and a contract suite pins the shared swiftc argument library so the dev/release builds cannot silently diverge again. The fast-tests compile list registers ObservabilityLogRotation.swift. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The hygiene workflow never compiled a line of Swift, so merge safety for the entire test pyramid rested on local discipline. This adds a PR workflow that restores prebuilt deps from a cache keyed on the staleness inputs, builds the app (launch smoke skipped — hosted runners have no reliable GUI session), and runs the fast suite plus Core package tests. Marked continue-on-error and runs-on macos-26 pending one green run: if the hosted image cannot target arm64-apple-macos26.0, switch to a self-hosted runner per the header comment, then flip this to a required check. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ch-up The DMG shipped eSpeak NG (GPL-3.0-or-later), Sparkle, Sentry, and statically linked FluidAudio/MLX/swift-transformers/WhisperKit with no license texts — the only license in the tree was the repo's own MIT file, and README said simply "MIT". THIRD_PARTY_LICENSES.md now carries verbatim license texts fetched from each pinned upstream revision (with a corresponding-source note for eSpeak NG), ships inside the app bundle via both build scripts, and README scopes MIT to this repo's own code. CHANGELOG was four shipped releases behind; 1.1.44-1.1.47 entries are reconstructed from the GitHub releases and the PR history between the Info.plist version bumps. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…efined CLAUDE.md referenced a DictationSessionController.cancelSession() hook that no longer exists anywhere in Sources. TranscriptedQA's CLAUDE.md claimed validators run on background threads and SQLite readers use dedicated queues (neither is true), and its health check passed on macOS 14.2+ machines that cannot run the macOS-26-only app — the gate now requires 26. AGENTS.md's mandated codex-review gate was defined nowhere; it now says what qualifies and what to record. The Meeting docs note the shipping Apple Foundation Models summary provider. generate-nightly-digest.py derives its repo/output defaults from the script location instead of one machine's personal paths. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Found by an independent review pass over the branch: when a warning/error event's preceding info-buffer flush pushed events.jsonl over the rotation threshold, write() closed the handle — and the triggering event's own write then no-opped against the nil handle, silently losing the highest-priority record in the cycle. write() now re-prepares (which performs the rotation) when it finds the handle closed mid-cycle. Verified: full fast suite + bash build.sh --no-open both green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
First CI run validated the macos-26 runner (deps + app build green) and exposed two pre-existing environment assumptions in the fast suite: - DictationTranscriptWriter tests construct fixed -0500 instants and assert locally-formatted headings/filenames, so a UTC runner shifts times and the midnight-boundary filename. Reproduced locally under TZ=UTC; the workflow now pins TZ=America/Chicago to match the supported interactive environment. - Two clipboard suites are wall-clock timing proofs (20/70ms observer windows, an elapsed-time floor) that shared-runner scheduler jitter makes unprovable. They now skip — loudly — only when TRANSCRIPTED_SKIP_TIMING_SENSITIVE_TESTS=1, which only the CI workflow sets; every local bash run-tests.sh still runs them. Verified: TZ=UTC run reproduces exactly the CI failures; the CI-equivalent env passes 4709/4709; the default local run still executes the full 4717. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A deep multi-agent repo audit (8 subsystem maps, 8 audit dimensions, adversarially verified findings) surfaced a cluster of correctness, privacy, licensing, and build-infrastructure issues. This PR executes everything from the audit's task plan that didn't need a product decision: the Milestone 0/1 correctness and privacy fixes, the quick wins, and the high-leverage build/test-infra items.
Product Impact
meetings+dictation+agent artifacts+ docsmeeting reliability(plusrelease opsfor the build/CI changes)What changed
RetroactiveSpeakerUpdater). 3 new tests; 33 existing tests pin common-case behavior unchanged.activeTasks(was a no-op sentinel), socancelAll()reaches in-flight inference and the single-pipeline guard holds. New regression test can't pass with the sentinel pattern.allowedDiagnosticTagKeysfilter as tags;capture(error:context:)no longer stores under a key the sanitizer always drops;events.jsonl/reliability.jsonlrotate at 10MB by atomic rename (incl. a rotation-boundary event-loss fix found by an independent review pass).THIRD_PARTY_LICENSES.mdwith verbatim upstream-verified texts (eSpeak NG GPL-3.0 + corresponding-source note, Sparkle, Sentry, FluidAudio, MLX, swift-transformers, WhisperKit), bundled into the app by both builds; README scopes MIT to repo code.build-deps.shbuilds into staging and swaps atomically (a mid-build failure no longer strands the checkout), traps its multi-GB scratch tree, runs underset -euo pipefail; dev and beta builds share one swiftc arg source (scripts/entrypoints/lib/swiftc-app-args.sh) with a contract-test pin — the ScreenCaptureKit/sqlite3 divergence is resolved explicitly.run-tests.shcatches root test files outside the*Tests.swiftglob; newswift-ci.ymlbuilds + tests on PRs (continue-on-error pending first-run runner validation).--configactually applies (was a silent no-op; unsupported keys now error); MCP index propagates SQL write errors with rollback (no more half-indexed meetings marked indexed) andsearchreturns real frontmatter titles.takeRetainedValue), Home row menus respectisEnabled(autoenablesItems = false), QA health gate requires macOS 26 (was 14.2), nightly-digest defaults no longer hardcode personal paths.cancelSession()ref removed, QA CLAUDE.md false threading claims fixed,codex-reviewgate defined in AGENTS.md, Apple Foundation Models provider documented.How I checked it
scripts/dev/agent-preflight.sh.agents/test-matrix.ymlfor the files changed (full union)bash build.sh --no-open(incl. launch smoke + bundle perf gate)bash run-tests.sh(4,717 tests)bash run-integration-smoke.shswift test(388 tests) +swift test --package-pathfor CLI (45), MCP (74), QAbash build-deps.sh --force(validates the new atomic staging logic end-to-end)SKIP_NOTARIZATION=1 bash build-beta.sh '' redbars(release path with shared swiftc args + bundled licenses)Risk Review
.agent-review/visuals/evidence (n/a — the only UI change is an NSMenu enable-state fix with no visual delta)Notes
swift-ci.ymlneeds one validation run: if the hostedmacos-26image can't targetarm64-apple-macos26.0, switch to a self-hosted runner per its header comment, then flip to a required check.TranscriptedSettingsViewsplit timing (collides with the Home redesign), FluidAudio upgrade appetite.Agent handoff
COORD_DONE: GREEN | PR URL below | 12 commits: 2 data-integrity fixes, 1 perf/process fix, 2 privacy fixes, licensing, build/test-infra hardening, CI workflow, docs | recommend: validate swift-ci runner, then delete merged remote branches | decisions needed: branch cleanup, settings-split timing, FluidAudio upgrade | checks: full matrix union (build, 4717 fast, 388 core, smokes, beta build) | next: human review + first CI run🤖 Generated with Claude Code