Audit fixes: rename corruption, retry cancellation, privacy gaps, build/CI hardening by r3dbars · Pull Request #1071 · r3dbars/transcripted

r3dbars · 2026-06-11T13:42:06Z

Why

A deep multi-agent repo audit (8 subsystem maps, 8 audit dimensions, adversarially verified findings) surfaced a cluster of correctness, privacy, licensing, and build-infrastructure issues. This PR executes everything from the audit's task plan that didn't need a product decision: the Milestone 0/1 correctness and privacy fixes, the quick wins, and the high-leverage build/test-infra items.

Product Impact

Affects: meetings + dictation + agent artifacts + docs
Lane: meeting reliability (plus release ops for the build/CI changes)
Why this matters: two user-data bugs (speaker rename could corrupt transcripts of same-named speakers; cancelled retries kept running and could contend with fresh pipelines), one privacy gap (Sentry extras bypassed the diagnostic allowlist), a license compliance hole in the shipped DMG (GPL eSpeak NG with zero attribution), and a build system where the dev and shipped builds had already silently diverged.

What changed

Speaker rename/merge is row-targeted in frontmatter, channel-scoped when display names collide, fails closed when truly ambiguous, and scans subfolders (RetroactiveSpeakerUpdater). 3 new tests; 33 existing tests pin common-case behavior unchanged.
Retry cancellation: the retry's real work is registered in activeTasks (was a no-op sentinel), so cancelAll() reaches in-flight inference and the single-pipeline guard holds. New regression test can't pass with the sentinel pattern.
Local summaries: model runs moved off the Swift-concurrency cooperative pool (dedicated queue + cancellation bridge), stderr drained concurrently (no more 64KB-pipe stall → 15-min timeout), SIGTERM→grace→SIGKILL escalation.
Privacy: Sentry extras now pass the same allowedDiagnosticTagKeys filter as tags; capture(error:context:) no longer stores under a key the sanitizer always drops; events.jsonl/reliability.jsonl rotate at 10MB by atomic rename (incl. a rotation-boundary event-loss fix found by an independent review pass).
Licensing: THIRD_PARTY_LICENSES.md with verbatim upstream-verified texts (eSpeak NG GPL-3.0 + corresponding-source note, Sparkle, Sentry, FluidAudio, MLX, swift-transformers, WhisperKit), bundled into the app by both builds; README scopes MIT to repo code.
Build system: build-deps.sh builds into staging and swaps atomically (a mid-build failure no longer strands the checkout), traps its multi-GB scratch tree, runs under set -euo pipefail; dev and beta builds share one swiftc arg source (scripts/entrypoints/lib/swiftc-app-args.sh) with a contract-test pin — the ScreenCaptureKit/sqlite3 divergence is resolved explicitly.
Test infra: preflight/matrix drift closed (4 missing rule families restored) plus a generic contract test that parses every matrix check; repo-hygiene syntax checks are globs (6 python + 2 ruby scripts were silently unchecked); run-tests.sh catches root test files outside the *Tests.swift glob; new swift-ci.yml builds + tests on PRs (continue-on-error pending first-run runner validation).
Tools: CLI --config actually applies (was a silent no-op; unsupported keys now error); MCP index propagates SQL write errors with rollback (no more half-indexed meetings marked indexed) and search returns real frontmatter titles.
Smaller fixes: CoreAudio CFString leak per device lookup (takeRetainedValue), Home row menus respect isEnabled (autoenablesItems = false), QA health gate requires macOS 26 (was 14.2), nightly-digest defaults no longer hardcode personal paths.
Docs: CHANGELOG caught up (1.1.44–1.1.47 from release history), stale cancelSession() ref removed, QA CLAUDE.md false threading claims fixed, codex-review gate defined in AGENTS.md, Apple Foundation Models provider documented.

How I checked it

Risk Review

Privacy / local-first behavior reviewed (changes tighten the off-device allowlist; nothing new leaves the device)
Storage path or migration impact reviewed (rename scan is now recursive but depth-bounded; every write still gated by frontmatter db_id match)
Public-facing copy stays concrete and matches current product scope
Release/update impact reviewed (no version bump; both builds now bundle THIRD_PARTY_LICENSES.md — DMG grows by ~72KB)
Agent PRs link the issue/workpad and stay draft until human review (draft; no linked issue — audit-driven)
UI changes include sanitized .agent-review/visuals/ evidence (n/a — the only UI change is an NSMenu enable-state fix with no visual delta)
No private transcripts, audio, tokens, personal paths, or customer data are included (the nightly-digest change removes hardcoded personal paths)

Notes

swift-ci.yml needs one validation run: if the hosted macos-26 image can't target arm64-apple-macos26.0, switch to a self-hosted runner per its header comment, then flip to a required check.
Decisions still open from the audit: deleting the 247 merged remote branches, TranscriptedSettingsView split timing (collides with the Home redesign), FluidAudio upgrade appetite.
Follow-ups spun off separately: CLI/MCP transcript-parser dedup, calendar queries off the main actor, save-time restyle off the main actor.
One audit finding was corrected rather than "fixed": meeting start already force-revalidates the cached system-audio permission, so no change was made there.

Agent handoff

COORD_DONE: GREEN | PR URL below | 12 commits: 2 data-integrity fixes, 1 perf/process fix, 2 privacy fixes, licensing, build/test-infra hardening, CI workflow, docs | recommend: validate swift-ci runner, then delete merged remote branches | decisions needed: branch cleanup, settings-split timing, FluidAudio upgrade | checks: full matrix union (build, 4717 fast, 388 core, smokes, beta build) | next: human review + first CI run

🤖 Generated with Claude Code

Settings rename/merge used unscoped global string replacement across whole transcripts, so renaming one "John" rewrote every same-named speaker's YAML row, body labels, and wiki links in the file. Renames are now row-targeted in frontmatter, fall back to channel-scoped label rewrites when another speaker shares the display name, and fail closed (with a log) when the conflict is on the same channel. The transcript scan is also recursive now, so renames reach transcripts users organized into subfolders. Covered by three new tests: shared-name no-bleed, same-channel fail-closed, and subfolder scanning; the 33 existing rename/merge tests pin the common-case behavior unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…s it retryFailedTranscription parked a no-op sentinel Task in activeTasks while the actual inference ran in the caller's task. cancelAll() cancelled only the sentinel: the retry's Parakeet/diarizer work kept running, and a subsequent startTranscription passed the activeTasks.isEmpty guard — exactly the concurrent single-instance-model contention the guard exists to prevent. The retry body now runs inside the stored task (registered before the first suspension point, same as before), so cancellation propagates into the in-flight inference. New regression test blocks an engine until real cancellation arrives — it cannot pass with the sentinel pattern. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ng on stderr Gemma summary generation busy-waited on the cooperative thread pool for up to 15 minutes per run (Thread.sleep loop inside an async chain) and only drained the child's stderr after exit, so a chatty runner filling the ~64KB pipe buffer turned fast failures into full timeouts. Timeout/cancel also only SIGTERMed the top-level nice/uv process. Blocking model runs now execute on a dedicated dispatch queue with the awaiting task's cancellation bridged in via an explicit probe, stderr drains concurrently through a locked buffer, and termination escalates SIGTERM → grace → SIGKILL so a hung MLX run cannot keep multi-GB memory pinned after its summary was cancelled. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

captureObservabilityEvent forwarded the full merged event context as Sentry extras, bypassing the curated allowedDiagnosticTagKeys that gates tags — a free-text value under an innocuous key (or a device-name key, since "device" is not a drop fragment) would leave the device after only key-drop + regex redaction. Extras now carry the same allowlist-filtered subset as tags. The capture(error:context:) parameter also landed under the key "context", which the sanitizer's own fragment list always drops; it now lands under "detail" so the parameter actually reaches Sentry. events.jsonl and reliability.jsonl grew without bound (13MB/3.5MB on a real install). Both now rotate by atomic rename at 10MB — one generation kept, no full-file read, concurrent flock writers keep appending to the rotated file instead of losing records. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

CoreAudio's get-rule returns a +1 CFString from AudioObjectGetPropertyData; takeUnretainedValue leaked one string per device on every prewarm, device change, and recording start. And HomeRowMoreMenuButton built its NSMenu without autoenablesItems = false, so AppKit re-enabled every item whose target responds to the action — disabled row actions rendered enabled and silently no-opped. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…errors toOfflineDiarizerConfig() returned .default unconditionally, silently discarding all 16 user-supplied diarizer tuning fields; the 15 fields the pinned FluidAudio exposes now map through (starting from .default, non-nil only), and unsupported keys — including exclusiveSegments, which the pinned FluidAudio predates — fail loudly with a ValidationError instead of being dropped. TranscriptedMCP's index swallowed every SQL write error including BEGIN/COMMIT, so a failed insert left a meeting half-indexed but marked indexed. Write helpers now throw, transactions roll back, and nothing is marked indexed unless the whole batch commits — verified by tests that break the schema mid-lifecycle and assert a later reconcile retries cleanly. handleSearch also hydrates real frontmatter titles now instead of mangling filenames ("Call_2026-03-26_16-04-11" no longer renders as "2026:03:26 16:04:11"). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

build-deps.sh deleted all four artifact dirs before building replacements, so any mid-build failure stranded the checkout with no working build or tests until a full successful rebuild; it also leaked a multi-GB SwiftPM scratch tree per run (40 leftovers totaling ~85GB found). It now builds into .deps-staging and swaps in by rename only after full success, traps scratch cleanup on every exit path, and runs under set -euo pipefail. Verified by a real --force rebuild. The dev and release builds also kept two hand-maintained ~40-line swiftc invocations that had already diverged (build.sh linked ScreenCaptureKit but not sqlite3; build-beta.sh the reverse — each worked only via autolink). Both now source scripts/entrypoints/lib/swiftc-app-args.sh for frameworks, linker inputs, and the source list, expanded as quoted arrays so paths with spaces can no longer split or inject flags. build.sh additionally gained TRANSCRIPTED_SKIP_LAUNCH_SMOKE=1 for CI runners without a GUI session, and both builds bundle THIRD_PARTY_LICENSES.md into Contents/Resources. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…guard agent-preflight.sh claimed to mirror .agents/test-matrix.yml but had drifted: no rules for the slow-pasteback smoke, packaged-app smoke, privacy-leak sweep, or local-summary fixture, so agents trusting preflight skipped matrix-required checks. The rules are restored and a new contract test parses every check string out of the matrix and asserts preflight covers it — adding a matrix rule without preflight coverage now fails the fast suite. repo-hygiene.yml's hand-listed syntax checks (6 of 11 python and 2 of 3 ruby scripts never checked) are replaced with find-based globs; the workflow pin asserts the glob form. run-tests.sh also gains a guard for root test-shaped files outside the *Tests.swift glob (FooTest.swift would previously neither run nor trip the sync check), and a contract suite pins the shared swiftc argument library so the dev/release builds cannot silently diverge again. The fast-tests compile list registers ObservabilityLogRotation.swift. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The hygiene workflow never compiled a line of Swift, so merge safety for the entire test pyramid rested on local discipline. This adds a PR workflow that restores prebuilt deps from a cache keyed on the staleness inputs, builds the app (launch smoke skipped — hosted runners have no reliable GUI session), and runs the fast suite plus Core package tests. Marked continue-on-error and runs-on macos-26 pending one green run: if the hosted image cannot target arm64-apple-macos26.0, switch to a self-hosted runner per the header comment, then flip this to a required check. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ch-up The DMG shipped eSpeak NG (GPL-3.0-or-later), Sparkle, Sentry, and statically linked FluidAudio/MLX/swift-transformers/WhisperKit with no license texts — the only license in the tree was the repo's own MIT file, and README said simply "MIT". THIRD_PARTY_LICENSES.md now carries verbatim license texts fetched from each pinned upstream revision (with a corresponding-source note for eSpeak NG), ships inside the app bundle via both build scripts, and README scopes MIT to this repo's own code. CHANGELOG was four shipped releases behind; 1.1.44-1.1.47 entries are reconstructed from the GitHub releases and the PR history between the Info.plist version bumps. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…efined CLAUDE.md referenced a DictationSessionController.cancelSession() hook that no longer exists anywhere in Sources. TranscriptedQA's CLAUDE.md claimed validators run on background threads and SQLite readers use dedicated queues (neither is true), and its health check passed on macOS 14.2+ machines that cannot run the macOS-26-only app — the gate now requires 26. AGENTS.md's mandated codex-review gate was defined nowhere; it now says what qualifies and what to record. The Meeting docs note the shipping Apple Foundation Models summary provider. generate-nightly-digest.py derives its repo/output defaults from the script location instead of one machine's personal paths. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Found by an independent review pass over the branch: when a warning/error event's preceding info-buffer flush pushed events.jsonl over the rotation threshold, write() closed the handle — and the triggering event's own write then no-opped against the nil handle, silently losing the highest-priority record in the cycle. write() now re-prepares (which performs the rotation) when it finds the handle closed mid-cycle. Verified: full fast suite + bash build.sh --no-open both green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

First CI run validated the macos-26 runner (deps + app build green) and exposed two pre-existing environment assumptions in the fast suite: - DictationTranscriptWriter tests construct fixed -0500 instants and assert locally-formatted headings/filenames, so a UTC runner shifts times and the midnight-boundary filename. Reproduced locally under TZ=UTC; the workflow now pins TZ=America/Chicago to match the supported interactive environment. - Two clipboard suites are wall-clock timing proofs (20/70ms observer windows, an elapsed-time floor) that shared-runner scheduler jitter makes unprovable. They now skip — loudly — only when TRANSCRIPTED_SKIP_TIMING_SENSITIVE_TESTS=1, which only the CI workflow sets; every local bash run-tests.sh still runs them. Verified: TZ=UTC run reproduces exactly the CI failures; the CI-equivalent env passes 4709/4709; the default local run still executes the full 4717. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

r3dbars and others added 13 commits June 10, 2026 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit fixes: rename corruption, retry cancellation, privacy gaps, build/CI hardening#1071

Audit fixes: rename corruption, retry cancellation, privacy gaps, build/CI hardening#1071
r3dbars wants to merge 13 commits into
mainfrom
claude/heuristic-wing-fc0954

r3dbars commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

r3dbars commented Jun 11, 2026

Why

Product Impact

What changed

How I checked it

Risk Review

Notes

Agent handoff

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant