Skip to content

refactor: extract shared TranscriptedCaptureKit for CLI and MCP tools#1073

Open
r3dbars wants to merge 2 commits into
mainfrom
claude/focused-aryabhata-a187f1
Open

refactor: extract shared TranscriptedCaptureKit for CLI and MCP tools#1073
r3dbars wants to merge 2 commits into
mainfrom
claude/focused-aryabhata-a187f1

Conversation

@r3dbars

@r3dbars r3dbars commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Why

Capture-library resolution and markdown transcript parsing were duplicated nearly verbatim between Tools/TranscriptedCLI (ContextStore.swift) and Tools/TranscriptedMCP (DataDirectories.swift + TranscriptLoader.swift), and drift had already started: MCP resolved symlinks before enumerating legacy candidates while the CLI did not, MCP only attached frontmatter speaker metadata to system speakers, and looksLikeCaptureMarkdown existed four times (plus a third copy of extractTitle hiding in ToolHandlers.swift). One implementation now serves both tools so future fixes land once.

Product Impact

  • Affects: agent artifacts
  • Lane: agent workflow
  • Why this matters: the CLI and MCP server are how agents read saved captures; silent drift between them means the same library resolves or parses differently depending on which tool an agent uses.

What changed

  • New Tools/TranscriptedCaptureKit local SPM package (dependency-free), consumed by both tools via .package(path: "../TranscriptedCaptureKit"):
    • CaptureLibraryResolver — full resolution chain (shared data dir, per-kind overrides, mcp-directories.json manifest, transcriptSaveLocation preference, defaults + legacy Draft/~/Documents/Transcripted fallback), returning sharedDataRoot so MCP derives its index dir
    • CaptureMarkdown — capture-markdown detection, directory probing, frontmatter title: extraction
    • CaptureMarkdownParser — frontmatter, speaker metadata, styled + legacy transcript entries, dictation day entries, producing superset models each tool maps into its own output types
  • Both tools keep their facade types (CLIContextDirectories, CLIContextStore, TranscriptedDataDirectories, TranscriptLoader), so public APIs are unchanged; ContextStore.swift drops from ~1,140 to ~480 lines
  • Drift unified to the safer variant of each: symlink-resolved legacy enumeration (MCP behavior) and metadata matching by system id or unique normalized name for all speakers (CLI behavior, a superset); both pinned by kit tests
  • run-e2e-smoke.sh pre-compiles the kit as a swiftmodule + static lib before its raw swiftc compile (same pattern as TranscriptedCore)
  • Fixes a pre-existing e2e smoke breakage on main (verified via git stash): SWIFT_SOURCES was missing Sources/Support/LocalMeetingSummaryPreferences.swift, which defines LocalMeetingSummaryProvider used by LocalMeetingSummarizer.swift
  • Verification map: new Tools/TranscriptedCaptureKit/** rule in .agents/test-matrix.yml + agent-preflight.sh (kit tests + both consumer suites + e2e smoke); docs updated (CLAUDE.md, Tools/README.md, tool CLAUDE.mds, docs/agent-onboarding.md, new kit CLAUDE.md)
  • One CLI test edit: testMalformedDurationFallsBackToZero asserts on parser source text; repointed at the kit file where the guards now live

How I checked it

  • scripts/dev/agent-preflight.sh (suggests the new rule union correctly)
  • Selected checks from .agents/test-matrix.yml for the files changed
  • bash build.sh --no-open (no Sources/** or root-test changes; not required by the matrix for these paths)
  • bash run-tests.sh (same)
  • Performance budget (not runtime-sensitive)
  • bash run-integration-smoke.sh (no Sources/Meeting/ or Sources/TranscriptedCore/ changes)
  • swift test for the core seam (root Package.swift untouched)
  • Manual check: swift test --package-path Tools/TranscriptedCaptureKit 19/19, swift test --package-path Tools/TranscriptedCLI 40/40, swift test --package-path Tools/TranscriptedMCP 68/68 (zero MCP test edits), bash run-e2e-smoke.sh green, swift build -c release --package-path Tools/TranscriptedMCP (the path build.sh uses for the bundled helper) green, transcripted-mcp --self-test against a temp TRANSCRIPTED_DATA_DIR resolves correctly through the kit

Risk Review

  • Privacy / local-first behavior reviewed (read-only refactor; no new data flows)
  • Storage path or migration impact reviewed (resolution order unchanged; behavior pinned by existing + new tests)
  • Public-facing copy stays concrete and matches current product scope
  • Release/update impact reviewed (build.sh/build-beta.sh build the MCP helper via swift build, which handles the path dependency transparently — verified release build)
  • Agent PRs link the issue/workpad and stay draft until human review (no linked issue; session-initiated)
  • UI changes include sanitized .agent-review/visuals/ evidence (no UI changes)
  • No private transcripts, audio, tokens, personal paths, or customer data are included

Notes

  • The e2e smoke was failing on unmodified main before this branch (missing source in SWIFT_SOURCES from the recent Gemma summary work); this PR fixes it since a green smoke was needed to verify the script change.
  • The app target still has its own near-duplicate detection in Sources/TranscriptedCore/Storage/TranscriptScanner.swift; left out of scope deliberately (build-system boundary) and flagged as a follow-up.

Agent handoff

COORD_DONE: GREEN | https://github.com/r3dbars/transcripted/pull/1073 | extracted shared TranscriptedCaptureKit, refactored CLI+MCP to consume it, fixed pre-existing e2e smoke breakage, updated test matrix + docs | none | none | kit 19/19, CLI 40/40, MCP 68/68, e2e smoke green, MCP release build + self-test green, preflight | review and merge

🤖 Generated with Claude Code

r3dbars and others added 2 commits June 11, 2026 08:41
Capture-library resolution, capture-markdown detection, and transcript
parsing were duplicated nearly verbatim between TranscriptedCLI
(ContextStore.swift) and TranscriptedMCP (DataDirectories.swift +
TranscriptLoader.swift), and had already drifted: MCP resolved symlinks
before enumerating legacy candidates while the CLI did not, MCP only
attached frontmatter speaker metadata to system speakers, and
looksLikeCaptureMarkdown existed four times (plus extractTitle three
times, counting the copy in ToolHandlers.swift).

Add Tools/TranscriptedCaptureKit, a dependency-free local SPM package
both tools consume via a relative path dependency:

- CaptureLibraryResolver: full resolution chain (shared data dir,
  per-kind overrides, mcp-directories.json manifest,
  transcriptSaveLocation preference, defaults + legacy fallback),
  returning sharedDataRoot so MCP can derive its index dir
- CaptureMarkdown: capture-markdown detection, directory probing,
  frontmatter title extraction
- CaptureMarkdownParser: frontmatter, speaker metadata, styled + legacy
  transcript entries, dictation day entries, into superset models each
  tool maps to its own output types

Both tools keep their existing facade types so public APIs and test
suites are unchanged. Drift resolved by taking the safer variant of
each: symlink-resolved enumeration (MCP behavior) and metadata matching
by system id or unique normalized name for all speakers (CLI behavior).

run-e2e-smoke.sh now pre-compiles the kit as a swiftmodule + static lib
before its raw swiftc compile, mirroring the TranscriptedCore pattern.
Also fixes a pre-existing smoke breakage on main: SWIFT_SOURCES was
missing Sources/Support/LocalMeetingSummaryPreferences.swift, which
defines LocalMeetingSummaryProvider used by LocalMeetingSummarizer.

Verification map gains a Tools/TranscriptedCaptureKit/** rule
(kit + both consumer suites + e2e smoke) in .agents/test-matrix.yml and
agent-preflight.sh; docs updated to match.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Core's CLAUDE.md (TranscriptFormatter/TranscriptFrontmatter) and the
dictation day-file notes now cross-reference the kit so a written-format
change updates the standalone tools' parsers in the same change instead
of drifting silently. Kit-side cross-reference already exists in
Tools/TranscriptedCaptureKit/CLAUDE.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant