Skip to content

Wire Compose Mode end-to-end with full test coverage#229

Open
FuJacob wants to merge 4 commits into
compose-interaction-modefrom
compose-mode-functionality
Open

Wire Compose Mode end-to-end with full test coverage#229
FuJacob wants to merge 4 commits into
compose-interaction-modefrom
compose-mode-functionality

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented May 25, 2026

Summary

Stacks on #121 to make Compose Mode actually functional. The parent PR put the plumbing in place but left the Tab handler stubbed with a "Compose Mode is selected. Draft generation will be enabled after the Compose request pipeline is installed." disabled state. This PR removes that stub and lights up the full RFC flow.

What changed:

  • First Tab in Compose Mode runs runComposeGeneration — bounded AX context walk via the new ComposeContextCollecting protocol, ComposeRequestFactory build, route to llama through SuggestionEngineRouter.generateCompose.
  • Second Tab on a ready draft calls SuggestionInserter.typeDraft, which chunks synthetic Unicode insertion and re-checks focus identity between chunks.
  • Esc / navigation / text mutation while the preview is visible (or while generation is in flight) cancel the session and the work controller.
  • Focus change while the preview is visible drops the session immediately instead of leaving a stale draft alive.
  • Mode flip in either direction resets interaction state through the existing settings-change path; the work controller cancels in-flight generation.
  • Model gate: RuntimeBootstrapModel.prepareForComposeMode auto-switches to tabby-depth-1 if installed and records the previous autocomplete model so restoreAutocompleteModel puts it back. CotabbyAppEnvironment wires the mode-change subscription so RuntimeBootstrapModel does not need to know about settings.
  • UI: Menu bar and Settings show "Compose requires the Open Source engine" and "Compose requires gemma-3n-E4B-it-Q4_K_M.gguf" banners when conditions aren't met. Model picker is disabled while Compose is active so the user cannot fight the auto-switch.

What is intentionally not in this PR:

  • A download CTA for tabby-depth-1. The model is not in RuntimeModelCatalog.downloadableModels yet, so the banner stops at "add it to your models folder" rather than wiring a button against a placeholder URL.
  • Productizing userTags. The snapshot field is still always []; this PR does not add a tag editor.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build
# ** BUILD SUCCEEDED **

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build-for-testing
# ** TEST BUILD SUCCEEDED **

xcodebuild test-without-building -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS'
# Executed 310 tests, with 0 failures (was 274 on the parent branch; 36 new tests in this PR)

swiftlint lint --quiet
# 3 pre-existing cyclomatic_complexity warnings (Prediction, GhostSuggestionLayout, ComposeContextCollector); no new violations

New tests:

  • ComposePromptRendererTests (9 cases) — prompt shape, optional sections, empty placeholder, final-instruction position.
  • ComposeTextNormalizerTests (9 cases) — prompt echo, chat tokens, markdown fences, leading labels, wrapping quotes, typed prefix echo, paragraph preservation, blank-line collapse.
  • ComposeRequestFactoryTests (7 cases) — clipping, clipboard gating, sampling minimums, prompt-preview parity, user profile carry-through.
  • SuggestionCoordinatorComposeTests (11 cases) — first-Tab triggers generation, second-Tab types draft, Esc/navigation/text mutation cancel session, focus change clears session, mode change clears session, empty draft returns to idle, engine failure surfaces failed state. Includes a full in-file fake suite for every coordinator collaborator.

Manual QA is still outstanding (see Risk section); this PR has not been exercised in a running build against GitHub / Gmail / Slack.

Linked issues

Refs #66, #67, #68, #69, #70 — Compose Mode RFC issues. Stacks on #121.

Risk / rollout notes

  • Stacked on WIP: Compose interaction mode #121. Do not merge this PR until the parent lands; the base ref is compose-interaction-mode, not main.
  • Compose typing is synthetic Unicode, the same primitive autocomplete already uses. InputSuppressionController accumulates suppression counts per chunk so Compose's own keystrokes don't get reinterpreted as user typing. If Compose typing ever feels off in a target app, the most likely culprit is suppression scope, not the inserter's chunking.
  • composeTypingShouldContinue is the single cancellation pivot: if the session is cleared OR the focused element identity changes, the next chunk is skipped. Reviewers should focus on whether the comparison fields (process, element, focusChangeSequence) catch every "user moved on" case in target apps.
  • Test infrastructure is now in place for full coordinator orchestration tests via ComposeContextCollecting and per-test fakes. Future PRs can lean on the same pattern for autocomplete coordinator tests, which currently have no orchestration coverage.
  • pbxproj: 4 new manual test entries.

Greptile Summary

This PR activates Compose Mode end-to-end by replacing the parent branch's stubbed Tab handler with a single-Tab streaming pipeline: Tab to AX context collection to ComposeRequestFactory to SuggestionEngineRouter.generateComposeStreaming to per-token insertion via suggestionInserter.insert. The 36 new tests cover streaming, overlay lifecycle, all cancellation triggers, and failure propagation using a self-contained in-file fake suite.

  • Streaming runtime: LlamaRuntimeCore.summarizeStreaming adds a per-token onToken callback with the same ephemeral-sequence and lifecycle-locking guarantees as summarize; LlamaRuntimeManager.streamUncached bridges it to AsyncThrowingStream with a termination-handler task cancel.
  • Cancellation coverage: Esc, navigation, text mutation, focus change, and mode flip all call cancelComposeWork; the composeStreamShouldContinue pivot in the for-await loop catches mid-stream field changes by comparing all three identity fields.
  • Model/engine gates: The hard per-filename GGUF gate in LlamaSuggestionEngine is removed in favour of an engine-type gate in SuggestionEngineRouter and advisory UI banners.

Confidence Score: 3/5

The Compose streaming loop is solid for the common path, but a field-change race during context collection can silently swallow a Tab in the new field.

The focus-change handler in handleSupportedSnapshot guards on activeComposeSession != nil, but compose work is in-flight before the session is created during composeContextCollector.collect. A focus change in that window falls through to the autocomplete path without calling cancelComposeWork, causing any Tab press in the new field to be silently swallowed while the lingering session is cleaned up.

SuggestionCoordinator+Input.swift (compose work not cancelled during pre-session context collection on field change) and SuggestionCoordinator+Compose.swift (engine compatibility not gated before starting work).

Important Files Changed

Filename Overview
Cotabby/App/Coordinators/SuggestionCoordinator+Compose.swift New file wiring the full Compose streaming loop: Tab triggers context collection, request build, and token-by-token insertion via the llama stream. Cancellation guards are thorough within the for-await loop, but startComposeGeneration lacks an early engine-compatibility check before starting work.
Cotabby/App/Coordinators/SuggestionCoordinator+Input.swift Adds compose-session handling to focus updates, but the guard only checks activeComposeSession != nil. In-flight compose work during context collection (before session starts) is not cancelled on field change, leaving a window where a Tab press in the new field is silently swallowed.
Cotabby/Services/Runtime/LlamaRuntimeCore.swift Adds summarizeStreaming with per-token onToken callback. Lifecycle locking, shutdown guard, and activeOperationCount tracking mirror the existing pattern correctly.
Cotabby/Services/Runtime/LlamaRuntimeManager.swift Adds streamUncached that verifies the prepared runtime, starts a detached sampling task, and bridges pieces to an AsyncThrowingStream. Termination handler cancels the task and the activeOperationCount defer protects shutdown.
Cotabby/Services/Runtime/LlamaSuggestionEngine.swift Removes the selectedModelSupportsCompose filename gate and adds generateComposeStreaming. The engine-type gate is present in SuggestionEngineRouter, but no in-path check remains for the wrong-model-loaded case.
Cotabby/Models/SuggestionSubsystemContracts.swift Adds generateComposeStreaming to SuggestionGenerating with a one-shot fallback default, showComposeProgress to SuggestionOverlayControlling, and the new ComposeContextCollecting protocol and ComposeContextCollectionResult value type.
CotabbyTests/SuggestionCoordinatorComposeTests.swift Comprehensive in-file fake suite covering streaming, overlay lifecycle, cancellation paths, and failure state. FakeSuggestionInserter.typeDraft always returns true without invoking shouldContinue, leaving mid-draft cancellation uncovered.
Cotabby/Services/UI/OverlayController.swift Adds showComposeProgress and ComposeProgressView for the Drafting pill. Sizing and positioning logic mirrors the existing compose preview approach.
Cotabby/Services/Runtime/SuggestionEngineRouter.swift Routes generateComposeStreaming to llama and throws a clear unavailable error for Apple Intelligence and mlxSwift engines.

Sequence Diagram

sequenceDiagram
    participant User
    participant Coordinator as SuggestionCoordinator
    participant ContextCollector as ComposeContextCollector
    participant Router as SuggestionEngineRouter
    participant RuntimeMgr as LlamaRuntimeManager
    participant Core as LlamaRuntimeCore
    participant Inserter as SuggestionInserter

    User->>Coordinator: Tab keypress (Compose mode)
    Coordinator->>Coordinator: startComposeGeneration()
    Coordinator->>Coordinator: "state = .generating"
    Coordinator->>Coordinator: showComposeProgress(Drafting)
    Coordinator->>ContextCollector: collect(for: context)
    ContextCollector-->>Coordinator: ComposeContextCollectionResult
    Coordinator->>Router: generateComposeStreaming(request)
    Router->>RuntimeMgr: streamUncached(prompt, options)
    RuntimeMgr->>Core: summarizeStreaming(prompt, options, onToken)
    Coordinator->>Coordinator: startComposeSession()
    Coordinator->>Coordinator: "state = .typing"
    loop Per token
        Core-->>RuntimeMgr: onToken(piece)
        RuntimeMgr-->>Router: continuation.yield(piece)
        Router-->>Coordinator: for-await piece
        Coordinator->>Coordinator: composeStreamShouldContinue?
        Coordinator->>Coordinator: hideOverlay (first token only)
        Coordinator->>Inserter: insert(piece)
        Coordinator->>Coordinator: updateComposeSession()
        Coordinator->>Coordinator: Task.yield()
    end
    Coordinator->>Coordinator: finishComposeStream()
    Coordinator->>Coordinator: clearComposeSession()
    Coordinator->>Coordinator: "state = .idle"

    alt Esc / navigation / text mutation / focus change
        User->>Coordinator: cancel event
        Coordinator->>Coordinator: cancelComposeWork()
        Coordinator->>Coordinator: workController.cancelAll()
        Coordinator->>Coordinator: clearComposeSession()
        Coordinator->>Coordinator: hideOverlay()
        Coordinator->>Coordinator: "state = .idle"
    end
Loading

Fix All in Codex Fix All in Claude Code

Reviews (4): Last reviewed commit: "Show a "Drafting…" indicator at the care..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

The compose-interaction-mode branch put the plumbing in place but left
the actual Tab handling stubbed with a "pipeline not installed" disabled
state. This makes Compose Mode functional.

Tab routing
- First Tab in Compose Mode kicks off `runComposeGeneration`, which
  walks the AX context, builds a `ComposeRequest`, and routes it through
  the engine (llama only).
- Second Tab on a ready draft calls `SuggestionInserter.typeDraft`,
  which chunks the synthetic Unicode insertion and re-checks focus
  identity between chunks.
- Esc, navigation, and text mutation events cancel any active session
  or in-flight work. `composeTypingShouldContinue` short-circuits the
  inserter when the session is cleared or focus shifts.

Focus + lifecycle
- A focus change while the preview is visible drops the session
  immediately rather than silently keeping a stale draft alive.
- `schedulePrediction` is a no-op in Compose Mode instead of emitting a
  disabled state; the user typing in their field should not surface a
  misleading "predictions disabled" message.
- Mode flips in either direction reset interaction state through the
  existing settings-change path; the work controller cancels in-flight
  generation.

Model gate
- `RuntimeBootstrapModel.prepareForComposeMode` auto-switches to
  `tabby-depth-1` if installed and records the user's previous
  autocomplete model so `restoreAutocompleteModel` puts it back on the
  flip back. Subscribed from `CotabbyAppEnvironment` so the mode setting
  drives the runtime without `RuntimeBootstrapModel` needing to know
  about settings.
- Menu bar + Settings show "Compose requires the Open Source engine"
  and "Compose requires gemma-3n-E4B-it-Q4_K_M.gguf" banners when those
  conditions aren't met; model picker is disabled while Compose is
  active so the user cannot fight the auto-switch.

Testability
- Introduced `ComposeContextCollecting` protocol so the coordinator can
  be exercised against an in-memory fake AX walker.
- Returned type is `ComposeContextCollectionResult` (top-level) instead
  of `ComposeContextCollector.Result` so fakes don't depend on the
  concrete collector.

Tests (36 new, 310 total, 0 failures)
- ComposePromptRendererTests: prompt shape, optional sections, empty
  placeholder, final instruction position.
- ComposeTextNormalizerTests: prompt echo, chat tokens, markdown
  fences, leading labels, wrapping quotes, typed prefix echo,
  paragraph preservation.
- ComposeRequestFactoryTests: clipping, clipboard gating, sampling
  minimums, prompt-preview parity, user profile carry-through.
- SuggestionCoordinatorComposeTests: first-Tab triggers generation,
  second-Tab types draft, Esc/navigation/text mutation cancel session,
  focus change clears session, mode change clears session, empty draft
  returns to idle, engine failure surfaces failed state. Includes full
  in-file fake suite for every coordinator collaborator.
Comment thread Cotabby/Models/RuntimeBootstrapModel.swift Outdated
Comment thread CotabbyTests/SuggestionCoordinatorComposeTests.swift
Comment thread Cotabby/App/Coordinators/SuggestionCoordinator+Input.swift
Comment thread Cotabby/App/Core/CotabbyAppEnvironment.swift Outdated
The RFC reserved Compose Mode for one specific model. In practice the
gate ended up being three layers of plumbing for a constraint we cannot
even download — `tabby-depth-1` (gemma-3n-E4B-it-Q4_K_M.gguf) isn't in
the model catalog yet. Stripping the gate lets any local llama model
draft a Compose response while we figure out which model actually ships.

Removed:
- `LlamaSuggestionEngine.generateCompose` model guard.
- `LlamaRuntimeManager.selectedModelSupportsCompose`.
- `RuntimeModelCatalog.composeRequiredFilename` and `supportsCompose`.
- `RuntimeBootstrapModel.prepareForComposeMode`, `restoreAutocompleteModel`,
  `isComposeRequiredModelInstalled`, and the pre-Compose model memory.
- The mode-change subscription in `CotabbyAppEnvironment` that drove the
  auto-switch.
- The "Compose requires tabby-depth-1" banner in menu bar and Settings.
- The model-picker disable that pinned the user to one filename in
  Compose Mode.

Kept the engine-level gate: Compose still routes to llama only, so
the "Compose requires the Open Source engine" banner stays. Apple
Intelligence and MLX would need their own `generateCompose` work.

310 tests, 0 failures.
Comment thread Cotabby/App/Coordinators/SuggestionCoordinator+Input.swift
Comment on lines 75 to 80
}

func generateCompose(for request: ComposeRequest) async throws -> ComposeResult {
guard runtimeManager.selectedModelSupportsCompose else {
throw SuggestionClientError.unavailable(
"Compose Mode requires tabby-depth-1. Select or download \(RuntimeModelCatalog.composeRequiredFilename)."
)
}

do {
let startTime = Date()
let prompt = ComposePromptRenderer.prompt(for: request)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Hard model-compatibility gate removed with no in-path replacement

The selectedModelSupportsCompose guard that previously threw SuggestionClientError.unavailable when the wrong GGUF was loaded has been removed. SuggestionAvailabilityEvaluator.disabledReason does not check engine or model-filename compatibility, so nothing in the generation call-path gates on this anymore. If tabby-depth-1 is not installed and RuntimeBootstrapModel.prepareForComposeMode cannot complete the auto-switch, generateCompose will silently run the Compose prompt against whatever autocomplete model is active and return garbled text without any error state — worse than the old failed state with a clear message. Since the only UI guard (from the model-requirements banner) is advisory, a hard check in generateCompose (or before dispatching the work in runComposeGeneration) is still needed for the case where the auto-switch is unavailable.

Fix in Codex Fix in Claude Code

FuJacob added 2 commits May 25, 2026 00:58
Old flow: press Tab in Compose Mode, wait several seconds, an overlay
popup eventually shows the whole draft, press Tab again to type it.
There was no visual feedback while the model was generating, so it felt
like nothing was happening.

New flow: press Tab once, the model starts sampling, and each piece is
typed into the focused field as it arrives. Esc / focus change / typing
cancels the stream. Already-typed characters stay where they are.

How it works
- `LlamaRuntimeCore.summarizeStreaming` runs the same ephemeral-sequence
  sampling as `summarize` but invokes an `onToken` callback per piece
  instead of accumulating into a single returned string.
- `LlamaRuntimeManager.streamUncached` wraps that in an
  `AsyncThrowingStream<String, Error>` after the runtime is prepared.
  The stream's `onTermination` cancels the detached sampling task, so
  workController.cancelAll() stops generation cleanly.
- `SuggestionGenerating.generateComposeStreaming` is added to the
  protocol with a one-shot default (call `generateCompose`, emit once).
  `LlamaSuggestionEngine` overrides it for real streaming. The router
  forwards llama-only the same way it does for `generateCompose`.
- `SuggestionCoordinator+Compose` consumes the stream on the main actor
  and calls `suggestionInserter.insert(piece)` per piece. The active
  `ActiveComposeSession` accumulates `fullText` so logs and diagnostics
  describe the full draft.
- Tab during streaming is absorbed without restarting (no double draft).
  Esc, focus change, typing, mode change all run `cancelComposeWork`,
  which cancels the work controller, clears the session, and lets the
  stream terminate.

Coordinator-level guards
- `composeStreamShouldContinue` checks the focused field identity
  before posting each piece. If focus shifts mid-stream, the loop
  breaks and no further synthetic keys reach the wrong app.
- Tabby's own synthetic key events are already absorbed by
  `InputSuppressionController` before they reach `handleInputEvent`,
  so the stream's own characters never trigger the
  textMutation-cancels-compose path.

Removed
- The two-step Tab `acceptComposeDraft` flow.
- `presentComposePreview` overlay path inside the coordinator (the
  overlay controller's `showComposePreview` stays for future preview
  modes).

Tests
- `SuggestionCoordinatorComposeTests` rewritten for streaming:
  - first Tab streams pieces into the field via the inserter
  - second Tab during streaming is absorbed (no duplicate stream)
  - Esc/navigation/text-mutation/focus-change/mode-change cancel
    cleanly and partial pieces stay typed
  - engine throw surfaces a `.failed` state
- The fake engine uses a `.blocked(initialPieces:)` behavior that
  yields a few pieces then parks until the underlying task is
  cancelled, so cancellation paths can be exercised deterministically.

309 tests, 0 failures.
The context walk plus model warm-up can take a beat before the first
streamed token lands. With nothing on screen during that window, Tab
felt like it did nothing. This adds a small "Drafting…" pill at the
caret the instant Tab registers; it disappears as soon as the first
token starts landing in the field.

- New `OverlayState.composeProgress(label:geometry:)` plus
  `SuggestionOverlayControlling.showComposeProgress`.
- `OverlayController` renders a compact `ComposeProgressView` (spinner +
  label) in a capsule, sized as a status chip rather than the larger
  preview box.
- `SuggestionOverlayPresenter.presentComposeProgress` is idempotent so
  re-presenting the same label/geometry does not churn the overlay log.
- Coordinator shows the pill when `startComposeGeneration` flips to
  `.generating`, and hides it on the first non-empty streamed piece.
  Cancellation / completion already hide the overlay.

Tests: indicator appears before the first token and the overlay ends
hidden after the stream completes. 311 tests, 0 failures.
Comment on lines +73 to +80
if let composeSession = interactionState.activeComposeSession {
let processChanged = composeSession.baseContext.processIdentifier != focusedContext.processIdentifier
let elementChanged = composeSession.baseContext.elementIdentifier != focusedContext.elementIdentifier
if processChanged || elementChanged {
cancelComposeWork(reason: "Compose cancelled because the focused field changed.")
}
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 In-flight compose work not cancelled when session hasn't started yet

The guard at line 73 only fires when activeComposeSession != nil, but compose work can be in-flight (state = .generating) before the session is started — specifically during composeContextCollector.collect(for:) inside runComposeStreaming. A focus change during that window falls through to the autocomplete path (lines 87–97), which calls cancelPredictionWork() and sets state = .idle but does not call cancelComposeWork. The compose task continues; once context collection finishes it calls startComposeSession, setting activeComposeSession. Any Tab press in the new field between session start and the composeStreamShouldContinue check (which does catch the field mismatch and breaks the loop) is silently consumed because handleComposeInputEvent(.acceptance) sees a non-nil session. Adding || isAnyComposeWorkInFlight to the guard condition (mirroring the check in handleComposeInputEvent) would close this window.

Fix in Codex Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant