Wire Compose Mode end-to-end with full test coverage by FuJacob · Pull Request #229 · FuJacob/cotabby

FuJacob · 2026-05-25T06:35:49Z

Summary

Stacks on #121 to make Compose Mode actually functional. The parent PR put the plumbing in place but left the Tab handler stubbed with a "Compose Mode is selected. Draft generation will be enabled after the Compose request pipeline is installed." disabled state. This PR removes that stub and lights up the full RFC flow.

What changed:

First Tab in Compose Mode runs runComposeGeneration — bounded AX context walk via the new ComposeContextCollecting protocol, ComposeRequestFactory build, route to llama through SuggestionEngineRouter.generateCompose.
Second Tab on a ready draft calls SuggestionInserter.typeDraft, which chunks synthetic Unicode insertion and re-checks focus identity between chunks.
Esc / navigation / text mutation while the preview is visible (or while generation is in flight) cancel the session and the work controller.
Focus change while the preview is visible drops the session immediately instead of leaving a stale draft alive.
Mode flip in either direction resets interaction state through the existing settings-change path; the work controller cancels in-flight generation.
Model gate: RuntimeBootstrapModel.prepareForComposeMode auto-switches to tabby-depth-1 if installed and records the previous autocomplete model so restoreAutocompleteModel puts it back. CotabbyAppEnvironment wires the mode-change subscription so RuntimeBootstrapModel does not need to know about settings.
UI: Menu bar and Settings show "Compose requires the Open Source engine" and "Compose requires gemma-3n-E4B-it-Q4_K_M.gguf" banners when conditions aren't met. Model picker is disabled while Compose is active so the user cannot fight the auto-switch.

What is intentionally not in this PR:

A download CTA for tabby-depth-1. The model is not in RuntimeModelCatalog.downloadableModels yet, so the banner stops at "add it to your models folder" rather than wiring a button against a placeholder URL.
Productizing userTags. The snapshot field is still always []; this PR does not add a tag editor.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build
# ** BUILD SUCCEEDED **

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build-for-testing
# ** TEST BUILD SUCCEEDED **

xcodebuild test-without-building -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS'
# Executed 310 tests, with 0 failures (was 274 on the parent branch; 36 new tests in this PR)

swiftlint lint --quiet
# 3 pre-existing cyclomatic_complexity warnings (Prediction, GhostSuggestionLayout, ComposeContextCollector); no new violations

New tests:

ComposePromptRendererTests (9 cases) — prompt shape, optional sections, empty placeholder, final-instruction position.
ComposeTextNormalizerTests (9 cases) — prompt echo, chat tokens, markdown fences, leading labels, wrapping quotes, typed prefix echo, paragraph preservation, blank-line collapse.
ComposeRequestFactoryTests (7 cases) — clipping, clipboard gating, sampling minimums, prompt-preview parity, user profile carry-through.
SuggestionCoordinatorComposeTests (11 cases) — first-Tab triggers generation, second-Tab types draft, Esc/navigation/text mutation cancel session, focus change clears session, mode change clears session, empty draft returns to idle, engine failure surfaces failed state. Includes a full in-file fake suite for every coordinator collaborator.

Manual QA is still outstanding (see Risk section); this PR has not been exercised in a running build against GitHub / Gmail / Slack.

Linked issues

Refs #66, #67, #68, #69, #70 — Compose Mode RFC issues. Stacks on #121.

Risk / rollout notes

Stacked on WIP: Compose interaction mode #121. Do not merge this PR until the parent lands; the base ref is compose-interaction-mode, not main.
Compose typing is synthetic Unicode, the same primitive autocomplete already uses. InputSuppressionController accumulates suppression counts per chunk so Compose's own keystrokes don't get reinterpreted as user typing. If Compose typing ever feels off in a target app, the most likely culprit is suppression scope, not the inserter's chunking.
composeTypingShouldContinue is the single cancellation pivot: if the session is cleared OR the focused element identity changes, the next chunk is skipped. Reviewers should focus on whether the comparison fields (process, element, focusChangeSequence) catch every "user moved on" case in target apps.
Test infrastructure is now in place for full coordinator orchestration tests via ComposeContextCollecting and per-test fakes. Future PRs can lean on the same pattern for autocomplete coordinator tests, which currently have no orchestration coverage.
pbxproj: 4 new manual test entries.

Greptile Summary

This PR activates Compose Mode end-to-end by replacing the parent branch's stubbed Tab handler with a single-Tab streaming pipeline: Tab to AX context collection to ComposeRequestFactory to SuggestionEngineRouter.generateComposeStreaming to per-token insertion via suggestionInserter.insert. The 36 new tests cover streaming, overlay lifecycle, all cancellation triggers, and failure propagation using a self-contained in-file fake suite.

Streaming runtime: LlamaRuntimeCore.summarizeStreaming adds a per-token onToken callback with the same ephemeral-sequence and lifecycle-locking guarantees as summarize; LlamaRuntimeManager.streamUncached bridges it to AsyncThrowingStream with a termination-handler task cancel.
Cancellation coverage: Esc, navigation, text mutation, focus change, and mode flip all call cancelComposeWork; the composeStreamShouldContinue pivot in the for-await loop catches mid-stream field changes by comparing all three identity fields.
Model/engine gates: The hard per-filename GGUF gate in LlamaSuggestionEngine is removed in favour of an engine-type gate in SuggestionEngineRouter and advisory UI banners.

Confidence Score: 3/5

The Compose streaming loop is solid for the common path, but a field-change race during context collection can silently swallow a Tab in the new field.

The focus-change handler in handleSupportedSnapshot guards on activeComposeSession != nil, but compose work is in-flight before the session is created during composeContextCollector.collect. A focus change in that window falls through to the autocomplete path without calling cancelComposeWork, causing any Tab press in the new field to be silently swallowed while the lingering session is cleaned up.

SuggestionCoordinator+Input.swift (compose work not cancelled during pre-session context collection on field change) and SuggestionCoordinator+Compose.swift (engine compatibility not gated before starting work).

Important Files Changed

Filename	Overview
Cotabby/App/Coordinators/SuggestionCoordinator+Compose.swift	New file wiring the full Compose streaming loop: Tab triggers context collection, request build, and token-by-token insertion via the llama stream. Cancellation guards are thorough within the for-await loop, but `startComposeGeneration` lacks an early engine-compatibility check before starting work.
Cotabby/App/Coordinators/SuggestionCoordinator+Input.swift	Adds compose-session handling to focus updates, but the guard only checks `activeComposeSession != nil`. In-flight compose work during context collection (before session starts) is not cancelled on field change, leaving a window where a Tab press in the new field is silently swallowed.
Cotabby/Services/Runtime/LlamaRuntimeCore.swift	Adds `summarizeStreaming` with per-token `onToken` callback. Lifecycle locking, shutdown guard, and `activeOperationCount` tracking mirror the existing pattern correctly.
Cotabby/Services/Runtime/LlamaRuntimeManager.swift	Adds `streamUncached` that verifies the prepared runtime, starts a detached sampling task, and bridges pieces to an `AsyncThrowingStream`. Termination handler cancels the task and the `activeOperationCount` defer protects shutdown.
Cotabby/Services/Runtime/LlamaSuggestionEngine.swift	Removes the `selectedModelSupportsCompose` filename gate and adds `generateComposeStreaming`. The engine-type gate is present in `SuggestionEngineRouter`, but no in-path check remains for the wrong-model-loaded case.
Cotabby/Models/SuggestionSubsystemContracts.swift	Adds `generateComposeStreaming` to `SuggestionGenerating` with a one-shot fallback default, `showComposeProgress` to `SuggestionOverlayControlling`, and the new `ComposeContextCollecting` protocol and `ComposeContextCollectionResult` value type.
CotabbyTests/SuggestionCoordinatorComposeTests.swift	Comprehensive in-file fake suite covering streaming, overlay lifecycle, cancellation paths, and failure state. `FakeSuggestionInserter.typeDraft` always returns `true` without invoking `shouldContinue`, leaving mid-draft cancellation uncovered.
Cotabby/Services/UI/OverlayController.swift	Adds `showComposeProgress` and `ComposeProgressView` for the Drafting pill. Sizing and positioning logic mirrors the existing compose preview approach.
Cotabby/Services/Runtime/SuggestionEngineRouter.swift	Routes `generateComposeStreaming` to llama and throws a clear unavailable error for Apple Intelligence and mlxSwift engines.

Sequence Diagram

sequenceDiagram
    participant User
    participant Coordinator as SuggestionCoordinator
    participant ContextCollector as ComposeContextCollector
    participant Router as SuggestionEngineRouter
    participant RuntimeMgr as LlamaRuntimeManager
    participant Core as LlamaRuntimeCore
    participant Inserter as SuggestionInserter

    User->>Coordinator: Tab keypress (Compose mode)
    Coordinator->>Coordinator: startComposeGeneration()
    Coordinator->>Coordinator: "state = .generating"
    Coordinator->>Coordinator: showComposeProgress(Drafting)
    Coordinator->>ContextCollector: collect(for: context)
    ContextCollector-->>Coordinator: ComposeContextCollectionResult
    Coordinator->>Router: generateComposeStreaming(request)
    Router->>RuntimeMgr: streamUncached(prompt, options)
    RuntimeMgr->>Core: summarizeStreaming(prompt, options, onToken)
    Coordinator->>Coordinator: startComposeSession()
    Coordinator->>Coordinator: "state = .typing"
    loop Per token
        Core-->>RuntimeMgr: onToken(piece)
        RuntimeMgr-->>Router: continuation.yield(piece)
        Router-->>Coordinator: for-await piece
        Coordinator->>Coordinator: composeStreamShouldContinue?
        Coordinator->>Coordinator: hideOverlay (first token only)
        Coordinator->>Inserter: insert(piece)
        Coordinator->>Coordinator: updateComposeSession()
        Coordinator->>Coordinator: Task.yield()
    end
    Coordinator->>Coordinator: finishComposeStream()
    Coordinator->>Coordinator: clearComposeSession()
    Coordinator->>Coordinator: "state = .idle"

    alt Esc / navigation / text mutation / focus change
        User->>Coordinator: cancel event
        Coordinator->>Coordinator: cancelComposeWork()
        Coordinator->>Coordinator: workController.cancelAll()
        Coordinator->>Coordinator: clearComposeSession()
        Coordinator->>Coordinator: hideOverlay()
        Coordinator->>Coordinator: "state = .idle"
    end

_{Reviews (4): Last reviewed commit: "Show a "Drafting…" indicator at the care..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

The compose-interaction-mode branch put the plumbing in place but left the actual Tab handling stubbed with a "pipeline not installed" disabled state. This makes Compose Mode functional. Tab routing - First Tab in Compose Mode kicks off `runComposeGeneration`, which walks the AX context, builds a `ComposeRequest`, and routes it through the engine (llama only). - Second Tab on a ready draft calls `SuggestionInserter.typeDraft`, which chunks the synthetic Unicode insertion and re-checks focus identity between chunks. - Esc, navigation, and text mutation events cancel any active session or in-flight work. `composeTypingShouldContinue` short-circuits the inserter when the session is cleared or focus shifts. Focus + lifecycle - A focus change while the preview is visible drops the session immediately rather than silently keeping a stale draft alive. - `schedulePrediction` is a no-op in Compose Mode instead of emitting a disabled state; the user typing in their field should not surface a misleading "predictions disabled" message. - Mode flips in either direction reset interaction state through the existing settings-change path; the work controller cancels in-flight generation. Model gate - `RuntimeBootstrapModel.prepareForComposeMode` auto-switches to `tabby-depth-1` if installed and records the user's previous autocomplete model so `restoreAutocompleteModel` puts it back on the flip back. Subscribed from `CotabbyAppEnvironment` so the mode setting drives the runtime without `RuntimeBootstrapModel` needing to know about settings. - Menu bar + Settings show "Compose requires the Open Source engine" and "Compose requires gemma-3n-E4B-it-Q4_K_M.gguf" banners when those conditions aren't met; model picker is disabled while Compose is active so the user cannot fight the auto-switch. Testability - Introduced `ComposeContextCollecting` protocol so the coordinator can be exercised against an in-memory fake AX walker. - Returned type is `ComposeContextCollectionResult` (top-level) instead of `ComposeContextCollector.Result` so fakes don't depend on the concrete collector. Tests (36 new, 310 total, 0 failures) - ComposePromptRendererTests: prompt shape, optional sections, empty placeholder, final instruction position. - ComposeTextNormalizerTests: prompt echo, chat tokens, markdown fences, leading labels, wrapping quotes, typed prefix echo, paragraph preservation. - ComposeRequestFactoryTests: clipping, clipboard gating, sampling minimums, prompt-preview parity, user profile carry-through. - SuggestionCoordinatorComposeTests: first-Tab triggers generation, second-Tab types draft, Esc/navigation/text mutation cancel session, focus change clears session, mode change clears session, empty draft returns to idle, engine failure surfaces failed state. Includes full in-file fake suite for every coordinator collaborator.

The RFC reserved Compose Mode for one specific model. In practice the gate ended up being three layers of plumbing for a constraint we cannot even download — `tabby-depth-1` (gemma-3n-E4B-it-Q4_K_M.gguf) isn't in the model catalog yet. Stripping the gate lets any local llama model draft a Compose response while we figure out which model actually ships. Removed: - `LlamaSuggestionEngine.generateCompose` model guard. - `LlamaRuntimeManager.selectedModelSupportsCompose`. - `RuntimeModelCatalog.composeRequiredFilename` and `supportsCompose`. - `RuntimeBootstrapModel.prepareForComposeMode`, `restoreAutocompleteModel`, `isComposeRequiredModelInstalled`, and the pre-Compose model memory. - The mode-change subscription in `CotabbyAppEnvironment` that drove the auto-switch. - The "Compose requires tabby-depth-1" banner in menu bar and Settings. - The model-picker disable that pinned the user to one filename in Compose Mode. Kept the engine-level gate: Compose still routes to llama only, so the "Compose requires the Open Source engine" banner stays. Apple Intelligence and MLX would need their own `generateCompose` work. 310 tests, 0 failures.

greptile-apps · 2026-05-25T07:30:36Z

    }

    func generateCompose(for request: ComposeRequest) async throws -> ComposeResult {
-        guard runtimeManager.selectedModelSupportsCompose else {
-            throw SuggestionClientError.unavailable(
-                "Compose Mode requires tabby-depth-1. Select or download \(RuntimeModelCatalog.composeRequiredFilename)."
-            )
-        }
-
        do {
            let startTime = Date()
            let prompt = ComposePromptRenderer.prompt(for: request)


Hard model-compatibility gate removed with no in-path replacement

The selectedModelSupportsCompose guard that previously threw SuggestionClientError.unavailable when the wrong GGUF was loaded has been removed. SuggestionAvailabilityEvaluator.disabledReason does not check engine or model-filename compatibility, so nothing in the generation call-path gates on this anymore. If tabby-depth-1 is not installed and RuntimeBootstrapModel.prepareForComposeMode cannot complete the auto-switch, generateCompose will silently run the Compose prompt against whatever autocomplete model is active and return garbled text without any error state — worse than the old failed state with a clear message. Since the only UI guard (from the model-requirements banner) is advisory, a hard check in generateCompose (or before dispatching the work in runComposeGeneration) is still needed for the case where the auto-switch is unavailable.

Old flow: press Tab in Compose Mode, wait several seconds, an overlay popup eventually shows the whole draft, press Tab again to type it. There was no visual feedback while the model was generating, so it felt like nothing was happening. New flow: press Tab once, the model starts sampling, and each piece is typed into the focused field as it arrives. Esc / focus change / typing cancels the stream. Already-typed characters stay where they are. How it works - `LlamaRuntimeCore.summarizeStreaming` runs the same ephemeral-sequence sampling as `summarize` but invokes an `onToken` callback per piece instead of accumulating into a single returned string. - `LlamaRuntimeManager.streamUncached` wraps that in an `AsyncThrowingStream<String, Error>` after the runtime is prepared. The stream's `onTermination` cancels the detached sampling task, so workController.cancelAll() stops generation cleanly. - `SuggestionGenerating.generateComposeStreaming` is added to the protocol with a one-shot default (call `generateCompose`, emit once). `LlamaSuggestionEngine` overrides it for real streaming. The router forwards llama-only the same way it does for `generateCompose`. - `SuggestionCoordinator+Compose` consumes the stream on the main actor and calls `suggestionInserter.insert(piece)` per piece. The active `ActiveComposeSession` accumulates `fullText` so logs and diagnostics describe the full draft. - Tab during streaming is absorbed without restarting (no double draft). Esc, focus change, typing, mode change all run `cancelComposeWork`, which cancels the work controller, clears the session, and lets the stream terminate. Coordinator-level guards - `composeStreamShouldContinue` checks the focused field identity before posting each piece. If focus shifts mid-stream, the loop breaks and no further synthetic keys reach the wrong app. - Tabby's own synthetic key events are already absorbed by `InputSuppressionController` before they reach `handleInputEvent`, so the stream's own characters never trigger the textMutation-cancels-compose path. Removed - The two-step Tab `acceptComposeDraft` flow. - `presentComposePreview` overlay path inside the coordinator (the overlay controller's `showComposePreview` stays for future preview modes). Tests - `SuggestionCoordinatorComposeTests` rewritten for streaming: - first Tab streams pieces into the field via the inserter - second Tab during streaming is absorbed (no duplicate stream) - Esc/navigation/text-mutation/focus-change/mode-change cancel cleanly and partial pieces stay typed - engine throw surfaces a `.failed` state - The fake engine uses a `.blocked(initialPieces:)` behavior that yields a few pieces then parks until the underlying task is cancelled, so cancellation paths can be exercised deterministically. 309 tests, 0 failures.

The context walk plus model warm-up can take a beat before the first streamed token lands. With nothing on screen during that window, Tab felt like it did nothing. This adds a small "Drafting…" pill at the caret the instant Tab registers; it disappears as soon as the first token starts landing in the field. - New `OverlayState.composeProgress(label:geometry:)` plus `SuggestionOverlayControlling.showComposeProgress`. - `OverlayController` renders a compact `ComposeProgressView` (spinner + label) in a capsule, sized as a status chip rather than the larger preview box. - `SuggestionOverlayPresenter.presentComposeProgress` is idempotent so re-presenting the same label/geometry does not churn the overlay log. - Coordinator shows the pill when `startComposeGeneration` flips to `.generating`, and hides it on the first non-empty streamed piece. Cancellation / completion already hide the overlay. Tests: indicator appears before the first token and the overlay ends hidden after the stream completes. 311 tests, 0 failures.

greptile-apps · 2026-05-25T08:40:29Z

+        if let composeSession = interactionState.activeComposeSession {
+            let processChanged = composeSession.baseContext.processIdentifier != focusedContext.processIdentifier
+            let elementChanged = composeSession.baseContext.elementIdentifier != focusedContext.elementIdentifier
+            if processChanged || elementChanged {
+                cancelComposeWork(reason: "Compose cancelled because the focused field changed.")
+            }
+            return
+        }


In-flight compose work not cancelled when session hasn't started yet

The guard at line 73 only fires when activeComposeSession != nil, but compose work can be in-flight (state = .generating) before the session is started — specifically during composeContextCollector.collect(for:) inside runComposeStreaming. A focus change during that window falls through to the autocomplete path (lines 87–97), which calls cancelPredictionWork() and sets state = .idle but does not call cancelComposeWork. The compose task continues; once context collection finishes it calls startComposeSession, setting activeComposeSession. Any Tab press in the new field between session start and the composeStreamShouldContinue check (which does catch the field mismatch and breaks the loop) is silently consumed because handleComposeInputEvent(.acceptance) sees a non-nil session. Adding || isAnyComposeWorkInFlight to the guard condition (mirroring the check in handleComposeInputEvent) would close this window.

greptile-apps Bot reviewed May 25, 2026

View reviewed changes

Comment thread Cotabby/Models/RuntimeBootstrapModel.swift Outdated

Comment thread CotabbyTests/SuggestionCoordinatorComposeTests.swift

Comment thread Cotabby/App/Coordinators/SuggestionCoordinator+Input.swift

Comment thread Cotabby/App/Core/CotabbyAppEnvironment.swift Outdated

greptile-apps Bot reviewed May 25, 2026

View reviewed changes

FuJacob added 2 commits May 25, 2026 00:58

greptile-apps Bot reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wire Compose Mode end-to-end with full test coverage#229

Wire Compose Mode end-to-end with full test coverage#229
FuJacob wants to merge 4 commits into
compose-interaction-modefrom
compose-mode-functionality

FuJacob commented May 25, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot May 25, 2026

Uh oh!

greptile-apps Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented May 25, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented May 25, 2026 •

edited by greptile-apps Bot

Loading