Wire Compose Mode end-to-end with full test coverage#229
Conversation
The compose-interaction-mode branch put the plumbing in place but left the actual Tab handling stubbed with a "pipeline not installed" disabled state. This makes Compose Mode functional. Tab routing - First Tab in Compose Mode kicks off `runComposeGeneration`, which walks the AX context, builds a `ComposeRequest`, and routes it through the engine (llama only). - Second Tab on a ready draft calls `SuggestionInserter.typeDraft`, which chunks the synthetic Unicode insertion and re-checks focus identity between chunks. - Esc, navigation, and text mutation events cancel any active session or in-flight work. `composeTypingShouldContinue` short-circuits the inserter when the session is cleared or focus shifts. Focus + lifecycle - A focus change while the preview is visible drops the session immediately rather than silently keeping a stale draft alive. - `schedulePrediction` is a no-op in Compose Mode instead of emitting a disabled state; the user typing in their field should not surface a misleading "predictions disabled" message. - Mode flips in either direction reset interaction state through the existing settings-change path; the work controller cancels in-flight generation. Model gate - `RuntimeBootstrapModel.prepareForComposeMode` auto-switches to `tabby-depth-1` if installed and records the user's previous autocomplete model so `restoreAutocompleteModel` puts it back on the flip back. Subscribed from `CotabbyAppEnvironment` so the mode setting drives the runtime without `RuntimeBootstrapModel` needing to know about settings. - Menu bar + Settings show "Compose requires the Open Source engine" and "Compose requires gemma-3n-E4B-it-Q4_K_M.gguf" banners when those conditions aren't met; model picker is disabled while Compose is active so the user cannot fight the auto-switch. Testability - Introduced `ComposeContextCollecting` protocol so the coordinator can be exercised against an in-memory fake AX walker. - Returned type is `ComposeContextCollectionResult` (top-level) instead of `ComposeContextCollector.Result` so fakes don't depend on the concrete collector. Tests (36 new, 310 total, 0 failures) - ComposePromptRendererTests: prompt shape, optional sections, empty placeholder, final instruction position. - ComposeTextNormalizerTests: prompt echo, chat tokens, markdown fences, leading labels, wrapping quotes, typed prefix echo, paragraph preservation. - ComposeRequestFactoryTests: clipping, clipboard gating, sampling minimums, prompt-preview parity, user profile carry-through. - SuggestionCoordinatorComposeTests: first-Tab triggers generation, second-Tab types draft, Esc/navigation/text mutation cancel session, focus change clears session, mode change clears session, empty draft returns to idle, engine failure surfaces failed state. Includes full in-file fake suite for every coordinator collaborator.
The RFC reserved Compose Mode for one specific model. In practice the gate ended up being three layers of plumbing for a constraint we cannot even download — `tabby-depth-1` (gemma-3n-E4B-it-Q4_K_M.gguf) isn't in the model catalog yet. Stripping the gate lets any local llama model draft a Compose response while we figure out which model actually ships. Removed: - `LlamaSuggestionEngine.generateCompose` model guard. - `LlamaRuntimeManager.selectedModelSupportsCompose`. - `RuntimeModelCatalog.composeRequiredFilename` and `supportsCompose`. - `RuntimeBootstrapModel.prepareForComposeMode`, `restoreAutocompleteModel`, `isComposeRequiredModelInstalled`, and the pre-Compose model memory. - The mode-change subscription in `CotabbyAppEnvironment` that drove the auto-switch. - The "Compose requires tabby-depth-1" banner in menu bar and Settings. - The model-picker disable that pinned the user to one filename in Compose Mode. Kept the engine-level gate: Compose still routes to llama only, so the "Compose requires the Open Source engine" banner stays. Apple Intelligence and MLX would need their own `generateCompose` work. 310 tests, 0 failures.
| } | ||
|
|
||
| func generateCompose(for request: ComposeRequest) async throws -> ComposeResult { | ||
| guard runtimeManager.selectedModelSupportsCompose else { | ||
| throw SuggestionClientError.unavailable( | ||
| "Compose Mode requires tabby-depth-1. Select or download \(RuntimeModelCatalog.composeRequiredFilename)." | ||
| ) | ||
| } | ||
|
|
||
| do { | ||
| let startTime = Date() | ||
| let prompt = ComposePromptRenderer.prompt(for: request) |
There was a problem hiding this comment.
Hard model-compatibility gate removed with no in-path replacement
The selectedModelSupportsCompose guard that previously threw SuggestionClientError.unavailable when the wrong GGUF was loaded has been removed. SuggestionAvailabilityEvaluator.disabledReason does not check engine or model-filename compatibility, so nothing in the generation call-path gates on this anymore. If tabby-depth-1 is not installed and RuntimeBootstrapModel.prepareForComposeMode cannot complete the auto-switch, generateCompose will silently run the Compose prompt against whatever autocomplete model is active and return garbled text without any error state — worse than the old failed state with a clear message. Since the only UI guard (from the model-requirements banner) is advisory, a hard check in generateCompose (or before dispatching the work in runComposeGeneration) is still needed for the case where the auto-switch is unavailable.
Old flow: press Tab in Compose Mode, wait several seconds, an overlay
popup eventually shows the whole draft, press Tab again to type it.
There was no visual feedback while the model was generating, so it felt
like nothing was happening.
New flow: press Tab once, the model starts sampling, and each piece is
typed into the focused field as it arrives. Esc / focus change / typing
cancels the stream. Already-typed characters stay where they are.
How it works
- `LlamaRuntimeCore.summarizeStreaming` runs the same ephemeral-sequence
sampling as `summarize` but invokes an `onToken` callback per piece
instead of accumulating into a single returned string.
- `LlamaRuntimeManager.streamUncached` wraps that in an
`AsyncThrowingStream<String, Error>` after the runtime is prepared.
The stream's `onTermination` cancels the detached sampling task, so
workController.cancelAll() stops generation cleanly.
- `SuggestionGenerating.generateComposeStreaming` is added to the
protocol with a one-shot default (call `generateCompose`, emit once).
`LlamaSuggestionEngine` overrides it for real streaming. The router
forwards llama-only the same way it does for `generateCompose`.
- `SuggestionCoordinator+Compose` consumes the stream on the main actor
and calls `suggestionInserter.insert(piece)` per piece. The active
`ActiveComposeSession` accumulates `fullText` so logs and diagnostics
describe the full draft.
- Tab during streaming is absorbed without restarting (no double draft).
Esc, focus change, typing, mode change all run `cancelComposeWork`,
which cancels the work controller, clears the session, and lets the
stream terminate.
Coordinator-level guards
- `composeStreamShouldContinue` checks the focused field identity
before posting each piece. If focus shifts mid-stream, the loop
breaks and no further synthetic keys reach the wrong app.
- Tabby's own synthetic key events are already absorbed by
`InputSuppressionController` before they reach `handleInputEvent`,
so the stream's own characters never trigger the
textMutation-cancels-compose path.
Removed
- The two-step Tab `acceptComposeDraft` flow.
- `presentComposePreview` overlay path inside the coordinator (the
overlay controller's `showComposePreview` stays for future preview
modes).
Tests
- `SuggestionCoordinatorComposeTests` rewritten for streaming:
- first Tab streams pieces into the field via the inserter
- second Tab during streaming is absorbed (no duplicate stream)
- Esc/navigation/text-mutation/focus-change/mode-change cancel
cleanly and partial pieces stay typed
- engine throw surfaces a `.failed` state
- The fake engine uses a `.blocked(initialPieces:)` behavior that
yields a few pieces then parks until the underlying task is
cancelled, so cancellation paths can be exercised deterministically.
309 tests, 0 failures.
The context walk plus model warm-up can take a beat before the first streamed token lands. With nothing on screen during that window, Tab felt like it did nothing. This adds a small "Drafting…" pill at the caret the instant Tab registers; it disappears as soon as the first token starts landing in the field. - New `OverlayState.composeProgress(label:geometry:)` plus `SuggestionOverlayControlling.showComposeProgress`. - `OverlayController` renders a compact `ComposeProgressView` (spinner + label) in a capsule, sized as a status chip rather than the larger preview box. - `SuggestionOverlayPresenter.presentComposeProgress` is idempotent so re-presenting the same label/geometry does not churn the overlay log. - Coordinator shows the pill when `startComposeGeneration` flips to `.generating`, and hides it on the first non-empty streamed piece. Cancellation / completion already hide the overlay. Tests: indicator appears before the first token and the overlay ends hidden after the stream completes. 311 tests, 0 failures.
| if let composeSession = interactionState.activeComposeSession { | ||
| let processChanged = composeSession.baseContext.processIdentifier != focusedContext.processIdentifier | ||
| let elementChanged = composeSession.baseContext.elementIdentifier != focusedContext.elementIdentifier | ||
| if processChanged || elementChanged { | ||
| cancelComposeWork(reason: "Compose cancelled because the focused field changed.") | ||
| } | ||
| return | ||
| } |
There was a problem hiding this comment.
In-flight compose work not cancelled when session hasn't started yet
The guard at line 73 only fires when activeComposeSession != nil, but compose work can be in-flight (state = .generating) before the session is started — specifically during composeContextCollector.collect(for:) inside runComposeStreaming. A focus change during that window falls through to the autocomplete path (lines 87–97), which calls cancelPredictionWork() and sets state = .idle but does not call cancelComposeWork. The compose task continues; once context collection finishes it calls startComposeSession, setting activeComposeSession. Any Tab press in the new field between session start and the composeStreamShouldContinue check (which does catch the field mismatch and breaks the loop) is silently consumed because handleComposeInputEvent(.acceptance) sees a non-nil session. Adding || isAnyComposeWorkInFlight to the guard condition (mirroring the check in handleComposeInputEvent) would close this window.
Summary
Stacks on #121 to make Compose Mode actually functional. The parent PR put the plumbing in place but left the Tab handler stubbed with a "Compose Mode is selected. Draft generation will be enabled after the Compose request pipeline is installed." disabled state. This PR removes that stub and lights up the full RFC flow.
What changed:
runComposeGeneration— bounded AX context walk via the newComposeContextCollectingprotocol,ComposeRequestFactorybuild, route to llama throughSuggestionEngineRouter.generateCompose.SuggestionInserter.typeDraft, which chunks synthetic Unicode insertion and re-checks focus identity between chunks.RuntimeBootstrapModel.prepareForComposeModeauto-switches totabby-depth-1if installed and records the previous autocomplete model sorestoreAutocompleteModelputs it back.CotabbyAppEnvironmentwires the mode-change subscription soRuntimeBootstrapModeldoes not need to know about settings.What is intentionally not in this PR:
tabby-depth-1. The model is not inRuntimeModelCatalog.downloadableModelsyet, so the banner stops at "add it to your models folder" rather than wiring a button against a placeholder URL.userTags. The snapshot field is still always[]; this PR does not add a tag editor.Validation
New tests:
ComposePromptRendererTests(9 cases) — prompt shape, optional sections, empty placeholder, final-instruction position.ComposeTextNormalizerTests(9 cases) — prompt echo, chat tokens, markdown fences, leading labels, wrapping quotes, typed prefix echo, paragraph preservation, blank-line collapse.ComposeRequestFactoryTests(7 cases) — clipping, clipboard gating, sampling minimums, prompt-preview parity, user profile carry-through.SuggestionCoordinatorComposeTests(11 cases) — first-Tab triggers generation, second-Tab types draft, Esc/navigation/text mutation cancel session, focus change clears session, mode change clears session, empty draft returns to idle, engine failure surfaces failed state. Includes a full in-file fake suite for every coordinator collaborator.Manual QA is still outstanding (see Risk section); this PR has not been exercised in a running build against GitHub / Gmail / Slack.
Linked issues
Refs #66, #67, #68, #69, #70 — Compose Mode RFC issues. Stacks on #121.
Risk / rollout notes
compose-interaction-mode, notmain.InputSuppressionControlleraccumulates suppression counts per chunk so Compose's own keystrokes don't get reinterpreted as user typing. If Compose typing ever feels off in a target app, the most likely culprit is suppression scope, not the inserter's chunking.composeTypingShouldContinueis the single cancellation pivot: if the session is cleared OR the focused element identity changes, the next chunk is skipped. Reviewers should focus on whether the comparison fields (process, element, focusChangeSequence) catch every "user moved on" case in target apps.ComposeContextCollectingand per-test fakes. Future PRs can lean on the same pattern for autocomplete coordinator tests, which currently have no orchestration coverage.Greptile Summary
This PR activates Compose Mode end-to-end by replacing the parent branch's stubbed Tab handler with a single-Tab streaming pipeline: Tab to AX context collection to
ComposeRequestFactorytoSuggestionEngineRouter.generateComposeStreamingto per-token insertion viasuggestionInserter.insert. The 36 new tests cover streaming, overlay lifecycle, all cancellation triggers, and failure propagation using a self-contained in-file fake suite.LlamaRuntimeCore.summarizeStreamingadds a per-tokenonTokencallback with the same ephemeral-sequence and lifecycle-locking guarantees assummarize;LlamaRuntimeManager.streamUncachedbridges it toAsyncThrowingStreamwith a termination-handler task cancel.cancelComposeWork; thecomposeStreamShouldContinuepivot in the for-await loop catches mid-stream field changes by comparing all three identity fields.LlamaSuggestionEngineis removed in favour of an engine-type gate inSuggestionEngineRouterand advisory UI banners.Confidence Score: 3/5
The Compose streaming loop is solid for the common path, but a field-change race during context collection can silently swallow a Tab in the new field.
The focus-change handler in
handleSupportedSnapshotguards onactiveComposeSession != nil, but compose work is in-flight before the session is created duringcomposeContextCollector.collect. A focus change in that window falls through to the autocomplete path without callingcancelComposeWork, causing any Tab press in the new field to be silently swallowed while the lingering session is cleaned up.SuggestionCoordinator+Input.swift(compose work not cancelled during pre-session context collection on field change) andSuggestionCoordinator+Compose.swift(engine compatibility not gated before starting work).Important Files Changed
startComposeGenerationlacks an early engine-compatibility check before starting work.activeComposeSession != nil. In-flight compose work during context collection (before session starts) is not cancelled on field change, leaving a window where a Tab press in the new field is silently swallowed.summarizeStreamingwith per-tokenonTokencallback. Lifecycle locking, shutdown guard, andactiveOperationCounttracking mirror the existing pattern correctly.streamUncachedthat verifies the prepared runtime, starts a detached sampling task, and bridges pieces to anAsyncThrowingStream. Termination handler cancels the task and theactiveOperationCountdefer protects shutdown.selectedModelSupportsComposefilename gate and addsgenerateComposeStreaming. The engine-type gate is present inSuggestionEngineRouter, but no in-path check remains for the wrong-model-loaded case.generateComposeStreamingtoSuggestionGeneratingwith a one-shot fallback default,showComposeProgresstoSuggestionOverlayControlling, and the newComposeContextCollectingprotocol andComposeContextCollectionResultvalue type.FakeSuggestionInserter.typeDraftalways returnstruewithout invokingshouldContinue, leaving mid-draft cancellation uncovered.showComposeProgressandComposeProgressViewfor the Drafting pill. Sizing and positioning logic mirrors the existing compose preview approach.generateComposeStreamingto llama and throws a clear unavailable error for Apple Intelligence and mlxSwift engines.Sequence Diagram
sequenceDiagram participant User participant Coordinator as SuggestionCoordinator participant ContextCollector as ComposeContextCollector participant Router as SuggestionEngineRouter participant RuntimeMgr as LlamaRuntimeManager participant Core as LlamaRuntimeCore participant Inserter as SuggestionInserter User->>Coordinator: Tab keypress (Compose mode) Coordinator->>Coordinator: startComposeGeneration() Coordinator->>Coordinator: "state = .generating" Coordinator->>Coordinator: showComposeProgress(Drafting) Coordinator->>ContextCollector: collect(for: context) ContextCollector-->>Coordinator: ComposeContextCollectionResult Coordinator->>Router: generateComposeStreaming(request) Router->>RuntimeMgr: streamUncached(prompt, options) RuntimeMgr->>Core: summarizeStreaming(prompt, options, onToken) Coordinator->>Coordinator: startComposeSession() Coordinator->>Coordinator: "state = .typing" loop Per token Core-->>RuntimeMgr: onToken(piece) RuntimeMgr-->>Router: continuation.yield(piece) Router-->>Coordinator: for-await piece Coordinator->>Coordinator: composeStreamShouldContinue? Coordinator->>Coordinator: hideOverlay (first token only) Coordinator->>Inserter: insert(piece) Coordinator->>Coordinator: updateComposeSession() Coordinator->>Coordinator: Task.yield() end Coordinator->>Coordinator: finishComposeStream() Coordinator->>Coordinator: clearComposeSession() Coordinator->>Coordinator: "state = .idle" alt Esc / navigation / text mutation / focus change User->>Coordinator: cancel event Coordinator->>Coordinator: cancelComposeWork() Coordinator->>Coordinator: workController.cancelAll() Coordinator->>Coordinator: clearComposeSession() Coordinator->>Coordinator: hideOverlay() Coordinator->>Coordinator: "state = .idle" endReviews (4): Last reviewed commit: "Show a "Drafting…" indicator at the care..." | Re-trigger Greptile