Skip to content

perf(startup): continue reducing desktop startup and interactive shell readiness time #949

@limityan

Description

@limityan

Global Optimization Scoreboard

Last updated: 2026-05-30

Performance PR acceptance rule:

  • Every performance PR must include repeated A/B timing data for the target business scenario and any materially affected user-facing scenario.
  • If work is deferred from startup to a later interaction, that later interaction must be measured. Bundle size alone is supporting evidence, not an accepted user-perceived performance result.
  • If one scenario improves while another regresses, the regression must be listed with severity. Material UX regressions require product confirmation before the PR is considered an effective optimization.
  • PR descriptions must list measured wins, measured regressions, test method, risk/functional impact, and unresolved data gaps.
PR Status Target scenario Key result data Known regression / risk Decision state
#934 Merged Release-fast cold startup / first visible shell main_window_shown p50 1504.7 ms -> 1101.8 ms, -402.9 ms (-26.8%); start_application_end p50 1455.7 ms -> 1035.6 ms, -420.1 ms (-28.9%) interactive_shell_ready p50 1538.9 ms -> 1628.4 ms, +89.4 ms (+5.8%) Merged, but interactive readiness remains unresolved
#958 Merged First open of larger historical sessions Backend first restore avg 69.8 ms -> 5.1 ms, about -92.7% before first render Deferred full hydrate still costs 35.1-99.5 ms after latest-content rendering Merged, latest-content-first tradeoff accepted
#971 Open draft / mitigation measured Startup by deferring editor and root AI startup paths; first editor-heavy interactions protected by post-splash idle warmup Current head p50: first_script_eval -577.9 ms (-49.0%), main_window_shown -598.2 ms (-45.7%), interactive_shell_ready -660.6 ms (-38.4%); first code editor -176.7 ms (-46.0%), first Git diff -127.4 ms (-32.0%) in explicit-warmup path Latest-main strict A/B not rerun after rebase; editor warmup consumes about 230 ms idle time; users clicking before warmup finishes can still pay partial cold cost Draft; candidate positive, but keep open for latest-main A/B decision and continued interactive_shell_ready work

Follow-Up Metrics From PR #971

Related PR: #971

Benchmark method: Windows desktop release-fast. Current PR head is 20d6a235, rebased on gcwing/main@18aa2df4. The comparison baseline remains #949's continuity baseline gcwing/main@978e079f; latest-main strict A/B was not rerun after this rebase. Startup and long-session data uses tests/e2e/specs/performance/startup-session-perf.spec.ts; editor first-open data uses tests/e2e/specs/performance/editor-first-open-perf.spec.ts. Editor timing measures app event trigger to Monaco DOM ready to isolate deferred editor/rendering cost.

PR #971 Business Scenario Coverage

Business scenario Baseline p50 PR #971 p50 Delta Decision impact
Cold startup first_script_eval 1179.0 ms 601.1 ms -577.9 ms (-49.0%) Positive startup result.
Cold startup main_window_shown 1309.5 ms 711.3 ms -598.2 ms (-45.7%) Positive first-shell result.
Cold startup interactive_shell_ready 1720.4 ms 1059.8 ms -660.6 ms (-38.4%) Positive readiness result in current head data; issue remains open for deeper interactive_shell_ready optimization.
Non-critical initialization completion 1667.2 ms 1315.9 ms -351.3 ms (-21.1%) Positive in this run despite deferred scheduling.
First open of generated long session, backend restore 45.6 ms 41.5 ms -4.1 ms (-9.0%) Neutral to positive.
First open of generated long session, latest frame 45.7 ms 46.5 ms +0.8 ms (+1.8%) Small regression, below material UX threshold in this run.
First open of generated long session, initial hydrate 105.9 ms 102.9 ms -3.0 ms (-2.8%) Neutral to positive.
First open of generated long session, full hydrate 51.5 ms 52.7 ms +1.2 ms (+2.3%) Small regression, track only.
First code editor open, no explicit warmup wait 384.1 ms 207.4 ms -176.7 ms (-46.0%) Previous PR regression mitigated.
First code editor open, explicit warmup wait 384.1 ms 203.4 ms -180.7 ms (-47.0%) Positive first-use result after idle warmup.
First Git diff editor open, no explicit warmup wait 398.6 ms 283.7 ms -114.9 ms (-28.8%) Previous PR regression mitigated.
First Git diff editor open, explicit warmup wait 398.6 ms 271.2 ms -127.4 ms (-32.0%) Positive first-use result after idle warmup.

PR #971 Notes

  • The earlier first-use editor regression is now mitigated in current release-fast measurements: code-editor p50 improved from the previous PR measurement of 503.7 ms to 207.4 ms without explicit wait, and Git-diff p50 improved from 526.9 ms to 283.7 ms without explicit wait.
  • Background editor warmup duration p50 is about 226.7-237.0 ms across the measured editor scenarios. It is scheduled only after splash exit, uses low-priority idle scheduling, and yields between editor-surface preload stages, Monaco init, and theme sync.
  • The current no-wait editor measurements all had editor_startup_warmup_end before the trigger. That means the measured common path is good, but a user who clicks an editor immediately after splash exit can still see partial cold cost.
  • Startup splash behavior changed from immediate loading text to logo-only first. Loading workspace... appears below the logo only after 1800 ms if workspace loading is still active, so normal startup does not show a short text flash.
  • Current state: keep PR perf(startup): defer editor and AI initialization #971 draft until the team decides whether the existing continuity baseline is enough, or whether a fresh latest-main strict A/B run is required before marking ready.

Required Metrics For Next Startup Stage

Scenario Required comparison Acceptance note
Cold startup to first visible shell Latest main vs PR, repeated Windows release-fast cold launches Must improve or be neutral with data.
Cold startup to interactive_shell_ready Latest main vs PR, repeated Windows release-fast cold launches Regressions need explicit confirmation.
First open of long historical session Latest main vs PR, long-session fixture and real local sample if available Latest visible turns must render first; total hydrate cost must be reported.
First code editor open Latest main vs PR, first-use timing after fresh startup Required for any PR that defers Monaco/editor work.
First Git diff editor open Latest main vs PR, first-use timing after fresh startup Required for any PR that lazy-loads diff editor paths.

Feature Goal

Reduce overall desktop cold-start time and first-screen wait time without hiding existing navigation content or sacrificing feature behavior. Performance work should improve the time to a usable shell, not only move work out of one metric.

Original Problem

Cold startup currently pays for too much work before or around the first visible shell:

  • Startup eagerly loads systems that are not always needed before first interaction, including IDE / MCP / ACP startup probes and MiniApp catalog refresh.
  • Session navigation previously loaded far more session metadata than was immediately visible or clickable.
  • The startup path showed a black-background logo and then a white-background logo before the main UI, causing a visible flash.
  • Earlier measurements showed that improving isolated marks is not enough: interactive_shell_ready still needs direct follow-up work.

Related Stage 1 PR

Related PR: #934

This issue should remain open after #934 is merged. PR #934 is the stage-1 first-screen optimization and does not complete the broader startup performance goal.

Stage 1 Performance Metrics From PR #934

Benchmark method: Windows desktop release-fast cold launches, 10 runs per build, p50 comparison using the same packaged WebView path.

P50 metric Baseline Stage 1 PR Delta
main_window_shown 1504.7 ms 1101.8 ms -402.9 ms (-26.8%)
start_application_end 1455.7 ms 1035.6 ms -420.1 ms (-28.9%)
startApplicationDurationMs 614.7 ms 27.7 ms -586.9 ms (-95.5%)
afterRenderDurationMs 584.1 ms 461.4 ms -122.7 ms (-21.0%)
Primary startup API count 15.5 7.0 -8.5 (-54.8%)
Primary startup response bytes 12,891 7,699 -5,192 (-40.3%)
Session metadata loaded 187 27 -160 (-85.6%)
Max session metadata duration 175.1 ms 53.3 ms -121.8 ms (-69.6%)
interactive_shell_ready 1538.9 ms 1628.4 ms +89.4 ms (+5.8%)

Additional PR #934 backend microbenchmark: paged session metadata for 1000 sessions reports about page5_avg_ms=11.550 versus full_list_avg_ms=88.142, about 7.6x faster for the paged backend path.

Stage 1 Scope Already Covered By PR #934

  • Show the native desktop main window earlier while keeping the frontend show path as a fallback.
  • Defer non-critical IDE / MCP / ACP / MiniApp startup work behind first visible shell or idle boundaries.
  • Page session metadata so startup fetches only initially visible rows, while explicit show-more loads the next page with loading feedback.
  • Keep workspace and session sections visually expanded by default instead of hiding existing navigation content to obtain the win.
  • Remove the extra pre-React static logo layer to avoid the double-logo flash.
  • Keep editor and LSP startup lazy, with a first-open loading prompt for the editor initialization path.

Remaining Follow-Up Work

  • Directly optimize interactive_shell_ready; this is the main remaining blocker because PR perf(startup): reduce desktop first-screen work #934 improves first-visible metrics but regresses this readiness mark by about 89 ms p50.
  • Profile React mount and first-shell effects to identify long tasks, synchronous store hydration, expensive render paths, and readiness gating that still run before the shell is actually usable.
  • Revisit startup marks so first-visible, first-interactive, and deferred-work completion are clearly separated and comparable across strict A/B runs.
  • Add or refine measurements for the first on-demand cost of deferred ACP / IDE / MCP / LSP work, especially editor first-open and agent flows that require language-service indexing.
  • Validate future stages with strict cold-start A/B data on the latest main before merging.

Constraints

  • Do not trade away visible functionality or default navigation behavior for startup metrics without explicit product confirmation.
  • Lazy loading is acceptable only when the user gets clear loading feedback for actions that can exceed about 1 second.
  • LSP base configuration can be prepared cheaply, but language servers and indexing should start only when the editor or an agent flow actually requires language services.
  • Future PRs should update this issue with the same metric table style so the trend is visible over time.

Follow-Up Metrics From PR #958

Related PR: #958

Benchmark method: Windows desktop debug preview for startup/session-list observations, plus real persistence restore-path measurements on the 5 largest local sessions by turn count. The restore benchmark compares the pre-PR full restore behavior with PR #958's tailTurnCount=3 first restore, then explicitly records that full history is still hydrated later in the background.

PR #958 Business Scenario Coverage

Business scenario Baseline / reference behavior PR #958 result Product impact
Debug startup to visible native window Existing native show path Main window show at 944 ms after main-window create start No latency win claimed; unsafe delayed-window strategy was removed.
Debug startup to frontend script Existing Vite/WebView cold module path first_script_eval at 9799.4 ms Still slow; static preload only reduces blank-screen perception.
Debug startup to shell readiness Existing readiness path interactive_shell_ready at 10557 ms Still open follow-up work for this issue.
Startup workspace/session navigation APIs Existing paged metadata path get_opened_workspaces 15.1 ms, get_recent_workspaces 13.9 ms, 7 list_persisted_sessions_page calls max 14.6 ms Confirms workspace/history list remains visible and responsive.
First open of small/current session Full restore is effectively unavoidable 1/1 turn restored in 133.4 ms, convert 1.0 ms, state commit 0.5 ms Neutral; no tail-load benefit for short sessions.
First open of larger historical sessions Full history read before first render Tail-3 first restore reads only latest turns, then full history hydrates after 150 ms if state is unchanged Improves time to latest visible conversation, while preserving eventual full history.
Completed large model round rendering Previously rendered from oldest visible groups first Completed rounds render newest visible groups first; streaming rounds remain head-anchored Improves perceived latest-content priority; unit-test covered, no separate browser timing claimed.

PR #958 Historical Session Restore Measurements

Local sample Turns Size Baseline full restore avg PR first restore avg (tail=3) Deferred full hydrate cost still paid Turn files avoided before first render
A 26 3.02 MB 75.5 ms 4.2 ms 75.5 ms 23
B 23 4.29 MB 99.5 ms 5.8 ms 99.5 ms 20
C 17 1.33 MB 35.1 ms 2.4 ms 35.1 ms 14
D 14 2.95 MB 59.3 ms 2.9 ms 59.3 ms 11
E 11 3.80 MB 79.7 ms 10.0 ms 79.7 ms 8

Summary:

  • Baseline full restore avg range: 35.1-99.5 ms.
  • PR first restore avg range: 2.4-10.0 ms.
  • Average initial backend restore reduction across these samples: 69.8 ms to 5.1 ms, about 92.7% lower before first render.
  • Full-history hydration still costs 35.1-99.5 ms after the 150 ms delay; PR perf(flow-chat): render recent session history first #958 moves that work behind latest-content rendering rather than eliminating it.
  • Startup interactive_shell_ready remains unresolved and should continue to be tracked by this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions