diff --git a/.agents/skills/ade-autoresearch/SKILL.md b/.agents/skills/ade-autoresearch/SKILL.md new file mode 100644 index 000000000..ccae8ba12 --- /dev/null +++ b/.agents/skills/ade-autoresearch/SKILL.md @@ -0,0 +1,280 @@ +--- +name: ade-autoresearch +description: Iteratively optimize an ADE tab's CPU/memory/IPC/render performance. + Drives the real UI, builds tab-specific probes from the visible product and + perf-pass repo, identifies bottlenecks from JSONL metrics, makes ONE targeted + code change per iteration, gates on tests + smoke, keeps wins on a branch, and + distills patterns into per-tab perf skills. Invoke when the user says + "optimize ", "autoresearch ", or "perf pass on ". Drives ADE + pointed at the perf-pass throwaway repo; full liberty inside that repo. Uses + Codex/GPT models for in-ADE AI activity unless the run explicitly opts into + another configured provider for comparison. +metadata: + author: ADE + version: 0.3.0 +--- + +# ade-autoresearch + +A Karpathy-style autoresearch loop for ADE perf. You (the agent) ARE the loop runner — there is no hidden script. Follow this algorithm exactly. + +## Inputs + +- ``: the tab to optimize. Must be one of: `boot`, `lanes`, `missions`, `prs`, `work`, `files`, `run`, `graph`, `review`, `history`, `automations`, `cto`, `settings`. (`boot` = cold launch + welcome + project open + remote runtime + iOS pairing — the "main ADE screen" surface above any specific tab.) +- ``: throwaway git repo path. Defaults to `/Users/admin/Projects/perf pass` (note the space — quote it). Must exist, must be a git repo, must have a `perf-pass-seed` tag (or you create one on first run). Override via `ADE_PERF_PASS_DIR` env var. + +## Real UI audit is the primary loop + +The job is to find what a person actually feels in the tab. Do not predefine a +fixed deterministic scenario suite and mistake that for the audit. Build the +measurement plan from the live UI, source, and perf-pass repo; then create +whatever repeatable probes, scripts, seeded repo states, or tests are needed to +show load-time, CPU, heap, IPC, render, and interaction deltas for the surfaces +the user actually exercises. + +Use this order: + +1. **Warm launch the real Electron UI on the target tab** and keep it open while auditing: + ```bash + NO_DEVTOOLS=1 ADE_DISABLE_LOCAL_RUNTIME_DAEMON=1 ADE_LOCAL_RUNTIME_FALLBACK=1 ADE_MODEL_OVERRIDE=gpt-5-codex \ + node scripts/perf-launch.mjs --tab --run-id -ui-audit-$(date +%Y%m%d-%H%M) + ``` + Confirm the Electron surface is on the requested tab. The visible active tab must match ``; do not audit a related embedded surface from another tab. + +2. **Build an action inventory from the visible UI and source.** Start with the tab's actual first screen, then cover every safe user action, subpane, menu, picker, dialog, mode switch, list interaction, empty state, error/preflight state, expand/minimize/fullscreen state, keyboard/search/filter path, and tab-specific destructive/external preflight. For destructive or externally visible actions, open and measure the prompt/preflight unless the user has explicitly allowed final execution. + + The inventory must be tab-derived. For example, a Work pass should cover Work sidebar/session list, chat/CLI/shell start surfaces, session tabs/grid/layout controls, running/ended session actions, model/attachment/command/parallel pickers, terminal/chat panes, context menus, filters/search, and ADE tools drawers because those are Work-tab surfaces. A Lanes pass should cover lane list, stack graph, lane dialogs, Git Actions, and lane Work panes because those are Lanes-tab surfaces. + + Do not claim complete coverage until the inventory itself says every row is + measured, prompt-only, external-skip, or explicitly deferred with a reason. + A handful of representative clicks is a partial smoke pass, not an audit. If + an action matrix already exists for the tab, update that matrix as evidence + arrives instead of replacing it with narrative notes. + + Treat the matrix as the work queue. Pick the next unresolved action/state, + drive that exact UI path, record evidence, then either promote the row or + mark why it needs a fixture, sandbox, prompt, or external dependency. When a + row exposes slowness, churn, overflow, broken behavior, or missing + accessibility, make one targeted product change, re-drive that same row, and + only then move to the next row. + +3. **Mark each UI segment in the perf log** before and after exercising it: + ```ts + window.ade.perf.recordEvent({ kind: "manualStep", ts: Date.now(), name: "git-actions-stage", phase: "start" }); + // drive the visible UI + window.ade.perf.recordEvent({ kind: "manualStep", ts: Date.now(), name: "git-actions-stage", phase: "end" }); + ``` + Segment names should describe the workflow, not the implementation detail. + +4. **Use direct IPC only for setup, cleanup, and analysis.** It is fine to create fixture data, reset a throwaway repo, query status, or extract metrics through IPC/shell. Do not replace a UI audit action with `window.ade.*` unless the UI is genuinely impossible to drive; if you must, say so in the run notes. + +5. **Create UI-derived probes after findings.** If existing scenarios cover a + surface, you may run them. If they do not, write a tab-specific probe, + scenario, fixture, or test that reproduces the measured workflow against the + perf-pass repo. The probe is evidence, not a product requirement: it exists to + quantify a real UI bottleneck and compare before/after behavior. + +## Setup (do once at start of run) + +1. **Read prior wins** at `.agents/skills/ade-perf-/SKILL.md` if it exists. These are optional best-practice notes from earlier audits, not prerequisites. If no per-tab skill exists, derive the checklist from the tab UI and source and create the per-tab skill only during codification after you have measured real behavior. +2. **Inspect existing perf probes** under `apps/desktop/src/renderer/perf/scenarios/`, + scripts, and tab tests. Reuse what matches the tab, but do not treat missing or + incomplete scenarios as a blocker and do not let scenario availability define + the work. It is acceptable to add new tab-specific scenarios/probes when they + help quantify a real UI workflow. +3. **Verify perf-pass repo** exists, has a seed tag, and can exercise real GitHub paths when needed: + ```bash + scripts/reset-perf-pass.sh + ``` + Refuse to start if perf-pass doesn't exist. If the tab uses GitHub behavior, publish the repo as a private `perf-pass` remote before measuring push/pull/fetch UI. +4. **Create a working branch** off main: + ```bash + git checkout -b autoresearch/-$(date +%Y%m%d-%H%M) + ``` +5. **Set the model override** for all in-ADE AI activity: export `ADE_MODEL_OVERRIDE=gpt-5-codex` (or another GPT/Codex model id available in ADE). Don't touch this during the run. + +## Baseline (iteration 0) + +Start with the real UI inventory. The baseline is complete when every safe tab +surface has been exercised or explicitly marked unsafe/external/destructive, and +each measured segment has corresponding perf evidence. + +For each important workflow, capture at least one of: + +- A real UI perf-launch run with `manualStep` markers in + `~/.ade/perf-runs//events.jsonl` +- A purpose-built probe or scenario that drives the workflow against the + perf-pass repo and writes `summary.json` / `events.jsonl` +- A focused unit/component/integration test that reproduces the expensive derive, + mount, or IPC behavior +- A shell/IPC setup script only for fixture creation and cleanup + +Record `baseline_metrics` as a small table, not a single mandatory fitness +score. Include the metrics that matter to the surface: route/load time, segment +duration, main CPU p95, renderer CPU or long tasks, heap growth, IPC count/p95, +render-on-scroll time, or panel mount cost. If an existing scenario reports a +fitness score, keep it as one data point; do not let it override real UI evidence. + +Then analyze `events.jsonl` by manualStep segment. Record the worst UI segment, +the slow IPC channels inside it, and whether the cost is expected work (for +example network push/fetch) or avoidable tab work. + +Tag the baseline commit: +```bash +git tag perf-baseline--$(date +%Y%m%d) +``` + +## Iteration loop + +Stop conditions: **no measurable improvement on the current bottleneck for 10 +consecutive attempts** OR user kills the run OR 50 iterations OR 4 hours +wall-clock. + +For each iteration: + +### 1. Analyze +- Read the latest real-UI `events.jsonl`, probe outputs, scenario summaries, and + focused test results. +- Pick the **#1 bottleneck**: the avoidable cost that appears in real UI segments + or repeatable probes. Tie-break by user-visible workflow first, then + reproducibility and metric severity. +- Common bottleneck categories: + - **Slow IPC channel**: a channel in `summary.ipc.slowChannels` with p95 ≥ 120ms + - **Long task spam**: `webVitals.longTaskCount` > 5 per minute + - **Memory growth**: `process.rendererHeapGrowthMB` > 10 over a measured workflow + - **Render-on-scroll cost**: `marks.scroll.*` p95 high + - **Route transition cost**: `marks.nav.*` or `marks.switch.*` p95 high + - **Main CPU**: `process.mainCpuPercentP95` > 30 during idle or panel-open probes → background pollers +- UI segment waste: heavy refreshes, duplicate mounted panes, hidden pollers, repeated global status checks, or expensive dialog prefetches that are not needed for the action the user took +- Read the code that owns the bottleneck. Form a hypothesis. + +### 2. Propose ONE change + +Legal moves (examples, not a complete list): +- Memoize a hot selector with `useMemo` / `useCallback` +- Batch IPC calls (collapse N independent invokes into one) +- Debounce / throttle a poller +- Virtualize a long list (`@tanstack/react-virtual` or similar) +- Lazy-load a heavy component (`React.lazy`) +- Replace `O(n²)` work with a Map lookup +- Hoist a stable callback out of render +- Skip re-renders with `React.memo` + stable props +- Move work off the render thread (`requestIdleCallback`, microtask deferral) +- Replace a polling interval with an event-driven subscription +- Cache an expensive derive (only invalidate on deps change) + +**Forbidden moves:** +- Editing anything under `apps/desktop/src/main/services/perf/**` +- Editing metrics plumbing under `apps/desktop/src/renderer/perf/harness/**`, + `apps/desktop/src/renderer/perf/markers.ts`, or + `apps/desktop/src/renderer/perf/webVitals.ts` to make results look better +- Editing `scripts/run-perf-scenario.mjs` or `scripts/reset-perf-pass.sh` to + weaken measurement or setup +- Editing test files to make them pass +- Disabling polling/sync features outright (only debounce/throttle) +- Removing UI features or hiding elements to bypass measured workflows +- Changing metric weights, summaries, or existing probes to mask a regression + +Allowed measurement moves: +- Add new tab-specific scenarios/probes under `apps/desktop/src/renderer/perf/scenarios/` + when they drive a real UI-derived workflow. +- Add scripts under `scripts/` or tests under the touched feature area to seed the + perf-pass repo or reproduce an expensive UI path. +- Expand a probe to cover a newly discovered tab surface, provided it remains + honest about what it measures. + +### 3. Apply the change +One commit, focused. Conventional message: `perf(): `. + +### 4. Test gate +Run **only the affected test files**. Never the full suite. Use the per-tab Vitest projects. +```bash +npm --prefix apps/desktop run typecheck +npm --prefix apps/desktop run test -- --run path/to/affected.test.ts +``` +If tests fail: **revert** the commit (`git reset --hard HEAD~1`), do NOT count toward plateau, try a different change targeting the same or next bottleneck. + +### 5. Measure +First re-drive the same UI segment with the same markers and compare the +IPC/render/memory/load/CPU delta. Then re-run the smallest probe, scenario, or +test that covers the changed surface. Before declaring the run done, re-run the +final measured sweep that covers the audited surfaces; this can be a mix of real +UI markers, custom probes, and existing scenarios. + +### 6. Smoke gate +For each probe or scenario that writes a summary, check +`summary.scenarios..ok === true` when present and `smokeFailures.length === 0`. +For tests, require the targeted tests to pass. If the workflow breaks or smoke +fails because of the code change: **revert**, increment the missed-attempt +counter. + +### 7. Decide +- Improvement threshold: at least one primary metric for the bottleneck improves + by ≥2% without regressing the surrounding smoke metrics. Use the most relevant + metric for the workflow (duration, CPU p95, heap, long tasks, IPC count/p95, + render cost), not a mandatory global fitness score. +- If improvement: **keep**. Update best. Reset plateau to 0. Amend the commit + message with the metric delta, e.g. `work open 1840ms → 1210ms` or + `ipc p95 160ms → 70ms`. +- Else: **revert** (`git reset --hard HEAD~1`). Plateau += 1. + +### 8. Soft iteration cap +If this iteration has been running >15 minutes wall clock (build loops, scenario flakes, etc.), abort it: revert any in-progress change, mark as a missed iteration (don't count toward plateau), move on. + +## Termination + +When stop condition hits: +1. Print run summary: baseline metrics, final metrics, %-improvement for each + kept bottleneck, and list of kept commits (sha + message + metric delta). +2. Suggest the user merge the working branch into main via PR. +3. Proceed to codification (next section). + +## Completion and handoff discipline + +Do not describe the run as "done", "complete", or "covered" while the tab +inventory still has unresolved rows (`source`, `fixture-needed`, `sandbox-only`, +or unvisited `prompt-only` / `external-skip`) unless the user explicitly narrowed +the objective. Open rows mean the run is still in progress. + +If there is a feasible next measured iteration and the user has not asked you to +stop, continue the loop instead of ending with a future-work summary. If you must +pause because the user asked for a handoff, the environment needs cleanup, or a +blocking decision is required, do all of the following before the final response: + +- Stop any perf/dev/Electron processes you started and record the latest run id. +- Update the tab audit matrix with what is measured, invalid, skipped, and next. +- Update the per-tab perf skill with any measured win that future agents must + preserve. +- State clearly that the audit is incomplete and name the next concrete loop. +- Include a ready-to-run follow-up prompt that points to the matrix, run ids, + current bottleneck, validation commands, and "do not claim full coverage" rule. + +## Codify (after the run ends) + +Read all kept commits (`git log --oneline perf-baseline--... HEAD`). For each, extract the **pattern** (the technique used, not the literal change). Update `.agents/skills/ade-perf-/SKILL.md`: + +- Write this as future engineering guidance for agents editing that tab, not as an audit transcript. One entry per pattern. If a similar pattern already exists, append a refinement instead of duplicating. +- Each entry: + - **Pattern**: one-line name (e.g. "Debounce git-status pollers behind window visibility"). + - **Why it helped**: which bottleneck it addressed, with the metric delta from the summary. + - **How to recognize when to apply**: signs in future code that the same pattern is needed. + - **Anti-pattern to avoid**: what NOT to do. + - **Verification**: which UI segment, probe, scenario, or test metric this affected. +- Preserve proven history, but keep the top of the file readable as best practices for future code changes. + +## Notes on agent behavior + +- **Stay focused.** One bottleneck at a time. Resist the urge to "while I'm here also fix..." — that breaks attribution. +- **Trust the metric.** If the relevant measured workflow does not improve, revert + even when the code feels cleaner. The metric-backed user workflow is the + contract. +- **The perf-pass repo is your sandbox.** Inside it, you may create lanes, open + chats, push/pull throwaway branches, run automations, stash changes, and delete + fixtures when needed to exercise ADE. Purpose-built probes are encouraged when + fixed scenarios do not cover the tab. Real UI audit coverage is required before + you call the tab optimized. +- **Codex model preference.** If a probe or in-ADE action invokes chat/agent work, + use the `ADE_MODEL_OVERRIDE` model (gpt-5-codex by default) for the majority of + chat work and for deep performance-fix work. Other configured providers may be + sampled for comparison when the user asks for broad coverage. +- **Concurrency**: only one perf run on the machine at a time. If `~/.ade/perf-runs/` contains a `/lock` file with a live pid, refuse to start. diff --git a/.agents/skills/ade-perf-boot/SKILL.md b/.agents/skills/ade-perf-boot/SKILL.md new file mode 100644 index 000000000..1f69521e4 --- /dev/null +++ b/.agents/skills/ade-perf-boot/SKILL.md @@ -0,0 +1,64 @@ +--- +name: ade-perf-boot +description: Performance patterns discovered for ADE's cold launch and "main + screen" surfaces — welcome / project picker, recent projects list, project + open flow, remote runtime connect, iOS pairing. Read before editing files in + apps/desktop/src/main/main.ts, the App shell (apps/desktop/src/renderer/components/app/**), + project bootstrap services (apps/desktop/src/main/services/projects/**, lanes/**), + remote runtime services, or anything in the app's pre-tab boot path. + Append-only knowledge base populated by ade-autoresearch runs. +metadata: + author: ade-autoresearch + version: 0.1.0 + status: seed +--- + +# ade-perf-boot + +Patterns discovered for ADE's cold launch and main-screen surfaces. Each entry has run-traced provenance — do not delete entries without explicit user approval. + +## Scope + +This tab covers the code paths that run before any per-tab UI is mounted, plus the project-level chrome that appears regardless of which tab is open: + +- **Cold launch** — process spawn → main process bootstrap → renderer first paint → first interactive. +- **Welcome / project picker** — when no project is loaded. +- **Recent projects** — listing, rendering, icon resolution. +- **Project open flow** — `project.openRepo` IPC, the load + bind path. +- **Remote runtime** — `remoteRuntime.listTargets`, snapshot, connection lifecycle. +- **iOS / phone pairing** — pairing status, auth, etc. +- **App shell chrome** — sidebar, header, route transitions (the AppShell wrapping all tabs). + +Use `ade-autoresearch boot` to run an optimization cycle against this surface. + +## Scenarios this tab is benchmarked against + +Defined in `apps/desktop/src/renderer/perf/scenarios/boot.ts`: + +- `boot.cold-paint` — measures FCP / LCP / INP during cold launch (no driving). +- `boot.recent-projects` — `project.listRecent` IPC + welcome render. +- `boot.open-project` — `project.openRepo` round-trip for perf-pass. +- `boot.remote-runtime` — `remoteRuntime.listTargets` cost. +- `boot.idle-welcome` — 20s idle on welcome (no project) — catches background pollers. +- `boot.stress-launch` — 2min idle from cold — catches startup leaks. + +The "no project" scenarios should be run with `--no-project` so the welcome screen renders: + +```bash +node scripts/run-perf-scenario.mjs boot.idle-welcome run-id --no-project +``` + +## Patterns + +_No patterns recorded yet — populated by the first `ade-autoresearch boot` run._ + + diff --git a/.agents/skills/ade-perf-lanes/SKILL.md b/.agents/skills/ade-perf-lanes/SKILL.md new file mode 100644 index 000000000..a681ace02 --- /dev/null +++ b/.agents/skills/ade-perf-lanes/SKILL.md @@ -0,0 +1,91 @@ +--- +name: ade-perf-lanes +description: Performance practices for ADE's Lanes tab. Read before editing + files under apps/desktop/src/renderer/components/lanes/**, + apps/desktop/src/renderer/state/appStore.ts, or + apps/desktop/src/main/services/lanes/**. Preserve these patterns unless a new + measured UI audit proves a better one. +metadata: + author: ade-autoresearch + version: 0.2.0 + status: active +--- + +# ade-perf-lanes + +Use this as engineering guidance for keeping the Lanes tab fast while adding features. The Lanes tab is a dense workspace: lane list, branch selector, stack graph, Work pane, Git Actions, dialogs, history, diff viewer, and runtime/session state all coexist. Small refresh choices can easily multiply into visible UI noise. + +## Testing posture + +- Test the actual `/lanes` route in the Electron dev app. Do not treat the Work tab with a lane selector as Lanes parity. +- Drive visible UI actions and mark each segment with `window.ade.perf.recordEvent({ kind: "manualStep", ... })`. Deterministic scenarios are regression guards, not a substitute for clicking through the tab. +- Keep a private `perf-pass` GitHub repo available for real fetch/push/pull behavior. It is safe to create throwaway lanes, commits, stashes, and branches there. +- When an action is destructive or externally visible, exercise the prompt/preflight by default. Execute the final action only when the user has allowed it or the target is clearly disposable. + +## Refresh rules + +- Use full decorated snapshots only when runtime decorations, conflict status, rebase suggestions, or auto-rebase state are truly needed. +- For Git Actions local operations such as stage, commit, fetch, push, pull, and history refresh, prefer `refreshLanes({ includeStatus: true, includeSnapshots: false })`. This updates lane Git status without rebuilding runtime/rebase/conflict snapshot decorations. +- For runtime-only updates from Work pane sessions, use `refreshLanes({ includeStatus: false, includeSnapshots: true, includeConflictStatus: false, includeRebaseSuggestions: false, includeAutoRebaseStatus: false })`. Preserve prior lane Git status while refreshing runtime buckets. +- For metadata-only updates such as lane color/appearance, use `refreshLanes({ includeStatus: false })` and preserve prior `status` / `parentStatus` in the store. A color change must not recompute Git status. +- Avoid calling bare `refreshLanes()` from new Lanes UI handlers. Treat it as the expensive path and document why a full refresh is required. + +## Pane and poller rules + +- Expanded/fullscreen panes must unmount the corresponding inline pane body when the duplicate would keep effects alive. CSS hiding is not enough. +- Git Actions should have at most one active polling/effect owner per visible lane. Timers must clean up on lane switch, pane minimize, and fullscreen transitions. +- Poll only visible or active surfaces. Hidden lanes, hidden panes, minimized panes, and closed dialogs should not keep expensive Git, PR, Linear, AI, or runtime status requests alive. +- Background sync/local-runtime failures should be fast-pathed in disabled perf/dev modes at the IPC boundary. Do not make every renderer caller catch slow "service unavailable" failures. +- Presence updates should be idempotent and de-duped by lane/signature so filter changes, layout switches, and tab clicks do not spam sync IPC. + +## Dialog and menu rules + +- Fetch heavy dialog data on open, not on page load. Branch lists, Linear issues, unregistered worktrees, delete-risk preflights, and PR metadata should be lazy and cancelable. +- Do not precompute delete, rebase, merge, force-push, or cherry-pick risk for every lane. Compute it only when the user opens that flow. +- Color/appearance changes should update cheaply. They are not a reason to rebuild snapshots or rerun Git status. +- Create-lane flows may be expensive because they create worktrees and initialize environment state. Keep that cost isolated to submit; opening and editing the dialog should stay light. + +## Git Actions rules + +- Keep local change operations scoped. Stage, unstage, commit, stash, and discard should refresh the active lane's change model and lane Git status, not all snapshot decorations. +- History and diff controls should fetch only the selected commit/file data. Split/unified, wrap, line-number, and copy-path controls should be renderer-local after the file/patch is loaded. +- Network actions are allowed to cost real time. `fetch`, `push`, and child-lane creation can dominate a trace; do not optimize them by hiding progress or skipping correctness checks. +- Save Changes currently stashes tracked changes and can leave untracked files visible. If changing that behavior, treat it as functionality work and add tests before using it as a perf cleanup. + +## Proven patterns + +### Skip disabled local runtime bridge calls +- **Why it helped**: When `ADE_DISABLE_LOCAL_RUNTIME_DAEMON=1`, preload still attempted local-runtime action/sync/event IPC before falling back to desktop IPC. Real `/lanes` runs showed slow `ade.localRuntime.*` spans and `lanes.idle-at-rest` hit V8 OOM before summary. +- **Apply when**: Perf/dev launches disable the daemon but traces show slow `ade.localRuntime.callAction`, `ade.localRuntime.callSync`, or `ade.localRuntime.streamEvents`. +- **Avoid**: Renderer-only caches that hide the symptom while the unavailable transport still burns time. +- **Verification**: Baseline `lanes-20260511-1721-real-baseline-*` had local-runtime slow channels and idle OOM. Post-change `lanes-20260511-1725-real-optimized-{cold,switch,idle,scroll,stress}` passed with total fitness `7028.82` and no `ade.localRuntime.*` channels. + +### Suppress hidden duplicate fullscreen pane bodies +- **Why it helped**: Expanding Git Actions mounted the fullscreen pane while leaving the inline `LaneGitActionsPane` body alive, producing duplicate toolbars and duplicated effects. +- **Apply when**: A Lanes expanded/fullscreen overlay reuses pane configs and a DOM snapshot shows duplicate pane bodies or repeated test regions while only one is visible. +- **Avoid**: CSS-only hiding for duplicate pane bodies. +- **Verification**: Git Actions expand went from 2 toolbars to 1. IPC dropped from 30 calls / 223 ms in `lanes-expand-prefix-20260511` to 29 calls / 192 ms in `lanes-expand-postfix-20260511`. + +### Fast-path disabled sync status and presence +- **Why it helped**: Perf-mode Lanes traces hit `ade.sync.getStatus` and `ade.sync.setActiveLanePresence` even though the local runtime daemon and in-process sync service were unavailable. Failed calls cost about 250-370 ms each. +- **Apply when**: `ADE_DISABLE_LOCAL_RUNTIME_DAEMON=1` and traces show failed `ade.sync.*` calls with "Sync service is not available." +- **Avoid**: Removing Lanes presence calls globally or catching every failure in the renderer. +- **Verification**: `lanes-expand-postfix-20260511` had 4 failed sync IPC calls totaling 1145 ms. `lanes-sync-postfix-20260511` had 3 successful sync IPC calls totaling 2 ms. + +### Scope Git Actions refreshes to lane status +- **Why it helped**: Stage/commit/fetch used to call full `listSnapshots`, rebuilding runtime and decoration state for local Git actions. The scoped path keeps status fresh and skips snapshot decorations. +- **Apply when**: A Git Actions handler finishes local Git work and calls bare `refreshLanes()`. +- **Avoid**: Recomputing runtime/rebase/conflict decorations after every stage, commit, stash, fetch, or history refresh. +- **Verification**: Before the change, `lanes-full-ui-audit-20260511-01` stage cycle spent 551 ms in `ade.lanes.listSnapshots`; commit spent 278 ms in `listSnapshots`. After the change, `lanes-refresh-light-20260511` stage used `ade.lanes.list` at 40 ms with no `listSnapshots`, and typed commit used `ade.lanes.list` at 213 ms with no `listSnapshots`. + +### Split runtime refresh from Git status refresh +- **Why it helped**: Work pane pty/chat updates need runtime buckets, not fresh Git status for every lane. Runtime-only snapshot refresh avoids Git-status recompute while preserving prior status in store. +- **Apply when**: A session/runtime event updates running/awaiting/ended counts. +- **Avoid**: Calling full snapshots with `includeStatus:true` for runtime-only changes. +- **Verification**: `lanes-refresh-light-20260511` runtime snapshot refresh used `ade.lanes.listSnapshots` with `includeStatus:false`; the only runtime snapshot call was 76 ms while prior Git status stayed intact. + +### Keep appearance refresh metadata-only +- **Why it helped**: Lane color changes in manage/context flows previously refreshed full decorated snapshots. Appearance is metadata and should use the statusless list path while preserving previous Git status in the store. +- **Apply when**: New lane metadata or appearance handlers update color/name/description without changing branch state. +- **Avoid**: Bare `refreshLanes()` after appearance-only updates. +- **Verification**: Real UI manage-dialog trace `lanes-refresh-light-20260511` showed `ade.lanes.updateAppearance` at 1 ms followed by an unnecessary `ade.lanes.listSnapshots` at 308 ms. The lightweight path uses `refreshLanes({ includeStatus: false })` instead. diff --git a/.agents/skills/ade-perf-work/SKILL.md b/.agents/skills/ade-perf-work/SKILL.md new file mode 100644 index 000000000..8b46b955e --- /dev/null +++ b/.agents/skills/ade-perf-work/SKILL.md @@ -0,0 +1,458 @@ +--- +name: ade-perf-work +description: Performance and UX patterns discovered for ADE's Work tab, including chat/CLI/shell launch surfaces, the Work tools pane, Git/Files/iOS/App Control/Browser/Mac VM panels, and local-runtime-disabled perf runs. Read before editing Work tab code. +metadata: + author: ADE + version: 0.1.0 +--- + +# ade-perf-work + +Read this before editing Work tab surfaces: + +- `apps/desktop/src/renderer/components/terminals/**` +- `apps/desktop/src/renderer/components/chat/**` when mounted from Work +- `apps/desktop/src/renderer/components/lanes/LaneGitActionsPane.tsx` +- `apps/desktop/src/renderer/components/lanes/CommitTimeline.tsx` +- `apps/desktop/src/renderer/components/files/FilesPage.tsx` when embedded +- `apps/desktop/src/preload/preload.ts` +- `apps/desktop/src/main/services/ipc/registerIpc.ts` +- Work-facing tool services for iOS Simulator, App Control, built-in browser, and macOS VM + +## Measurement pattern + +Use the real Work tab first. For local perf runs, reset and open the perf-pass repo: + +```bash +scripts/reset-perf-pass.sh +NO_DEVTOOLS=1 ADE_DISABLE_LOCAL_RUNTIME_DAEMON=1 ADE_LOCAL_RUNTIME_FALLBACK=1 ADE_MODEL_OVERRIDE=gpt-5-codex \ + node scripts/perf-launch.mjs --tab work --run-id work-ui-audit-- +``` + +Keep the Work audit matrix at `docs/perf/work-tab-action-inventory.md` current. +Rows start as source inventory. Promote them only when a real UI run, UI-derived +probe, or focused fixture test covers that exact control/state. Do not describe a +run as complete while rows are still `source`, `fixture-needed`, +`sandbox-only`, `prompt-only`, or `external-skip` without evidence or an explicit +reason. + +Use the matrix as the Work pass queue. Enumerate the visible Work actions and +states, then handle rows one by one: drive the real control, record markers, +promote or classify the row, fix at most one discovered bottleneck/bug, re-drive +the same row to prove the change, and only then advance. Backend/source review is +supporting evidence; it does not replace trying the UI action. + +Drive actual Work UI actions and record `work.audit.*` markers for: + +- session search/filter, tab/grid, Chat/CLI/Shell mode switches +- model picker, attachment picker, slash command picker, parallel model configuration +- Work tools pane open/close +- Git: status, More menu, history refresh, diff selection +- Files: mount and path filtering +- iOS Sim, App Control, Browser, Mac VM panel mounts + +Do not start from a fixed deterministic scenario list. Scenario files are only +optional evidence after the Work UI inventory exposes a real workflow. The +important proof is a UI-derived run over +`~/.ade/perf-runs//events.jsonl` plus focused tests for any fixed +behavior. + +## Current known wins + +### Skip classic GitHub repo probes during shell status + +`githubService.getStatus()` must still validate the token and keep the +fine-grained-token repo access probe. For classic tokens, required scopes are +inspectable from `/user`, so do not also probe the active repo on the Work +startup status path. + +Measured Work runs: + +- Before: `ade.github.getStatus` `644ms` during Work startup. +- After: post-fix samples were `473ms` and `192ms` in the hot-reload run, and + `524ms` in a clean Work launch. + +Keep `repoAccessOk` / `repoAccessError` as `null` for classic-token statuses. +Do not remove the fine-grained-token repo probe; it prevents a false connected +state when a fine-grained token cannot read the active repo. + +### Keep GitHub auth status out of the first Work startup window + +The top-bar Publish pill only needs local origin state. It must call the +lightweight `github.getRemoteStatus` path, not full `github.getStatus`, so Work +startup does not validate GitHub auth just to decide whether to show Publish. + +The AppShell GitHub banner/avatar refresh may still call full `github.getStatus`, +but keep it post-startup. Settings save/clear broadcasts still update the shell +immediately through `onStatusChanged`. + +Measured Work runs: + +- Before shell auth delay: first `10s` IPC total `901ms`, with + `ade.github.getStatus` at `293ms` in the startup window. +- After: first `10s` IPC total `633ms`; `ade.github.getStatus` was absent from + the first startup summary. The auth check still ran later at `301ms`. +- `ade.github.getRemoteStatus` stayed `0ms` in the startup path. + +### Bound local usage cost-log scans + +The usage tracker starts in Work perf launches after the startup delay and scans +local Claude/Codex JSONL logs for cost estimates. Do not read every recent log +file into memory. Scan only bounded recent files, skip oversized files, stream +line-by-line, and cap retained token entries. + +Measured Work runs: + +- Before: clean Work launch crashed with V8 heap OOM; main-process heap reached + `1,867.7MB` around the `usage.start` window. +- After: clean Work launch stayed alive past the same window with main-process + heap max `130.9MB` and CPU p95 `0.18%`. + +If changing usage/cost telemetry, test with a large local `~/.codex` / +`~/.claude` history and re-run a Work perf launch for at least 45 seconds. + +### Skip the local runtime bridge when the local daemon is disabled + +In Work perf runs with `ADE_DISABLE_LOCAL_RUNTIME_DAEMON=1`, the preload bridge must not try `ade.localRuntime.callAction`, `ade.localRuntime.callSync`, or `ade.localRuntime.streamEvents` for local project bindings. + +Measured Work cold run: + +- Before: `67` failed `ade.localRuntime.*` IPC calls, `19,427ms` aggregate failed IPC time. +- After: `0` failed `ade.localRuntime.*` IPC calls. + +Keep the preload guard in `apps/desktop/src/preload/preload.ts` intact. If changing project binding or remote runtime event pump logic, re-run a local-runtime-disabled Work audit and confirm local runtime IPC remains zero. + +### Fast-path sync status when the local runtime daemon is disabled + +`ade.sync.getStatus` is called during Work startup and periodically from shell chrome. In local-runtime-disabled runs it must not spawn the disabled runtime or fail before falling back. + +Measured Work run: + +- Before sync fix: `ade.sync.getStatus` failed and consumed `606ms` across two calls in the startup window. +- After: the same status path returned successfully in `0-1ms`. + +Keep the unavailable sync snapshot in `apps/desktop/src/main/services/ipc/registerIpc.ts`. It is a perf-mode/status fallback, not a replacement for real sync service behavior. + +### Work tools pane must remain operable when narrow + +The Work tools pane can be narrow after the session list, chat surface, and tools pane are all visible. Do not assume all tab labels fit. The tab strip should collapse to icon buttons under narrow widths while preserving `aria-label`, tooltips, and stable hit targets. + +Measured UI pass after compact tabs: + +- Git, Files, iOS Sim, App Control, Browser, and Mac VM tabs were all visible and clickable in the narrow tools pane. +- No tools tab had a bounding rect beyond the renderer viewport. + +If changing `WorkSidebar`, verify with a small Work pane and a larger audit window. The target is no clipped or unreachable tool tabs, not merely no TypeScript errors. + +### Session list filters must wrap in narrow panes + +The sessions pane can be squeezed when Work, the sessions list, and the tools pane are all visible. Status/group filter pills must wrap inside the filter panel instead of assuming a single row, and embedded lane selectors must be able to fill a narrow parent without overflowing it. + +Measured Work run `work-inventory-shell-session-20260512-01`: + +- Before: a `120.2px` filter panel had status/group controls extending `122.8px` past the panel edge. +- After: the rightmost control ended `8.0px` inside the same narrow panel. + +When editing `SessionListPane` filter controls or `LaneCombobox` trigger sizing, verify a small Work viewport with the filter panel open and record the panel/control bounds in the action inventory. + +### Context menus must clamp to the renderer viewport + +Session and Work-tab context menus are fixed-position menus created from the event `clientX/clientY`. Do not place them directly at the click point without measuring the rendered menu size and clamping to the viewport. + +Measured Work run `work-context-menu-edge-20260512-01`: + +- Before: a Work-tab context menu opened at `left=534.0px` in a `582px` viewport and overflowed right by `132.0px`. +- After: the same probe placed the `180px` menu at `left=394.0px`, ending `8.0px` inside the viewport. + +When changing `SessionContextMenu`, verify with a small Work viewport and a context menu opened near the right edge. Keep the inset stable so users can still reach every menu item. + +Files context menus have the same requirement. The Files tree context menu is +rendered by `FilesPage.tsx`; measure its actual rendered width/height and clamp +both axes to the viewport. + +Measured Work run `work-chat-other-controls-20260512-01`: + +- Before: right-clicking the `README.md` row near the viewport edge opened the + menu at `x=1163.0px` with `width=200px` in a `1164px` viewport, overflowing + right by `199px` and bottom by `159.1px`. +- After: the same probe placed it at `x=956.0px`, `right=1156.0px`, + `bottom=737.0px` in the `1164x745` viewport. + +When changing Files context-menu contents or row context-menu handling, verify +with a right-click near the renderer's right/bottom edges. + +Embedded Files has a narrower parent than the full Files route. Keep the Work +tools pane layout responsive: the explorer/editor split should stack when +`FilesPage` is embedded rather than preserving a fixed `320px` explorer column +that pushes editor controls offscreen. + +Measured Work run `work-chat-other-controls-20260512-01`: + +- Before: the embedded explorer ended at `right=1170.8px` in a `1164px` + viewport, the editor collapsed to `1.8px`, and the `CODE` button overflowed + right by `92.2px`. +- After: explorer and editor both fit inside the tools pane at `right=1151.6px`; + the `CODE` button ended at `right=953.8px` with no overflow. + +### Chat controls depend on provider/model state + +The Work Chat composer changes its visible controls after provider/model +selection. Do not mark a control missing until the matching provider state is +set through the real model picker. + +Measured Work run `work-chat-controls-20260512-02`: + +- `Fast mode` appeared only after selecting a Codex model with a fast service + tier (`GPT-5.4` in the measured run). The valid marker toggled + `aria-pressed` from `false` to `true`. +- The Codex approval preset menu exposed `Default permissions`, `Plan mode`, + `Full access`, and `Custom (config.toml)` after a Codex model was selected. +- Parallel setup starts with two visible model slots; `Add model` should + increase the visible slot count, `Configure` should become `Editing`, and the + uppercase `FOCUSED` / `PARALLEL` execution controls should be checked with a + viewport-aware probe before promotion. +- In a fresh empty perf-pass chat, the slash command menu may expose only + `/clear Clear chat history`. Do not promote `work.chat.command.select` from + that state; use a fixture or real session state with a non-clear command. +- Handoff controls are not an empty-draft surface. They appear only for standard + locked Work chats. Use focused `AgentChatPane.submit.test.tsx -t "handoff"` + evidence for open/permission/launch logic unless you intentionally create a + real throwaway chat in perf-pass. +- The proof/artifacts drawer is also not exposed on the empty draft surface in + Work. Use a selected standard chat/session or a focused companion-toolbar + fixture before measuring proof drawer open/close or right-pane resize. +- In the Work tab embedding, `AgentChatPane` is rendered with + `hideLaneToolDrawers` from `WorkViewArea` / `WorkStartSurface`, so the + chat-toolbar iOS and App Control drawer buttons are intentionally absent. + Verify Work coverage through the WorkSidebar tabs and panels; keep exact + drawer-button rows as fixture-needed/non-Work `AgentChatPane` fixture + evidence if they remain in the inventory. +- Cursor permission/mode controls and Cursor Cloud actions only appear after a + Cursor-backed model is selected through the real model picker. In the measured + run, selecting a Cursor SDK model exposed a native mode select with `Agent`, + `Ask`, `Plan`, and `Full auto`; the `Cursor Cloud actions` menu exposed + `Send to Cursor Cloud` and `Open existing cloud chat` without launching a + cloud session. `CursorCloudInlineLaunch.test.tsx` covers canceling an inline + Cursor Cloud send without creating a run. +- `work.chat.composer.clear` is not an empty-draft affordance in the current + Work composer. The visible `Clear` button is gated by `turnActive` alongside + steer/stop controls, so measure it with an active-turn fixture or sandbox chat + session rather than by typing into the fresh draft surface. +- The Browser tools panel can be measured from the empty Work surface for tab + creation, tab switching/closing, URL typing, and inspect toggling. Browser + screenshot crop is different: in the empty tools panel it is disabled with + `Chat context is unavailable here`, so measure screenshot start/cancel from a + selected chat/context-capable surface. +- Chat attachment search uses the composer lane, not the Files tools workspace. + In perf-pass, switch the composer lane to `Primary` before searching for + `README.md`; a missing lane worktree legitimately returns no file results. + For React-controlled picker inputs, prefer focused CDP text input over simply + assigning `input.value`, because DOM-only value changes can leave the picker + state at `Type to search files...`. Text-file chips cover select/remove only; + `open-preview` and `copy-image` need an image attachment fixture. +- `ChatAttachmentTray.test.tsx` is valid focused fixture evidence for image + attachment preview/copy behavior: it mocks `getImageDataUrl`, verifies the + image lightbox opens, and verifies `writeClipboardImage` is called from the + copy control. +- `AgentChatMessageList.test.tsx` has valid focused fixture evidence for + transcript message copy, cloud PR browser navigation, file-link routing, + transcript code-block copy, tool-call disclosure expansion, full-prompt + toggles, memory/thought disclosure toggles, manual transcript scroll, + jump-to-latest, user-message minimap jump, and inline question tab/prev-next + controls. Keep assistant markdown code blocks routed through + `HighlightedCode`; that component owns the `Copy code` control and copy-button + placement preference. Match the inventory row to the exact clicked control; do + not use the localhost URL test as PR-browser evidence. Grouped tool results + render through `ChatWorkLogBlock`, not the old standalone `ToolResultCard`; + keep long-result `show all` / `collapse` behavior on that reachable + `ChatWorkLogBlock` row path. +- `AgentChatPane.submit.test.tsx` has valid focused fixture evidence for + selecting chat tabs in the Work chat pane. Its CLI-created terminal tests only + auto-reveal terminal tabs; use `ChatTerminalDrawer.test.tsx` for terminal + drawer toggle, resize, and manual tab-switch evidence. +- `AgentChatPane.companionDrawers.test.tsx` covers chat companion iOS, App + Control, proof drawer open/close, right-pane split resize, archived-chat + restore, and persistent identity `Clear view` through the real + `AgentChatPane` chrome. This is non-Work fixture evidence for the lane tool + drawers: normal Work embedding passes `hideLaneToolDrawers`, so iOS/App + Control drawer buttons are intentionally absent there. +- `ChatSubagentsPanel.test.tsx` can cover the subagents drawer toggle, detail + view, Back navigation, hidden timeline expansion, and copy-id behavior without + spawning real subagents. +- `AgentChatComposer.test.tsx` can cover the active-turn `Clear` composer + control, queued steer edit/remove callbacks, prompt-suggestion Tab + acceptance, rich visual-context chip select/remove, slash-command selection, + and dismissing an attachment error. `FilesPage.test.tsx` can cover the + non-embedded Files editor theme toggle, primary-workspace `TRUST & EDIT` + toggle, and Files tree context-menu `COPY PATH`. Embedded Work Files does not + render those non-embedded chrome controls, so keep live embedded fail markers + separate from focused Files chrome coverage. +- `SessionListPane.test.tsx` can cover the stale running-session warning, child + shell section collapse/expand, and bulk restore footer wiring. + `PackedSessionGrid.test.tsx` covers persisted tile resize, while + `WorkViewArea.test.tsx` covers selecting a tiled grid session by clicking its + body, closing an ended tab from the tab strip, and embedded floating-pane + minimize/expand through the actual `FloatingPane` chrome context. +- `WorkStartSurface.test.tsx` can cover the `lanes=[]` no-lanes empty state + without mutating the perf-pass lane database. +- Browser back/forward/reload can use a localhost two-page fixture. For + `Stop loading`, do not count a URL-open click while the toolbar is still + `busy`; that leaves the real Stop button disabled. Use direct browser API + navigation only as setup for a slow localhost page, wait for the visible Stop + button to become enabled, then click the real button and verify loading + returns to false. If `captureScreenshot` times out before crop mode appears, + record it as invalid screenshot evidence and keep screenshot start/cancel + fixture-needed. +- Browser selected-context rows can use direct `setBounds`/`selectPoint` only to + seed a localhost selected element; measure the real UI controls afterward. + The composer may be a `textarea[aria-label="Type to vibecode"]` rather than a + rich contenteditable after cleanup, so verify `Insert draft` against the + actual textarea value. Auto-attached browser context chips must be removed + before handing off. +- `ChatBuiltInBrowserPanel.test.tsx` can cover starting screenshot crop mode, + canceling crop mode, and clicking the visible Attach control for an already + selected browser element. It does not cover dragging a crop region; keep crop + drag sandbox-only or measure it in a browser-capable throwaway run. +- After forcing BrowserView bounds, switch the Work tools pane away from + Browser and verify `builtInBrowser.getStatus().visible === false` before + measuring unrelated chat-toolbar controls. The chat Git commit-message input + can then be measured with CDP mouse input without submitting; clear the input + value afterward, and treat close/submit behavior as separate evidence. Broad + Work UI coordinate probes did not open the Quick Run Radix menu, but + `ChatGitToolbar.test.tsx` covers the current-lane navigation button and the + embedded `QuickRunMenu` trigger with pointerdown/up; use that as fixture + evidence unless a future real-UI probe opens the menu reliably. +- The detached App Control tools pane can safely measure local text entry in + `App Control launch command` and `CDP port` if the fields are restored and + Run/Connect are not clicked. `Help wire CDP` is not rendered there unless + `canAttachToChat` is true, and Control/Inspect mode plus focused-element + typing need an active app session. +- `ChatAppControlPanel.test.tsx` can cover safe App Control state controls + without launching or driving an app: selecting a configured Run-tab command, + inserting the Help wire CDP draft, showing the launch terminal, refreshing a + snapshot, dismissing the message, toggling Control/Inspect, typing in the + local focused-element text field, switching controlled windows, and re-scanning + controlled windows. It can also cover hovering an inspected screenshot + element, selecting a source-context point, and re-attaching that selected + point. Do not use it as evidence for screenshot control clicks, + element-crop attachment, Type/send, Stop, Run, or Connect. +- Mac VM provisioning fields are safe to measure as state-only controls when + restored afterward: CPU, memory, disk, display, source mode, source image, and + local viewer checkbox. Do not click Start, Provision, Stop, Focus, + Screenshot, Delete, or Setup docs during a normal inventory pass unless the + row state explicitly allows that sandbox/external/prompt path. +- `MacosVmPanel.test.tsx` can cover selecting a mocked VM screenshot point and + editing the local VM text input without clicking the externally visible + Click/Type actions. Keep actual VM click/type/send rows sandbox-only unless a + throwaway VM run is explicitly allowed. +- For iOS Simulator tools, switch the Work lane to a real existing worktree + before measuring status, launch-target, or Preview Lab refresh. Missing lane + worktrees produce useful fixture evidence but should not be the only proof for + `refresh-state` or `preview-refresh`. Surface switches, Preview Control / + Capture mode, and the preview agent-action selector are safe state-only + controls; Ask-agent/#Preview prompt buttons need a chat-capable fixture or + explicit clipboard/draft handling. +- `ChatIosSimulatorPanel.test.tsx` can cover iOS stream retry/recovery by + emitting a `stream-error` event and verifying the panel restarts the stream + for the selected device. It can also cover state-only launch-target select, + Preview-target select, setup install-command copy, Preview `Ask agent`, and + no-target `Ask agent to add a #Preview` without launching Simulator or Xcode. + The same fixture can cover live simulator Control/Inspect mode switches, + active-app text-field entry before Send, inspector refresh, and starting + screenshot capture mode. It can also cover selecting a mocked ADE-inspector + element and opening Preview Lab scoped to that element's source file; keep + Preview rendering as separate sandbox-only evidence. It also covers clearing + stale launch-target/project-root errors after the project root changes and a + later `listLaunchTargets` call succeeds; preserve that behavior when editing + iOS refresh or lane/project-root handling. It does not cover sending text or + dragging a screenshot crop. +- The iOS Simulator device selector can be measured without booting a simulator: + switch to the iOS Sim tools tab, change the populated device `