perf(workspace): batch deps-cache invalidation into one workspace fs scan#10445
perf(workspace): batch deps-cache invalidation into one workspace fs scan#10445davidfirst wants to merge 6 commits into
Conversation
…scan The deps fs-cache freshness check ran a recursive globby per component that followed each component's node_modules symlink into the shared workspace node_modules (226k of 230k scanned entries), 313x per command. Replace it with a single node_modules-ignoring workspace scan, memoized as a command-scoped mtime index on FsCache and invalidated via the workspace's clear-cache hooks (so watch stays correct), with a per-component fallback. Cuts warm `bit status` fs syscalls ~40% (74.3k -> 44.8k); read traffic unchanged. Warm-wall-neutral on fast SSD (I/O-wait overlapping CPU), a real win on cold/CI/networked filesystems.
Code Review by Qodo
1. Symlinked scopes not tracked
|
PR Summary by Qodoperf(workspace): batch deps-cache invalidation with shared workspace mtime index Description
Diagram
High-Level Assessment
Files changed (6)
|
Code Review by Qodo
1. Stuck index build promise
|
…nd watch races Address qodo review on the deps-cache invalidation index: - clear the in-flight build promise in `finally`, so a transient build rejection (glob/stat error) no longer poisons all later reads in a long-lived process (watch/start). - in `deleteComponentMtimeIndexEntry`, bump the generation when a build is in-flight, so a watch-triggered clear that races the first build discards that build's result instead of caching the now-stale entry as canonical. - fall back to the per-component scan if the centralized index build throws, preserving fault isolation (one bad dir no longer fails every load).
|
Code review by qodo was updated up to the latest commit db242dc |
…eeds The first cut ignored node_modules entirely, which is a correctness regression: auto-detect resolves imports *through* node_modules and reads each direct dep's package.json (name/componentId), so the cached result depends on node_modules content — ignoring it drops the invalidation safety net. Restrict instead of ignore: scan component source (excluding the deep node_modules subtree) plus each component's `node_modules` and `node_modules/@scope` directory mtimes. A dep add/remove/version-relink/ componentId-change all go through a relink that bumps those dirs, and the dependency-tree builder stops at the package boundary, so the deep tree and transitive store are irrelevant to the cache. Warm `bit status` fs syscalls 74.3k -> 46.9k (~37%); correctness preserved (verified: source, node_modules, and @scope changes all invalidate).
|
Code review by qodo was updated up to the latest commit 3b9a34f |
…mponent scan instead The shared command-scoped index added memoization/invalidation complexity (and the two reliability bugs qodo flagged) for no syscall benefit — batching saves nothing here, and it over-scanned single-component commands (bit show would scan all components to check one). Reverted it (fs-cache, workspace, dependencies-loader back to their original state; deps fs-cache and per-component invalidation untouched). Keep only the actual win: the per-component freshness scan (getLastModifiedDirTimestampMs) now stops at the node_modules boundary and takes just the node_modules + node_modules/@scope dir mtimes — auto-detect never traverses into a package, so the deep tree/transitive store are irrelevant. Warm `bit status` fs syscalls 74.3k -> 46.4k (~37%); correctness verified (source / node_modules / @scope changes all invalidate).
|
Simplified the approach (commit 7e1857b): dropped the shared command-scoped mtime index entirely and instead just restrict the existing per-component freshness scan to the node_modules footprint the cache actually needs. Why: the shared index added memoization/invalidation complexity (including the two reliability issues flagged above) for no syscall benefit — batching saves nothing here (measured: per-component 46.4k vs batched 46.9k), and it over-scanned single-component commands ( Net change is now ~1 file: |
|
Code review by qodo was updated up to the latest commit 7e1857b |
| const scopeDirs = await globby(`${rootDir}/node_modules/@*`, { | ||
| onlyDirectories: true, | ||
| followSymbolicLinks: false, | ||
| }); |
There was a problem hiding this comment.
1. Symlinked scopes not tracked 🐞 Bug ≡ Correctness
getLastModifiedDirTimestampMs() uses globby() with { onlyDirectories: true, followSymbolicLinks:
false } for ${rootDir}/node_modules/@*, which can omit @scope entries that are symbolic links;
changes under those scopes won’t affect the freshness signal and can incorrectly reuse stale
deps-cache.
Agent Prompt
### Issue description
`getLastModifiedDirTimestampMs()` builds `scopeDirs` via `globby(`${rootDir}/node_modules/@*`, { onlyDirectories: true, followSymbolicLinks: false })`. If an `@scope` folder in `node_modules` is represented as a symlink, it may be excluded from `scopeDirs`, so changes inside that scope (e.g., relinks under `@scope/`) won't bump the computed last-modified timestamp and the deps fs-cache can be treated as fresh incorrectly.
### Issue Context
Elsewhere in the repo, node_modules entries (including scoped ones) are explicitly treated as potentially being either directories or symlinks (and scoped entries are recursed into), so the scan should include symlinked `@scope` roots as well.
### Fix Focus Areas
- scopes/toolbox/fs/last-modified/last-modified.ts[17-32]
### Suggested fix
Replace the `globby(`${rootDir}/node_modules/@*`, ...)` call with a non-recursive directory listing of `${rootDir}/node_modules`:
- `readdir(node_modulesPath, { withFileTypes: true })`
- filter entries whose name starts with `@` and where `dirent.isDirectory()` **or** `dirent.isSymbolicLink()`
- add `${rootDir}/node_modules/<scopeName>` paths to `scopeDirs`
This avoids deep traversal while still capturing symlinked scope directories.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
|
Code review by qodo was updated up to the latest commit 5bae110 |
Part of the component-loading redesign (
scopes/workspace/workspace/component-loading-redesign.md, Phase 2).The deps fs-cache freshness check ran a recursive
globbyper component that followed each component'snode_modulessymlink into the shared workspacenode_modules— 226k of 230k scanned entries, run 313× per command. This replaces it with a command-scoped mtime index built from a few batched scans, memoized onFsCacheand invalidated through the workspace's existing clear-cache hooks (sowatch/startstay correct), with a per-component fallback.Restricting the node_modules traversal — correctness matters here. The cached "dependencies-data" is the auto-detect result, which resolves imports through
node_modulesand reads each direct dep'spackage.json(name/componentId), so the cache genuinely depends onnode_modulescontent — it can't simply be ignored. But the dependency-tree builder stops at the package boundary (it never traverses into a package or the transitive store), so the onlynode_modulessignal the cache needs is each component'snode_modulesandnode_modules/@scopedirectory mtimes — a dep added / removed / version-relinked / componentId-changed all go through a relink that bumps those dirs. The deep tree is irrelevant. So the scan is: component source (excluding the deepnode_modulessubtree) + those dir mtimes.Measured on this repo's workspace (~313 components): warm
bit statusfilesystem syscalls 74.3k → 46.9k (~37% fewer);readFiletraffic unchanged (checked against the bootstrap fs-read e2e metric). Correctness verified: source,node_modules, and@scopechanges all invalidate the cache.Warm wall is ~flat on a fast local SSD (the eliminated work is I/O-wait that overlaps with CPU on the single JS thread); the win lands on cold / CI / networked filesystems. §4.1 of the redesign doc has the full breakdown (and corrects an earlier "object materialization" misread — deserialize is 9ms).