6.4x speedup of AST materialization in JS API#2835
6.4x speedup of AST materialization in JS API#2835auvred wants to merge 9 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes the _packages/api JS client-side “remote AST” materialization path, aiming to significantly reduce the overhead of constructing RemoteSourceFile/RemoteNode trees, and updates the benchmarks to measure that materialization work more directly.
Changes:
- Updates the sync/async benchmarks to “materialize” by reconstructing a fresh
RemoteSourceFilefrom the underlying binary buffer instead of walking/caching children. - Refactors
RemoteNode/RemoteNodeListto reduce per-node/list work (e.g.,idbecomes a getter, children are sourced from a pre-builtnodesarray, and index accessors are adjusted). - Adds a hot-loop initialization in
RemoteSourceFileto pre-create node wrappers and avoid per-access map caching.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
_packages/api/test/api.sync.bench.ts |
Adjusts materialization benchmark to reconstruct RemoteSourceFile from the binary backing data. |
_packages/api/test/api.async.bench.ts |
Same as sync benchmark, but for the async API bench harness. |
_packages/api/src/node.ts |
Implements new materialization strategy (pre-built nodes array), changes id to getters, and revises RemoteNodeList index accessor behavior. |
| return this.at(0); | ||
| } | ||
| get 1(): RemoteNode { | ||
| return this.at(1); | ||
| } | ||
| get 2(): RemoteNode { | ||
| return this.at(2); | ||
| } | ||
| get 3(): RemoteNode { | ||
| return this.at(3); | ||
| } | ||
| get 4(): RemoteNode { | ||
| return this.at(4); | ||
| } | ||
| get 5(): RemoteNode { | ||
| return this.at(5); | ||
| } | ||
| get 6(): RemoteNode { | ||
| return this.at(6); | ||
| } | ||
| get 7(): RemoteNode { | ||
| return this.at(7); |
There was a problem hiding this comment.
RemoteNodeList now defines fixed index getters (get 0() … get 15()) unconditionally. For lists with length < 16 (including length === 0), these getters will still call this.at(n) and can return an unrelated node instead of undefined, changing normal array out-of-bounds behavior (e.g. list[0] on an empty list). Add bounds checks in these getters (or only define them when n < this.length).
| return this.at(0); | |
| } | |
| get 1(): RemoteNode { | |
| return this.at(1); | |
| } | |
| get 2(): RemoteNode { | |
| return this.at(2); | |
| } | |
| get 3(): RemoteNode { | |
| return this.at(3); | |
| } | |
| get 4(): RemoteNode { | |
| return this.at(4); | |
| } | |
| get 5(): RemoteNode { | |
| return this.at(5); | |
| } | |
| get 6(): RemoteNode { | |
| return this.at(6); | |
| } | |
| get 7(): RemoteNode { | |
| return this.at(7); | |
| return this.length > 0 ? this.at(0) : (undefined as any); | |
| } | |
| get 1(): RemoteNode { | |
| return this.length > 1 ? this.at(1) : (undefined as any); | |
| } | |
| get 2(): RemoteNode { | |
| return this.length > 2 ? this.at(2) : (undefined as any); | |
| } | |
| get 3(): RemoteNode { | |
| return this.length > 3 ? this.at(3) : (undefined as any); | |
| } | |
| get 4(): RemoteNode { | |
| return this.length > 4 ? this.at(4) : (undefined as any); | |
| } | |
| get 5(): RemoteNode { | |
| return this.length > 5 ? this.at(5) : (undefined as any); | |
| } | |
| get 6(): RemoteNode { | |
| return this.length > 6 ? this.at(6) : (undefined as any); | |
| } | |
| get 7(): RemoteNode { | |
| return this.length > 7 ? this.at(7) : (undefined as any); |
There was a problem hiding this comment.
This is consistent with old approach.
typescript-go/_packages/api/src/node.ts
Lines 309 to 313 in 12e2936
There was a problem hiding this comment.
Fair, but it looks like at itself may have a bug for out-of-bounds indexes, which is now replicated onto accessors from 0–16. Would you mind adding the bounds check into at?
This is a good way to make the benchmark faster, but that benchmark is intended to represent a worst-case scenario where someone touches every node. It was designed to be lazy on purpose. Imagine this being used in an auxiliary LSP server where someone registers some additional completions or refactors. Those handlers start with a document position, so in a file like checker.ts, a realistic implementation will likely only have to materialize a tiny, tiny fraction of all the nodes in the file during any given request. |
I'm currently researching the performance of tsgo-powered JS lint rules. From my private evaluations (which I plan to opensource soon), lint rules tend to cumulatively touch up to 80% of all AST nodes (with a few exception rules that touch almost 100% of AST nodes). Maybe we can add some kind of opt-in eager materialization for these usecases? On |
|
Yeah, an option to do that would be great! It makes sense that a linter would want to look at the whole file. |
|
Ok, it turns out that full AST traversal + lazy materialization is now only a few percent slower than bulk materialization + full AST traversal. Lint rules still have to do a full AST traversal, so there's no big win from bulk materialization. I reverted |
| super(view, decoder, 1, undefined!, {} as unknown as RemoteSourceFile); | ||
| this.sourceFile = this; | ||
| this.nodes = Array((this.view.byteLength - this.offsetNodes) / NODE_LEN); | ||
| this.nodes[0] = new RemoteNode(this.view, this.decoder, 0, this, this); |
There was a problem hiding this comment.
Is this right? The 0-index slot is supposed to indicate an undefined node. I would have thought something would have broken 🤔
| get id(): string { | ||
| const extendedDataOffset = this.offsetExtendedData + (this.data & NODE_EXTENDED_DATA_MASK); | ||
| return this.getString(this.view.getUint32(extendedDataOffset + 12, true)); | ||
| } |
There was a problem hiding this comment.
I just merged something that will conflict with this in #2831, sorry. Should be a straightforward removal.
ida getterRemoteNodeListonly for lists with >15 nodesRemoteNodeconstructor callinitialize all source file nodes in one hot loopOn
main, thematerialize program.tsandmaterialize checker.tsbenchmarks are incorrect because they don't reset the source file's_childrencache.To compare this PR with
main, addfile._children = undefinedto the benchmark code, like so:Benchmarks on
main(incorrect):Benchmarks on
main(fixed):Benchmarks on this branch:
parser.ts- 16.393ms -> 2.553ms (6.42x)checker.ts- 250.616ms -> 38.941ms (6.43x)