Skip to content

poc: WASM/wazero tree-sitter backend (speed + stability vs cgo PR #80)#81

Draft
dvcdsys wants to merge 4 commits into
developfrom
feat/chunker-wasm-treesitter
Draft

poc: WASM/wazero tree-sitter backend (speed + stability vs cgo PR #80)#81
dvcdsys wants to merge 4 commits into
developfrom
feat/chunker-wasm-treesitter

Conversation

@dvcdsys

@dvcdsys dvcdsys commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Draft / PoC for comparison — not for merge. Alternative to the cgo backend in #80, to decide direction.

Official tree-sitter C runtime + TypeScript grammar → standalone wasm32-wasi module (zig cc), driven from Go via wazero. No cgo, no JS, no third-party parser — only the wazero host (poc/wasm-treesitter/wasmts.go) is ours.

Speed — same 852-file vscode TS corpus, full-tree walk

backend wall files/s ERROR trees editorOptions.ts
gotreesitter (pure-Go) 13.83s 62 13 8.77s → ERROR
WASM (wazero) ~2.5s ~330 0 49ms
cgo (native, #80) 1.26s 675 0 17ms

~2× slower than cgo, ~5× faster than gotreesitter, correct. Overhead is the per-node host↔guest call boundary (mitigable with a batched subtree export).

Stability

tree-sitter is robust on adversarial input under both backends. WASM additionally contains guest faults (resource/trap → recoverable Go error, host alive) where cgo would SIGSEGV the whole process. Insurance vs unknown C bugs.

Decision framing

~2× parse cost (largely invisible end-to-end — embeddings dominate) in exchange for CGO_ENABLED=0 builds, crash-isolation, and a likely smaller binary. Cost: engineering effort to build/bundle all 31 grammars + flesh out the node API. Full write-up in poc/wasm-treesitter/README.md.

🤖 Generated with Claude Code

dvcdsys and others added 4 commits June 7, 2026 23:49
Alternative to feat/chunker-cgo-treesitter: the official tree-sitter C runtime
+ TypeScript grammar compiled to a standalone wasm32-wasi reactor module
(build.sh, via zig cc) and driven from Go through wazero — no cgo, no JS, no
third-party parser. Only the wazero host (wasmts.go) is bespoke; the parser is
unmodified upstream C. wasm_store.c is gated by TREE_SITTER_FEATURE_WASM (we
don't define it), so the stock amalgamation compiles to wasi with no stubs.

Measured on the same 852-file vscode TypeScript corpus (full-tree walk):

  backend                     wall    files/s  ERROR trees  editorOptions.ts
  gotreesitter (pure-Go)     13.83s     62        13        8.77s -> ERROR
  WASM (wazero, pure-Go)     ~2.5s     ~330        0         49ms
  cgo (native)                1.26s    675         0         17ms

- WASM ~2x slower than cgo, ~5x faster than gotreesitter, correct (0 errors).
- Overhead is the per-node host<->guest call boundary (~3 calls/node x 2.68M
  nodes), not memory — slot-pooling barely moved it. A batched "serialize
  subtree" export would close most of the gap (future work).
- Stability: tree-sitter is robust on adversarial input under both backends;
  WASM additionally CONTAINS faults (resource/guest trap -> recoverable Go
  error, host alive) where cgo would SIGSEGV the whole process. Insurance vs
  unknown C bugs, not a fix for an observed crash.

Trade-off vs cgo: ~2x parse cost (largely invisible end-to-end since embeddings
dominate) in exchange for CGO_ENABLED=0 builds, crash-isolation, and a likely
smaller binary; cost is the engineering effort to build/bundle all 31 grammars
and flesh out the node API. README.md has the full comparison.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d skip, doc-comment attachment

Replace gotreesitter with the official tree-sitter C runtime + 31 grammars
compiled to one wasm32-wasi module (ts-core.wasm.br, brotli ~3MB) driven via
wazero. No cgo: traps are contained (parse falls back to sliding window, the
process survives), and the binary stays CGO_ENABLED=0.

Memory design (measured on the prod-shaped churn workload):
- linear memory is mmap-backed (experimental.WithMemoryAllocator) instead of
  wazero's default Go-heap append-grow: no realloc-copy garbage on growth and
  munmap-on-close returns recycled instances' memory to the OS immediately.
  Churn heapSys 1135→391MB, peak RSS 1070→535MB; full-repo chunking peak RSS
  1516→787MB.
- engine pool: hard concurrency cap (dashboard-tunable), 256MiB per-instance
  linear-memory ceiling (2× headroom over the worst measured instance at the
  indexer's 512KiB file cap), high-water-mark recycling, 1 idle instance.

Chunker quality fixes:
- minified/bundled js/ts/css (.min., .bundle.js, >2KiB lines) skip the parser
  straight to sliding window — the pathological input class that ballooned
  instances for near-zero semantic value.
- a declaration's doc comment now attaches to its chunk (language-agnostic via
  tree-sitter's extra flag + same-row wrapper climb; verified for Go, TS, C,
  Python, Rust, Java). Generated files stop spraying comment-only micro
  chunks: openapi.gen.go 893→517 chunks, median 114→256B, symbols/refs
  byte-identical.

Memory-stress harnesses are committed but gated behind CIX_MEMSTRESS=1.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…am (OOM fix)

Two new runtime-config fields, end to end (DB migrations 16/17 → runtimecfg →
admin API → openapi → dashboard):

- chunk_max_concurrent — the wasm chunker's instance-concurrency cap,
  decoupled from embedding concurrency; resizes the live limiter without a
  restart. Env: CIX_CHUNK_MAX_CONCURRENT; per-instance memory knobs stay
  env-only (CIX_CHUNK_MEM_LIMIT_PAGES, CIX_CHUNK_RECYCLE_GROWTH_MB,
  CIX_CHUNK_MAX_IDLE).

- llama_cache_ram_mib — llama-server's HOST prompt cache cap (--cache-ram).
  Upstream defaults this to 8 GiB (ggml-org/llama.cpp#16391), which is pure
  waste for an embeddings-only sidecar: prompts are never reused, but the
  cache fills anyway. Observed on prod: llama-server RSS 365MB→11.3GB within
  minutes of indexing vscode@main, then cgroup OOM kill — twice at the 10G
  limit, again at 16G. With --cache-ram 0 (our default; -1 = unlimited) it
  plateaus at ~900MB under the same load. Env: CIX_LLAMA_CACHE_RAM; shown in
  the dashboard's Runtime parameters card, applied via Save & Restart.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…vation

A full-reindex wipe ran as ONE transaction: DELETE of all refs/symbols/
file_hashes plus the trigram-FTS rows. On a vscode-sized project (~445k refs,
tens of thousands of FTS rows — each FTS delete re-tokenizes its content)
that held SQLite's single writer for minutes, starving every concurrent
writer past busy_timeout. Prod symptom: the jobs worker logged
`claim failed: SQLITE_BUSY` on every 5s poll tick for the whole wipe.

- BeginIndexing full wipe: file_hashes first (its own statement — once gone,
  every file looks dirty, so a crash mid-wipe just resumes on the next run),
  then symbols/refs in 20k-row batches, then chunks_fts/chunks_meta via the
  batched chunksfts.DeleteByProject (500 rows per tx — FTS deletes are the
  expensive ones). The writer is released between batches.
- projects.Delete: same batched FTS wipe, project row deleted last so a
  failed wipe is resumable.
- jobs worker: SQLITE_BUSY on claim is expected contention, not a fault —
  log the streak start as WARN with a once-a-minute heartbeat instead of an
  ERROR per tick, and log when it clears.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant