Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions .agent/notes/bench-report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# SQLite Benchmark Report

Generated: 2026-04-02 21:48:19 UTC
Engine: http://127.0.0.1:6420
Mode: quick

## Results

| Benchmark | Native (ms) | WASM (ms) | Speedup | Native/op | WASM/op |
|-----------|------------:|----------:|--------:|----------:|--------:|
| Migration (50 tables + indexes) | 21.4 | 89.6 | **4.2x** | 0.428ms/op | 1.793ms/op |
| Migration TX (50 tables + indexes) | 10.5 | 15.5 | **1.5x** | 0.209ms/op | 0.310ms/op |
| Insert single x1 | 2.9 | 6.3 | **2.2x** | 2.883ms/op | 6.268ms/op |
| Insert single x10 | 21.1 | 33.4 | **1.6x** | 2.110ms/op | 3.341ms/op |
| Insert single x100 | 129.2 | 158.0 | **1.2x** | 1.292ms/op | 1.580ms/op |
| Insert single x1000 | 665.0 | 1024.4 | **1.5x** | 0.665ms/op | 1.024ms/op |
| Insert single x10000 | 7525.5 | 21641.3 | **2.9x** | 0.753ms/op | 2.164ms/op |
| Insert batch x1 | 38.1 | 4.6 | 0.1x | 38.078ms/op | 4.565ms/op |
| Insert batch x10 | 3.6 | 5.6 | **1.6x** | 0.359ms/op | 0.564ms/op |
| Insert batch x100 | 7.0 | 10.4 | **1.5x** | 0.070ms/op | 0.104ms/op |
| Insert batch x1000 | 72.2 | 90.9 | **1.3x** | 0.072ms/op | 0.091ms/op |
| Insert batch x10000 | 768.9 | 959.7 | **1.2x** | 0.077ms/op | 0.096ms/op |
| Insert TX x1 | 5.8 | 3.6 | 0.6x | 5.769ms/op | 3.572ms/op |
| Insert TX x10 | 1.7 | 1.2 | 0.7x | 0.171ms/op | 0.119ms/op |
| Insert TX x100 | 10.7 | 7.9 | 0.7x | 0.107ms/op | 0.079ms/op |
| Insert TX x1000 | 29.3 | 29.5 | ~1.0x | 0.029ms/op | 0.029ms/op |
| Insert TX x10000 | 560.0 | 442.0 | 0.8x | 0.056ms/op | 0.044ms/op |
| Point read x1 | 0.1 | 0.4 | **2.9x** | 0.127ms/op | 0.375ms/op |
| Point read x10 | 0.2 | 0.4 | **1.6x** | 0.023ms/op | 0.035ms/op |
| Point read x100 | 4.5 | 1.7 | 0.4x | 0.045ms/op | 0.017ms/op |
| Point read x1000 | 21.4 | 14.1 | 0.7x | 0.021ms/op | 0.014ms/op |
| Point read x10000 | 196.3 | 122.3 | 0.6x | 0.020ms/op | 0.012ms/op |
| Full scan (500 rows) | 0.5 | 1.2 | **2.6x** | - | - |
| Range scan indexed (undefined rows) | 0.4 | 0.4 | **1.1x** | - | - |
| Range scan unindexed (undefined rows) | 0.7 | 0.5 | 0.8x | - | - |
| Large payload insert (4KB x 100) | 47.3 | 25.4 | 0.5x | 0.473ms/op | 0.254ms/op |
| Large payload read (4KB x 100) | 24.5 | 4.1 | 0.2x | - | - |
| Large payload insert (32KB x 20) | 146.3 | 307.5 | **2.1x** | 7.315ms/op | 15.376ms/op |
| Large payload read (32KB x 20) | 42.6 | 4.4 | 0.1x | - | - |
| Complex: join (200 rows) | 0.3 | 0.9 | **3.1x** | - | - |
| Complex: aggregation (10 rows) | 0.3 | 1.3 | **4.2x** | - | - |
| Complex: cte_window (50 rows) | 0.4 | 2.2 | **5.3x** | - | - |
| Complex: subquery (100 rows) | 0.2 | 0.6 | **3.4x** | - | - |
| Bulk update (~250 rows) | 1.0 | 1.9 | **2.0x** | - | - |
| Bulk delete (~250 rows) | 3.5 | 3.0 | 0.9x | - | - |
| VACUUM after delete | 5.5 | 6.8 | **1.2x** | - | - |
| Mixed OLTP x1 (1R/0W) | 0.1 | 0.1 | ~1.0x | 0.111ms/op | 0.115ms/op |
| Mixed OLTP x10 (8R/2W) | 1.9 | - | - | 0.189ms/op | - |
| Mixed OLTP x100 (75R/25W) | 48.6 | - | - | 0.486ms/op | - |
| Mixed OLTP x1000 (698R/302W) | 309.9 | - | - | 0.310ms/op | - |
| Mixed OLTP x10000 (7016R/2984W) | 2693.0 | - | - | 0.269ms/op | - |
| Hot row updates x1 | 0.7 | 3.5 | **5.1x** | 0.687ms/op | 3.477ms/op |
| Hot row updates x10 | 3.9 | 27.0 | **6.9x** | 0.392ms/op | 2.705ms/op |
| Hot row updates x100 | 33.8 | 94.1 | **2.8x** | 0.338ms/op | 0.941ms/op |
| Hot row updates x1000 | 621.3 | 1645.7 | **2.6x** | 0.621ms/op | 1.646ms/op |
| Hot row updates x10000 | 6628.4 | 18517.6 | **2.8x** | 0.663ms/op | 1.852ms/op |
| JSON insert x100 | 17.1 | 9.0 | 0.5x | 0.171ms/op | 0.090ms/op |
| JSON extract query (58 rows) | 0.4 | 5.0 | **13.5x** | - | - |
| JSON each aggregation (5 groups) | 0.3 | 4.5 | **13.3x** | - | - |
| FTS5 insert x100 | 6.6 | - | - | 0.066ms/op | - |
| FTS5 search (50 hits) | 0.5 | - | - | - | - |
| FTS5 prefix search (50 hits) | 0.5 | - | - | - | - |
| Growth @500 rows: insert batch | 24.2 | 58.1 | **2.4x** | 0.048ms/op | 0.116ms/op |
| Growth @500 rows: 100 point reads | 2.1 | 6.0 | **2.8x** | 0.021ms/op | 0.060ms/op |
| Growth @1000 rows: insert batch | 9.7 | 11.1 | **1.1x** | 0.019ms/op | 0.022ms/op |
| Growth @1000 rows: 100 point reads | 1.7 | 1.3 | 0.8x | 0.017ms/op | 0.013ms/op |
| Growth @1500 rows: insert batch | 10.1 | 8.8 | 0.9x | 0.020ms/op | 0.018ms/op |
| Growth @1500 rows: 100 point reads | 1.5 | 1.8 | **1.2x** | 0.015ms/op | 0.018ms/op |
| Growth @2000 rows: insert batch | 8.7 | 9.3 | ~1.0x | 0.017ms/op | 0.019ms/op |
| Growth @2000 rows: 100 point reads | 1.5 | 1.8 | **1.2x** | 0.015ms/op | 0.018ms/op |
| Concurrent 5 actors total wall time | 341.0 | - | - | 500 total rows | - |
| Concurrent 5 actors avg per-actor | 12.5 | - | - | 0.125ms/op | - |
| [baseline] Migration (50 tables) | 7.3 | 2.2 | 0.3x | 0.145ms/op | 0.043ms/op |
| [baseline] Migration TX (50 tables) | 3.3 | 2.0 | 0.6x | 0.067ms/op | 0.041ms/op |
| [baseline] Insert single-row x100 | 1.9 | 2.0 | ~1.0x | 0.019ms/op | 0.020ms/op |
| [baseline] Insert TX x100 | 0.2 | 0.3 | **1.3x** | 0.002ms/op | 0.003ms/op |
| [baseline] Point read x100 | 0.2 | 0.3 | **1.3x** | 0.002ms/op | 0.003ms/op |
| [baseline] Full scan (200 rows) | 0.1 | 0.1 | **1.3x** | - | - |
| [baseline] Hot row updates x100 | 7.3 | 6.3 | 0.9x | 0.073ms/op | 0.063ms/op |
| [baseline] Mixed OLTP x100 (66R/34W) | 0.6 | - | - | 0.006ms/op | - |
| Mixed OLTP x10 (6R/4W) | - | 3.7 | - | - | 0.366ms/op |
| Mixed OLTP x100 (73R/27W) | - | 59.3 | - | - | 0.593ms/op |
| Mixed OLTP x1000 (693R/307W) | - | 798.3 | - | - | 0.798ms/op |
| Mixed OLTP x10000 (7053R/2947W) | - | 6302.8 | - | - | 0.630ms/op |
| Concurrent x1 wall | - | 118.7 | - | - | 100 total rows |
| Concurrent x1 avg/actor | - | 7.9 | - | - | 0.079ms/op |
| Concurrent x1 throughput | - | 842.7 | - | - | rows/sec |
| Concurrent x5 wall | - | 575.3 | - | - | 500 total rows |
| Concurrent x5 avg/actor | - | 80.8 | - | - | 0.808ms/op |
| Concurrent x5 throughput | - | 869.1 | - | - | rows/sec |
| Concurrent x10 wall | - | 1231.3 | - | - | 1000 total rows |
| Concurrent x10 avg/actor | - | 350.4 | - | - | 3.504ms/op |
| Concurrent x10 throughput | - | 812.2 | - | - | rows/sec |
| Concurrent x50 wall | - | 4907.3 | - | - | 5000 total rows |
| Concurrent x50 avg/actor | - | 1432.0 | - | - | 14.320ms/op |
| Concurrent x50 throughput | - | 1018.9 | - | - | rows/sec |
| Concurrent x100 wall | - | 10907.3 | - | - | 10000 total rows |
| Concurrent x100 avg/actor | - | 2006.7 | - | - | 20.067ms/op |
| Concurrent x100 throughput | - | 916.8 | - | - | rows/sec |
| [baseline] Mixed OLTP x100 (64R/36W) | - | 0.6 | - | - | 0.006ms/op |

### Insert single (scale sweep)

| N | Native (ms) | Native/op | WASM (ms) | WASM/op | Speedup |
|--:|------------:|----------:|----------:|--------:|--------:|
| 1 | 2.9 | 2.883ms/op | 6.3 | 6.268ms/op | **2.2x** |
| 10 | 21.1 | 2.110ms/op | 33.4 | 3.341ms/op | **1.6x** |
| 100 | 129.2 | 1.292ms/op | 158.0 | 1.580ms/op | **1.2x** |
| 1000 | 665.0 | 0.665ms/op | 1024.4 | 1.024ms/op | **1.5x** |
| 10000 | 7525.5 | 0.753ms/op | 21641.3 | 2.164ms/op | **2.9x** |

### Insert batch (scale sweep)

| N | Native (ms) | Native/op | WASM (ms) | WASM/op | Speedup |
|--:|------------:|----------:|----------:|--------:|--------:|
| 1 | 38.1 | 38.078ms/op | 4.6 | 4.565ms/op | 0.1x |
| 10 | 3.6 | 0.359ms/op | 5.6 | 0.564ms/op | **1.6x** |
| 100 | 7.0 | 0.070ms/op | 10.4 | 0.104ms/op | **1.5x** |
| 1000 | 72.2 | 0.072ms/op | 90.9 | 0.091ms/op | **1.3x** |
| 10000 | 768.9 | 0.077ms/op | 959.7 | 0.096ms/op | **1.2x** |

### Insert TX (scale sweep)

| N | Native (ms) | Native/op | WASM (ms) | WASM/op | Speedup |
|--:|------------:|----------:|----------:|--------:|--------:|
| 1 | 5.8 | 5.769ms/op | 3.6 | 3.572ms/op | 0.6x |
| 10 | 1.7 | 0.171ms/op | 1.2 | 0.119ms/op | 0.7x |
| 100 | 10.7 | 0.107ms/op | 7.9 | 0.079ms/op | 0.7x |
| 1000 | 29.3 | 0.029ms/op | 29.5 | 0.029ms/op | ~1.0x |
| 10000 | 560.0 | 0.056ms/op | 442.0 | 0.044ms/op | 0.8x |

### Point read (scale sweep)

| N | Native (ms) | Native/op | WASM (ms) | WASM/op | Speedup |
|--:|------------:|----------:|----------:|--------:|--------:|
| 1 | 0.1 | 0.127ms/op | 0.4 | 0.375ms/op | **2.9x** |
| 10 | 0.2 | 0.023ms/op | 0.4 | 0.035ms/op | **1.6x** |
| 100 | 4.5 | 0.045ms/op | 1.7 | 0.017ms/op | 0.4x |
| 1000 | 21.4 | 0.021ms/op | 14.1 | 0.014ms/op | 0.7x |
| 10000 | 196.3 | 0.020ms/op | 122.3 | 0.012ms/op | 0.6x |

### Mixed OLTP (scale sweep)

| N | Native (ms) | Native/op | WASM (ms) | WASM/op | Speedup |
|--:|------------:|----------:|----------:|--------:|--------:|
| 1 | 0.1 | 0.111ms/op | 0.1 | 0.115ms/op | ~1.0x |
| 10 | 1.9 | 0.189ms/op | - | - | - |
| 100 | 48.6 | 0.486ms/op | - | - | - |
| 1000 | 309.9 | 0.310ms/op | - | - | - |
| 10000 | 2693.0 | 0.269ms/op | - | - | - |
| 10 | - | - | 3.7 | 0.366ms/op | - |
| 100 | - | - | 59.3 | 0.593ms/op | - |
| 1000 | - | - | 798.3 | 0.798ms/op | - |
| 10000 | - | - | 6302.8 | 0.630ms/op | - |

### Hot row updates (scale sweep)

| N | Native (ms) | Native/op | WASM (ms) | WASM/op | Speedup |
|--:|------------:|----------:|----------:|--------:|--------:|
| 1 | 0.7 | 0.687ms/op | 3.5 | 3.477ms/op | **5.1x** |
| 10 | 3.9 | 0.392ms/op | 27.0 | 2.705ms/op | **6.9x** |
| 100 | 33.8 | 0.338ms/op | 94.1 | 0.941ms/op | **2.8x** |
| 1000 | 621.3 | 0.621ms/op | 1645.7 | 1.646ms/op | **2.6x** |
| 10000 | 6628.4 | 0.663ms/op | 18517.6 | 1.852ms/op | **2.8x** |

## Totals

- **Native total**: 21.2s
- **WASM total**: 78.7s
- **Overall speedup**: 3.7x
154 changes: 154 additions & 0 deletions .agent/notes/native-sqlite-kv-channel-bench-results.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Native SQLite KV Channel Benchmark Results

## 2026-04-01: Local Manager (file-system driver)

Both sides use `driver-file-system.test.ts` with `Actor Database` filter.
KV ops go through `FileSystemManagerDriver` → `FileSystemGlobalState` → in-memory SQLite via `better-sqlite3`.
No Rust engine involved.

**Before**: main branch (`2e29ab3f8`), WASM VFS only
**After**: `native-sqlite-kv-channel` branch (`4c5ec7a78` + fixes), native SQLite via KV channel WebSocket

### Results (http/bare encoding)

| Test | WASM (ms) | Native (ms) | Speedup |
|------|-----------|-------------|---------|
| bootstraps schema on startup (raw) | 881 | 225 | **3.9x** |
| bootstraps schema on startup (drizzle) | 858 | 324 | **2.6x** |
| CRUD + multi-statement exec (drizzle) | 771 | 858 | 0.9x |
| handles transactions (drizzle) | 2383 | 653 | **3.6x** |
| persists across sleep/wake (drizzle) | 3071 | 1613 | **1.9x** |
| onDisconnect DB writes (drizzle) | 2819 | 1254 | **2.2x** |
| high-volume inserts (drizzle) | 1862 | 343 | **5.4x** |
| shrink+regrow with vacuum (drizzle) | 868 | 883 | ~1.0x |
| repeated updates same row (drizzle) | 4115 | 4492 | ~0.9x |
| integrity checks mixed workload (drizzle) | 3778 | 2211 | **1.7x** |

### Summary

- 2-5x speedups on most DB operations
- Biggest gains: high-volume inserts (5.4x), transactions (3.6x), startup (3.9x)
- Roughly equivalent: vacuum, repeated single-row updates (dominated by I/O, not VFS overhead)
- Slight regression on simple CRUD (KV channel round-trip overhead for small ops)

### Notes

- Native raw DB tests fail due to singleton KV channel lifecycle issues in the test framework (each test creates a new server on different port, singleton channel can only point at one)
- Drizzle tests pass because they use the same native SQLite path
- The "native" speedup is native SQLite (napi-rs) vs WASM SQLite (`@rivetkit/sqlite`), both storing pages in the same local KV store

---

## 2026-04-01: Real Rust Engine (native SQLite via KV channel)

Runner connected to local Rust engine on 127.0.0.1:6420 (RocksDB backend).
Native SQLite addon connects to engine's `/kv/connect` endpoint via WebSocket.
KV ops go through guard -> pegboard-kv-channel -> universaldb (RocksDB).

**Branch**: `native-sqlite-kv-channel` (`4c5ec7a78` + fixes)
**Config**: `--quick` mode (100-500 rows), `runnerKind: "normal"`, `runEngine: false`

### Results (kitchen-sink/scripts/bench-sqlite.ts)

| Benchmark | Time (ms) | Per-Op |
|-----------|-----------|--------|
| Migration (50 tables + indexes) | 2192.2 | 43.8ms/op |
| Insert single-row x100 | 4614.2 | 46.1ms/op |
| Insert batch x100 (batch=50) | 92.3 | 0.9ms/op |
| Insert transaction x100 | 49.9 | 0.5ms/op |
| Point read x100 | 302.8 | 3.0ms/op |
| Full scan (500 rows) | 3.7 | - |
| Range scan indexed | 3.0 | - |
| Range scan unindexed | 3.2 | - |
| Large payload insert (4KB x 100) | 191.9 | 1.9ms/op |
| Large payload read (4KB x 100) | 120.1 | - |
| Large payload insert (32KB x 20) | 246.5 | 12.3ms/op |
| Large payload read (32KB x 20) | 195.6 | - |
| Complex: join (200 rows) | 3.4 | - |
| Complex: aggregation (10 rows) | 3.8 | - |
| Complex: cte_window (50 rows) | 4.8 | - |
| Complex: subquery (100 rows) | 3.8 | - |
| Bulk update (~250 rows) | 66.3 | - |
| Bulk delete (~250 rows) | 93.7 | - |
| VACUUM after delete | 80.4 | - |
| Mixed OLTP x100 (76R/24W) | 1638.1 | 16.4ms/op |
| Hot row updates x100 | 3594.2 | 35.9ms/op |
| JSON insert x100 | 38.4 | 0.4ms/op |
| JSON extract query (58 rows) | 3.1 | - |
| JSON each aggregation (5 groups) | 3.4 | - |
| FTS5 insert x100 | 69.4 | 0.7ms/op |
| FTS5 search (50 hits) | 3.1 | - |
| FTS5 prefix search (50 hits) | 3.7 | - |
| Growth @2000 rows: insert batch | 141.1 | 0.3ms/op |
| Growth @2000 rows: 100 point reads | 317.7 | 3.2ms/op |
| Concurrent 5 actors total wall | 880.7 | 500 total rows |
| Concurrent 5 actors avg per-actor | 114.3 | 1.1ms/op |

**Total benchmark time: 16,345ms**

### Observations

- Single-row inserts are expensive (46ms/op) due to per-op KV channel round-trip through WebSocket to engine
- Batch inserts are ~50x faster (0.9ms/op) because they amortize round-trip cost
- Transactions are very fast (0.5ms/op) because SQLite journals locally and only flushes KV at commit
- Read-heavy queries (scans, joins, aggregations) are <5ms since pages are cached locally in the VFS
- Point reads cost ~3ms/op due to KV channel round-trip for uncached pages
- FTS5 works (native SQLite has the extension compiled in)
- 5-actor concurrency shows good parallelism (880ms wall time for 500 total rows vs 4614ms for 100 serial inserts)

### WASM vs Native comparison (both against real Rust engine)

| Benchmark | WASM (ms) | Native (ms) | Speedup |
|-----------|-----------|-------------|---------|
| Migration (50 tables) | 8,438 | 2,192 | **3.8x** |
| Insert single-row x100 | 9,737 | 4,614 | **2.1x** |
| Insert batch x100 | 210 | 92 | **2.3x** |
| Insert transaction x100 | 107 | 50 | **2.1x** |
| Point read x100 | 703 | 303 | **2.3x** |
| Full scan (500 rows) | 11 | 4 | **2.8x** |
| Range scan indexed | 7 | 3 | **2.3x** |
| Range scan unindexed | 9 | 3 | **3.0x** |
| Large payload insert (4KB x100) | 383 | 192 | **2.0x** |
| Large payload read (4KB x100) | 9 | 120 | 0.1x (regression) |
| Large payload insert (32KB x20) | 471 | 247 | **1.9x** |
| Large payload read (32KB x20) | 9 | 196 | 0.05x (regression) |
| Complex: join (200 rows) | 7 | 3 | **2.3x** |
| Complex: aggregation | 7 | 4 | 1.8x |
| Complex: cte_window | 9 | 5 | 1.8x |
| Bulk update (~250 rows) | 124 | 66 | **1.9x** |
| Bulk delete (~250 rows) | 175 | 94 | **1.9x** |
| VACUUM | 157 | 80 | **2.0x** |
| Mixed OLTP x100 | 4,288 | 1,638 | **2.6x** |
| Hot row updates x100 | 7,093 | 3,594 | **2.0x** |
| JSON insert x100 | 86 | 38 | **2.3x** |
| FTS5 insert x100 | N/A | 69 | native only |
| Growth @2000: insert batch | 251 | 141 | **1.8x** |
| Growth @2000: point reads | 679 | 318 | **2.1x** |
| Concurrent 5 actors | 1,101 | 881 | **1.2x** |
| **Total** | **36,979** | **16,345** | **2.3x** |

Large payload reads regress because native SQLite does a full KV round-trip per page read while WASM caches pages in-process. This needs investigation.

### Three-way comparison: Baseline vs Native vs WASM (engine v2.2.0, debug build, RocksDB)

| Benchmark | Baseline (raw) | Native KV | WASM VFS | Native vs WASM | vs Baseline |
|-----------|---------------:|-----------:|---------:|---------------:|------------:|
| Migration (50 tables) | 2.1ms | 2,256ms | 8,438ms | **3.7x** | 1,074x |
| Migration TX (50 tables) | 2.0ms | **136ms** | N/A | N/A | 68x |
| Insert single-row x100 | 1.5ms | 4,719ms | 9,737ms | **2.1x** | 3,146x |
| Insert batch x100 | N/A | 95ms | 210ms | **2.2x** | N/A |
| Insert TX x100 | 0.2ms | 51ms | 107ms | **2.1x** | 256x |
| Point read x100 | 0.2ms | 310ms | 703ms | **2.3x** | 1,550x |
| Full scan (500 rows) | 0.1ms | 3.7ms | 11ms | **2.8x** | 37x |
| Hot row updates x100 | 5.8ms | 3,633ms | 7,093ms | **2.0x** | 626x |
| Mixed OLTP x100 | 0.5ms | 2,446ms | 4,288ms | **1.8x** | 4,892x |

Key finding: wrapping migration in BEGIN/COMMIT gives **16.6x speedup** (136ms vs 2,256ms).
The baseline shows 68-5000x overhead from KV channel round-trips per write.
Batch/TX amortize this cost effectively.

### Fixes required to run against engine

1. Native addon must request `Sec-WebSocket-Protocol: rivet` subprotocol (guard adds it unconditionally to responses)
2. Native addon needs `connected_notify` to wait for initial WebSocket connection before sending KV ops
3. URL fix: TS side should not append `/kv/connect` since Rust `build_ws_url` already does
2 changes: 2 additions & 0 deletions .cargo/config.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
[build]
rustflags = ["--cfg", "tokio_unstable"]

[env]
LIBSQLITE3_FLAGS = "SQLITE_ENABLE_BATCH_ATOMIC_WRITE"
Loading
Loading