feat(ruvector-diskann): land disk-backed rerank — DRAM compression now real#385
Open
ruvnet wants to merge 1 commit intofeature/diskann-quantizer-search-pathfrom
Open
feat(ruvector-diskann): land disk-backed rerank — DRAM compression now real#385ruvnet wants to merge 1 commit intofeature/diskann-quantizer-search-pathfrom
ruvnet wants to merge 1 commit intofeature/diskann-quantizer-search-pathfrom
Conversation
…w real Closes the gap PR #384 deferred: today's `DiskAnnConfig::with_originals_in_memory(false)` returns `InvalidConfig`. After this PR, the option works and the index holds quantized codes + graph in DRAM while the final exact-L2² rerank reads originals from a memory-mapped sidecar file. Delivers the 17.5× DRAM compression target the research roadmap projected (`docs/research/rabitq-integration/05-roadmap.md` Phase 1). ## Measured DRAM compression D=128, n=2000, RaBitQ: codes = 40 000 B (40 KB) originals (in-memory) = 1 024 000 B (1 MB) ratio = 25.60× (exceeds 17.5× target) originals (disk-backed) = 0 DRAM bytes (kernel-owned mmap; reads through page cache on demand) The disk-backed variant pays only the codes + graph + mmap-handle size in DRAM. Originals live in `<storage_path>.originals` as a raw f32 layout with a 24-byte header. ## What changes 1. **`OriginalsStore` enum** — `InMemory(FlatVectors)` | `DiskBacked { mmap, n, dim }`. Enum chosen over `Box<dyn>` for: - Monomorphic dispatch on the rerank hot path - Automatic `Send + Sync` (both `Vec<f32>` and `Mmap` are) - One less indirection The trait the brief described would be equivalent in expressiveness; the enum was just lower-friction. 2. **`memmap2 = "0.9"`** — already in the workspace and an existing direct dep of ruvector-diskann. No new workspace deps. 3. **Builder wiring** — `with_originals_in_memory(false)` now requires `storage_path` and validates at `build()` time with `InvalidConfig` if absent. Default `true` behavior unchanged. 4. **`build()`** writes `<storage_path>.originals` (header + f32 payload), mmaps it, drops the in-memory `FlatVectors`. The sidecar header is `[u32 magic][u32 version][u32 dim][u32 n] [8 bytes pad]`. 5. **`load()`** checks for the sidecar; if it exists AND the saved config marks `keep_originals_in_memory: false`, mmaps it and skips the heap copy. v1 indexes (no sidecar) fall back to the read-into-Vec path for full back-compat. 6. **`search()`** reranks through `OriginalsStore::read(pos, &mut buf)` regardless of variant. The traversal layer (PR #384) needs no change. ## One observation worth recording The pre-PR `load()` was already mmapping `vectors.bin`, then immediately copying it byte-by-byte into a heap `Vec<f32>` (lines 574-579 of the pre-edit file). The mmap was retained but never read again — the field served no functional purpose, only kept the file descriptor alive. This PR turns that latent mmap into the active reader (via the sidecar) and the v1 path becomes the legacy fallback. ## Verification cargo build --workspace → 0 errors cargo build -p ruvector-diskann --no-default-features → OK cargo clippy --workspace --all-targets --no-deps -- -D warnings → exit 0 cargo fmt --all --check → exit 0 cargo test -p ruvector-diskann --features rabitq → 35 / 35 (was 30 in PR #384) cargo test -p ruvector-diskann --no-default-features → 19 / 19 New tests in `tests/disk_backed_rerank.rs`: - `disk_backed_yields_zero_dram_for_originals` - `disk_backed_compression_exceeds_17x_at_d128` - `disk_backed_recall_matches_in_memory` (≥ 0.85 floor maintained) - `disk_backed_save_load_round_trip_preserves_results` (uses PQ because RaBitQ persistence is a follow-up — see limitations) - `with_originals_in_memory_false_requires_storage_path` ## Limitations flagged 1. **RaBitQ codes still don't persist across save/load**. A reloaded RaBitQ-built index falls back to the f32 traversal path. The mmap loads correctly, but writing/reading the rotation matrix to disk is a separate follow-up. The save/load round-trip test uses PQ to avoid hitting this gap. 2. **`delete()` rejected on disk-backed indexes** (`InvalidConfig`). Writing through the mmap to NaN out a slot would break determinism guarantees under concurrent readers. Disk-backed callers must rebuild to delete; in-memory callers retain existing semantics. ## NAPI binding `ruvector-diskann-node/src/lib.rs` was untouched — the `..Default::default()` patch landed in PR #384 already absorbs the new `keep_originals_in_memory` field via its `Default` impl. Refs: PR #383 (Quantizer trait + RaBitQ backend), PR #384 (search-path rewrite), `docs/research/rabitq-integration/05-roadmap.md` Phase 1. Co-Authored-By: claude-flow <ruv@ruv.net>
This was referenced Apr 26, 2026
ruvnet
added a commit
that referenced
this pull request
Apr 26, 2026
Unblocks the 7 stacked PRs (#381-#387) and turns `main`'s CI green for the first time in days. Two issues fixed: ## Failure 1 — Security audit (was: 8 vulnerabilities) `cargo audit` is now exit 0. 4 of the 5 critical advisories were fixed by version bumps; only the unfixable one is ignored. **Dep-bumped:** - `rustls-webpki 0.101.7` + `0.103.10` → `0.103.13` via `cargo update -p rustls-webpki@0.103.10`. Patches: RUSTSEC-2026-0098 (URI name constraints) RUSTSEC-2026-0099 (wildcard name constraints) RUSTSEC-2026-0104 (CRL parsing panic) - `idna 0.5.0` → `1.1.0` via `validator 0.18 → 0.20` in `examples/scipix`. Patches RUSTSEC-2024-0421 (Punycode acceptance). - Bonus: `reqwest 0.11 → 0.12` (in `ruvector-core` + `examples/benchmarks`) and `hf-hub 0.3 → 0.4` (in `ruvector-core` + `ruvllm` + `ruvllm-cli`). Removes the entire legacy `rustls 0.21` / `rustls-webpki 0.101.7` subtree from the lockfile. **Ignored** (single advisory, with rationale): - `RUSTSEC-2023-0071` (rsa Marvin timing sidechannel) — no upstream fix available; we don't expose RSA decryption services. Documented in `.cargo/audit.toml`. **Unmaintained warnings** (16 total — proc-macro-error, derivative, instant, paste, bincode 1, pqcrypto-{kyber,dilithium}, rustls-pemfile 1, rusttype, wee_alloc, number_prefix, rand_os, core2, lru, pprof, rand) — each given a one-line justification in `.cargo/audit.toml` so CI stays green on them while the team decides whether to chase upstream replacements. ## Failure 2 — Tests timeout (was: 30-min job timeout cancellation) `.github/workflows/ci.yml` `test` job is now a `matrix` with `fail-fast: false` and `timeout-minutes: 45`. Six parallel shards under `cargo nextest run` (installed via `taiki-e/install-action@v2`) plus a separate `cargo test --doc` step (nextest doesn't run doctests): | Shard | Crates | |------------------|---------------------------------------------| | vector-index | rabitq, rulake, diskann, graph, gnn, cnn | | rvagent | 10 rvagent-* crates | | ruvix | 16 ruvix-* crates | | ruqu-quantum | 5 ruqu* crates | | ml-research | attention, mincut, scipix, fpga-transformer,| | | sparse-inference, sparsifier, solver, | | | graph-transformer, domain-expansion, | | | robotics | | core-and-rest | --workspace minus the above | `Swatinem/rust-cache@v2` is keyed per shard. Audit job switched to `taiki-e/install-action` for `cargo-audit` (faster than `cargo install --locked`). ## Verification cargo audit → exit 0 cargo build --workspace --exclude ruvector-postgres → clean cargo clippy --workspace --exclude ruvector-postgres --no-deps -- -D warnings → exit 0 cargo fmt --all --check → exit 0 ## Cargo.lock churn 166-line diff, net ~120 lines removed (more deletions than additions). Removed: `idna 0.5.0`, `rustls-webpki 0.101.7`, `validator 0.18`, `validator_derive 0.18`, `proc-macro-error 1.0.4`. Added: `rustls-webpki 0.103.13`, `validator 0.20`, `proc-macro-error2`, `hf-hub 0.4.3`, `reqwest 0.12.28`. No suspicious crates. ## Recommended merge order 1. **This PR first** — unblocks every other PR's CI. 2. After this lands and main is green, rebase the 7 open PRs (#381-#387) one at a time. The DiskANN stack (#383→#384→#385→#386) must merge in numeric order. #381 (Python SDK), #382 (research), #387 (graph property index) are independent and can merge in any order after their CI goes green on the rebase. Co-Authored-By: claude-flow <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on PR #384. Closes the gap PR #384 deferred: today's
with_originals_in_memory(false)returnsInvalidConfig. After this PR, the option works — the index holds quantized codes + graph in DRAM and the final exact-L2² rerank reads originals from a memory-mapped sidecar file.This is the PR that makes the "17.5× DRAM compression" the research roadmap projected actually real.
Measured DRAM compression (D=128, n=2000, RaBitQ)
25.6× exceeds the 17.5× target. The disk-backed variant pays only codes + graph + mmap-handle in DRAM; originals live in
<storage_path>.originalsas raw f32 with a 24-byte header.What changes
OriginalsStoreenum —InMemory(FlatVectors)|DiskBacked { mmap, n, dim }. Chosen overBox<dyn>for monomorphic dispatch on the rerank hot path + automaticSend + Sync.with_originals_in_memory(false)requiresstorage_path; missing →InvalidConfigat build time.[u32 magic][u32 version][u32 dim][u32 n][8 pad]followed bydim * nf32 values.Observation worth recording
Pre-PR
load()was already mmappingvectors.bin, then immediately copying it byte-by-byte into a heapVec<f32>. The mmap field served no functional purpose — it just kept the file descriptor alive. This PR turns that latent mmap into the active reader and the v1 path becomes the legacy fallback.Verification
cargo build --workspace→ 0 errorscargo build -p ruvector-diskann --no-default-features→ OKcargo clippy --workspace --all-targets --no-deps -- -D warnings→ exit 0cargo fmt --all --check→ exit 0cargo test -p ruvector-diskann --features rabitq→ 35 / 35 (was 30 in PR feat(ruvector-diskann): wire Quantizer trait into search path — codes load-bearing #384)cargo test -p ruvector-diskann --no-default-features→ 19 / 19New tests in
tests/disk_backed_rerank.rs:disk_backed_yields_zero_dram_for_originalsdisk_backed_compression_exceeds_17x_at_d128disk_backed_recall_matches_in_memory(≥ 0.85 floor)disk_backed_save_load_round_trip_preserves_resultswith_originals_in_memory_false_requires_storage_pathLimitations flagged
delete()rejected on disk-backed indexes (InvalidConfig). Writing through the mmap to NaN out a slot would break determinism under concurrent readers. Disk-backed callers must rebuild to delete; in-memory callers keep existing semantics.Stack
base: feature/diskann-quantizer-search-path(PR #384)base of base: feature/diskann-rabitq-backend(PR #383)base of all: mainThe three PRs together (#383 → #384 → #385) are Phase 1 item #1 from the research roadmap, fully realized:
Quantizerabstraction → trait load-bearing in search → DRAM compression delivered.🤖 Generated with claude-flow