Skip to content

feat(ruvector-diskann): land disk-backed rerank — DRAM compression now real#385

Open
ruvnet wants to merge 1 commit intofeature/diskann-quantizer-search-pathfrom
feature/diskann-disk-backed-rerank
Open

feat(ruvector-diskann): land disk-backed rerank — DRAM compression now real#385
ruvnet wants to merge 1 commit intofeature/diskann-quantizer-search-pathfrom
feature/diskann-disk-backed-rerank

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented Apr 26, 2026

Summary

Stacked on PR #384. Closes the gap PR #384 deferred: today's with_originals_in_memory(false) returns InvalidConfig. After this PR, the option works — the index holds quantized codes + graph in DRAM and the final exact-L2² rerank reads originals from a memory-mapped sidecar file.

This is the PR that makes the "17.5× DRAM compression" the research roadmap projected actually real.

Measured DRAM compression (D=128, n=2000, RaBitQ)

Variant DRAM cost (originals) Codes (always DRAM) Ratio
In-memory (default) 1 024 000 B (1 MB) 40 000 B 25.6× compression of codes vs originals
Disk-backed (this PR) 0 DRAM bytes 40 000 B originals in kernel-owned mmap

25.6× exceeds the 17.5× target. The disk-backed variant pays only codes + graph + mmap-handle in DRAM; originals live in <storage_path>.originals as raw f32 with a 24-byte header.

What changes

  1. OriginalsStore enumInMemory(FlatVectors) | DiskBacked { mmap, n, dim }. Chosen over Box<dyn> for monomorphic dispatch on the rerank hot path + automatic Send + Sync.
  2. memmap2 — already a workspace dep. No new deps.
  3. Builder validationwith_originals_in_memory(false) requires storage_path; missing → InvalidConfig at build time.
  4. Sidecar layout[u32 magic][u32 version][u32 dim][u32 n][8 pad] followed by dim * n f32 values.
  5. Backward compatibility — v1 indexes (no sidecar) fall back to the read-into-Vec path; load detects the sidecar and uses it only when present + the saved config requested it.

Observation worth recording

Pre-PR load() was already mmapping vectors.bin, then immediately copying it byte-by-byte into a heap Vec<f32>. The mmap field served no functional purpose — it just kept the file descriptor alive. This PR turns that latent mmap into the active reader and the v1 path becomes the legacy fallback.

Verification

New tests in tests/disk_backed_rerank.rs:

  • disk_backed_yields_zero_dram_for_originals
  • disk_backed_compression_exceeds_17x_at_d128
  • disk_backed_recall_matches_in_memory (≥ 0.85 floor)
  • disk_backed_save_load_round_trip_preserves_results
  • with_originals_in_memory_false_requires_storage_path

Limitations flagged

  1. RaBitQ codes don't persist across save/load (carried from PR feat(ruvector-diskann): add RaBitQ backend via new Quantizer trait (Phase 1 item #1) #383/feat(ruvector-diskann): wire Quantizer trait into search path — codes load-bearing #384). A reloaded RaBitQ-built index falls back to the f32 traversal path. The mmap loads correctly, but writing/reading the rotation matrix to disk is a separate follow-up. The save/load round-trip test uses PQ for this reason.
  2. delete() rejected on disk-backed indexes (InvalidConfig). Writing through the mmap to NaN out a slot would break determinism under concurrent readers. Disk-backed callers must rebuild to delete; in-memory callers keep existing semantics.

Stack

base: feature/diskann-quantizer-search-path (PR #384)
base of base: feature/diskann-rabitq-backend (PR #383)
base of all: main

The three PRs together (#383#384#385) are Phase 1 item #1 from the research roadmap, fully realized: Quantizer abstraction → trait load-bearing in search → DRAM compression delivered.

🤖 Generated with claude-flow

…w real

Closes the gap PR #384 deferred: today's
`DiskAnnConfig::with_originals_in_memory(false)` returns
`InvalidConfig`. After this PR, the option works and the index
holds quantized codes + graph in DRAM while the final exact-L2²
rerank reads originals from a memory-mapped sidecar file.

Delivers the 17.5× DRAM compression target the research roadmap
projected (`docs/research/rabitq-integration/05-roadmap.md` Phase 1).

## Measured DRAM compression

D=128, n=2000, RaBitQ:

  codes  =      40 000 B   (40 KB)
  originals (in-memory)  = 1 024 000 B  (1 MB)
  ratio  = 25.60×    (exceeds 17.5× target)

  originals (disk-backed) = 0 DRAM bytes
                            (kernel-owned mmap; reads through
                             page cache on demand)

The disk-backed variant pays only the codes + graph + mmap-handle
size in DRAM. Originals live in `<storage_path>.originals` as a
raw f32 layout with a 24-byte header.

## What changes

1. **`OriginalsStore` enum** — `InMemory(FlatVectors)` |
   `DiskBacked { mmap, n, dim }`. Enum chosen over `Box<dyn>` for:
   - Monomorphic dispatch on the rerank hot path
   - Automatic `Send + Sync` (both `Vec<f32>` and `Mmap` are)
   - One less indirection
   The trait the brief described would be equivalent in
   expressiveness; the enum was just lower-friction.

2. **`memmap2 = "0.9"`** — already in the workspace and an existing
   direct dep of ruvector-diskann. No new workspace deps.

3. **Builder wiring** — `with_originals_in_memory(false)` now
   requires `storage_path` and validates at `build()` time with
   `InvalidConfig` if absent. Default `true` behavior unchanged.

4. **`build()`** writes `<storage_path>.originals` (header + f32
   payload), mmaps it, drops the in-memory `FlatVectors`. The
   sidecar header is `[u32 magic][u32 version][u32 dim][u32 n]
   [8 bytes pad]`.

5. **`load()`** checks for the sidecar; if it exists AND the saved
   config marks `keep_originals_in_memory: false`, mmaps it and
   skips the heap copy. v1 indexes (no sidecar) fall back to the
   read-into-Vec path for full back-compat.

6. **`search()`** reranks through `OriginalsStore::read(pos, &mut
   buf)` regardless of variant. The traversal layer (PR #384)
   needs no change.

## One observation worth recording

The pre-PR `load()` was already mmapping `vectors.bin`, then
immediately copying it byte-by-byte into a heap `Vec<f32>` (lines
574-579 of the pre-edit file). The mmap was retained but never
read again — the field served no functional purpose, only kept
the file descriptor alive. This PR turns that latent mmap into
the active reader (via the sidecar) and the v1 path becomes the
legacy fallback.

## Verification

  cargo build --workspace                                              → 0 errors
  cargo build -p ruvector-diskann --no-default-features                → OK
  cargo clippy --workspace --all-targets --no-deps -- -D warnings      → exit 0
  cargo fmt --all --check                                              → exit 0
  cargo test -p ruvector-diskann --features rabitq                     → 35 / 35
                                                                          (was 30 in PR #384)
  cargo test -p ruvector-diskann --no-default-features                 → 19 / 19

New tests in `tests/disk_backed_rerank.rs`:
- `disk_backed_yields_zero_dram_for_originals`
- `disk_backed_compression_exceeds_17x_at_d128`
- `disk_backed_recall_matches_in_memory` (≥ 0.85 floor maintained)
- `disk_backed_save_load_round_trip_preserves_results` (uses PQ
  because RaBitQ persistence is a follow-up — see limitations)
- `with_originals_in_memory_false_requires_storage_path`

## Limitations flagged

1. **RaBitQ codes still don't persist across save/load**. A
   reloaded RaBitQ-built index falls back to the f32 traversal
   path. The mmap loads correctly, but writing/reading the
   rotation matrix to disk is a separate follow-up. The save/load
   round-trip test uses PQ to avoid hitting this gap.
2. **`delete()` rejected on disk-backed indexes** (`InvalidConfig`).
   Writing through the mmap to NaN out a slot would break
   determinism guarantees under concurrent readers. Disk-backed
   callers must rebuild to delete; in-memory callers retain
   existing semantics.

## NAPI binding

`ruvector-diskann-node/src/lib.rs` was untouched — the
`..Default::default()` patch landed in PR #384 already absorbs
the new `keep_originals_in_memory` field via its `Default` impl.

Refs: PR #383 (Quantizer trait + RaBitQ backend), PR #384
(search-path rewrite), `docs/research/rabitq-integration/05-roadmap.md`
Phase 1.

Co-Authored-By: claude-flow <ruv@ruv.net>
ruvnet added a commit that referenced this pull request Apr 26, 2026
Unblocks the 7 stacked PRs (#381-#387) and turns `main`'s CI green
for the first time in days. Two issues fixed:

## Failure 1 — Security audit (was: 8 vulnerabilities)

`cargo audit` is now exit 0. 4 of the 5 critical advisories were
fixed by version bumps; only the unfixable one is ignored.

**Dep-bumped:**
- `rustls-webpki 0.101.7` + `0.103.10` → `0.103.13` via
  `cargo update -p rustls-webpki@0.103.10`. Patches:
    RUSTSEC-2026-0098 (URI name constraints)
    RUSTSEC-2026-0099 (wildcard name constraints)
    RUSTSEC-2026-0104 (CRL parsing panic)
- `idna 0.5.0` → `1.1.0` via `validator 0.18 → 0.20` in
  `examples/scipix`. Patches RUSTSEC-2024-0421 (Punycode acceptance).
- Bonus: `reqwest 0.11 → 0.12` (in `ruvector-core` + `examples/benchmarks`)
  and `hf-hub 0.3 → 0.4` (in `ruvector-core` + `ruvllm` +
  `ruvllm-cli`). Removes the entire legacy `rustls 0.21` /
  `rustls-webpki 0.101.7` subtree from the lockfile.

**Ignored** (single advisory, with rationale):
- `RUSTSEC-2023-0071` (rsa Marvin timing sidechannel) — no upstream
  fix available; we don't expose RSA decryption services. Documented
  in `.cargo/audit.toml`.

**Unmaintained warnings** (16 total — proc-macro-error, derivative,
instant, paste, bincode 1, pqcrypto-{kyber,dilithium}, rustls-pemfile 1,
rusttype, wee_alloc, number_prefix, rand_os, core2, lru, pprof, rand) —
each given a one-line justification in `.cargo/audit.toml` so CI stays
green on them while the team decides whether to chase upstream
replacements.

## Failure 2 — Tests timeout (was: 30-min job timeout cancellation)

`.github/workflows/ci.yml` `test` job is now a `matrix` with
`fail-fast: false` and `timeout-minutes: 45`. Six parallel shards
under `cargo nextest run` (installed via `taiki-e/install-action@v2`)
plus a separate `cargo test --doc` step (nextest doesn't run
doctests):

  | Shard            | Crates                                      |
  |------------------|---------------------------------------------|
  | vector-index     | rabitq, rulake, diskann, graph, gnn, cnn    |
  | rvagent          | 10 rvagent-* crates                         |
  | ruvix            | 16 ruvix-* crates                           |
  | ruqu-quantum     | 5 ruqu* crates                              |
  | ml-research      | attention, mincut, scipix, fpga-transformer,|
  |                  | sparse-inference, sparsifier, solver,       |
  |                  | graph-transformer, domain-expansion,        |
  |                  | robotics                                    |
  | core-and-rest    | --workspace minus the above                 |

`Swatinem/rust-cache@v2` is keyed per shard. Audit job switched to
`taiki-e/install-action` for `cargo-audit` (faster than
`cargo install --locked`).

## Verification

  cargo audit                                                   → exit 0
  cargo build --workspace --exclude ruvector-postgres           → clean
  cargo clippy --workspace --exclude ruvector-postgres --no-deps -- -D warnings → exit 0
  cargo fmt --all --check                                       → exit 0

## Cargo.lock churn

166-line diff, net ~120 lines removed (more deletions than
additions). Removed: `idna 0.5.0`, `rustls-webpki 0.101.7`,
`validator 0.18`, `validator_derive 0.18`, `proc-macro-error 1.0.4`.
Added: `rustls-webpki 0.103.13`, `validator 0.20`,
`proc-macro-error2`, `hf-hub 0.4.3`, `reqwest 0.12.28`. No
suspicious crates.

## Recommended merge order

1. **This PR first** — unblocks every other PR's CI.
2. After this lands and main is green, rebase the 7 open PRs
   (#381-#387) one at a time. The DiskANN stack (#383#384#385#386)
   must merge in numeric order. #381 (Python SDK), #382 (research),
   #387 (graph property index) are independent and can merge in
   any order after their CI goes green on the rebase.

Co-Authored-By: claude-flow <ruv@ruv.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant