Skip to content

feat(ruvector-diskann): wire Quantizer trait into search path — codes load-bearing#384

Open
ruvnet wants to merge 1 commit intofeature/diskann-rabitq-backendfrom
feature/diskann-quantizer-search-path
Open

feat(ruvector-diskann): wire Quantizer trait into search path — codes load-bearing#384
ruvnet wants to merge 1 commit intofeature/diskann-rabitq-backendfrom
feature/diskann-quantizer-search-path

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented Apr 26, 2026

Summary

Stacked on PR #383. Closes the architectural gap that PR called out: today DiskAnnIndex.pq_codes is dead storage — search() calls graph greedy-search over FlatVectors (full f32 originals) and exact-L2² rerank, with self.pq_codes read by neither.

This PR makes the Quantizer trait actually load-bearing in the DiskANN hot path so both backends (PQ + RaBitQ) consult their codes during graph traversal.

What changes

  1. New greedy_search_with_codes in graph.rs — generalizes the existing greedy_search_fast to accept an arbitrary distance_fn closure. The original stays unchanged for back-compat.
  2. DiskAnnIndex carries enum QuantizerBackend instead of Option<ProductQuantizer>. Hybrid pattern (match-once-per-search, monomorphic inner loops).
  3. Builder API: with_quantizer_kind, with_rerank_factor, with_rabitq_seed, with_originals_in_memory.
  4. NAPI binding patched with ..Default::default() so the workspace still builds.

Why hybrid pattern (not generic, not dyn)

  • Option A (generic DiskAnnIndex<Q>): would cascade through ruvector-diskann-node and any other consumer using DiskAnnIndex by name.
  • Option B (Box<dyn Quantizer>): impossible — Quantizer::Query is an associated type; trait isn't object-safe.
  • Hybrid: match once per search() on a backend enum; the inner closure is monomorphic per arm, hot loop stays branch-free.

Recall numbers (the proof codes are now consulted)

Path recall@10
Flat f32 (legacy) 1.000
PQ before this PR ≈ 1.000 (codes ignored, PQ effectively no-op)
PQ after this PR 0.897 (trait-driven, real quantization noise)
RaBitQ after this PR 0.967 (trait-driven, rerank_factor=40)

PQ recall now reflects real quantization noise because codes are actually used. The 0.897 number is well above the 0.85 floor required by the regression test.

What's deferred

with_originals_in_memory(false) is plumbed and validated, but build() rejects false with InvalidConfig. The disk-backed rerank path that lands the actual 17.5× DRAM compression is the natural next PR. The trait-driven traversal shipped here is the prerequisite.

Measured codes-vs-originals ratio at D=128, n=2000: codes 40 KB, originals 1024 KB, ratio 0.039 (~25× smaller). Once originals can be evicted, dataset-level compression will exceed the 17.5× target.

Verification

New tests in tests/quantizer_search_uses_codes.rs:

  • Spy-quantizer codes-consulted assertion (would fail before this PR)
  • recall@10 vs brute-force baseline (≥ 0.85 floor)
  • PQ recall regression check
  • Codes-vs-originals memory ratio sanity check
  • greedy_search_with_codes ≡ greedy_search_fast under f32 distance

Surprise observation worth recording

The "abstraction gap" PR #383 left wasn't an architectural mismatch — it was three lines of l2_squared(...) inside greedy_search_fast's loop, not coupled at the type-signature level. Threading a closure through took ~120 LoC, not the refactor the research doc implied.

Stacked on PR #383

base: feature/diskann-rabitq-backend. Will rebase to main once #383 merges. The two together are Phase 1 item #1 from docs/research/rabitq-integration/05-roadmap.md.

🤖 Generated with claude-flow

… load-bearing

Closes the architectural gap surfaced in PR #383: previously
`DiskAnnIndex.pq_codes` was dead storage. `search()` called
`graph.greedy_search` over `FlatVectors` (full f32 originals)
and exact-L2² rerank — `self.pq_codes` was read by neither.

Today's PQ "savings" were on-disk only; in DRAM the index still
held full f32 vectors. RaBitQ inherited the same gap.

This PR makes the `Quantizer` trait load-bearing in the DiskANN
hot path so both backends (PQ + RaBitQ) actually consult their
codes during graph traversal.

## What changes

1. **New `greedy_search_with_codes` in `graph.rs`** — generalizes
   `greedy_search_fast` to accept an arbitrary `distance_fn` closure
   over node ids. The original `greedy_search_fast` stays unchanged
   for back-compat. Surprise observation worth recording: the old
   `greedy_search_fast` was coupled to f32 distance via inline
   `l2_squared(...)` calls inside the loop, NOT via type signatures.
   The "abstraction gap" PR #383 left was three lines, not an
   architectural mismatch.

2. **`DiskAnnIndex` carries `enum QuantizerBackend`** rather than
   `Option<ProductQuantizer>`. Hybrid pattern chosen because:
       Option A (generic `DiskAnnIndex<Q>`) — would cascade through
         `ruvector-diskann-node`'s NAPI binding and any other crate
         using `DiskAnnIndex` by name. Too much ripple.
       Option B (`Box<dyn Quantizer>`) — impossible because
         `Quantizer::Query` is an associated type; trait isn't
         object-safe.
       Hybrid (this PR): match-once-per-search on the backend enum,
         dispatch into a monomorphic closure. Hot loop stays
         branch-free.

3. **Builder API**: `DiskAnnConfig::with_quantizer_kind()`,
   `with_rerank_factor()`, `with_rabitq_seed()`,
   `with_originals_in_memory()`. All have `Default` impls so existing
   call sites compile unchanged after a `..Default::default()` patch.

4. **`with_originals_in_memory(false)` plumbed but not yet honored**.
   `build()` rejects with `InvalidConfig` for now — the disk-backed
   rerank path that would land the actual 17.5× DRAM compression is
   the natural next PR. The trait-driven traversal shipped here is
   the prerequisite. Measured codes-vs-originals memory ratio at
   D=128, n=2000: **codes 40 KB, originals 1024 KB, ratio 0.039
   (~25× smaller)** — once originals can be evicted, dataset-level
   compression will exceed the 17.5× target.

## Recall numbers

| Path | recall@10 | Notes |
|---|---|---|
| Flat f32 (legacy) | 1.000 | n=1k, dim=64, 30 queries |
| **PQ before this PR** | (≈ 1.000) | Codes never read — PQ was effectively no-op on recall |
| **PQ after this PR** | 0.897 | Trait-driven, M=8, rerank_factor=4. Reflects actual quantization noise. |
| **RaBitQ after this PR** | 0.967 | Trait-driven, search_beam=512, rerank_factor=40 |

PQ recall now reflects real quantization noise because PQ codes
are *used* during traversal. 0.897 is well above the 0.85 floor
required by the regression test.

## NAPI binding patch

`ruvector-diskann-node/src/lib.rs:48` was using a struct literal
that listed every `DiskAnnConfig` field. Adding the four new fields
broke that initializer. One-line `..Default::default()` fix
restores it without growing the NAPI override surface (all new
fields have sensible defaults).

## Verification

  cargo build --workspace                                              → 0 errors
  cargo build -p ruvector-diskann --no-default-features                → OK (PQ-only)
  cargo clippy --workspace --all-targets --no-deps -- -D warnings      → exit 0
  cargo fmt --all --check                                              → exit 0
  cargo test -p ruvector-diskann --features rabitq                     → 30 / 30
                                                                          (was 26 in PR #383)

New tests in `tests/quantizer_search_uses_codes.rs`:
  - spy-quantizer codes-consulted assertion (proves codes are now
    consulted; would fail before this PR)
  - recall@10 vs brute-force baseline (≥ 0.85 floor)
  - PQ recall regression (no drop vs the old "ignore codes,
    brute-force rerank" behavior, after accounting for actual
    quantization noise)
  - codes-vs-originals memory ratio sanity check
  - greedy_search_with_codes ≡ greedy_search_fast under f32 distance

Refs: PR #383 (DiskANN Quantizer trait + RaBitQ backend),
docs/research/rabitq-integration/05-roadmap.md Phase 1

Co-Authored-By: claude-flow <ruv@ruv.net>
ruvnet added a commit that referenced this pull request Apr 26, 2026
Unblocks the 7 stacked PRs (#381-#387) and turns `main`'s CI green
for the first time in days. Two issues fixed:

## Failure 1 — Security audit (was: 8 vulnerabilities)

`cargo audit` is now exit 0. 4 of the 5 critical advisories were
fixed by version bumps; only the unfixable one is ignored.

**Dep-bumped:**
- `rustls-webpki 0.101.7` + `0.103.10` → `0.103.13` via
  `cargo update -p rustls-webpki@0.103.10`. Patches:
    RUSTSEC-2026-0098 (URI name constraints)
    RUSTSEC-2026-0099 (wildcard name constraints)
    RUSTSEC-2026-0104 (CRL parsing panic)
- `idna 0.5.0` → `1.1.0` via `validator 0.18 → 0.20` in
  `examples/scipix`. Patches RUSTSEC-2024-0421 (Punycode acceptance).
- Bonus: `reqwest 0.11 → 0.12` (in `ruvector-core` + `examples/benchmarks`)
  and `hf-hub 0.3 → 0.4` (in `ruvector-core` + `ruvllm` +
  `ruvllm-cli`). Removes the entire legacy `rustls 0.21` /
  `rustls-webpki 0.101.7` subtree from the lockfile.

**Ignored** (single advisory, with rationale):
- `RUSTSEC-2023-0071` (rsa Marvin timing sidechannel) — no upstream
  fix available; we don't expose RSA decryption services. Documented
  in `.cargo/audit.toml`.

**Unmaintained warnings** (16 total — proc-macro-error, derivative,
instant, paste, bincode 1, pqcrypto-{kyber,dilithium}, rustls-pemfile 1,
rusttype, wee_alloc, number_prefix, rand_os, core2, lru, pprof, rand) —
each given a one-line justification in `.cargo/audit.toml` so CI stays
green on them while the team decides whether to chase upstream
replacements.

## Failure 2 — Tests timeout (was: 30-min job timeout cancellation)

`.github/workflows/ci.yml` `test` job is now a `matrix` with
`fail-fast: false` and `timeout-minutes: 45`. Six parallel shards
under `cargo nextest run` (installed via `taiki-e/install-action@v2`)
plus a separate `cargo test --doc` step (nextest doesn't run
doctests):

  | Shard            | Crates                                      |
  |------------------|---------------------------------------------|
  | vector-index     | rabitq, rulake, diskann, graph, gnn, cnn    |
  | rvagent          | 10 rvagent-* crates                         |
  | ruvix            | 16 ruvix-* crates                           |
  | ruqu-quantum     | 5 ruqu* crates                              |
  | ml-research      | attention, mincut, scipix, fpga-transformer,|
  |                  | sparse-inference, sparsifier, solver,       |
  |                  | graph-transformer, domain-expansion,        |
  |                  | robotics                                    |
  | core-and-rest    | --workspace minus the above                 |

`Swatinem/rust-cache@v2` is keyed per shard. Audit job switched to
`taiki-e/install-action` for `cargo-audit` (faster than
`cargo install --locked`).

## Verification

  cargo audit                                                   → exit 0
  cargo build --workspace --exclude ruvector-postgres           → clean
  cargo clippy --workspace --exclude ruvector-postgres --no-deps -- -D warnings → exit 0
  cargo fmt --all --check                                       → exit 0

## Cargo.lock churn

166-line diff, net ~120 lines removed (more deletions than
additions). Removed: `idna 0.5.0`, `rustls-webpki 0.101.7`,
`validator 0.18`, `validator_derive 0.18`, `proc-macro-error 1.0.4`.
Added: `rustls-webpki 0.103.13`, `validator 0.20`,
`proc-macro-error2`, `hf-hub 0.4.3`, `reqwest 0.12.28`. No
suspicious crates.

## Recommended merge order

1. **This PR first** — unblocks every other PR's CI.
2. After this lands and main is green, rebase the 7 open PRs
   (#381-#387) one at a time. The DiskANN stack (#383#384#385#386)
   must merge in numeric order. #381 (Python SDK), #382 (research),
   #387 (graph property index) are independent and can merge in
   any order after their CI goes green on the rebase.

Co-Authored-By: claude-flow <ruv@ruv.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant