From 454bba8a19252f3e2ff569266538fad7c754d1b4 Mon Sep 17 00:00:00 2001 From: ruvnet Date: Sat, 25 Apr 2026 20:55:37 -0400 Subject: [PATCH] docs(research): deep review of RaBitQ integration paths into ruvector MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Seven-file research at docs/research/rabitq-integration/ surveying where RaBitQ (ADR-154, crates.io v2.2.0) is consumed today, where else it could go, and what architectural pattern each candidate should use. ## Top 3 integration recommendations 1. **ruvector-diskann RaBitQ backend** — replace/augment the PQ quantizer with `RabitqPlusIndex` (≤500 LoC). ADR-154 already named DiskANN as a target consumer; the spot is open. 2. **ruvector-graph `VectorPropertyIndex`** — vector-keyed property lookup for graph nodes via RaBitQ codes alongside the property table (≤600 LoC). Unlocks "find nodes whose embedding is closest to query" without a separate index crate. 3. **ruvector-gnn `differentiable_search` rewrite** — replace the cosine fan-out at `differentiable_search.rs` with `RabitqPlusIndex::search_with_rerank` (≤300 LoC). Keeps the gradient path; collapses memory by 32×. ## Key nuance discovered The `VectorKernel` trait + `CpuKernel` shipped at `crates/ruvector-rabitq/src/kernel.rs:78` and ADR-157's dispatch policy is fully specified — but **no caller wires it up**. The only reference is a doc comment at `crates/ruvector-rulake/src/lake.rs:595`. Any new consumer choosing Pattern 2 (the trait dispatch route) would be the first non-test caller and would have to implement dispatch from scratch — almost certainly diverging from ADR-157's determinism gate. This forced an ordering decision: **ruLake must implement `register_kernel` first**; Phase 1 below stays Pattern 1 (direct embed) only. ## Phased roadmap - **Phase 1 (4–5 wk):** the 3 high-value Pattern-1 integrations above. All direct-embed; no trait dispatch yet. - **Phase 2 (4–6 wk):** ruLake wires `register_kernel`; CpuKernel + at least one new kernel (CPU-SIMD or WASM) become real; ≥2 consumers route through the trait. - **Phase 3 (~2 wk):** propose new ADR ("RaBitQ as ruvector's canonical vector compression substrate") and catalog what ruvector-graph / -gnn / -attention need to share one compression layer. Total: ~10–13 engineer-weeks. ## What this is NOT - Not implementation. No Rust code in this PR — just markdown. - Not an ADR. Phase 3 may produce one; this is the research that precedes it. - Not a binding decision. Each integration in §02 is annotated with effort + value so the team can re-prioritize. ## File breakdown INDEX.md 51 LoC 01-current-integration.md 134 LoC (call sites today) 02-integration-opportunities.md 300 LoC (15 candidates surveyed) 03-architectural-patterns.md 289 LoC (3 patterns + anti-patterns) 04-cross-cutting-concerns.md 230 LoC (determinism, witness, perf) 05-roadmap.md 238 LoC (3 phases, milestones) 06-decision-record.md 107 LoC (1-page call to action) Refs: ADR-154 (RaBitQ), ADR-155 (ruLake), ADR-157 (accelerator plane), PR #380 (ADR-159 + workspace cleanup), PR #381 (Python SDK M1). Co-Authored-By: claude-flow --- .../01-current-integration.md | 134 ++++++++ .../02-integration-opportunities.md | 300 ++++++++++++++++++ .../03-architectural-patterns.md | 289 +++++++++++++++++ .../04-cross-cutting-concerns.md | 230 ++++++++++++++ .../research/rabitq-integration/05-roadmap.md | 238 ++++++++++++++ .../rabitq-integration/06-decision-record.md | 107 +++++++ docs/research/rabitq-integration/INDEX.md | 51 +++ 7 files changed, 1349 insertions(+) create mode 100644 docs/research/rabitq-integration/01-current-integration.md create mode 100644 docs/research/rabitq-integration/02-integration-opportunities.md create mode 100644 docs/research/rabitq-integration/03-architectural-patterns.md create mode 100644 docs/research/rabitq-integration/04-cross-cutting-concerns.md create mode 100644 docs/research/rabitq-integration/05-roadmap.md create mode 100644 docs/research/rabitq-integration/06-decision-record.md create mode 100644 docs/research/rabitq-integration/INDEX.md diff --git a/docs/research/rabitq-integration/01-current-integration.md b/docs/research/rabitq-integration/01-current-integration.md new file mode 100644 index 000000000..ff2809520 --- /dev/null +++ b/docs/research/rabitq-integration/01-current-integration.md @@ -0,0 +1,134 @@ +# 01 — Current RaBitQ Integration in RuVector + +## What `ruvector-rabitq` ships (the supplier) + +Crate `ruvector-rabitq` 2.2.0 (workspace version, `Cargo.toml:215`) lives +at `crates/ruvector-rabitq/` and exports four pieces from +`crates/ruvector-rabitq/src/lib.rs:45-59`: + +| Item | Source | Status | +|------|--------|--------| +| `FlatF32Index`, `RabitqIndex`, `RabitqAsymIndex`, `RabitqPlusIndex` | `src/index.rs` | shipped, all four behind `AnnIndex` | +| `BinaryCode`, `pack_bits`, `unpack_bits` | `src/quantize.rs` | shipped | +| `RandomRotation`, `RandomRotationKind` | `src/rotation.rs` | shipped (Haar + Hadamard-signed) | +| `persist::save_index` / `load_index` (`.rbpx` v1) | `src/persist.rs:118,187` | shipped, deterministic seed-based | +| `VectorKernel`, `KernelCaps`, `ScanRequest`, `ScanResponse`, `CpuKernel` | `src/kernel.rs:78-126` | **trait shipped, only `CpuKernel` implements it** | + +The "shipped vs. scaffolded" map for the kernel surface is critical: +the trait is ready and a default kernel exists, but the dispatch lives +in **no caller** today (see ruLake gap below). + +## Real consumers in the workspace + +Three call sites import `ruvector_rabitq`. They are the universe of +integration as of HEAD. + +### 1. `ruvector-rulake` — the showpiece + +`crates/ruvector-rulake/Cargo.toml:16` pins +`ruvector-rabitq = { path = "../ruvector-rabitq", version = "2.2" }`. +The crate is the only one in the tree that already exercises every +public surface of rabitq: + +| Surface | Used in | Lines | +|---------|---------|-------| +| `RabitqPlusIndex::from_vectors_parallel` (build) | `crates/ruvector-rulake/src/cache.rs:402` | rayon-parallel rotate+pack on cache prime | +| `RabitqPlusIndex::new` + `add` (incremental) | `crates/ruvector-rulake/src/cache.rs:409` | small-batch path | +| `Arc` cache slot | `crates/ruvector-rulake/src/cache.rs:213,488,499,667` | concurrency story (see ADR-155 §"Arc-concurrency 12×") | +| `AnnIndex::search` / `search_with_rerank` | `crates/ruvector-rulake/src/cache.rs:708,833` | hot path | +| `persist::save_index` / `load_index` (`.rbpx`) | `crates/ruvector-rulake/src/lake.rs:304,399` | bundle warm/freeze | +| `RabitqError` `From` conversion | `crates/ruvector-rulake/src/error.rs:17-18` | error propagation | +| `RandomRotationKind::HadamardSigned` | `crates/ruvector-rulake/benches/*` (per BENCHMARK.md) | rotation-flavor toggle | + +Total: **15 references** across `cache.rs`, `lake.rs`, `error.rs`, the +demo bin, and the federation smoke test (count from +`grep -n rabitq crates/ruvector-rulake/src/{lib,cache,lake}.rs`). + +ruLake exposes ruvector-rabitq's contract under three witness modes +(`Consistency::{Fresh, Eventual, Frozen}` — `lake.rs`, ADR-155). The +measured intermediary tax on a cache hit is **1.02× direct +`RabitqPlusIndex::search`** (`crates/ruvector-rulake/BENCHMARK.md` and +ADR-157 §Context). This is the cost ceiling against which every other +integration should be measured. + +**Gap: `VectorKernel` is referenced but not wired.** `lake.rs:595` is +literally a doc comment "this is also the plug-point for the future +`VectorKernel` trait (ADR-157)". `register_kernel` does not exist as a +method in `crates/ruvector-rulake/src/lake.rs`. The README confirms +under "M2+ on the roadmap": +`crates/ruvector-rulake/README.md:507` — `VectorKernel` trait +scaffolding (M1, done) → `crates/ruvector-rulake/README.md:515` — GPU +kernels in separate crates (M2+, deferred). The dispatch policy from +ADR-157 has no caller. + +### 2. `ruvector-py` — the third major consumer (PR #381 / commit `e7f5a391f`) + +`crates/ruvector-py/Cargo.toml:25` pins `ruvector-rabitq = { path = +"../ruvector-rabitq" }` and exposes a single `RabitqIndex` PyO3 class +backed by `RabitqPlusIndex`. Surface used: + +| Surface | Used in | Lines | +|---------|---------|-------| +| `RabitqPlusIndex::from_vectors_parallel` (with GIL release) | `crates/ruvector-py/src/rabitq.rs:118` | `py.allow_threads` wraps the rotate+pack | +| `AnnIndex::search_with_rerank` | `crates/ruvector-py/src/rabitq.rs:154` | per-call rerank override | +| `RabitqPlusIndex::export_items` | `crates/ruvector-py/src/rabitq.rs` (in `save`) | replay-source recovery | +| `persist::save_index` / `load_index` | `crates/ruvector-py/src/rabitq.rs:198` | NumPy-friendly disk roundtrip | +| `RabitqError → PyErr` | `crates/ruvector-py/src/error.rs:25` | typed Python error | + +This consumer's lesson, recorded directly in the source comment at +`src/rabitq.rs:35-43`: *RaBitQ does not expose `originals_flat` +directly; the wrapper must call `export_items()` to re-materialise the +items vector for `save_index`.* This drives the §04 design rule. + +### 3. The rabitq demo binary + +`crates/ruvector-rabitq/src/main.rs:28-29` imports every index variant +(`FlatF32Index`, `RabitqAsymIndex`, `RabitqIndex`, `RabitqPlusIndex`) +and benches them on clustered Gaussian data. This is internal +benchmarking, not an integration in the workspace sense, but it's the +canonical place to read all four indexes used together. + +## The integration map at HEAD + +``` + consumers supplier + ───────── ──────── + ruvector-rulake ────────► ┌────────────────────────────┐ + (cache, lake, │ ruvector-rabitq 2.2.0 │ + bundle, witness) │ │ + │ - RabitqPlusIndex (build, │ + ruvector-py ────────► │ add, search, persist) │ + (Python wheel, │ - VectorKernel trait │ + M1) │ - CpuKernel only │ + │ │ + rabitq-demo ────────► │ - rotation, pack/unpack │ + (internal bench) └────────────────────────────┘ +``` + +Every other crate in the workspace **does not** depend on +`ruvector-rabitq`. The 126 other crates listed under `crates/` are +empty space from rabitq's perspective. That gap is what §02 surveys. + +## Three properties every existing consumer relies on + +These show up in the source comments of all three call sites and they +are the load-bearing API contract: + +1. **Determinism across processes.** `(dim, seed, items) → + bit-identical index` (`crates/ruvector-rabitq/src/persist.rs:14-17`, + re-cited in `ruvector-rulake::cache::CacheEntry` and the + roundtrip-preserves-search-results test at + `persist.rs:258-318`). ruLake's witness chain (ADR-155) and + cross-backend cache sharing depend on this. +2. **Encapsulation: no exposed `originals_flat`.** Consumers that need + raw vectors call `export_items()` (`src/index.rs:589`) — the field + itself is private (`src/index.rs:546`). Both rulake and the Python + SDK live with this; new consumers must too. +3. **`AnnIndex` is the only stable trait.** `RabitqPlusIndex::search`, + `search_with_rerank`, `len`, `dim`, `external_ids`, `ids_u64` — + these are the public hot-path surface. Internals (`originals_flat`, + `last_word_mask`, `cos_lut`) are private and the persist format + exists precisely to avoid widening that encapsulation + (`crates/ruvector-rabitq/src/persist.rs:1-18`). + +These three are what §04 elaborates as "must not break". diff --git a/docs/research/rabitq-integration/02-integration-opportunities.md b/docs/research/rabitq-integration/02-integration-opportunities.md new file mode 100644 index 000000000..915917d91 --- /dev/null +++ b/docs/research/rabitq-integration/02-integration-opportunities.md @@ -0,0 +1,300 @@ +# 02 — Integration Opportunities + +For each candidate consumer crate this section answers: what it +stores, where similarity matters, what 32× compression buys, the +friction, the effort, and the strategic value. Candidates are +clustered by the (value × effort) quadrant they fall into so a +roadmap can pick from the top of the list without re-deriving the +trade-offs. + +The numbers behind "32× compression" are from +`crates/ruvector-rabitq/BENCHMARK.md`: at D=128 n=100k, RabitqPlus +with rerank×20 holds **100% recall@10** at **957 QPS** vs +FlatF32's 306 QPS, with 53.5 MB vs 50.4 MB total memory **including +the originals reranked from**. Strip rerank (RabitqIndex, no rerank) +and the codes alone are 2.4 MB vs 50.4 MB — that's the **17.5×–32× +compression** number cited in ADR-154. + +--- + +## Tier A — High-value, low-effort (do first) + +### A1. `ruvector-diskann` — replace or sit alongside the PQ quantizer + +**What it stores.** `crates/ruvector-diskann/src/index.rs:57` +`DiskAnnIndex` keeps a Vamana graph plus per-vector PQ codes via +`crates/ruvector-diskann/src/pq.rs:14` (`ProductQuantizer` with +k-means codebooks). Insert (`index.rs:98`), batch insert +(`index.rs:118`), and search (`index.rs:169`) are all vector-keyed. + +**Hot-path similarity.** Beam search inside Vamana scores candidates +via `pq::distance_with_table` (`pq.rs:220`). The PQ table is built +per query (`pq.rs:194`). + +**What 32× buys.** ADR-154 §"Integration path" already calls this +shot: *use BinaryCode for the in-memory candidate list during beam +search; full vectors stay on SSD; binary codes in DRAM for filtering*. +RaBitQ's popcount kernel is faster than the table-lookup PQ inner +loop (`O(D/64)` vs `O(M)` with cache-bound LUT) and ships +deterministic codes — k-means PQ is non-deterministic across runs. + +**Friction.** PQ has a `train` step (`pq.rs:46`) RaBitQ doesn't — +RaBitQ's "training" is a single rotation matrix from a seed, so the +DiskAnnIndex API can shed `train(...)` entirely on the rabitq path. +The on-disk format (`save`/`load` at `index.rs:219,297`) currently +serialises PQ codebooks; would need a parallel `.rbpx` slot or a tag +discriminating the two encodings. + +**Effort.** Small — one new module, one feature flag (or a constructor +variant `DiskAnnIndex::new_rabitq(seed, rerank)`), and a code path in +beam search. ≤500 LoC. + +**Strategic value.** **High.** DiskANN is the SSD-friendly cousin of +HNSW; pairing it with RaBitQ closes the "billion-scale on commodity +disk + DRAM" pitch in ADR-154. Also breaks the PQ training-data +bootstrap problem at index build. + +--- + +### A2. `ruvector-graph` — vector-property index for nodes + +**What it stores.** `crates/ruvector-graph/src/index.rs:15-79` +ships `LabelIndex`, `PropertyIndex`, `EdgeTypeIndex`, `AdjacencyIndex`. +There is no vector-property index today. + +**Hot-path similarity.** The `PropertyIndex::get_nodes_by_property` +path (`index.rs:118`) does exact matching on `PropertyValue`. The +moment a property is a `Vec` (an embedding stored on a node), +this collapses to "scan every node, compute distance, return top-k" — +which the crate cannot do today without a sister index. + +**What 32× buys.** A graph database with millions of nodes that each +carry a 768-dim embedding (LLM context, agent memory, code symbol) +needs vector-near-neighbor lookup as a property-search primitive. +RaBitQ codes turn that lookup from "scan everything" into "scan 1-bit +codes, rerank top candidates", and the codes themselves cost ~32× less +RAM than the originals. + +**Friction.** Graph database semantics: insert/update/delete on a +single node should not rebuild the rotation. `RabitqPlusIndex::add` +(`crates/ruvector-rabitq/src/index.rs`) already supports incremental +insertion under the existing rotation. Witness chain doesn't apply +here — graph nodes have their own ID semantics, so the rabitq index is +a sub-index keyed by `NodeId`. + +**Effort.** Medium-low — a new `VectorPropertyIndex` next to the +existing four (`crates/ruvector-graph/src/index.rs`), with the same +lifecycle hooks (`add_node`, `remove_node`). ~600 LoC. + +**Strategic value.** **High.** Unlocks "graph-structured RAG" inside +the same crate, which is what `crates/ruvector-graph-transformer/` and +the GNN consumers actually want. + +--- + +### A3. `ruvector-gnn` — KNN for `differentiable_search` + +**What it stores.** `crates/ruvector-gnn/src/search.rs:4` exposes +`cosine_similarity(a, b)` and `differentiable_search(query, candidates, +top_k, temperature)` (`search.rs:56`). The candidates list is held by +the caller — typically as a `Vec>`. + +**Hot-path similarity.** `differentiable_search` sorts every candidate +by cosine, takes top-k, and reweights the survivors with softmax. +`hierarchical_forward` at `search.rs:105` does this **once per +hierarchy layer per forward pass** during inference and training. + +**What 32× buys.** GNN inference at scale (10⁵+ nodes, 768-dim +features) hits a hard memory ceiling on the candidate set; replacing +the f32 candidate fan-out with RaBitQ codes lets a 100× larger +candidate pool fit in DRAM. Symmetric estimator +(`crates/ruvector-rabitq/src/lib.rs:11-14`) is `O(D/64)` vs cosine's +`O(D)` — the same algorithmic win the rabitq-demo measures (3.1× +QPS). + +**Friction.** `differentiable_search` returns *weights* via softmax, +not just ids. The 1-bit angular estimator `cos(π·(1 − B/D))` is a +proxy — top-k selection is fine, but the softmax weights would need to +come from RabitqPlus's exact-rerank f32 scores so gradients stay +meaningful. Practical: rerank top-k×10 with the f32 estimator, softmax +those. + +**Effort.** Small — replace the candidate-scan loop in +`search.rs:56` with `RabitqPlusIndex::search_with_rerank`. ≤300 LoC +plus a test that shows recall@k matches the brute-force cosine +within tolerance. + +**Strategic value.** **High.** Unlocks attention-over-large-graphs +patterns inside the GNN trainer. Pair with A2 for graph + GNN sharing +one rabitq sub-index. + +--- + +## Tier B — High-value, high-effort (medium-term) + +### B1. `ruvector-attention` — KV-cache compression behind 1-bit + +**What it stores.** `crates/ruvector-attention/src/attention/kv_cache.rs:253` +`CacheManager` owns key/value tensors per layer, with `append` +(`kv_cache.rs:284`), `get` (`:309`), `evict` (`:325`), and +`pyramid_budget` (`:398`). It already has its own asymmetric/symmetric +quantize (`:130, :182`) producing `QuantizedTensor` (`:90`) at 4–8 +bits. + +**Hot-path similarity.** Attention is `softmax(QK^T / sqrt(d)) V`. +The K-cache is the database, the Q is the query — exactly RaBitQ's +*asymmetric* setting (`crates/ruvector-rabitq/src/lib.rs:16-18`, +`RabitqAsymIndex` in `src/index.rs`). + +**What 32× buys.** A 32k-token cache at D=4096 is **524 MB per layer** +in f16; RaBitQ-Asym takes 16 MB for the codes. The asymmetric +estimator `‖q‖·‖x‖·(1/√D)·Σ sign(x_rot)·q_rot` keeps the query in +f32 — exactly what attention needs. + +**Friction.** Attention's existing 4–8-bit quantize is bf16/f16 native +across the rest of the LLM stack; introducing a third datatype path is +real work. Also, RabitqAsym's QPS at D=128 was only **26 QPS** +(`BENCHMARK.md` headline) — that path needs the SIMD/GPU kernel from +ADR-157 before it's competitive with the existing 4-bit path. +Determinism on rerank (float reduction order) is a problem on GPU. + +**Effort.** Large — touches an existing performance-sensitive cache, +needs SIMD kernel development, needs fallback to existing +`QuantizedTensor` when D is too small for the rotation cost to pay +off. ~2000+ LoC across kv_cache + a feature flag. + +**Strategic value.** **Medium-high but speculative.** ruvllm's KV +cache is the bigger target (B2); attention is the upstream library. +If B2 lands first, B1 follows. + +--- + +### B2. `ruvllm` — LLM serving KV cache + retrieval-augmented prompt cache + +**What it stores.** `crates/ruvllm/src/kv_cache.rs:203` `KvMemoryPool` +holds aligned f32 buffers (`AlignedBuffer` at `kv_cache.rs:45`). The +ruvllm hot path is the same shape as B1 but at the serving layer: +multi-tenant, eviction-pressured, latency-sensitive. + +**Hot-path similarity.** Same as B1 (attention K-cache). Plus, ruvllm's +RAG path is whatever the embedding model + a separate ANN index look +like — and that's a free win for ruLake (since ruvllm could just +embed and query a `RuLake` instance instead of holding its own). + +**What 32× buys.** Multi-tenant serving is RAM-bound; 32× compression +of long-context K-caches lets one box serve 32× more concurrent +sessions before eviction. + +**Friction.** ruvllm has its own backend abstraction +(`crates/ruvllm/src/backends/`), GGUF loaders +(`crates/ruvllm/src/gguf/`), Metal kernels (`/metal/`), and bitnet +support (`/bitnet/`). Adding RaBitQ as another quantization path +needs to live behind that backend trait, not in the cache directly. + +**Effort.** Large — needs ADR-class decision on whether ruvllm +adopts ruLake as its retrieval substrate (which solves both the K-cache +and the RAG question with one integration). Otherwise: a dedicated +RaBitQ K-cache implementation. 1500–3000 LoC depending on path. + +**Strategic value.** **High.** ruvllm is the LLM serving frontend; +RaBitQ-as-K-cache-compression is a marketing-grade moat ("32× more +concurrent contexts on the same hardware"). + +--- + +### B3. `ruvector-temporal-tensor` — time-windowed compressed segments + +`crates/ruvector-temporal-tensor/src/{lib,tiering,quantizer,compressor,f16,segment,bitpack}.rs` +ships a temperature-tiered compression stack already (hot/warm/cold via +`tier_policy.rs`, with its own quantizer in `core_trait.rs`). Cold-tier +reads currently pay an unpack cost; if the segment payload is 1-bit +RaBitQ codes the read can stay in compressed form for proximity-of- +time-window search. 32× compression pushes billion-sample D=128/768 +working sets onto one machine at the tier boundary. + +**Friction.** RaBitQ is a new tier alongside scalar/PQ; determinism +still matters because of the cross-tier coherence story +(`coherence.rs`). **Effort:** medium — codec for `.rbpx`, hook into +`tier_policy`. 800–1200 LoC. **Value:** medium-high. Pairs naturally +with ruLake (different problem, same compression substrate). + +### B4. `ruvector-domain-expansion` — embedding-based domain shift + +`crates/ruvector-domain-expansion/src/lib.rs:90` `DomainExpansionEngine` +exposes `embed(...)` (`lib.rs:199`) and `initiate_transfer(...)` +(`:205`). The kNN-over-domains lookup at transfer time would benefit +from RaBitQ, but domain counts are 10²–10⁴ — the compression win is +modest. The real win is **consistency**: embeddings would gain witness ++ cross-process sharing for free if stored in ruLake. + +**Friction.** The embedding type is `DomainEmbedding`, not raw +`Vec`; light refactor. **Effort:** small (300 LoC) — Tier B not +because of effort but because of value. **Value:** medium-low; this +is consistency hygiene, not load-bearing. + +--- + +## Tier C — Speculative (defer or kill) + +### C1. `ruvector-mincut` — graph cut over vector similarity + +`crates/ruvector-mincut/src/core/`, `sparsify/`, `localkcut/`, +`cluster/` — graph-cut algorithms over edge-weighted graphs. MinCut +operates on edges, not raw vectors. The vector → kNN-graph build step +could feed RaBitQ, but that's an instance of A2/A3, not a separate +consumer. **Verdict:** defer (downstream of A2). + +### C2. `ruvector-cnn` — embedding producer, not indexer + +`crates/ruvector-cnn/src/embedding.rs:122` `MobileNetEmbedder` +produces `Vec` via `extract`. The crate ends at producing the +embedding; consumers do their own indexing. The "integration" is a +one-liner in user code, not a crate change. **Verdict:** kill as a +crate-level integration; add a README example showing the +producer→`RabitqPlusIndex::add` plug. + +### C3. `ruvector-fpga-transformer` — RaBitQ popcount on FPGA + +`crates/ruvector-fpga-transformer/src/lib.rs:86` `Engine` for +transformer inference. RaBitQ's popcount kernel is **the** kernel a +small FPGA can do well — 64-bit XOR + popcount is two LUT levels deep. +A `ruvector-rabitq-fpga` kernel under ADR-157 is a research project, +not a near-term integration. **Verdict:** defer to ADR-157 follow-on. + +### C4. `ruvector-sparsifier` — spectral sparsification + +Same logic as C1: vectors only enter via the kNN-graph build step. +**Verdict:** defer (downstream of A2). + +### C5. `rvagent-a2a` — already integrated by reference + +`crates/rvAgent/rvagent-a2a/src/artifact_types.rs:64` defines +`ArtifactKind::RuLakeWitness { witness, data_ref, capabilities }` — +a by-reference vector handle that travels between agents without +moving bytes (ADR-159 §"Typed artifact semantics"). A2A doesn't carry +RaBitQ codes directly; it carries the witness that resolves to a +ruLake bundle. The integration **is** the witness type. **Verdict:** +no-op; the "witness-as-handle" pattern is already paying off here. + +--- + +## Summary table + +| Tier | Crate | Effort | Value | Notes | +|------|-------|--------|-------|-------| +| A1 | `ruvector-diskann` | small | high | Replace/augment PQ; ADR-154 already named this | +| A2 | `ruvector-graph` | medium-low | high | New `VectorPropertyIndex` | +| A3 | `ruvector-gnn` | small | high | `differentiable_search` rewrite | +| B1 | `ruvector-attention` | large | medium-high | KV cache, asymmetric path | +| B2 | `ruvllm` | large | high | K-cache + RAG via ruLake | +| B3 | `ruvector-temporal-tensor` | medium | medium-high | New temperature tier | +| B4 | `ruvector-domain-expansion` | small | medium-low | Hygiene rather than load-bearing | +| C1 | `ruvector-mincut` | — | low | Downstream of A2 | +| C2 | `ruvector-cnn` | — | none | Pure user-code example | +| C3 | `ruvector-fpga-transformer` | research | speculative | ADR-157 kernel | +| C4 | `ruvector-sparsifier` | — | low | Downstream of A2 | +| C5 | `rvagent-a2a` | — | done | Witness-by-reference shipped | + +12 candidates surveyed. Phase 1 picks A1 + A2 + A3. Phase 2 picks B1 +*or* B2 (one of them, not both — they answer the same question). Phase +3 is the workspace-canonical-compression ADR question (§05). diff --git a/docs/research/rabitq-integration/03-architectural-patterns.md b/docs/research/rabitq-integration/03-architectural-patterns.md new file mode 100644 index 000000000..2797add24 --- /dev/null +++ b/docs/research/rabitq-integration/03-architectural-patterns.md @@ -0,0 +1,289 @@ +# 03 — Architectural Patterns + +Three sane shapes to add a new RaBitQ consumer. Each preserves the +ADR-154 / ADR-155 / ADR-157 invariants and matches a different +consumer profile. The choice is consequential because every shape +implies a different contract about who owns the index, who owns the +witness, and who picks the kernel. + +The goal of this section is to make the choice explicit at integration +time, so we don't accidentally fragment what is currently one +deterministic compression substrate. + +--- + +## Pattern 1 — Direct embed + +The consumer crate adds `ruvector-rabitq` to its `Cargo.toml` and uses +`RabitqPlusIndex` (or any of the four indexes) as a private field of +its own type. The consumer owns the index lifecycle: build, add, +search, persist. + +**Sketch.** + +```toml +# Cargo.toml of consumer +[dependencies] +ruvector-rabitq = { path = "../ruvector-rabitq", version = "2.2" } +``` + +```rust +// inside consumer +use ruvector_rabitq::{AnnIndex, RabitqPlusIndex}; + +pub struct VectorPropertyIndex { + by_property: HashMap, + seed: u64, + rerank_factor: usize, +} + +impl VectorPropertyIndex { + pub fn add_node(&mut self, node_id: NodeId, property: PropertyKey, vector: Vec) { + self.by_property + .entry(property) + .or_insert_with(|| RabitqPlusIndex::new(self.dim, self.seed, self.rerank_factor)) + .add(node_id.0, vector) + .unwrap(); + } + + pub fn knn(&self, property: &PropertyKey, q: &[f32], k: usize) -> Vec { + self.by_property + .get(property) + .map(|idx| idx.search(q, k).unwrap()) + .unwrap_or_default() + .into_iter() + .map(|r| NodeId(r.id)) + .collect() + } +} +``` + +**Best when.** The consumer owns its index lifecycle, doesn't need +witness chaining, and doesn't need to share the index across processes +or backends. Examples in §02: + +- **A1 — `ruvector-diskann`:** the index *is* the consumer's product; + it manages its own SSD-backed storage and its own rebuild policy. + RaBitQ is a backend choice, not a foreign service. +- **A2 — `ruvector-graph`:** the property index is a sub-component of + a graph database that already owns its lifecycle. +- **A3 — `ruvector-gnn`:** the candidate set passed to + `differentiable_search` is owned by the GNN forward pass; building a + fresh `RabitqPlusIndex` per layer is fine for inference and the + index is short-lived. +- **B4 — `ruvector-domain-expansion`:** the embedding store is + internal state, no cross-crate sharing required. + +**What this pattern doesn't give you.** The witness chain. Cross- +process cache sharing. Pluggable kernels (you get whatever ships in +`ruvector-rabitq` proper, which today means `CpuKernel`). + +--- + +## Pattern 2 — Behind the `VectorKernel` trait (ADR-157) + +The consumer registers a `VectorKernel` implementation — typically the +default `CpuKernel`, optionally an SIMD or GPU one — and dispatches +queries through it. The trait shape is at +`crates/ruvector-rabitq/src/kernel.rs:78-126`: + +```rust +pub trait VectorKernel: Send + Sync { + fn id(&self) -> &str; + fn caps(&self) -> KernelCaps; + fn scan(&self, req: ScanRequest<'_>) -> Result; +} +``` + +`ScanRequest` carries a borrowed `&RabitqPlusIndex` plus a query +batch; the consumer (or a coordinator) picks the kernel based on +batch size + dim + determinism requirement. + +**Sketch.** A consumer that wants pluggable backends keeps an +`Arc` field and calls `.scan(...)` in the hot +path: + +```rust +use ruvector_rabitq::{CpuKernel, ScanRequest, VectorKernel}; + +pub struct AcceleratedSearcher { + kernel: Arc, + // … +} + +impl AcceleratedSearcher { + pub fn new() -> Self { + Self { kernel: Arc::new(CpuKernel::new()) } + } + pub fn register_kernel(&mut self, k: Arc) { + // ranked dispatch by caps() + if self.should_prefer(&*k) { self.kernel = k; } + } + pub fn search(&self, idx: &RabitqPlusIndex, queries: &[Vec], k: usize) + -> Result + { + self.kernel.scan(ScanRequest { index: idx, queries, k, rerank_factor: None }) + } +} +``` + +**Best when.** The consumer wants pluggable acceleration but doesn't +need cross-process witness/cache. Examples: + +- **B1 — `ruvector-attention` KV cache:** wants SIMD on server, WASM + SIMD in browser, GPU on a Cognitum box. Same source, different + kernels. The trait was literally designed for this in ADR-157. +- **B2 — `ruvllm`:** if RaBitQ becomes the K-cache compression, + ruvllm picks Metal or CUDA per platform. +- **C3 — `ruvector-fpga-transformer`:** an `RabitqFpgaKernel` + registered at startup, with `caps().min_batch ≥ 1024` so it only + fires on bulk inference. + +**Critical caveat.** The trait is shipped (`src/kernel.rs`) but +**no caller wires it up today** — `ruvector-rulake` references it +only in a doc comment at `lake.rs:595`. The first consumer that +uses Pattern 2 must also write the dispatch policy (ADR-157 +§"Dispatch policy normative") in its own crate; this is *not* free. +Roadmap Phase 2 (§05) is exactly this work. + +--- + +## Pattern 3 — Through `ruLake` + +The consumer doesn't manage a RaBitQ index at all. It delegates to a +`RuLake` instance with a `LocalBackend` (or a remote one) holding the +vectors, and calls `lake.search_one(backend, collection, query, k)`. + +**Sketch.** + +```rust +use ruvector_rulake::{LocalBackend, RuLake}; + +let backend = LocalBackend::with_vectors("agent-mem", "episodic", dim, vecs); +let lake = RuLake::builder() + .register_backend(Arc::new(backend)) + .with_seed(42) + .with_rerank_factor(20) + .build()?; + +let hits = lake.search_one("agent-mem", "episodic", &q, 10)?; +``` + +The consumer gets: + +- 1.02× tax on the cache-hit path (measured — + `crates/ruvector-rulake/BENCHMARK.md`). +- A SHAKE-256 witness chain via `RuLakeBundle` + (`crates/ruvector-rulake/src/bundle.rs`). +- Cross-process cache sharing: two ruLake instances reading the same + bundle reuse one compressed copy + (`crates/ruvector-rulake/src/cache.rs`, + test `two_backends_share_cache_when_witness_matches`). +- `Consistency::{Fresh, Eventual, Frozen}` knob for staleness SLA + (ADR-156). +- Witness-by-reference for cross-agent handoff via + `ArtifactKind::RuLakeWitness` + (`crates/rvAgent/rvagent-a2a/src/artifact_types.rs:64`). + +**Best when.** The consumer wants witness-sealed memory, cross-process +sharing, freshness modes, or zero-copy handoff to other agents. + +- **B2 — `ruvllm` RAG:** any retrieval ruvllm does should sit on a + `RuLake`, not on its own `RabitqPlusIndex` — gets the witness + + freshness modes for free. +- **Any rvAgent subagent:** the agent memory hierarchy from ADR-156 is + literally this pattern. Direct embed would re-implement bundle + + witness; through-ruLake is "the brain on the substrate". +- **A future `ruvector-postgres` extension:** a Postgres function that + returns top-k from a managed lake of vectors — ruLake is the right + shape because the function may run in many backend processes + sharing one cache. + +**What this pattern doesn't give you.** Bare-metal min latency. The +1.02× tax is measured on `LocalBackend`; on a Parquet-on-GCS backend +the cold path is network-bound. Direct embed wins for in-process +single-user workloads where the consumer already has the vectors +materialised. + +--- + +## Mapping §02 candidates to patterns + +| Candidate | Pattern | Why | +|-----------|---------|-----| +| A1 `ruvector-diskann` | **1** direct embed | Owns its index lifecycle; SSD/PQ already custom | +| A2 `ruvector-graph` | **1** direct embed | Sub-index of an existing graph store | +| A3 `ruvector-gnn` | **1** direct embed | Short-lived per-layer index in forward pass | +| B1 `ruvector-attention` | **2** VectorKernel | Needs SIMD/GPU/WASM kernel choice per target | +| B2 `ruvllm` | **3** through ruLake (RAG) + **2** kernel (KV cache) | Two integrations, two patterns | +| B3 `ruvector-temporal-tensor` | **1** direct embed | New temperature tier inside the existing crate | +| B4 `ruvector-domain-expansion` | **3** through ruLake | Already produces witness-shaped outputs | +| C1, C4 (mincut, sparsifier) | downstream of A2 | n/a until A2 lands | +| C2 `ruvector-cnn` | none (user code) | Producers, not indexers | +| C3 `ruvector-fpga-transformer` | **2** VectorKernel | The kernel pattern's poster child | +| C5 `rvagent-a2a` | **3** (already, via witness) | Done | + +Note the split for B2 — ruvllm probably wants both. That's fine; the +patterns compose. + +--- + +## Anti-patterns to refuse + +The following shapes look reasonable in a PR review but each one +breaks an existing ADR invariant or fragments the substrate. None of +them should pass review. + +### Anti-pattern A — re-implementing rotation + +A consumer crate copy-pastes the rotation code from +`crates/ruvector-rabitq/src/rotation.rs` into its own module to "avoid +the dependency". Breaks ADR-154's determinism guarantee — a divergent +copy means `(seed, dim, vectors) → bit-identical codes` no longer holds +across crates. **Always import from `ruvector-rabitq`.** + +### Anti-pattern B — ad-hoc 1-bit compression + +A consumer crate ships its own `pack_bits` function and its own +distance estimator because "we just need a quick binary code". This +re-creates the original `BinaryQuantized` problem ADR-154 §"Measured +gap" was written to fix: ~15–20% recall vs RaBitQ's 40.8%/98.9%. **If +you're doing 1-bit compression of vectors in this workspace, it's +RaBitQ.** + +### Anti-pattern C — exposing `originals_flat` + +A consumer crate's PR widens `RabitqPlusIndex` to expose its private +`originals_flat: Vec` field (`src/index.rs:546`) "for +zero-copy". Breaks the encapsulation that the persist format relies +on (`src/persist.rs:1-18`) — and the persist format is the contract +that lets two processes warm-load each other's bundles. The Python +SDK explicitly works around this at `src/rabitq.rs:35-43` by calling +`export_items()` instead. **Use `export_items()` or extend +`AnnIndex`; do not widen the struct.** + +### Anti-pattern D — fragmenting the witness + +A consumer crate runs RaBitQ to compress vectors and ships them under +a *different* witness scheme (e.g. its own SHA-3 over a private +serialization format). Breaks ADR-155 cross-backend cache sharing and +ADR-159 by-reference artifact handoff. **All compressed-vector +artifacts that traverse process boundaries use ruLake's +`RuLakeBundle` witness or none at all.** + +### Anti-pattern E — RaBitQ everywhere by default + +The mirror of D — adding `ruvector-rabitq` as a default dep on every +crate "because it's available". Adds ~50 KB compiled size and the +rotation tables to every WASM bundle and embedded build. The §04 +performance budget is explicit: only candidate consumers with +demonstrated benefit (Tier A) get the dep on the default build path. +WASM consumers must feature-gate. + +### Anti-pattern F — ignoring the kernel determinism gate + +A consumer crate registers a non-deterministic GPU kernel and +serves Fresh/Frozen consistency from it. Breaks ADR-157 §"Determinism +as a hard gate". **Caps are advisory at compile time but enforced at +dispatch.** The consumer must implement the dispatch filter from +ADR-157, not just `kernels.iter().next()`. diff --git a/docs/research/rabitq-integration/04-cross-cutting-concerns.md b/docs/research/rabitq-integration/04-cross-cutting-concerns.md new file mode 100644 index 000000000..94d9e760a --- /dev/null +++ b/docs/research/rabitq-integration/04-cross-cutting-concerns.md @@ -0,0 +1,230 @@ +# 04 — Cross-Cutting Concerns + +The invariants every new RaBitQ integration must hold. These come from +reading the existing call sites and the ADRs that govern them; if a +new integration breaks any of these, it almost certainly invalidates +ADR-154/155/157 by side effect. + +--- + +## 1. Determinism across architectures + +**The contract.** `(dim, seed, items) → bit-identical rotation matrix ++ packed codes + index build + search output across runs and across +machines.` Stated explicitly at +`crates/ruvector-rabitq/src/persist.rs:14-17` and re-stated at +`crates/ruvector-rabitq/src/lib.rs:34-37`. Tested by +`persist::tests::serialize_roundtrip_preserves_search_results` +(`persist.rs:258`) which compares score bits with `to_bits()` — not a +tolerance compare. + +**Why it matters for new integrations.** ruLake's witness chain +(ADR-155) and cross-backend cache sharing (the +`two_backends_share_cache_when_witness_matches` test) depend on +this. So does the rabitq-by-reference handoff in +`ArtifactKind::RuLakeWitness` +(`crates/rvAgent/rvagent-a2a/src/artifact_types.rs:64`) — agents on +different boxes reading the same witness must compute the same +top-k. + +**The trap.** Floating-point reduction order is not stable across +SIMD widths or GPU lane counts. ADR-157 already calls this out: +the **scan phase** (1-bit popcount) is integer math and trivially +deterministic; the **rerank phase** (exact L2²) is float reduction +and can diverge in the last ulp on GPU. ADR-157's resolution: kernels +that can't guarantee identical rerank set `caps().deterministic = +false`, and the dispatch policy refuses to use them on Fresh/Frozen +paths. + +**Enforcement.** Every integration adds a regression test of the +shape "build same data twice, different threads/seeds-with-same-value, +asserting `to_bits()` match on at least 100 query results". The +existing test at `persist.rs:258` is the model. + +--- + +## 2. Witness format compatibility + +**The contract.** `.rbpx` v1 is the on-disk and on-wire format +(`crates/ruvector-rabitq/src/persist.rs:23-33`). It carries +`(magic, version, dim, seed, rerank_factor, n, items)`. The format is +**deliberately seed-based** rather than field-based — it stores the +*replay inputs*, not the index internals, because the deterministic +build is cheaper to re-run than the rotation matrix is to ship. + +**Why it matters.** Every cross-process integration that wants +witness-sealed memory rides this format. `ruvector-rulake`'s +`save_index`/`load_index` calls (`lake.rs:304,399`) are the only +producer/consumer today, but ADR-159's `RuLakeWitness` artifact (and +its `data_ref` field) implicitly depends on this format being stable. + +**The trap.** A consumer that needs a fielded format (e.g. for a +columnar store like Parquet) will be tempted to widen `.rbpx` v1 with +extra fields. Don't. The right shape is: + +- For a richer container, wrap `.rbpx` inside another format + (e.g. a tar-like bundle that holds `.rbpx` + a sidecar metadata file). +- For a different field set entirely, bump to `.rbpx` v2 in the same + module, with a feature flag, and keep v1 readable. +- Never extend v1 in place. The persist format's `MAGIC` + `VERSION` + bytes (`persist.rs:49-51`) are a contract. + +**Enforcement.** PR review on every change touching `persist.rs`. The +`reject_version_too_new` test (`persist.rs:425`) defends this. + +--- + +## 3. Memory ownership: who holds the codes + +**The lesson from PR #381 (Python SDK).** `RabitqPlusIndex` does not +expose `originals_flat` directly — the field is private at +`crates/ruvector-rabitq/src/index.rs:546`. Consumers that need to +re-export the originals (e.g. for `save_index`) call +`export_items()` (`src/index.rs:589`), which **clones** +`n*dim*sizeof(f32)` bytes. This is documented in +`crates/ruvector-py/src/rabitq.rs:35-43` as a deliberate cost trade. + +**The contract.** Three rules. + +a. The cache (or the consumer's struct) owns the `Arc`. + `ruvector-rulake::cache.rs:213` is the model. + +b. New consumers that need raw vectors call `export_items()`. They do + not get a borrowed slice; the wrapper is intentional. + +c. New consumers that need to *avoid* the export-items copy need to + restructure to keep the source-of-truth `Vec` themselves and + use `RabitqPlusIndex` only for the codes + search. The Python SDK + chose to do the copy; ruLake chose to keep the source-of-truth in + `LocalBackend::PulledBatch`. + +**The trap.** A PR that "adds a `pub fn raw_vector(&self, i: usize) -> +&[f32]` to `RabitqPlusIndex` for performance" — see Anti-pattern C in +§03. Refuse it. If the perf is real, the right move is to widen +`AnnIndex`, not the struct internals. + +--- + +## 4. API stability and version pinning + +**The state.** `ruvector-rabitq` is at `2.2.0` on crates.io +(`Cargo.toml:215` workspace version). Both consumer Cargo.tomls +(`ruvector-rulake/Cargo.toml:16`, `ruvector-py/Cargo.toml:25`) pin via +`path = "../ruvector-rabitq"`. The rulake Cargo.toml also adds a +`version = "2.2"` constraint, the Python SDK doesn't yet — that's +worth normalising. + +**The contract.** New integrations pin `ruvector-rabitq = { path = +"../ruvector-rabitq", version = "^2.2" }` to allow patch + minor +upgrades but block major ones. This is what semver bought: anything +that needs to break the persist format or the index trait surface +becomes a major bump and forces a synchronised upgrade across all +consumers. + +**The trap.** Workspace-only `path` deps without a version constraint +work locally, but the moment the supplier crate publishes a major +version on crates.io and a downstream user pulls +`ruvector-rabitq = "3"` the workspace is silently inconsistent. Add +the version constraint at integration time. + +--- + +## 5. Performance footprint on small targets + +**The numbers.** `ruvector-rabitq`'s `Cargo.toml` deps are `rand`, +`rand_distr`, `rayon`, `serde`, `serde_json`, `thiserror` — small. +But the rotation tables, the cos-LUT, and the binary code paths add +~50 KB to a release WASM bundle (estimated; not yet measured for +ruvector-py wheel). The crate explicitly disables `unsafe` and pulls +no BLAS, which keeps it portable. + +**The contract.** WASM, embedded, and `wasm32-*` consumers must +feature-gate the rabitq dep. The `Cargo.toml` excludes list at +`/home/ruvultra/projects/ruvector/Cargo.toml:1-8` already keeps things +out of `cargo build --workspace` selectively; new WASM consumers +should follow that pattern. + +**The trap.** Adding `ruvector-rabitq` as a default dep on a +hypothetical `ruvector-edge-something` crate, then discovering the +WASM build is 50 KB heavier and the embedded ESP32 build (cf. +`examples/ruvLLM/esp32-flash` excluded list) doesn't link. Feature- +gate before integrating, not after. + +--- + +## 6. Cross-language story + +**The state today.** `ruvector-py` is the only non-Rust consumer +(M1 shipped, commit `e7f5a391f`). Wheel binding via PyO3 + maturin, +ABI3 across Python 3.9..3.13 (`crates/ruvector-py/Cargo.toml:21`). + +**The contract for future bindings (Node, WASM, Java).** + +- Bindings expose **only the `AnnIndex` trait surface plus persist**. + Internal types (`BinaryCode`, `RandomRotation`) stay Rust-only — + exposing them widens the FFI surface beyond what the determinism + contract can survive across language runtimes. + +- Persist roundtrip is the cross-language compatibility test. A `.rbpx` + written by Rust must load identically in Python; a `.rbpx` written + by Python must load identically in Rust. The + `persist::tests::serialize_roundtrip_preserves_search_results` test + is the in-Rust version; the cross-language version is a + cross-runtime test (the Python SDK already does the round-trip in + its test suite, just within Python). + +- WASM bindings inherit the §5 footprint constraint: no rabitq in the + default WASM bundle unless the consumer opts in. + +**The trap.** Each new binding tempted to expose more of the API. The +Python SDK got this right by exposing exactly one class +(`crates/ruvector-py/src/rabitq.rs:36`); future bindings should match. + +--- + +## 7. The `VectorKernel` story is asymmetrical + +**The state.** Trait shipped in `ruvector-rabitq` (`src/kernel.rs`). +One implementation (`CpuKernel`). **Zero callers** that wire dispatch +— only a doc comment at `crates/ruvector-rulake/src/lake.rs:595`. +That's a real gap. + +**The implication for new integrations.** A consumer that uses +Pattern 2 (§03) is **the first non-test caller of `VectorKernel`**. +That consumer must: + +- Implement the dispatch policy from ADR-157 §"Dispatch policy + normative" (preference order, batch-size + dim + determinism filter). +- Decide where to surface kernel identity in stats (the comment in + `src/kernel.rs:23-25` says "kernel identity is surfaced in caps + + stats, not in the witness" — caller's responsibility). +- Write the test that verifies determinism across two registered + kernels on Fresh/Frozen consistency. + +This is real engineering — Phase 2 of §05 explicitly budgets it. A +consumer that thinks it's getting "free GPU" by adopting the trait +is going to be disappointed unless someone has done this work first. + +**The graceful path.** `ruvector-rulake` should be that someone. It +already references the trait in the doc comment; making the dispatch +real in rulake first means every other Pattern-2 consumer inherits a +working pattern and a test harness. + +--- + +## 8. The witness chain is anchored on data, not on kernels + +**Restated from ADR-157 §"Determinism as a hard gate":** the +witness is computed over `(data_ref, dim, rotation_seed, +rerank_factor, generation)`. Kernel identity is **not** in the +witness — kernels are execution substrate. + +**The contract for new integrations.** A consumer that adds a new +kernel does *not* invalidate any existing witness. A consumer that +changes the rotation seed, the rerank factor, or the data does. New +integrations must not couple kernel selection to data identity — that +includes "use a different rotation seed for the GPU path because it +benchmarks better at that seed", which is a ruled-out direction. + +This is what makes Phase 2's GPU work safe: a CUDA kernel that ships +later does not break already-published bundles. diff --git a/docs/research/rabitq-integration/05-roadmap.md b/docs/research/rabitq-integration/05-roadmap.md new file mode 100644 index 000000000..b401754d8 --- /dev/null +++ b/docs/research/rabitq-integration/05-roadmap.md @@ -0,0 +1,238 @@ +# 05 — Roadmap + +Three phases. Each picks a coherent slice of the §02 candidate list, +specifies the files to touch, an acceptance test, and an LoC budget. +Each phase is sized to ~3–6 engineer-weeks. Phases are independent — +Phase 2 doesn't block on Phase 1 except where noted. + +--- + +## Phase 1 — Low-hanging integrations (3 candidates, 4–5 weeks) + +Pick the three Tier-A candidates from §02. They share three desirable +properties: + +- All use Pattern 1 (direct embed) — no new infrastructure required. +- All can pin the same major version of `ruvector-rabitq` (`^2.2`). +- All have the consumer code already structured around vectors, so + the integration is *adding a new index path*, not redesigning a + hot loop. + +### P1.A — `ruvector-diskann` RaBitQ backend + +**Files to touch:** + +- `crates/ruvector-diskann/Cargo.toml` — add `ruvector-rabitq = { + path = "../ruvector-rabitq", version = "^2.2" }`. +- `crates/ruvector-diskann/src/index.rs` — add a `Backend` enum + alongside the existing PQ path (`pq.rs:14`). New variant + `Backend::Rabitq { plus: RabitqPlusIndex, seed: u64 }`. Constructor + `DiskAnnIndex::new_rabitq(config, seed, rerank_factor)`. +- `crates/ruvector-diskann/src/index.rs:169` `search` — branch on + backend; RaBitQ path calls `RabitqPlusIndex::search_with_rerank`. +- `crates/ruvector-diskann/src/index.rs:219,297` `save`/`load` — + delegate to `ruvector_rabitq::persist::save_index/load_index` + on the rabitq path. + +**Acceptance test:** on the same dataset (Gaussian-clustered D=128 +n=100k, the one in `crates/ruvector-rabitq/src/main.rs`), the rabitq +path achieves recall@10 ≥ 95% at QPS ≥ 2× the existing PQ path. New +file `crates/ruvector-diskann/tests/rabitq_backend_smoke.rs`. + +**LoC budget:** ≤500 LoC source + ≤200 LoC tests. + +### P1.B — `ruvector-graph` `VectorPropertyIndex` + +**Files to touch:** + +- `crates/ruvector-graph/Cargo.toml` — add the rabitq dep. +- `crates/ruvector-graph/src/index.rs` — new `VectorPropertyIndex` + struct alongside `LabelIndex` (`:15`), `PropertyIndex` (`:79`), + `EdgeTypeIndex` (`:180`), `AdjacencyIndex` (`:240`). Same + lifecycle methods (`new`, `add_node`, `remove_node`, plus a new + `knn(&self, property, query, k) -> Vec`). +- `crates/ruvector-graph/src/node.rs` — extend `Node` to carry + optional vector-typed properties; or a side-table indexed by + `NodeId`. +- New `crates/ruvector-graph/src/index.rs` regression: build a graph + with 10k nodes carrying 128-dim embeddings, query top-k, assert + recall ≥ 90% vs brute-force cosine. + +**Acceptance test:** insert 10k node embeddings, run 100 queries, +recall@10 ≥ 90% vs an in-test brute-force cosine baseline; round-trip +the index to a `.rbpx` file via the new `save_property_index_rabitq` +and reload bit-identically. + +**LoC budget:** ≤600 LoC source + ≤250 LoC tests. + +### P1.C — `ruvector-gnn` `differentiable_search` + +**Files to touch:** + +- `crates/ruvector-gnn/Cargo.toml` — add the rabitq dep behind a + default-on feature `rabitq` so the WASM build can opt out. +- `crates/ruvector-gnn/src/search.rs:56` `differentiable_search` — + add a sibling `differentiable_search_rabitq(query, & + RabitqPlusIndex, top_k, temperature)`. Top-k via + `search_with_rerank`, softmax weights from the rerank f32 scores so + gradients stay meaningful. +- `crates/ruvector-gnn/src/search.rs:105` `hierarchical_forward` — + parameterise so callers can pass a per-layer + `&RabitqPlusIndex` instead of a `&[Vec]`. + +**Acceptance test:** on the existing `test_differentiable_search` +(`search.rs:204`), the rabitq path returns the same top-k ids and +softmax weights within 1e-3 vs the reference cosine path on D=128 +n=10k. Also a new throughput micro-bench showing ≥ 2× QPS. + +**LoC budget:** ≤300 LoC source + ≤150 LoC tests. + +### Phase 1 acceptance gate + +All three candidates merged with green tests, no regressions on +existing crate suites, no new clippy warnings. Total: 3 PRs, 4–5 +engineer-weeks, ~1400 LoC source + 600 LoC tests. + +### Phase 1 milestones + +- **Week 1–2.** P1.A (DiskANN backend). Has the highest §02 strategic + value and the sharpest existing call site for "where would 32× + earn its keep" — ADR-154 already named it. +- **Week 2–3.** P1.B (graph VectorPropertyIndex). Independent of P1.A. +- **Week 3–4.** P1.C (GNN). Smallest, lands last. +- **Week 5.** Documentation pass: a §"Choosing a pattern" page added + to each consumer's README citing §03; bench summary in + `crates/ruvector-rabitq/BENCHMARK.md` extended with the three new + call sites' numbers. + +--- + +## Phase 2 — Make `VectorKernel` real (~4–6 weeks) + +The trait at `crates/ruvector-rabitq/src/kernel.rs` is shipped but has +**zero non-test callers**. Phase 2 changes that by wiring two kernels +to two consumers — exactly the minimum to prove the dispatch policy +isn't paper. + +### P2.A — Wire `VectorKernel` dispatch into ruLake + +**Files to touch:** + +- `crates/ruvector-rulake/src/lake.rs:590-630` — replace the doc-only + `// plug-point` with a real `register_kernel(Arc)` + method, a `kernels: Vec>` field, and the + dispatch policy from ADR-157. Used inside `search_batch` (already + has the right shape per its doc comment at `:595`). +- `crates/ruvector-rulake/src/cache.rs:833` — re-route the batch scan + through the dispatcher. +- `crates/ruvector-rulake/tests/` — new `kernel_dispatch.rs` testing: + (a) default kernel is `CpuKernel`; (b) registered determinism-false + kernel is filtered on `Consistency::Frozen`; (c) batch_size < + caps().min_batch is filtered. + +**Acceptance test:** `RuLake::cache_stats()` exposes which kernel +served the last query (or last batch). Witness output is +bit-identical regardless of which deterministic kernel served. + +### P2.B — Ship a portable SIMD `CpuSimdKernel` + +**Files to touch:** + +- `crates/ruvector-rabitq/src/kernel.rs` — add `CpuSimdKernel` behind + feature flag `simd`. Implementation uses `std::simd` (when stable) + or a `target_feature(enable = "avx2,popcnt")` portable path + otherwise; falls back to scalar via the existing `CpuKernel` if + detection fails. +- `crates/ruvector-rabitq/Cargo.toml` — add the `simd` feature. + +**Acceptance test:** on the same Gaussian D=128 n=100k bench from +`crates/ruvector-rabitq/src/main.rs`, the SIMD kernel achieves ≥ 1.5× +QPS vs `CpuKernel` at bit-identical scan output (per the ADR-157 hard +gate). + +### P2.C — Connect a second consumer + +Pick **one** of B1 (`ruvector-attention` KV cache) or C3 +(`ruvector-fpga-transformer`) for the second `VectorKernel` consumer. +This is the smaller-effort half — the dispatch is already real in +ruLake, the second consumer just adopts the same pattern. + +Likely B1 because the kernel surface there is the closest match to the +existing rabitq hot path (asymmetric scan over a K-cache). + +**Acceptance test:** B1's KV cache, with `RabitqAsymIndex` behind +`VectorKernel` dispatch, demonstrates the same bit-identical output on +CPU vs SIMD, and ships a benchmark showing the ratio. + +### Phase 2 acceptance gate + +Two consumers using `VectorKernel`, two kernels available +(`CpuKernel` + `CpuSimdKernel`). A first GPU kernel can land in +**Phase 2.5** as a separate `ruvector-rabitq-cuda` crate that passes +the ADR-157 acceptance gate (2× p95 OR 30% cost). Phase 2 itself does +not commit to GPU. + +### Phase 2 LoC budget + +~600 LoC ruLake dispatch + 800 LoC SIMD kernel + 400 LoC second- +consumer adoption + 500 LoC tests = ~2300 LoC across two crates and +one new feature. + +--- + +## Phase 3 — Cross-cutting story (1–2 ADRs, no code commitment) + +Phase 3 is a research-not-code phase. It commits the question +"should RaBitQ be the workspace's canonical vector compression +substrate?" and produces an ADR that either says yes (and lists the +consequences) or no (and lists the alternatives). + +### P3.A — Draft ADR-160 "RaBitQ as the workspace's canonical 1-bit compression" + +The ADR would say: + +- All workspace crates that ship 1-bit binary vector compression use + `ruvector-rabitq`. Re-implementations are PR-blocked (Anti-pattern + A from §03). +- 4-bit, 8-bit, and PQ tiers are **not** subsumed — RaBitQ is the + canonical *1-bit* path; ADR-001's tiered scheme stays for higher + bitwidths. +- A migration plan for `ruvector-core::quantization::BinaryQuantized` + (the original 15–20% recall path called out in ADR-154 + §"Measured gap"): deprecate, then delete, then point to RaBitQ. +- Cross-cutting impact on `ruvector-graph`, `ruvector-gnn`, + `ruvector-attention`, `ruvector-temporal-tensor`, `ruvllm` — each + named, each with its preferred §03 pattern, each with effort + estimate. + +### P3.B — Optionally, ADR-161 "Memory-substrate consolidation around ruLake" + +Strictly downstream of ADR-156. If multiple new consumers (B2, B4, +plus future agent crates) end up sitting on `RuLake`, this ADR commits +that pattern: the agent-memory hierarchy, the ruvllm RAG cache, and +the rvAgent witness handoff are one substrate, not three. + +### Phase 3 acceptance + +ADR-160 in `docs/adr/` with status "Proposed", reviewed by maintainers +of the named consumer crates, and a one-page consequences section +rolled into the relevant crates' `Cargo.toml` comments. No code +changes — Phase 3 is the *decision* phase that drives Phases 4+ on the +quarterly roadmap. + +### Phase 3 effort + +~2 engineer-weeks split across writing + review. ADR-class work, not +implementation. + +--- + +## Total roadmap effort + +- **Phase 1:** 4–5 engineer-weeks, ~2000 LoC. +- **Phase 2:** 4–6 engineer-weeks, ~2300 LoC. +- **Phase 3:** ~2 engineer-weeks (docs). + +**Total: ~10–13 engineer-weeks** to land three new consumers, make +the kernel trait load-bearing, and lock the workspace position. This +fits inside one quarter for a single engineer or 6 weeks for two. diff --git a/docs/research/rabitq-integration/06-decision-record.md b/docs/research/rabitq-integration/06-decision-record.md new file mode 100644 index 000000000..b30b16048 --- /dev/null +++ b/docs/research/rabitq-integration/06-decision-record.md @@ -0,0 +1,107 @@ +# 06 — Decision Record + +## The sharpest insight from the research + +**The `VectorKernel` trait is shipped, the `CpuKernel` exists, and the +dispatch policy from ADR-157 is already specified — but no caller in +the workspace wires it up.** The only reference is a doc comment at +`crates/ruvector-rulake/src/lake.rs:595`. This means every consumer +that thinks it's getting "free pluggable acceleration" by adopting the +trait would actually be the **first non-test caller**, and would have +to implement the dispatch policy itself. + +The implication is non-obvious: Pattern 2 (§03) is currently more +expensive than Pattern 1 because there is no working dispatch +implementation to copy. The right fix is to wire dispatch into ruLake +*first* (Phase 2.A in §05), making it the canonical reference, then +let other Pattern-2 consumers inherit the pattern. Otherwise we'll +end up with two consumers each writing their own divergent dispatch +policies and quietly breaking the determinism gate from ADR-157 +§"Determinism as a hard gate". + +This finding shifts the recommendation: don't start a Pattern-2 +integration in any new crate until ruLake's `register_kernel` is real. +The §05 phase ordering is built around that. + +--- + +## Top 3 integrations to start now + +1. **`ruvector-diskann` RaBitQ backend** (§02 A1; §05 P1.A). ADR-154 + already named this as the next step; the consumer code is shaped + right; PQ replacement is a controlled scope; ≤500 LoC. Strategic + value: closes the "billion-scale on disk + DRAM" pitch. + +2. **`ruvector-graph` `VectorPropertyIndex`** (§02 A2; §05 P1.B). + Unblocks vector-keyed property lookup that the graph-transformer + and GNN consumers want; pairs naturally with #3; ≤600 LoC. + +3. **`ruvector-gnn` `differentiable_search`** (§02 A3; §05 P1.C). The + smallest of the three by LoC, the highest QPS multiplier of the + three by §02's analysis, and complements #2 directly. ≤300 LoC. + +All three use Pattern 1 (direct embed); all three pin +`ruvector-rabitq = "^2.2"`; all three avoid the §03 anti-patterns. + +--- + +## One thing we should refuse + +**Don't build per-consumer 1-bit compression.** A PR that adds an +`ad-hoc binary code` module to any workspace crate — most likely +under the rationale "we just need a quick binary path before ruLake +is ready" — re-creates the original `BinaryQuantized` failure mode +that ADR-154 was specifically written to retire (15–20% recall vs +RaBitQ's 40.8% no-rerank / 98.9% rerank×5 on the same dataset, per +the §"Measured gap" comparison). + +The cost of refusing is real (some consumers will wait one quarter +for ADR-160 / Phase 3 before getting their dep wired). The cost of +allowing is permanent: a fragmented compression substrate where the +witness chain (ADR-155) and the kernel-dispatch determinism contract +(ADR-157) both stop holding across crate boundaries. + +If a consumer genuinely cannot wait, they get Pattern 1 (direct embed +of `ruvector-rabitq`) — not their own fork. + +--- + +## Open questions for stakeholders + +1. **Do we commit to Phase 2 (`VectorKernel` real in ruLake) before + Phase 1 (three new direct-embed consumers) finishes?** Phase 1 + produces no Pattern-2 consumers; Phase 2 has one (ruLake) plus + one other. Sequencing them concurrently is fine if there are two + engineers; sequentially Phase 1 first is the safer single-engineer + path because the §02 candidates with the highest §"Strategic + value" (A1, A2, A3) all happen to be Pattern 1. + +2. **Does ADR-160 (Phase 3.A) need to land before B2 (`ruvllm` → + ruLake)?** The ruvllm KV cache + RAG integration is the largest + single ROI in §02 but also the one most disrupted by getting + the substrate question wrong. If ADR-160 says "ruLake is the + canonical retrieval cache", B2 is straightforward; if it says "no + canonical cache, choose per consumer", B2 becomes a multi-week + design conversation. + +3. **Should `ruvector-rabitq` ship a portable SIMD kernel as part of + the default build, or behind a feature flag?** §05 P2.B sets it + behind `simd`. Default-on simplifies dispatch (every CPU caller + just gets the SIMD path) at the cost of the WASM/embedded + footprint (§04 §5). The WASM consumers don't yet exist, so + default-on is plausible — but reversing it later is a SemVer + minor bump. + +4. **Does Phase 2's second consumer choice between B1 + (`ruvector-attention`) and C3 (`ruvector-fpga-transformer`) matter + strategically?** B1 is the realistic near-term win; C3 is the + research-mode showcase. Recommendation: B1 in Phase 2; C3 lands + only if a customer asks for FPGA inference. + +5. **Is there a customer pressure to ship a Node.js / WASM binding + parallel to the Python SDK (M1)?** None of §02 surveys this + directly. ruvector-py shipping in PR #381 is a precedent that + establishes the binding pattern; replicating it for Node and + WASM is mostly mechanical *if* §04's cross-language contract is + followed. Estimate: ~2 engineer-weeks per binding once §05 Phase + 1 has landed. diff --git a/docs/research/rabitq-integration/INDEX.md b/docs/research/rabitq-integration/INDEX.md new file mode 100644 index 000000000..789e02cef --- /dev/null +++ b/docs/research/rabitq-integration/INDEX.md @@ -0,0 +1,51 @@ +# RaBitQ Integration — Research Index + +This directory surveys how `ruvector-rabitq` (the rotation-based 1-bit +quantizer published as crate `2.2.0`) is wired into the RuVector +workspace today, and where else it could plausibly land. Output is a +focused review, not a brain dump — read it in order. + +- [`01-current-integration.md`](01-current-integration.md) — every + call site that imports `ruvector_rabitq` today, with `crate:file:line` + references. Establishes the baseline of three real consumers + (`ruvector-rulake`, `ruvector-py`, the rabitq demo bin) and surfaces + what is shipped vs. scaffolded inside the rabitq crate itself + (notably `VectorKernel` exists; only `CpuKernel` implements it). + +- [`02-integration-opportunities.md`](02-integration-opportunities.md) — + candidate consumer crates, ranked by strategic value × engineering + effort. For each: what they store, where similarity matters in the + hot path, what 32× compression buys, the friction (typing, + determinism, witness propagation), and an honest tier + classification (now / mid-term / defer / kill). + +- [`03-architectural-patterns.md`](03-architectural-patterns.md) — the + three sane shapes for adding a new consumer: direct-embed, behind + the `VectorKernel` trait, or through ruLake. Maps each candidate + from §02 to its preferred pattern, and calls out the anti-patterns + (re-implementing rotation, ad-hoc compression, witness fragmentation) + that would silently break the existing ADRs. + +- [`04-cross-cutting-concerns.md`](04-cross-cutting-concerns.md) — + invariants every new integration must hold: determinism across + architectures, witness format compatibility, memory ownership, + API-version pinning, performance footprint on WASM/edge, cross- + language story. The `originals_flat`-encapsulation lesson from the + Python SDK PR is recorded as a load-bearing constraint. + +- [`05-roadmap.md`](05-roadmap.md) — three phases, each with scope, + files to touch, acceptance test, and LoC budget. Phase 1 picks the + three top-of-bucket integrations from §02. Phase 2 makes the + `VectorKernel` trait load-bearing for two consumers across two + hardware targets. Phase 3 is the optional ADR-class question of + whether RaBitQ should be the workspace's canonical compression. + +- [`06-decision-record.md`](06-decision-record.md) — one page. The + single sharpest insight from this research, the three integrations + to start now, the one path we should refuse, and the open + questions for stakeholders. + +All references are to absolute paths under +`/home/ruvultra/projects/ruvector/`. Numbers cited (957 QPS, 32×, +1.02× tax, etc.) trace back to `crates/ruvector-rabitq/BENCHMARK.md` +and `crates/ruvector-rulake/BENCHMARK.md`.