From 454bba8a19252f3e2ff569266538fad7c754d1b4 Mon Sep 17 00:00:00 2001
From: ruvnet <ruvnet@gmail.com>
Date: Sat, 25 Apr 2026 20:55:37 -0400
Subject: [PATCH] docs(research): deep review of RaBitQ integration paths into
 ruvector
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Seven-file research at docs/research/rabitq-integration/ surveying
where RaBitQ (ADR-154, crates.io v2.2.0) is consumed today, where
else it could go, and what architectural pattern each candidate
should use.

## Top 3 integration recommendations

1. **ruvector-diskann RaBitQ backend** — replace/augment the PQ
   quantizer with `RabitqPlusIndex` (≤500 LoC). ADR-154 already
   named DiskANN as a target consumer; the spot is open.

2. **ruvector-graph `VectorPropertyIndex`** — vector-keyed property
   lookup for graph nodes via RaBitQ codes alongside the property
   table (≤600 LoC). Unlocks "find nodes whose embedding is closest
   to query" without a separate index crate.

3. **ruvector-gnn `differentiable_search` rewrite** — replace the
   cosine fan-out at `differentiable_search.rs` with
   `RabitqPlusIndex::search_with_rerank` (≤300 LoC). Keeps the
   gradient path; collapses memory by 32×.

## Key nuance discovered

The `VectorKernel` trait + `CpuKernel` shipped at
`crates/ruvector-rabitq/src/kernel.rs:78` and ADR-157's dispatch
policy is fully specified — but **no caller wires it up**. The
only reference is a doc comment at `crates/ruvector-rulake/src/lake.rs:595`.

Any new consumer choosing Pattern 2 (the trait dispatch route)
would be the first non-test caller and would have to implement
dispatch from scratch — almost certainly diverging from
ADR-157's determinism gate. This forced an ordering decision:
**ruLake must implement `register_kernel` first**; Phase 1 below
stays Pattern 1 (direct embed) only.

## Phased roadmap

- **Phase 1 (4–5 wk):** the 3 high-value Pattern-1 integrations
  above. All direct-embed; no trait dispatch yet.
- **Phase 2 (4–6 wk):** ruLake wires `register_kernel`; CpuKernel
  + at least one new kernel (CPU-SIMD or WASM) become real;
  ≥2 consumers route through the trait.
- **Phase 3 (~2 wk):** propose new ADR ("RaBitQ as ruvector's
  canonical vector compression substrate") and catalog what
  ruvector-graph / -gnn / -attention need to share one
  compression layer.

Total: ~10–13 engineer-weeks.

## What this is NOT

- Not implementation. No Rust code in this PR — just markdown.
- Not an ADR. Phase 3 may produce one; this is the research that
  precedes it.
- Not a binding decision. Each integration in §02 is annotated
  with effort + value so the team can re-prioritize.

## File breakdown

  INDEX.md                         51 LoC
  01-current-integration.md       134 LoC  (call sites today)
  02-integration-opportunities.md 300 LoC  (15 candidates surveyed)
  03-architectural-patterns.md    289 LoC  (3 patterns + anti-patterns)
  04-cross-cutting-concerns.md    230 LoC  (determinism, witness, perf)
  05-roadmap.md                   238 LoC  (3 phases, milestones)
  06-decision-record.md           107 LoC  (1-page call to action)

Refs: ADR-154 (RaBitQ), ADR-155 (ruLake), ADR-157 (accelerator
plane), PR #380 (ADR-159 + workspace cleanup), PR #381 (Python
SDK M1).

Co-Authored-By: claude-flow <ruv@ruv.net>
---
 .../01-current-integration.md                 | 134 ++++++++
 .../02-integration-opportunities.md           | 300 ++++++++++++++++++
 .../03-architectural-patterns.md              | 289 +++++++++++++++++
 .../04-cross-cutting-concerns.md              | 230 ++++++++++++++
 .../research/rabitq-integration/05-roadmap.md | 238 ++++++++++++++
 .../rabitq-integration/06-decision-record.md  | 107 +++++++
 docs/research/rabitq-integration/INDEX.md     |  51 +++
 7 files changed, 1349 insertions(+)
 create mode 100644 docs/research/rabitq-integration/01-current-integration.md
 create mode 100644 docs/research/rabitq-integration/02-integration-opportunities.md
 create mode 100644 docs/research/rabitq-integration/03-architectural-patterns.md
 create mode 100644 docs/research/rabitq-integration/04-cross-cutting-concerns.md
 create mode 100644 docs/research/rabitq-integration/05-roadmap.md
 create mode 100644 docs/research/rabitq-integration/06-decision-record.md
 create mode 100644 docs/research/rabitq-integration/INDEX.md

diff --git a/docs/research/rabitq-integration/01-current-integration.md b/docs/research/rabitq-integration/01-current-integration.md
new file mode 100644
index 000000000..ff2809520
--- /dev/null
+++ b/docs/research/rabitq-integration/01-current-integration.md
@@ -0,0 +1,134 @@
+# 01 — Current RaBitQ Integration in RuVector
+
+## What `ruvector-rabitq` ships (the supplier)
+
+Crate `ruvector-rabitq` 2.2.0 (workspace version, `Cargo.toml:215`) lives
+at `crates/ruvector-rabitq/` and exports four pieces from
+`crates/ruvector-rabitq/src/lib.rs:45-59`:
+
+| Item | Source | Status |
+|------|--------|--------|
+| `FlatF32Index`, `RabitqIndex`, `RabitqAsymIndex`, `RabitqPlusIndex` | `src/index.rs` | shipped, all four behind `AnnIndex` |
+| `BinaryCode`, `pack_bits`, `unpack_bits` | `src/quantize.rs` | shipped |
+| `RandomRotation`, `RandomRotationKind` | `src/rotation.rs` | shipped (Haar + Hadamard-signed) |
+| `persist::save_index` / `load_index` (`.rbpx` v1) | `src/persist.rs:118,187` | shipped, deterministic seed-based |
+| `VectorKernel`, `KernelCaps`, `ScanRequest`, `ScanResponse`, `CpuKernel` | `src/kernel.rs:78-126` | **trait shipped, only `CpuKernel` implements it** |
+
+The "shipped vs. scaffolded" map for the kernel surface is critical:
+the trait is ready and a default kernel exists, but the dispatch lives
+in **no caller** today (see ruLake gap below).
+
+## Real consumers in the workspace
+
+Three call sites import `ruvector_rabitq`. They are the universe of
+integration as of HEAD.
+
+### 1. `ruvector-rulake` — the showpiece
+
+`crates/ruvector-rulake/Cargo.toml:16` pins
+`ruvector-rabitq = { path = "../ruvector-rabitq", version = "2.2" }`.
+The crate is the only one in the tree that already exercises every
+public surface of rabitq:
+
+| Surface | Used in | Lines |
+|---------|---------|-------|
+| `RabitqPlusIndex::from_vectors_parallel` (build) | `crates/ruvector-rulake/src/cache.rs:402` | rayon-parallel rotate+pack on cache prime |
+| `RabitqPlusIndex::new` + `add` (incremental) | `crates/ruvector-rulake/src/cache.rs:409` | small-batch path |
+| `Arc<RabitqPlusIndex>` cache slot | `crates/ruvector-rulake/src/cache.rs:213,488,499,667` | concurrency story (see ADR-155 §"Arc-concurrency 12×") |
+| `AnnIndex::search` / `search_with_rerank` | `crates/ruvector-rulake/src/cache.rs:708,833` | hot path |
+| `persist::save_index` / `load_index` (`.rbpx`) | `crates/ruvector-rulake/src/lake.rs:304,399` | bundle warm/freeze |
+| `RabitqError` `From` conversion | `crates/ruvector-rulake/src/error.rs:17-18` | error propagation |
+| `RandomRotationKind::HadamardSigned` | `crates/ruvector-rulake/benches/*` (per BENCHMARK.md) | rotation-flavor toggle |
+
+Total: **15 references** across `cache.rs`, `lake.rs`, `error.rs`, the
+demo bin, and the federation smoke test (count from
+`grep -n rabitq crates/ruvector-rulake/src/{lib,cache,lake}.rs`).
+
+ruLake exposes ruvector-rabitq's contract under three witness modes
+(`Consistency::{Fresh, Eventual, Frozen}` — `lake.rs`, ADR-155). The
+measured intermediary tax on a cache hit is **1.02× direct
+`RabitqPlusIndex::search`** (`crates/ruvector-rulake/BENCHMARK.md` and
+ADR-157 §Context). This is the cost ceiling against which every other
+integration should be measured.
+
+**Gap: `VectorKernel` is referenced but not wired.** `lake.rs:595` is
+literally a doc comment "this is also the plug-point for the future
+`VectorKernel` trait (ADR-157)". `register_kernel` does not exist as a
+method in `crates/ruvector-rulake/src/lake.rs`. The README confirms
+under "M2+ on the roadmap":
+`crates/ruvector-rulake/README.md:507` — `VectorKernel` trait
+scaffolding (M1, done) → `crates/ruvector-rulake/README.md:515` — GPU
+kernels in separate crates (M2+, deferred). The dispatch policy from
+ADR-157 has no caller.
+
+### 2. `ruvector-py` — the third major consumer (PR #381 / commit `e7f5a391f`)
+
+`crates/ruvector-py/Cargo.toml:25` pins `ruvector-rabitq = { path =
+"../ruvector-rabitq" }` and exposes a single `RabitqIndex` PyO3 class
+backed by `RabitqPlusIndex`. Surface used:
+
+| Surface | Used in | Lines |
+|---------|---------|-------|
+| `RabitqPlusIndex::from_vectors_parallel` (with GIL release) | `crates/ruvector-py/src/rabitq.rs:118` | `py.allow_threads` wraps the rotate+pack |
+| `AnnIndex::search_with_rerank` | `crates/ruvector-py/src/rabitq.rs:154` | per-call rerank override |
+| `RabitqPlusIndex::export_items` | `crates/ruvector-py/src/rabitq.rs` (in `save`) | replay-source recovery |
+| `persist::save_index` / `load_index` | `crates/ruvector-py/src/rabitq.rs:198` | NumPy-friendly disk roundtrip |
+| `RabitqError → PyErr` | `crates/ruvector-py/src/error.rs:25` | typed Python error |
+
+This consumer's lesson, recorded directly in the source comment at
+`src/rabitq.rs:35-43`: *RaBitQ does not expose `originals_flat`
+directly; the wrapper must call `export_items()` to re-materialise the
+items vector for `save_index`.* This drives the §04 design rule.
+
+### 3. The rabitq demo binary
+
+`crates/ruvector-rabitq/src/main.rs:28-29` imports every index variant
+(`FlatF32Index`, `RabitqAsymIndex`, `RabitqIndex`, `RabitqPlusIndex`)
+and benches them on clustered Gaussian data. This is internal
+benchmarking, not an integration in the workspace sense, but it's the
+canonical place to read all four indexes used together.
+
+## The integration map at HEAD
+
+```
+            consumers                   supplier
+            ─────────                   ────────
+  ruvector-rulake   ────────►  ┌────────────────────────────┐
+    (cache, lake,              │  ruvector-rabitq 2.2.0     │
+     bundle, witness)          │                            │
+                               │  - RabitqPlusIndex (build, │
+  ruvector-py       ────────►  │    add, search, persist)   │
+    (Python wheel,             │  - VectorKernel trait      │
+     M1)                       │  - CpuKernel only          │
+                               │                            │
+  rabitq-demo       ────────►  │  - rotation, pack/unpack   │
+    (internal bench)           └────────────────────────────┘
+```
+
+Every other crate in the workspace **does not** depend on
+`ruvector-rabitq`. The 126 other crates listed under `crates/` are
+empty space from rabitq's perspective. That gap is what §02 surveys.
+
+## Three properties every existing consumer relies on
+
+These show up in the source comments of all three call sites and they
+are the load-bearing API contract:
+
+1. **Determinism across processes.** `(dim, seed, items) →
+   bit-identical index` (`crates/ruvector-rabitq/src/persist.rs:14-17`,
+   re-cited in `ruvector-rulake::cache::CacheEntry` and the
+   roundtrip-preserves-search-results test at
+   `persist.rs:258-318`). ruLake's witness chain (ADR-155) and
+   cross-backend cache sharing depend on this.
+2. **Encapsulation: no exposed `originals_flat`.** Consumers that need
+   raw vectors call `export_items()` (`src/index.rs:589`) — the field
+   itself is private (`src/index.rs:546`). Both rulake and the Python
+   SDK live with this; new consumers must too.
+3. **`AnnIndex` is the only stable trait.** `RabitqPlusIndex::search`,
+   `search_with_rerank`, `len`, `dim`, `external_ids`, `ids_u64` —
+   these are the public hot-path surface. Internals (`originals_flat`,
+   `last_word_mask`, `cos_lut`) are private and the persist format
+   exists precisely to avoid widening that encapsulation
+   (`crates/ruvector-rabitq/src/persist.rs:1-18`).
+
+These three are what §04 elaborates as "must not break".
diff --git a/docs/research/rabitq-integration/02-integration-opportunities.md b/docs/research/rabitq-integration/02-integration-opportunities.md
new file mode 100644
index 000000000..915917d91
--- /dev/null
+++ b/docs/research/rabitq-integration/02-integration-opportunities.md
@@ -0,0 +1,300 @@
+# 02 — Integration Opportunities
+
+For each candidate consumer crate this section answers: what it
+stores, where similarity matters, what 32× compression buys, the
+friction, the effort, and the strategic value. Candidates are
+clustered by the (value × effort) quadrant they fall into so a
+roadmap can pick from the top of the list without re-deriving the
+trade-offs.
+
+The numbers behind "32× compression" are from
+`crates/ruvector-rabitq/BENCHMARK.md`: at D=128 n=100k, RabitqPlus
+with rerank×20 holds **100% recall@10** at **957 QPS** vs
+FlatF32's 306 QPS, with 53.5 MB vs 50.4 MB total memory **including
+the originals reranked from**. Strip rerank (RabitqIndex, no rerank)
+and the codes alone are 2.4 MB vs 50.4 MB — that's the **17.5×–32×
+compression** number cited in ADR-154.
+
+---
+
+## Tier A — High-value, low-effort (do first)
+
+### A1. `ruvector-diskann` — replace or sit alongside the PQ quantizer
+
+**What it stores.** `crates/ruvector-diskann/src/index.rs:57`
+`DiskAnnIndex` keeps a Vamana graph plus per-vector PQ codes via
+`crates/ruvector-diskann/src/pq.rs:14` (`ProductQuantizer` with
+k-means codebooks). Insert (`index.rs:98`), batch insert
+(`index.rs:118`), and search (`index.rs:169`) are all vector-keyed.
+
+**Hot-path similarity.** Beam search inside Vamana scores candidates
+via `pq::distance_with_table` (`pq.rs:220`). The PQ table is built
+per query (`pq.rs:194`).
+
+**What 32× buys.** ADR-154 §"Integration path" already calls this
+shot: *use BinaryCode for the in-memory candidate list during beam
+search; full vectors stay on SSD; binary codes in DRAM for filtering*.
+RaBitQ's popcount kernel is faster than the table-lookup PQ inner
+loop (`O(D/64)` vs `O(M)` with cache-bound LUT) and ships
+deterministic codes — k-means PQ is non-deterministic across runs.
+
+**Friction.** PQ has a `train` step (`pq.rs:46`) RaBitQ doesn't —
+RaBitQ's "training" is a single rotation matrix from a seed, so the
+DiskAnnIndex API can shed `train(...)` entirely on the rabitq path.
+The on-disk format (`save`/`load` at `index.rs:219,297`) currently
+serialises PQ codebooks; would need a parallel `.rbpx` slot or a tag
+discriminating the two encodings.
+
+**Effort.** Small — one new module, one feature flag (or a constructor
+variant `DiskAnnIndex::new_rabitq(seed, rerank)`), and a code path in
+beam search. ≤500 LoC.
+
+**Strategic value.** **High.** DiskANN is the SSD-friendly cousin of
+HNSW; pairing it with RaBitQ closes the "billion-scale on commodity
+disk + DRAM" pitch in ADR-154. Also breaks the PQ training-data
+bootstrap problem at index build.
+
+---
+
+### A2. `ruvector-graph` — vector-property index for nodes
+
+**What it stores.** `crates/ruvector-graph/src/index.rs:15-79`
+ships `LabelIndex`, `PropertyIndex`, `EdgeTypeIndex`, `AdjacencyIndex`.
+There is no vector-property index today.
+
+**Hot-path similarity.** The `PropertyIndex::get_nodes_by_property`
+path (`index.rs:118`) does exact matching on `PropertyValue`. The
+moment a property is a `Vec<f32>` (an embedding stored on a node),
+this collapses to "scan every node, compute distance, return top-k" —
+which the crate cannot do today without a sister index.
+
+**What 32× buys.** A graph database with millions of nodes that each
+carry a 768-dim embedding (LLM context, agent memory, code symbol)
+needs vector-near-neighbor lookup as a property-search primitive.
+RaBitQ codes turn that lookup from "scan everything" into "scan 1-bit
+codes, rerank top candidates", and the codes themselves cost ~32× less
+RAM than the originals.
+
+**Friction.** Graph database semantics: insert/update/delete on a
+single node should not rebuild the rotation. `RabitqPlusIndex::add`
+(`crates/ruvector-rabitq/src/index.rs`) already supports incremental
+insertion under the existing rotation. Witness chain doesn't apply
+here — graph nodes have their own ID semantics, so the rabitq index is
+a sub-index keyed by `NodeId`.
+
+**Effort.** Medium-low — a new `VectorPropertyIndex` next to the
+existing four (`crates/ruvector-graph/src/index.rs`), with the same
+lifecycle hooks (`add_node`, `remove_node`). ~600 LoC.
+
+**Strategic value.** **High.** Unlocks "graph-structured RAG" inside
+the same crate, which is what `crates/ruvector-graph-transformer/` and
+the GNN consumers actually want.
+
+---
+
+### A3. `ruvector-gnn` — KNN for `differentiable_search`
+
+**What it stores.** `crates/ruvector-gnn/src/search.rs:4` exposes
+`cosine_similarity(a, b)` and `differentiable_search(query, candidates,
+top_k, temperature)` (`search.rs:56`). The candidates list is held by
+the caller — typically as a `Vec<Vec<f32>>`.
+
+**Hot-path similarity.** `differentiable_search` sorts every candidate
+by cosine, takes top-k, and reweights the survivors with softmax.
+`hierarchical_forward` at `search.rs:105` does this **once per
+hierarchy layer per forward pass** during inference and training.
+
+**What 32× buys.** GNN inference at scale (10⁵+ nodes, 768-dim
+features) hits a hard memory ceiling on the candidate set; replacing
+the f32 candidate fan-out with RaBitQ codes lets a 100× larger
+candidate pool fit in DRAM. Symmetric estimator
+(`crates/ruvector-rabitq/src/lib.rs:11-14`) is `O(D/64)` vs cosine's
+`O(D)` — the same algorithmic win the rabitq-demo measures (3.1×
+QPS).
+
+**Friction.** `differentiable_search` returns *weights* via softmax,
+not just ids. The 1-bit angular estimator `cos(π·(1 − B/D))` is a
+proxy — top-k selection is fine, but the softmax weights would need to
+come from RabitqPlus's exact-rerank f32 scores so gradients stay
+meaningful. Practical: rerank top-k×10 with the f32 estimator, softmax
+those.
+
+**Effort.** Small — replace the candidate-scan loop in
+`search.rs:56` with `RabitqPlusIndex::search_with_rerank`. ≤300 LoC
+plus a test that shows recall@k matches the brute-force cosine
+within tolerance.
+
+**Strategic value.** **High.** Unlocks attention-over-large-graphs
+patterns inside the GNN trainer. Pair with A2 for graph + GNN sharing
+one rabitq sub-index.
+
+---
+
+## Tier B — High-value, high-effort (medium-term)
+
+### B1. `ruvector-attention` — KV-cache compression behind 1-bit
+
+**What it stores.** `crates/ruvector-attention/src/attention/kv_cache.rs:253`
+`CacheManager` owns key/value tensors per layer, with `append`
+(`kv_cache.rs:284`), `get` (`:309`), `evict` (`:325`), and
+`pyramid_budget` (`:398`). It already has its own asymmetric/symmetric
+quantize (`:130, :182`) producing `QuantizedTensor` (`:90`) at 4–8
+bits.
+
+**Hot-path similarity.** Attention is `softmax(QK^T / sqrt(d)) V`.
+The K-cache is the database, the Q is the query — exactly RaBitQ's
+*asymmetric* setting (`crates/ruvector-rabitq/src/lib.rs:16-18`,
+`RabitqAsymIndex` in `src/index.rs`).
+
+**What 32× buys.** A 32k-token cache at D=4096 is **524 MB per layer**
+in f16; RaBitQ-Asym takes 16 MB for the codes. The asymmetric
+estimator `‖q‖·‖x‖·(1/√D)·Σ sign(x_rot)·q_rot` keeps the query in
+f32 — exactly what attention needs.
+
+**Friction.** Attention's existing 4–8-bit quantize is bf16/f16 native
+across the rest of the LLM stack; introducing a third datatype path is
+real work. Also, RabitqAsym's QPS at D=128 was only **26 QPS**
+(`BENCHMARK.md` headline) — that path needs the SIMD/GPU kernel from
+ADR-157 before it's competitive with the existing 4-bit path.
+Determinism on rerank (float reduction order) is a problem on GPU.
+
+**Effort.** Large — touches an existing performance-sensitive cache,
+needs SIMD kernel development, needs fallback to existing
+`QuantizedTensor` when D is too small for the rotation cost to pay
+off. ~2000+ LoC across kv_cache + a feature flag.
+
+**Strategic value.** **Medium-high but speculative.** ruvllm's KV
+cache is the bigger target (B2); attention is the upstream library.
+If B2 lands first, B1 follows.
+
+---
+
+### B2. `ruvllm` — LLM serving KV cache + retrieval-augmented prompt cache
+
+**What it stores.** `crates/ruvllm/src/kv_cache.rs:203` `KvMemoryPool`
+holds aligned f32 buffers (`AlignedBuffer` at `kv_cache.rs:45`). The
+ruvllm hot path is the same shape as B1 but at the serving layer:
+multi-tenant, eviction-pressured, latency-sensitive.
+
+**Hot-path similarity.** Same as B1 (attention K-cache). Plus, ruvllm's
+RAG path is whatever the embedding model + a separate ANN index look
+like — and that's a free win for ruLake (since ruvllm could just
+embed and query a `RuLake` instance instead of holding its own).
+
+**What 32× buys.** Multi-tenant serving is RAM-bound; 32× compression
+of long-context K-caches lets one box serve 32× more concurrent
+sessions before eviction.
+
+**Friction.** ruvllm has its own backend abstraction
+(`crates/ruvllm/src/backends/`), GGUF loaders
+(`crates/ruvllm/src/gguf/`), Metal kernels (`/metal/`), and bitnet
+support (`/bitnet/`). Adding RaBitQ as another quantization path
+needs to live behind that backend trait, not in the cache directly.
+
+**Effort.** Large — needs ADR-class decision on whether ruvllm
+adopts ruLake as its retrieval substrate (which solves both the K-cache
+and the RAG question with one integration). Otherwise: a dedicated
+RaBitQ K-cache implementation. 1500–3000 LoC depending on path.
+
+**Strategic value.** **High.** ruvllm is the LLM serving frontend;
+RaBitQ-as-K-cache-compression is a marketing-grade moat ("32× more
+concurrent contexts on the same hardware").
+
+---
+
+### B3. `ruvector-temporal-tensor` — time-windowed compressed segments
+
+`crates/ruvector-temporal-tensor/src/{lib,tiering,quantizer,compressor,f16,segment,bitpack}.rs`
+ships a temperature-tiered compression stack already (hot/warm/cold via
+`tier_policy.rs`, with its own quantizer in `core_trait.rs`). Cold-tier
+reads currently pay an unpack cost; if the segment payload is 1-bit
+RaBitQ codes the read can stay in compressed form for proximity-of-
+time-window search. 32× compression pushes billion-sample D=128/768
+working sets onto one machine at the tier boundary.
+
+**Friction.** RaBitQ is a new tier alongside scalar/PQ; determinism
+still matters because of the cross-tier coherence story
+(`coherence.rs`). **Effort:** medium — codec for `.rbpx`, hook into
+`tier_policy`. 800–1200 LoC. **Value:** medium-high. Pairs naturally
+with ruLake (different problem, same compression substrate).
+
+### B4. `ruvector-domain-expansion` — embedding-based domain shift
+
+`crates/ruvector-domain-expansion/src/lib.rs:90` `DomainExpansionEngine`
+exposes `embed(...)` (`lib.rs:199`) and `initiate_transfer(...)`
+(`:205`). The kNN-over-domains lookup at transfer time would benefit
+from RaBitQ, but domain counts are 10²–10⁴ — the compression win is
+modest. The real win is **consistency**: embeddings would gain witness
++ cross-process sharing for free if stored in ruLake.
+
+**Friction.** The embedding type is `DomainEmbedding`, not raw
+`Vec<f32>`; light refactor. **Effort:** small (300 LoC) — Tier B not
+because of effort but because of value. **Value:** medium-low; this
+is consistency hygiene, not load-bearing.
+
+---
+
+## Tier C — Speculative (defer or kill)
+
+### C1. `ruvector-mincut` — graph cut over vector similarity
+
+`crates/ruvector-mincut/src/core/`, `sparsify/`, `localkcut/`,
+`cluster/` — graph-cut algorithms over edge-weighted graphs. MinCut
+operates on edges, not raw vectors. The vector → kNN-graph build step
+could feed RaBitQ, but that's an instance of A2/A3, not a separate
+consumer. **Verdict:** defer (downstream of A2).
+
+### C2. `ruvector-cnn` — embedding producer, not indexer
+
+`crates/ruvector-cnn/src/embedding.rs:122` `MobileNetEmbedder`
+produces `Vec<f32>` via `extract`. The crate ends at producing the
+embedding; consumers do their own indexing. The "integration" is a
+one-liner in user code, not a crate change. **Verdict:** kill as a
+crate-level integration; add a README example showing the
+producer→`RabitqPlusIndex::add` plug.
+
+### C3. `ruvector-fpga-transformer` — RaBitQ popcount on FPGA
+
+`crates/ruvector-fpga-transformer/src/lib.rs:86` `Engine` for
+transformer inference. RaBitQ's popcount kernel is **the** kernel a
+small FPGA can do well — 64-bit XOR + popcount is two LUT levels deep.
+A `ruvector-rabitq-fpga` kernel under ADR-157 is a research project,
+not a near-term integration. **Verdict:** defer to ADR-157 follow-on.
+
+### C4. `ruvector-sparsifier` — spectral sparsification
+
+Same logic as C1: vectors only enter via the kNN-graph build step.
+**Verdict:** defer (downstream of A2).
+
+### C5. `rvagent-a2a` — already integrated by reference
+
+`crates/rvAgent/rvagent-a2a/src/artifact_types.rs:64` defines
+`ArtifactKind::RuLakeWitness { witness, data_ref, capabilities }` —
+a by-reference vector handle that travels between agents without
+moving bytes (ADR-159 §"Typed artifact semantics"). A2A doesn't carry
+RaBitQ codes directly; it carries the witness that resolves to a
+ruLake bundle. The integration **is** the witness type. **Verdict:**
+no-op; the "witness-as-handle" pattern is already paying off here.
+
+---
+
+## Summary table
+
+| Tier | Crate | Effort | Value | Notes |
+|------|-------|--------|-------|-------|
+| A1 | `ruvector-diskann` | small | high | Replace/augment PQ; ADR-154 already named this |
+| A2 | `ruvector-graph` | medium-low | high | New `VectorPropertyIndex` |
+| A3 | `ruvector-gnn` | small | high | `differentiable_search` rewrite |
+| B1 | `ruvector-attention` | large | medium-high | KV cache, asymmetric path |
+| B2 | `ruvllm` | large | high | K-cache + RAG via ruLake |
+| B3 | `ruvector-temporal-tensor` | medium | medium-high | New temperature tier |
+| B4 | `ruvector-domain-expansion` | small | medium-low | Hygiene rather than load-bearing |
+| C1 | `ruvector-mincut` | — | low | Downstream of A2 |
+| C2 | `ruvector-cnn` | — | none | Pure user-code example |
+| C3 | `ruvector-fpga-transformer` | research | speculative | ADR-157 kernel |
+| C4 | `ruvector-sparsifier` | — | low | Downstream of A2 |
+| C5 | `rvagent-a2a` | — | done | Witness-by-reference shipped |
+
+12 candidates surveyed. Phase 1 picks A1 + A2 + A3. Phase 2 picks B1
+*or* B2 (one of them, not both — they answer the same question). Phase
+3 is the workspace-canonical-compression ADR question (§05).
diff --git a/docs/research/rabitq-integration/03-architectural-patterns.md b/docs/research/rabitq-integration/03-architectural-patterns.md
new file mode 100644
index 000000000..2797add24
--- /dev/null
+++ b/docs/research/rabitq-integration/03-architectural-patterns.md
@@ -0,0 +1,289 @@
+# 03 — Architectural Patterns
+
+Three sane shapes to add a new RaBitQ consumer. Each preserves the
+ADR-154 / ADR-155 / ADR-157 invariants and matches a different
+consumer profile. The choice is consequential because every shape
+implies a different contract about who owns the index, who owns the
+witness, and who picks the kernel.
+
+The goal of this section is to make the choice explicit at integration
+time, so we don't accidentally fragment what is currently one
+deterministic compression substrate.
+
+---
+
+## Pattern 1 — Direct embed
+
+The consumer crate adds `ruvector-rabitq` to its `Cargo.toml` and uses
+`RabitqPlusIndex` (or any of the four indexes) as a private field of
+its own type. The consumer owns the index lifecycle: build, add,
+search, persist.
+
+**Sketch.**
+
+```toml
+# Cargo.toml of consumer
+[dependencies]
+ruvector-rabitq = { path = "../ruvector-rabitq", version = "2.2" }
+```
+
+```rust
+// inside consumer
+use ruvector_rabitq::{AnnIndex, RabitqPlusIndex};
+
+pub struct VectorPropertyIndex {
+    by_property: HashMap<PropertyKey, RabitqPlusIndex>,
+    seed: u64,
+    rerank_factor: usize,
+}
+
+impl VectorPropertyIndex {
+    pub fn add_node(&mut self, node_id: NodeId, property: PropertyKey, vector: Vec<f32>) {
+        self.by_property
+            .entry(property)
+            .or_insert_with(|| RabitqPlusIndex::new(self.dim, self.seed, self.rerank_factor))
+            .add(node_id.0, vector)
+            .unwrap();
+    }
+
+    pub fn knn(&self, property: &PropertyKey, q: &[f32], k: usize) -> Vec<NodeId> {
+        self.by_property
+            .get(property)
+            .map(|idx| idx.search(q, k).unwrap())
+            .unwrap_or_default()
+            .into_iter()
+            .map(|r| NodeId(r.id))
+            .collect()
+    }
+}
+```
+
+**Best when.** The consumer owns its index lifecycle, doesn't need
+witness chaining, and doesn't need to share the index across processes
+or backends. Examples in §02:
+
+- **A1 — `ruvector-diskann`:** the index *is* the consumer's product;
+  it manages its own SSD-backed storage and its own rebuild policy.
+  RaBitQ is a backend choice, not a foreign service.
+- **A2 — `ruvector-graph`:** the property index is a sub-component of
+  a graph database that already owns its lifecycle.
+- **A3 — `ruvector-gnn`:** the candidate set passed to
+  `differentiable_search` is owned by the GNN forward pass; building a
+  fresh `RabitqPlusIndex` per layer is fine for inference and the
+  index is short-lived.
+- **B4 — `ruvector-domain-expansion`:** the embedding store is
+  internal state, no cross-crate sharing required.
+
+**What this pattern doesn't give you.** The witness chain. Cross-
+process cache sharing. Pluggable kernels (you get whatever ships in
+`ruvector-rabitq` proper, which today means `CpuKernel`).
+
+---
+
+## Pattern 2 — Behind the `VectorKernel` trait (ADR-157)
+
+The consumer registers a `VectorKernel` implementation — typically the
+default `CpuKernel`, optionally an SIMD or GPU one — and dispatches
+queries through it. The trait shape is at
+`crates/ruvector-rabitq/src/kernel.rs:78-126`:
+
+```rust
+pub trait VectorKernel: Send + Sync {
+    fn id(&self) -> &str;
+    fn caps(&self) -> KernelCaps;
+    fn scan(&self, req: ScanRequest<'_>) -> Result<ScanResponse, RabitqError>;
+}
+```
+
+`ScanRequest` carries a borrowed `&RabitqPlusIndex` plus a query
+batch; the consumer (or a coordinator) picks the kernel based on
+batch size + dim + determinism requirement.
+
+**Sketch.** A consumer that wants pluggable backends keeps an
+`Arc<dyn VectorKernel>` field and calls `.scan(...)` in the hot
+path:
+
+```rust
+use ruvector_rabitq::{CpuKernel, ScanRequest, VectorKernel};
+
+pub struct AcceleratedSearcher {
+    kernel: Arc<dyn VectorKernel>,
+    // …
+}
+
+impl AcceleratedSearcher {
+    pub fn new() -> Self {
+        Self { kernel: Arc::new(CpuKernel::new()) }
+    }
+    pub fn register_kernel(&mut self, k: Arc<dyn VectorKernel>) {
+        // ranked dispatch by caps()
+        if self.should_prefer(&*k) { self.kernel = k; }
+    }
+    pub fn search(&self, idx: &RabitqPlusIndex, queries: &[Vec<f32>], k: usize)
+        -> Result<ScanResponse, RabitqError>
+    {
+        self.kernel.scan(ScanRequest { index: idx, queries, k, rerank_factor: None })
+    }
+}
+```
+
+**Best when.** The consumer wants pluggable acceleration but doesn't
+need cross-process witness/cache. Examples:
+
+- **B1 — `ruvector-attention` KV cache:** wants SIMD on server, WASM
+  SIMD in browser, GPU on a Cognitum box. Same source, different
+  kernels. The trait was literally designed for this in ADR-157.
+- **B2 — `ruvllm`:** if RaBitQ becomes the K-cache compression,
+  ruvllm picks Metal or CUDA per platform.
+- **C3 — `ruvector-fpga-transformer`:** an `RabitqFpgaKernel`
+  registered at startup, with `caps().min_batch ≥ 1024` so it only
+  fires on bulk inference.
+
+**Critical caveat.** The trait is shipped (`src/kernel.rs`) but
+**no caller wires it up today** — `ruvector-rulake` references it
+only in a doc comment at `lake.rs:595`. The first consumer that
+uses Pattern 2 must also write the dispatch policy (ADR-157
+§"Dispatch policy normative") in its own crate; this is *not* free.
+Roadmap Phase 2 (§05) is exactly this work.
+
+---
+
+## Pattern 3 — Through `ruLake`
+
+The consumer doesn't manage a RaBitQ index at all. It delegates to a
+`RuLake` instance with a `LocalBackend` (or a remote one) holding the
+vectors, and calls `lake.search_one(backend, collection, query, k)`.
+
+**Sketch.**
+
+```rust
+use ruvector_rulake::{LocalBackend, RuLake};
+
+let backend = LocalBackend::with_vectors("agent-mem", "episodic", dim, vecs);
+let lake = RuLake::builder()
+    .register_backend(Arc::new(backend))
+    .with_seed(42)
+    .with_rerank_factor(20)
+    .build()?;
+
+let hits = lake.search_one("agent-mem", "episodic", &q, 10)?;
+```
+
+The consumer gets:
+
+- 1.02× tax on the cache-hit path (measured —
+  `crates/ruvector-rulake/BENCHMARK.md`).
+- A SHAKE-256 witness chain via `RuLakeBundle`
+  (`crates/ruvector-rulake/src/bundle.rs`).
+- Cross-process cache sharing: two ruLake instances reading the same
+  bundle reuse one compressed copy
+  (`crates/ruvector-rulake/src/cache.rs`,
+  test `two_backends_share_cache_when_witness_matches`).
+- `Consistency::{Fresh, Eventual, Frozen}` knob for staleness SLA
+  (ADR-156).
+- Witness-by-reference for cross-agent handoff via
+  `ArtifactKind::RuLakeWitness`
+  (`crates/rvAgent/rvagent-a2a/src/artifact_types.rs:64`).
+
+**Best when.** The consumer wants witness-sealed memory, cross-process
+sharing, freshness modes, or zero-copy handoff to other agents.
+
+- **B2 — `ruvllm` RAG:** any retrieval ruvllm does should sit on a
+  `RuLake`, not on its own `RabitqPlusIndex` — gets the witness +
+  freshness modes for free.
+- **Any rvAgent subagent:** the agent memory hierarchy from ADR-156 is
+  literally this pattern. Direct embed would re-implement bundle +
+  witness; through-ruLake is "the brain on the substrate".
+- **A future `ruvector-postgres` extension:** a Postgres function that
+  returns top-k from a managed lake of vectors — ruLake is the right
+  shape because the function may run in many backend processes
+  sharing one cache.
+
+**What this pattern doesn't give you.** Bare-metal min latency. The
+1.02× tax is measured on `LocalBackend`; on a Parquet-on-GCS backend
+the cold path is network-bound. Direct embed wins for in-process
+single-user workloads where the consumer already has the vectors
+materialised.
+
+---
+
+## Mapping §02 candidates to patterns
+
+| Candidate | Pattern | Why |
+|-----------|---------|-----|
+| A1 `ruvector-diskann` | **1** direct embed | Owns its index lifecycle; SSD/PQ already custom |
+| A2 `ruvector-graph` | **1** direct embed | Sub-index of an existing graph store |
+| A3 `ruvector-gnn` | **1** direct embed | Short-lived per-layer index in forward pass |
+| B1 `ruvector-attention` | **2** VectorKernel | Needs SIMD/GPU/WASM kernel choice per target |
+| B2 `ruvllm` | **3** through ruLake (RAG) + **2** kernel (KV cache) | Two integrations, two patterns |
+| B3 `ruvector-temporal-tensor` | **1** direct embed | New temperature tier inside the existing crate |
+| B4 `ruvector-domain-expansion` | **3** through ruLake | Already produces witness-shaped outputs |
+| C1, C4 (mincut, sparsifier) | downstream of A2 | n/a until A2 lands |
+| C2 `ruvector-cnn` | none (user code) | Producers, not indexers |
+| C3 `ruvector-fpga-transformer` | **2** VectorKernel | The kernel pattern's poster child |
+| C5 `rvagent-a2a` | **3** (already, via witness) | Done |
+
+Note the split for B2 — ruvllm probably wants both. That's fine; the
+patterns compose.
+
+---
+
+## Anti-patterns to refuse
+
+The following shapes look reasonable in a PR review but each one
+breaks an existing ADR invariant or fragments the substrate. None of
+them should pass review.
+
+### Anti-pattern A — re-implementing rotation
+
+A consumer crate copy-pastes the rotation code from
+`crates/ruvector-rabitq/src/rotation.rs` into its own module to "avoid
+the dependency". Breaks ADR-154's determinism guarantee — a divergent
+copy means `(seed, dim, vectors) → bit-identical codes` no longer holds
+across crates. **Always import from `ruvector-rabitq`.**
+
+### Anti-pattern B — ad-hoc 1-bit compression
+
+A consumer crate ships its own `pack_bits` function and its own
+distance estimator because "we just need a quick binary code". This
+re-creates the original `BinaryQuantized` problem ADR-154 §"Measured
+gap" was written to fix: ~15–20% recall vs RaBitQ's 40.8%/98.9%. **If
+you're doing 1-bit compression of vectors in this workspace, it's
+RaBitQ.**
+
+### Anti-pattern C — exposing `originals_flat`
+
+A consumer crate's PR widens `RabitqPlusIndex` to expose its private
+`originals_flat: Vec<f32>` field (`src/index.rs:546`) "for
+zero-copy". Breaks the encapsulation that the persist format relies
+on (`src/persist.rs:1-18`) — and the persist format is the contract
+that lets two processes warm-load each other's bundles. The Python
+SDK explicitly works around this at `src/rabitq.rs:35-43` by calling
+`export_items()` instead. **Use `export_items()` or extend
+`AnnIndex`; do not widen the struct.**
+
+### Anti-pattern D — fragmenting the witness
+
+A consumer crate runs RaBitQ to compress vectors and ships them under
+a *different* witness scheme (e.g. its own SHA-3 over a private
+serialization format). Breaks ADR-155 cross-backend cache sharing and
+ADR-159 by-reference artifact handoff. **All compressed-vector
+artifacts that traverse process boundaries use ruLake's
+`RuLakeBundle` witness or none at all.**
+
+### Anti-pattern E — RaBitQ everywhere by default
+
+The mirror of D — adding `ruvector-rabitq` as a default dep on every
+crate "because it's available". Adds ~50 KB compiled size and the
+rotation tables to every WASM bundle and embedded build. The §04
+performance budget is explicit: only candidate consumers with
+demonstrated benefit (Tier A) get the dep on the default build path.
+WASM consumers must feature-gate.
+
+### Anti-pattern F — ignoring the kernel determinism gate
+
+A consumer crate registers a non-deterministic GPU kernel and
+serves Fresh/Frozen consistency from it. Breaks ADR-157 §"Determinism
+as a hard gate". **Caps are advisory at compile time but enforced at
+dispatch.** The consumer must implement the dispatch filter from
+ADR-157, not just `kernels.iter().next()`.
diff --git a/docs/research/rabitq-integration/04-cross-cutting-concerns.md b/docs/research/rabitq-integration/04-cross-cutting-concerns.md
new file mode 100644
index 000000000..94d9e760a
--- /dev/null
+++ b/docs/research/rabitq-integration/04-cross-cutting-concerns.md
@@ -0,0 +1,230 @@
+# 04 — Cross-Cutting Concerns
+
+The invariants every new RaBitQ integration must hold. These come from
+reading the existing call sites and the ADRs that govern them; if a
+new integration breaks any of these, it almost certainly invalidates
+ADR-154/155/157 by side effect.
+
+---
+
+## 1. Determinism across architectures
+
+**The contract.** `(dim, seed, items) → bit-identical rotation matrix
++ packed codes + index build + search output across runs and across
+machines.` Stated explicitly at
+`crates/ruvector-rabitq/src/persist.rs:14-17` and re-stated at
+`crates/ruvector-rabitq/src/lib.rs:34-37`. Tested by
+`persist::tests::serialize_roundtrip_preserves_search_results`
+(`persist.rs:258`) which compares score bits with `to_bits()` — not a
+tolerance compare.
+
+**Why it matters for new integrations.** ruLake's witness chain
+(ADR-155) and cross-backend cache sharing (the
+`two_backends_share_cache_when_witness_matches` test) depend on
+this. So does the rabitq-by-reference handoff in
+`ArtifactKind::RuLakeWitness`
+(`crates/rvAgent/rvagent-a2a/src/artifact_types.rs:64`) — agents on
+different boxes reading the same witness must compute the same
+top-k.
+
+**The trap.** Floating-point reduction order is not stable across
+SIMD widths or GPU lane counts. ADR-157 already calls this out:
+the **scan phase** (1-bit popcount) is integer math and trivially
+deterministic; the **rerank phase** (exact L2²) is float reduction
+and can diverge in the last ulp on GPU. ADR-157's resolution: kernels
+that can't guarantee identical rerank set `caps().deterministic =
+false`, and the dispatch policy refuses to use them on Fresh/Frozen
+paths.
+
+**Enforcement.** Every integration adds a regression test of the
+shape "build same data twice, different threads/seeds-with-same-value,
+asserting `to_bits()` match on at least 100 query results". The
+existing test at `persist.rs:258` is the model.
+
+---
+
+## 2. Witness format compatibility
+
+**The contract.** `.rbpx` v1 is the on-disk and on-wire format
+(`crates/ruvector-rabitq/src/persist.rs:23-33`). It carries
+`(magic, version, dim, seed, rerank_factor, n, items)`. The format is
+**deliberately seed-based** rather than field-based — it stores the
+*replay inputs*, not the index internals, because the deterministic
+build is cheaper to re-run than the rotation matrix is to ship.
+
+**Why it matters.** Every cross-process integration that wants
+witness-sealed memory rides this format. `ruvector-rulake`'s
+`save_index`/`load_index` calls (`lake.rs:304,399`) are the only
+producer/consumer today, but ADR-159's `RuLakeWitness` artifact (and
+its `data_ref` field) implicitly depends on this format being stable.
+
+**The trap.** A consumer that needs a fielded format (e.g. for a
+columnar store like Parquet) will be tempted to widen `.rbpx` v1 with
+extra fields. Don't. The right shape is:
+
+- For a richer container, wrap `.rbpx` inside another format
+  (e.g. a tar-like bundle that holds `.rbpx` + a sidecar metadata file).
+- For a different field set entirely, bump to `.rbpx` v2 in the same
+  module, with a feature flag, and keep v1 readable.
+- Never extend v1 in place. The persist format's `MAGIC` + `VERSION`
+  bytes (`persist.rs:49-51`) are a contract.
+
+**Enforcement.** PR review on every change touching `persist.rs`. The
+`reject_version_too_new` test (`persist.rs:425`) defends this.
+
+---
+
+## 3. Memory ownership: who holds the codes
+
+**The lesson from PR #381 (Python SDK).** `RabitqPlusIndex` does not
+expose `originals_flat` directly — the field is private at
+`crates/ruvector-rabitq/src/index.rs:546`. Consumers that need to
+re-export the originals (e.g. for `save_index`) call
+`export_items()` (`src/index.rs:589`), which **clones**
+`n*dim*sizeof(f32)` bytes. This is documented in
+`crates/ruvector-py/src/rabitq.rs:35-43` as a deliberate cost trade.
+
+**The contract.** Three rules.
+
+a. The cache (or the consumer's struct) owns the `Arc<RabitqPlusIndex>`.
+   `ruvector-rulake::cache.rs:213` is the model.
+
+b. New consumers that need raw vectors call `export_items()`. They do
+   not get a borrowed slice; the wrapper is intentional.
+
+c. New consumers that need to *avoid* the export-items copy need to
+   restructure to keep the source-of-truth `Vec<f32>` themselves and
+   use `RabitqPlusIndex` only for the codes + search. The Python SDK
+   chose to do the copy; ruLake chose to keep the source-of-truth in
+   `LocalBackend::PulledBatch`.
+
+**The trap.** A PR that "adds a `pub fn raw_vector(&self, i: usize) ->
+&[f32]` to `RabitqPlusIndex` for performance" — see Anti-pattern C in
+§03. Refuse it. If the perf is real, the right move is to widen
+`AnnIndex`, not the struct internals.
+
+---
+
+## 4. API stability and version pinning
+
+**The state.** `ruvector-rabitq` is at `2.2.0` on crates.io
+(`Cargo.toml:215` workspace version). Both consumer Cargo.tomls
+(`ruvector-rulake/Cargo.toml:16`, `ruvector-py/Cargo.toml:25`) pin via
+`path = "../ruvector-rabitq"`. The rulake Cargo.toml also adds a
+`version = "2.2"` constraint, the Python SDK doesn't yet — that's
+worth normalising.
+
+**The contract.** New integrations pin `ruvector-rabitq = { path =
+"../ruvector-rabitq", version = "^2.2" }` to allow patch + minor
+upgrades but block major ones. This is what semver bought: anything
+that needs to break the persist format or the index trait surface
+becomes a major bump and forces a synchronised upgrade across all
+consumers.
+
+**The trap.** Workspace-only `path` deps without a version constraint
+work locally, but the moment the supplier crate publishes a major
+version on crates.io and a downstream user pulls
+`ruvector-rabitq = "3"` the workspace is silently inconsistent. Add
+the version constraint at integration time.
+
+---
+
+## 5. Performance footprint on small targets
+
+**The numbers.** `ruvector-rabitq`'s `Cargo.toml` deps are `rand`,
+`rand_distr`, `rayon`, `serde`, `serde_json`, `thiserror` — small.
+But the rotation tables, the cos-LUT, and the binary code paths add
+~50 KB to a release WASM bundle (estimated; not yet measured for
+ruvector-py wheel). The crate explicitly disables `unsafe` and pulls
+no BLAS, which keeps it portable.
+
+**The contract.** WASM, embedded, and `wasm32-*` consumers must
+feature-gate the rabitq dep. The `Cargo.toml` excludes list at
+`/home/ruvultra/projects/ruvector/Cargo.toml:1-8` already keeps things
+out of `cargo build --workspace` selectively; new WASM consumers
+should follow that pattern.
+
+**The trap.** Adding `ruvector-rabitq` as a default dep on a
+hypothetical `ruvector-edge-something` crate, then discovering the
+WASM build is 50 KB heavier and the embedded ESP32 build (cf.
+`examples/ruvLLM/esp32-flash` excluded list) doesn't link. Feature-
+gate before integrating, not after.
+
+---
+
+## 6. Cross-language story
+
+**The state today.** `ruvector-py` is the only non-Rust consumer
+(M1 shipped, commit `e7f5a391f`). Wheel binding via PyO3 + maturin,
+ABI3 across Python 3.9..3.13 (`crates/ruvector-py/Cargo.toml:21`).
+
+**The contract for future bindings (Node, WASM, Java).**
+
+- Bindings expose **only the `AnnIndex` trait surface plus persist**.
+  Internal types (`BinaryCode`, `RandomRotation`) stay Rust-only —
+  exposing them widens the FFI surface beyond what the determinism
+  contract can survive across language runtimes.
+
+- Persist roundtrip is the cross-language compatibility test. A `.rbpx`
+  written by Rust must load identically in Python; a `.rbpx` written
+  by Python must load identically in Rust. The
+  `persist::tests::serialize_roundtrip_preserves_search_results` test
+  is the in-Rust version; the cross-language version is a
+  cross-runtime test (the Python SDK already does the round-trip in
+  its test suite, just within Python).
+
+- WASM bindings inherit the §5 footprint constraint: no rabitq in the
+  default WASM bundle unless the consumer opts in.
+
+**The trap.** Each new binding tempted to expose more of the API. The
+Python SDK got this right by exposing exactly one class
+(`crates/ruvector-py/src/rabitq.rs:36`); future bindings should match.
+
+---
+
+## 7. The `VectorKernel` story is asymmetrical
+
+**The state.** Trait shipped in `ruvector-rabitq` (`src/kernel.rs`).
+One implementation (`CpuKernel`). **Zero callers** that wire dispatch
+— only a doc comment at `crates/ruvector-rulake/src/lake.rs:595`.
+That's a real gap.
+
+**The implication for new integrations.** A consumer that uses
+Pattern 2 (§03) is **the first non-test caller of `VectorKernel`**.
+That consumer must:
+
+- Implement the dispatch policy from ADR-157 §"Dispatch policy
+  normative" (preference order, batch-size + dim + determinism filter).
+- Decide where to surface kernel identity in stats (the comment in
+  `src/kernel.rs:23-25` says "kernel identity is surfaced in caps +
+  stats, not in the witness" — caller's responsibility).
+- Write the test that verifies determinism across two registered
+  kernels on Fresh/Frozen consistency.
+
+This is real engineering — Phase 2 of §05 explicitly budgets it. A
+consumer that thinks it's getting "free GPU" by adopting the trait
+is going to be disappointed unless someone has done this work first.
+
+**The graceful path.** `ruvector-rulake` should be that someone. It
+already references the trait in the doc comment; making the dispatch
+real in rulake first means every other Pattern-2 consumer inherits a
+working pattern and a test harness.
+
+---
+
+## 8. The witness chain is anchored on data, not on kernels
+
+**Restated from ADR-157 §"Determinism as a hard gate":** the
+witness is computed over `(data_ref, dim, rotation_seed,
+rerank_factor, generation)`. Kernel identity is **not** in the
+witness — kernels are execution substrate.
+
+**The contract for new integrations.** A consumer that adds a new
+kernel does *not* invalidate any existing witness. A consumer that
+changes the rotation seed, the rerank factor, or the data does. New
+integrations must not couple kernel selection to data identity — that
+includes "use a different rotation seed for the GPU path because it
+benchmarks better at that seed", which is a ruled-out direction.
+
+This is what makes Phase 2's GPU work safe: a CUDA kernel that ships
+later does not break already-published bundles.
diff --git a/docs/research/rabitq-integration/05-roadmap.md b/docs/research/rabitq-integration/05-roadmap.md
new file mode 100644
index 000000000..b401754d8
--- /dev/null
+++ b/docs/research/rabitq-integration/05-roadmap.md
@@ -0,0 +1,238 @@
+# 05 — Roadmap
+
+Three phases. Each picks a coherent slice of the §02 candidate list,
+specifies the files to touch, an acceptance test, and an LoC budget.
+Each phase is sized to ~3–6 engineer-weeks. Phases are independent —
+Phase 2 doesn't block on Phase 1 except where noted.
+
+---
+
+## Phase 1 — Low-hanging integrations (3 candidates, 4–5 weeks)
+
+Pick the three Tier-A candidates from §02. They share three desirable
+properties:
+
+- All use Pattern 1 (direct embed) — no new infrastructure required.
+- All can pin the same major version of `ruvector-rabitq` (`^2.2`).
+- All have the consumer code already structured around vectors, so
+  the integration is *adding a new index path*, not redesigning a
+  hot loop.
+
+### P1.A — `ruvector-diskann` RaBitQ backend
+
+**Files to touch:**
+
+- `crates/ruvector-diskann/Cargo.toml` — add `ruvector-rabitq = {
+  path = "../ruvector-rabitq", version = "^2.2" }`.
+- `crates/ruvector-diskann/src/index.rs` — add a `Backend` enum
+  alongside the existing PQ path (`pq.rs:14`). New variant
+  `Backend::Rabitq { plus: RabitqPlusIndex, seed: u64 }`. Constructor
+  `DiskAnnIndex::new_rabitq(config, seed, rerank_factor)`.
+- `crates/ruvector-diskann/src/index.rs:169` `search` — branch on
+  backend; RaBitQ path calls `RabitqPlusIndex::search_with_rerank`.
+- `crates/ruvector-diskann/src/index.rs:219,297` `save`/`load` —
+  delegate to `ruvector_rabitq::persist::save_index/load_index`
+  on the rabitq path.
+
+**Acceptance test:** on the same dataset (Gaussian-clustered D=128
+n=100k, the one in `crates/ruvector-rabitq/src/main.rs`), the rabitq
+path achieves recall@10 ≥ 95% at QPS ≥ 2× the existing PQ path. New
+file `crates/ruvector-diskann/tests/rabitq_backend_smoke.rs`.
+
+**LoC budget:** ≤500 LoC source + ≤200 LoC tests.
+
+### P1.B — `ruvector-graph` `VectorPropertyIndex`
+
+**Files to touch:**
+
+- `crates/ruvector-graph/Cargo.toml` — add the rabitq dep.
+- `crates/ruvector-graph/src/index.rs` — new `VectorPropertyIndex`
+  struct alongside `LabelIndex` (`:15`), `PropertyIndex` (`:79`),
+  `EdgeTypeIndex` (`:180`), `AdjacencyIndex` (`:240`). Same
+  lifecycle methods (`new`, `add_node`, `remove_node`, plus a new
+  `knn(&self, property, query, k) -> Vec<NodeId>`).
+- `crates/ruvector-graph/src/node.rs` — extend `Node` to carry
+  optional vector-typed properties; or a side-table indexed by
+  `NodeId`.
+- New `crates/ruvector-graph/src/index.rs` regression: build a graph
+  with 10k nodes carrying 128-dim embeddings, query top-k, assert
+  recall ≥ 90% vs brute-force cosine.
+
+**Acceptance test:** insert 10k node embeddings, run 100 queries,
+recall@10 ≥ 90% vs an in-test brute-force cosine baseline; round-trip
+the index to a `.rbpx` file via the new `save_property_index_rabitq`
+and reload bit-identically.
+
+**LoC budget:** ≤600 LoC source + ≤250 LoC tests.
+
+### P1.C — `ruvector-gnn` `differentiable_search`
+
+**Files to touch:**
+
+- `crates/ruvector-gnn/Cargo.toml` — add the rabitq dep behind a
+  default-on feature `rabitq` so the WASM build can opt out.
+- `crates/ruvector-gnn/src/search.rs:56` `differentiable_search` —
+  add a sibling `differentiable_search_rabitq(query, &
+  RabitqPlusIndex, top_k, temperature)`. Top-k via
+  `search_with_rerank`, softmax weights from the rerank f32 scores so
+  gradients stay meaningful.
+- `crates/ruvector-gnn/src/search.rs:105` `hierarchical_forward` —
+  parameterise so callers can pass a per-layer
+  `&RabitqPlusIndex` instead of a `&[Vec<f32>]`.
+
+**Acceptance test:** on the existing `test_differentiable_search`
+(`search.rs:204`), the rabitq path returns the same top-k ids and
+softmax weights within 1e-3 vs the reference cosine path on D=128
+n=10k. Also a new throughput micro-bench showing ≥ 2× QPS.
+
+**LoC budget:** ≤300 LoC source + ≤150 LoC tests.
+
+### Phase 1 acceptance gate
+
+All three candidates merged with green tests, no regressions on
+existing crate suites, no new clippy warnings. Total: 3 PRs, 4–5
+engineer-weeks, ~1400 LoC source + 600 LoC tests.
+
+### Phase 1 milestones
+
+- **Week 1–2.** P1.A (DiskANN backend). Has the highest §02 strategic
+  value and the sharpest existing call site for "where would 32×
+  earn its keep" — ADR-154 already named it.
+- **Week 2–3.** P1.B (graph VectorPropertyIndex). Independent of P1.A.
+- **Week 3–4.** P1.C (GNN). Smallest, lands last.
+- **Week 5.** Documentation pass: a §"Choosing a pattern" page added
+  to each consumer's README citing §03; bench summary in
+  `crates/ruvector-rabitq/BENCHMARK.md` extended with the three new
+  call sites' numbers.
+
+---
+
+## Phase 2 — Make `VectorKernel` real (~4–6 weeks)
+
+The trait at `crates/ruvector-rabitq/src/kernel.rs` is shipped but has
+**zero non-test callers**. Phase 2 changes that by wiring two kernels
+to two consumers — exactly the minimum to prove the dispatch policy
+isn't paper.
+
+### P2.A — Wire `VectorKernel` dispatch into ruLake
+
+**Files to touch:**
+
+- `crates/ruvector-rulake/src/lake.rs:590-630` — replace the doc-only
+  `// plug-point` with a real `register_kernel(Arc<dyn VectorKernel>)`
+  method, a `kernels: Vec<Arc<dyn VectorKernel>>` field, and the
+  dispatch policy from ADR-157. Used inside `search_batch` (already
+  has the right shape per its doc comment at `:595`).
+- `crates/ruvector-rulake/src/cache.rs:833` — re-route the batch scan
+  through the dispatcher.
+- `crates/ruvector-rulake/tests/` — new `kernel_dispatch.rs` testing:
+  (a) default kernel is `CpuKernel`; (b) registered determinism-false
+  kernel is filtered on `Consistency::Frozen`; (c) batch_size <
+  caps().min_batch is filtered.
+
+**Acceptance test:** `RuLake::cache_stats()` exposes which kernel
+served the last query (or last batch). Witness output is
+bit-identical regardless of which deterministic kernel served.
+
+### P2.B — Ship a portable SIMD `CpuSimdKernel`
+
+**Files to touch:**
+
+- `crates/ruvector-rabitq/src/kernel.rs` — add `CpuSimdKernel` behind
+  feature flag `simd`. Implementation uses `std::simd` (when stable)
+  or a `target_feature(enable = "avx2,popcnt")` portable path
+  otherwise; falls back to scalar via the existing `CpuKernel` if
+  detection fails.
+- `crates/ruvector-rabitq/Cargo.toml` — add the `simd` feature.
+
+**Acceptance test:** on the same Gaussian D=128 n=100k bench from
+`crates/ruvector-rabitq/src/main.rs`, the SIMD kernel achieves ≥ 1.5×
+QPS vs `CpuKernel` at bit-identical scan output (per the ADR-157 hard
+gate).
+
+### P2.C — Connect a second consumer
+
+Pick **one** of B1 (`ruvector-attention` KV cache) or C3
+(`ruvector-fpga-transformer`) for the second `VectorKernel` consumer.
+This is the smaller-effort half — the dispatch is already real in
+ruLake, the second consumer just adopts the same pattern.
+
+Likely B1 because the kernel surface there is the closest match to the
+existing rabitq hot path (asymmetric scan over a K-cache).
+
+**Acceptance test:** B1's KV cache, with `RabitqAsymIndex` behind
+`VectorKernel` dispatch, demonstrates the same bit-identical output on
+CPU vs SIMD, and ships a benchmark showing the ratio.
+
+### Phase 2 acceptance gate
+
+Two consumers using `VectorKernel`, two kernels available
+(`CpuKernel` + `CpuSimdKernel`). A first GPU kernel can land in
+**Phase 2.5** as a separate `ruvector-rabitq-cuda` crate that passes
+the ADR-157 acceptance gate (2× p95 OR 30% cost). Phase 2 itself does
+not commit to GPU.
+
+### Phase 2 LoC budget
+
+~600 LoC ruLake dispatch + 800 LoC SIMD kernel + 400 LoC second-
+consumer adoption + 500 LoC tests = ~2300 LoC across two crates and
+one new feature.
+
+---
+
+## Phase 3 — Cross-cutting story (1–2 ADRs, no code commitment)
+
+Phase 3 is a research-not-code phase. It commits the question
+"should RaBitQ be the workspace's canonical vector compression
+substrate?" and produces an ADR that either says yes (and lists the
+consequences) or no (and lists the alternatives).
+
+### P3.A — Draft ADR-160 "RaBitQ as the workspace's canonical 1-bit compression"
+
+The ADR would say:
+
+- All workspace crates that ship 1-bit binary vector compression use
+  `ruvector-rabitq`. Re-implementations are PR-blocked (Anti-pattern
+  A from §03).
+- 4-bit, 8-bit, and PQ tiers are **not** subsumed — RaBitQ is the
+  canonical *1-bit* path; ADR-001's tiered scheme stays for higher
+  bitwidths.
+- A migration plan for `ruvector-core::quantization::BinaryQuantized`
+  (the original 15–20% recall path called out in ADR-154
+  §"Measured gap"): deprecate, then delete, then point to RaBitQ.
+- Cross-cutting impact on `ruvector-graph`, `ruvector-gnn`,
+  `ruvector-attention`, `ruvector-temporal-tensor`, `ruvllm` — each
+  named, each with its preferred §03 pattern, each with effort
+  estimate.
+
+### P3.B — Optionally, ADR-161 "Memory-substrate consolidation around ruLake"
+
+Strictly downstream of ADR-156. If multiple new consumers (B2, B4,
+plus future agent crates) end up sitting on `RuLake`, this ADR commits
+that pattern: the agent-memory hierarchy, the ruvllm RAG cache, and
+the rvAgent witness handoff are one substrate, not three.
+
+### Phase 3 acceptance
+
+ADR-160 in `docs/adr/` with status "Proposed", reviewed by maintainers
+of the named consumer crates, and a one-page consequences section
+rolled into the relevant crates' `Cargo.toml` comments. No code
+changes — Phase 3 is the *decision* phase that drives Phases 4+ on the
+quarterly roadmap.
+
+### Phase 3 effort
+
+~2 engineer-weeks split across writing + review. ADR-class work, not
+implementation.
+
+---
+
+## Total roadmap effort
+
+- **Phase 1:** 4–5 engineer-weeks, ~2000 LoC.
+- **Phase 2:** 4–6 engineer-weeks, ~2300 LoC.
+- **Phase 3:** ~2 engineer-weeks (docs).
+
+**Total: ~10–13 engineer-weeks** to land three new consumers, make
+the kernel trait load-bearing, and lock the workspace position. This
+fits inside one quarter for a single engineer or 6 weeks for two.
diff --git a/docs/research/rabitq-integration/06-decision-record.md b/docs/research/rabitq-integration/06-decision-record.md
new file mode 100644
index 000000000..b30b16048
--- /dev/null
+++ b/docs/research/rabitq-integration/06-decision-record.md
@@ -0,0 +1,107 @@
+# 06 — Decision Record
+
+## The sharpest insight from the research
+
+**The `VectorKernel` trait is shipped, the `CpuKernel` exists, and the
+dispatch policy from ADR-157 is already specified — but no caller in
+the workspace wires it up.** The only reference is a doc comment at
+`crates/ruvector-rulake/src/lake.rs:595`. This means every consumer
+that thinks it's getting "free pluggable acceleration" by adopting the
+trait would actually be the **first non-test caller**, and would have
+to implement the dispatch policy itself.
+
+The implication is non-obvious: Pattern 2 (§03) is currently more
+expensive than Pattern 1 because there is no working dispatch
+implementation to copy. The right fix is to wire dispatch into ruLake
+*first* (Phase 2.A in §05), making it the canonical reference, then
+let other Pattern-2 consumers inherit the pattern. Otherwise we'll
+end up with two consumers each writing their own divergent dispatch
+policies and quietly breaking the determinism gate from ADR-157
+§"Determinism as a hard gate".
+
+This finding shifts the recommendation: don't start a Pattern-2
+integration in any new crate until ruLake's `register_kernel` is real.
+The §05 phase ordering is built around that.
+
+---
+
+## Top 3 integrations to start now
+
+1. **`ruvector-diskann` RaBitQ backend** (§02 A1; §05 P1.A). ADR-154
+   already named this as the next step; the consumer code is shaped
+   right; PQ replacement is a controlled scope; ≤500 LoC. Strategic
+   value: closes the "billion-scale on disk + DRAM" pitch.
+
+2. **`ruvector-graph` `VectorPropertyIndex`** (§02 A2; §05 P1.B).
+   Unblocks vector-keyed property lookup that the graph-transformer
+   and GNN consumers want; pairs naturally with #3; ≤600 LoC.
+
+3. **`ruvector-gnn` `differentiable_search`** (§02 A3; §05 P1.C). The
+   smallest of the three by LoC, the highest QPS multiplier of the
+   three by §02's analysis, and complements #2 directly. ≤300 LoC.
+
+All three use Pattern 1 (direct embed); all three pin
+`ruvector-rabitq = "^2.2"`; all three avoid the §03 anti-patterns.
+
+---
+
+## One thing we should refuse
+
+**Don't build per-consumer 1-bit compression.** A PR that adds an
+`ad-hoc binary code` module to any workspace crate — most likely
+under the rationale "we just need a quick binary path before ruLake
+is ready" — re-creates the original `BinaryQuantized` failure mode
+that ADR-154 was specifically written to retire (15–20% recall vs
+RaBitQ's 40.8% no-rerank / 98.9% rerank×5 on the same dataset, per
+the §"Measured gap" comparison).
+
+The cost of refusing is real (some consumers will wait one quarter
+for ADR-160 / Phase 3 before getting their dep wired). The cost of
+allowing is permanent: a fragmented compression substrate where the
+witness chain (ADR-155) and the kernel-dispatch determinism contract
+(ADR-157) both stop holding across crate boundaries.
+
+If a consumer genuinely cannot wait, they get Pattern 1 (direct embed
+of `ruvector-rabitq`) — not their own fork.
+
+---
+
+## Open questions for stakeholders
+
+1. **Do we commit to Phase 2 (`VectorKernel` real in ruLake) before
+   Phase 1 (three new direct-embed consumers) finishes?** Phase 1
+   produces no Pattern-2 consumers; Phase 2 has one (ruLake) plus
+   one other. Sequencing them concurrently is fine if there are two
+   engineers; sequentially Phase 1 first is the safer single-engineer
+   path because the §02 candidates with the highest §"Strategic
+   value" (A1, A2, A3) all happen to be Pattern 1.
+
+2. **Does ADR-160 (Phase 3.A) need to land before B2 (`ruvllm` →
+   ruLake)?** The ruvllm KV cache + RAG integration is the largest
+   single ROI in §02 but also the one most disrupted by getting
+   the substrate question wrong. If ADR-160 says "ruLake is the
+   canonical retrieval cache", B2 is straightforward; if it says "no
+   canonical cache, choose per consumer", B2 becomes a multi-week
+   design conversation.
+
+3. **Should `ruvector-rabitq` ship a portable SIMD kernel as part of
+   the default build, or behind a feature flag?** §05 P2.B sets it
+   behind `simd`. Default-on simplifies dispatch (every CPU caller
+   just gets the SIMD path) at the cost of the WASM/embedded
+   footprint (§04 §5). The WASM consumers don't yet exist, so
+   default-on is plausible — but reversing it later is a SemVer
+   minor bump.
+
+4. **Does Phase 2's second consumer choice between B1
+   (`ruvector-attention`) and C3 (`ruvector-fpga-transformer`) matter
+   strategically?** B1 is the realistic near-term win; C3 is the
+   research-mode showcase. Recommendation: B1 in Phase 2; C3 lands
+   only if a customer asks for FPGA inference.
+
+5. **Is there a customer pressure to ship a Node.js / WASM binding
+   parallel to the Python SDK (M1)?** None of §02 surveys this
+   directly. ruvector-py shipping in PR #381 is a precedent that
+   establishes the binding pattern; replicating it for Node and
+   WASM is mostly mechanical *if* §04's cross-language contract is
+   followed. Estimate: ~2 engineer-weeks per binding once §05 Phase
+   1 has landed.
diff --git a/docs/research/rabitq-integration/INDEX.md b/docs/research/rabitq-integration/INDEX.md
new file mode 100644
index 000000000..789e02cef
--- /dev/null
+++ b/docs/research/rabitq-integration/INDEX.md
@@ -0,0 +1,51 @@
+# RaBitQ Integration — Research Index
+
+This directory surveys how `ruvector-rabitq` (the rotation-based 1-bit
+quantizer published as crate `2.2.0`) is wired into the RuVector
+workspace today, and where else it could plausibly land. Output is a
+focused review, not a brain dump — read it in order.
+
+- [`01-current-integration.md`](01-current-integration.md) — every
+  call site that imports `ruvector_rabitq` today, with `crate:file:line`
+  references. Establishes the baseline of three real consumers
+  (`ruvector-rulake`, `ruvector-py`, the rabitq demo bin) and surfaces
+  what is shipped vs. scaffolded inside the rabitq crate itself
+  (notably `VectorKernel` exists; only `CpuKernel` implements it).
+
+- [`02-integration-opportunities.md`](02-integration-opportunities.md) —
+  candidate consumer crates, ranked by strategic value × engineering
+  effort. For each: what they store, where similarity matters in the
+  hot path, what 32× compression buys, the friction (typing,
+  determinism, witness propagation), and an honest tier
+  classification (now / mid-term / defer / kill).
+
+- [`03-architectural-patterns.md`](03-architectural-patterns.md) — the
+  three sane shapes for adding a new consumer: direct-embed, behind
+  the `VectorKernel` trait, or through ruLake. Maps each candidate
+  from §02 to its preferred pattern, and calls out the anti-patterns
+  (re-implementing rotation, ad-hoc compression, witness fragmentation)
+  that would silently break the existing ADRs.
+
+- [`04-cross-cutting-concerns.md`](04-cross-cutting-concerns.md) —
+  invariants every new integration must hold: determinism across
+  architectures, witness format compatibility, memory ownership,
+  API-version pinning, performance footprint on WASM/edge, cross-
+  language story. The `originals_flat`-encapsulation lesson from the
+  Python SDK PR is recorded as a load-bearing constraint.
+
+- [`05-roadmap.md`](05-roadmap.md) — three phases, each with scope,
+  files to touch, acceptance test, and LoC budget. Phase 1 picks the
+  three top-of-bucket integrations from §02. Phase 2 makes the
+  `VectorKernel` trait load-bearing for two consumers across two
+  hardware targets. Phase 3 is the optional ADR-class question of
+  whether RaBitQ should be the workspace's canonical compression.
+
+- [`06-decision-record.md`](06-decision-record.md) — one page. The
+  single sharpest insight from this research, the three integrations
+  to start now, the one path we should refuse, and the open
+  questions for stakeholders.
+
+All references are to absolute paths under
+`/home/ruvultra/projects/ruvector/`. Numbers cited (957 QPS, 32×,
+1.02× tax, etc.) trace back to `crates/ruvector-rabitq/BENCHMARK.md`
+and `crates/ruvector-rulake/BENCHMARK.md`.