feat(ruvector-graph): VectorPropertyIndex — RaBitQ-backed kNN over node properties (Phase 1 item #2)#387
Open
feat(ruvector-graph): VectorPropertyIndex — RaBitQ-backed kNN over node properties (Phase 1 item #2)#387
Conversation
…de properties Phase 1 item #2 from `docs/research/rabitq-integration/05-roadmap.md`. Adds a vector-keyed kNN index for graph nodes via direct-embed (Pattern 1) of `ruvector-rabitq`. Callers can now ask "find the N node ids whose vector property is closest to query" without standing up a separate index crate. ## Surface ```rust let idx = VectorPropertyIndex::build( &graph, "embedding", VectorPropertyIndexConfig { seed: 42, rerank_factor: 20 }, )?; let hits: Vec<(NodeId, f32)> = idx.knn(&query, k)?; ``` Behind the `rabitq` cargo feature (default-on; `--no-default-features` keeps the graph crate buildable without ruvector-rabitq). ## Property-table shape encountered `NodeId = String`; `GraphDB` stores `DashMap<NodeId, Node>` where each `Node.properties: HashMap<String, PropertyValue>` and vector properties live as `PropertyValue::FloatArray(Vec<f32>)` — already a contiguous f32 slab. Added one new public accessor `GraphDB::node_ids() -> Vec<NodeId>` so the index can enumerate without becoming a friend of the DashMap. ## Important determinism finding `DashMap` iteration order is **shard-dependent**: two builds in the same process can disagree on which `NodeId` lives at row 0. Without a fix this would silently break ADR-154's `(seed, graph) → bit-identical codes` guarantee across runs and across shard-count changes. Fix: `VectorPropertyIndex::build` sorts `NodeId`s before encoding. The cost is one O(n log n) string sort per build; the benefit is that two `(seed, graph)` pairs always produce the same row→NodeId mapping. Verified by `byte_identical_query_results_for_same_seed`. ## Recall + memory at the test sizes - n=1k, dim=128, rerank_factor=20: recall@10 = **1.000** vs brute-force (floor: 0.85) codes / originals ratio = 0.176 (rotation matrix dominates at small n; asymptotically codes ≤ originals/16 + dim²·4) The 1/16 contract holds asymptotically; small-n is rotation-matrix- dominated which is the published ADR-154 behavior. ## Acceptance test The roadmap's M1 acceptance gate (100k × 768d, recall@10 ≥ 0.95, DRAM ≤ 1/16 of f32 baseline) is shipped as a criterion bench at `benches/vector_property_index.rs` defaulting to n=2k. Override with `VECTOR_PROPERTY_INDEX_N=100000 VECTOR_PROPERTY_INDEX_DIM=768 cargo bench -p ruvector-graph --features rabitq` for the full scale. ## No abstraction yet The graph crate had no quantizer trait. Kept things concrete (`VectorPropertyIndex` wraps `RabitqPlusIndex` directly) rather than introducing one. Phase 1 has one quantizer; an abstraction layer is unjustified now and easy to add in Phase 2. ## Verification cargo build --workspace → clean cargo build -p ruvector-graph --no-default-features → clean cargo build -p ruvector-graph --features rabitq → clean cargo clippy --workspace --all-targets --no-deps -- -D warnings → clean cargo fmt --all --check → clean cargo test -p ruvector-graph --features rabitq --lib → 135 pass cargo test -p ruvector-graph --features rabitq → 142 pass total (135 lib + 7 new integration) New tests in `tests/vector_property_index.rs`: - `build_and_query_returns_self_at_distance_zero` - `recall_at_10_meets_floor_vs_brute_force` - `byte_identical_query_results_for_same_seed` (determinism) - `build_skips_nodes_without_target_property` - `build_rejects_dim_mismatch` - `len_matches_indexed_node_count` - `empty_graph_yields_empty_index` ## Files - `src/vector_property_index.rs` (~210 LoC) — new module - `src/lib.rs` (+8) — gated `pub mod` + re-exports - `src/graph.rs` (+8) — `node_ids()` accessor - `src/error.rs` (+9) — `RabitqIndex(String)` variant + gated `From<RabitqError>` - `Cargo.toml` (+5) — optional dep + `rabitq` feature, folded into `full` - `tests/vector_property_index.rs` (+245) - `benches/vector_property_index.rs` (+95) — env-var-tunable Refs: `docs/research/rabitq-integration/05-roadmap.md` Phase 1 item #2, ADR-154 (RaBitQ determinism). Co-Authored-By: claude-flow <ruv@ruv.net>
4 tasks
ruvnet
added a commit
that referenced
this pull request
Apr 26, 2026
Unblocks the 7 stacked PRs (#381-#387) and turns `main`'s CI green for the first time in days. Two issues fixed: ## Failure 1 — Security audit (was: 8 vulnerabilities) `cargo audit` is now exit 0. 4 of the 5 critical advisories were fixed by version bumps; only the unfixable one is ignored. **Dep-bumped:** - `rustls-webpki 0.101.7` + `0.103.10` → `0.103.13` via `cargo update -p rustls-webpki@0.103.10`. Patches: RUSTSEC-2026-0098 (URI name constraints) RUSTSEC-2026-0099 (wildcard name constraints) RUSTSEC-2026-0104 (CRL parsing panic) - `idna 0.5.0` → `1.1.0` via `validator 0.18 → 0.20` in `examples/scipix`. Patches RUSTSEC-2024-0421 (Punycode acceptance). - Bonus: `reqwest 0.11 → 0.12` (in `ruvector-core` + `examples/benchmarks`) and `hf-hub 0.3 → 0.4` (in `ruvector-core` + `ruvllm` + `ruvllm-cli`). Removes the entire legacy `rustls 0.21` / `rustls-webpki 0.101.7` subtree from the lockfile. **Ignored** (single advisory, with rationale): - `RUSTSEC-2023-0071` (rsa Marvin timing sidechannel) — no upstream fix available; we don't expose RSA decryption services. Documented in `.cargo/audit.toml`. **Unmaintained warnings** (16 total — proc-macro-error, derivative, instant, paste, bincode 1, pqcrypto-{kyber,dilithium}, rustls-pemfile 1, rusttype, wee_alloc, number_prefix, rand_os, core2, lru, pprof, rand) — each given a one-line justification in `.cargo/audit.toml` so CI stays green on them while the team decides whether to chase upstream replacements. ## Failure 2 — Tests timeout (was: 30-min job timeout cancellation) `.github/workflows/ci.yml` `test` job is now a `matrix` with `fail-fast: false` and `timeout-minutes: 45`. Six parallel shards under `cargo nextest run` (installed via `taiki-e/install-action@v2`) plus a separate `cargo test --doc` step (nextest doesn't run doctests): | Shard | Crates | |------------------|---------------------------------------------| | vector-index | rabitq, rulake, diskann, graph, gnn, cnn | | rvagent | 10 rvagent-* crates | | ruvix | 16 ruvix-* crates | | ruqu-quantum | 5 ruqu* crates | | ml-research | attention, mincut, scipix, fpga-transformer,| | | sparse-inference, sparsifier, solver, | | | graph-transformer, domain-expansion, | | | robotics | | core-and-rest | --workspace minus the above | `Swatinem/rust-cache@v2` is keyed per shard. Audit job switched to `taiki-e/install-action` for `cargo-audit` (faster than `cargo install --locked`). ## Verification cargo audit → exit 0 cargo build --workspace --exclude ruvector-postgres → clean cargo clippy --workspace --exclude ruvector-postgres --no-deps -- -D warnings → exit 0 cargo fmt --all --check → exit 0 ## Cargo.lock churn 166-line diff, net ~120 lines removed (more deletions than additions). Removed: `idna 0.5.0`, `rustls-webpki 0.101.7`, `validator 0.18`, `validator_derive 0.18`, `proc-macro-error 1.0.4`. Added: `rustls-webpki 0.103.13`, `validator 0.20`, `proc-macro-error2`, `hf-hub 0.4.3`, `reqwest 0.12.28`. No suspicious crates. ## Recommended merge order 1. **This PR first** — unblocks every other PR's CI. 2. After this lands and main is green, rebase the 7 open PRs (#381-#387) one at a time. The DiskANN stack (#383→#384→#385→#386) must merge in numeric order. #381 (Python SDK), #382 (research), #387 (graph property index) are independent and can merge in any order after their CI goes green on the rebase. Co-Authored-By: claude-flow <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1 item #2 from
docs/research/rabitq-integration/05-roadmap.md. Adds a vector-keyed kNN index for graph nodes via direct-embed (Pattern 1) ofruvector-rabitq. Callers can now do "find the N node ids whose vector property is closest to query" without standing up a separate index crate.Behind the
rabitqcargo feature, default-on.--no-default-featureskeeps the graph crate buildable without ruvector-rabitq.Important determinism finding
DashMapiteration order is shard-dependent. Two builds in the same process can disagree on whichNodeIdlives at row 0. Without a fix this would silently break ADR-154's(seed, graph) → bit-identical codesguarantee across runs and shard-count changes.Fix:
VectorPropertyIndex::buildsortsNodeIds before encoding. One O(n log n) string sort per build; row→NodeId mapping now stable across runs. Verified bybyte_identical_query_results_for_same_seed.Recall + memory at test sizes
Acceptance test
The roadmap's gate (100k × 768d, recall@10 ≥ 0.95, DRAM ≤ 1/16 f32) is shipped as a criterion bench at
benches/vector_property_index.rsdefaulting to n=2k. Override:No abstraction yet (deliberate)
The graph crate had no quantizer trait. Kept
VectorPropertyIndexconcrete (wrapsRabitqPlusIndexdirectly). Phase 1 has one quantizer; an abstraction is unjustified now and easy to add in Phase 2 if a second backend joins.Verification
cargo build --workspace→ cleancargo build -p ruvector-graph --no-default-features→ cleancargo build -p ruvector-graph --features rabitq→ cleancargo clippy --workspace --all-targets --no-deps -- -D warnings→ cleancargo fmt --all --check→ cleancargo test -p ruvector-graph --features rabitq→ 142 pass (135 lib + 7 new integration)7 new integration tests:
build_and_query_returns_self_at_distance_zerorecall_at_10_meets_floor_vs_brute_forcebyte_identical_query_results_for_same_seed(determinism)build_skips_nodes_without_target_propertybuild_rejects_dim_mismatchlen_matches_indexed_node_countempty_graph_yields_empty_indexIndependent of the DiskANN stack
Branched from
main(PR #380's merge7a599b7c). No conflicts with the DiskANN PR chain (#383→#386). Different crate, different reviewer audience.🤖 Generated with claude-flow