Skip to content

feat(rvf-runtime): expose public vector iterator on RvfStore#557

Open
thesoulpole wants to merge 1 commit into
ruvnet:mainfrom
thesoulpole:feat/rvf-public-vector-iterator
Open

feat(rvf-runtime): expose public vector iterator on RvfStore#557
thesoulpole wants to merge 1 commit into
ruvnet:mainfrom
thesoulpole:feat/rvf-public-vector-iterator

Conversation

@thesoulpole

Copy link
Copy Markdown

Problem

RvfStore::query() returns only (id, distance) (SearchResult), and the underlying (id, vector) reader (VectorData, read_path::read_vec_seg_payload) is pub(crate). There is no public way to read vectors back out of an opened store.

This blocks anything that wants to use an .rvf as a source of vectors rather than just a query target. Concretely, ruLake's BackendAdapter needs to iterate every (id, vector) to prime its RaBitQ cache, and currently can't — documented upstream in ruvnet/RuLakedocs/research/rvf-backend-blocker.md (Path A).

Change

Two methods on RvfStore:

  • iter_vectors(&self) -> impl Iterator<Item = (u64, &[f32])> — lazy, zero-copy; borrows the in-memory vector store.
  • read_all_vectors(&self) -> Vec<(u64, Vec<f32>)> — owned convenience over the iterator.

Both skip deleted ids, matching query() visibility semantics. This is just exposing data already materialized in memory — it mirrors the existing ids().filter_map(...) walk inside query_with_envelope. No format change, no new IO path.

Test

read_all_vectors_round_trips_and_excludes_deleted — ingests known (id, vector) pairs, asserts both accessors return them, and asserts a deleted id is excluded. Passes under cargo test -p rvf-runtime.

query() returns only (id, distance) (SearchResult), and the (id, vector)
reader (VectorData / read_vec_seg_payload) was pub(crate) — so there was no
public way to read vectors back out of an opened store.

Adds two methods on RvfStore:
  - iter_vectors() -> impl Iterator<Item = (u64, &[f32])>  (lazy, zero-copy)
  - read_all_vectors() -> Vec<(u64, Vec<f32>)>             (owned convenience)

Both skip deleted ids, matching query() visibility. No format change and no
new IO path — exposes what is already materialized in memory (mirrors the
existing walk in query_with_envelope). Unblocks external cache backends
(e.g. ruLake's BackendAdapter) priming a quantized index without re-encoding.
Test included.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant