Tracking Issue: Vector Similarity Search

# Vector Similarity Search

We would like to add vector search support to Vortex.

## Plan

At a minimum, we likely need to support (some of this already exists):

- a stable vector representation
- reusable normalization (either as an encoding or potentially a serialized scalar fn - issue potetially coming soon)  
- exact and approximate execution paths (since quantization is lossy)
- vector-specific encodings (like TurboQuant or Spherical transforms)
- compressor/runtime optimizations for common query shapes
- persisted indexes (probably supported by a new layout)
- a clear user-facing model for lossy storage and approximate compute

Note that we still have not decided how to communicate or expose approximation/lossiness in a principled way.

## Design

<details>

- `Vector` should be a first-class data model for fixed-length float embeddings, with clean integration into the dtype system, expression system, serialization, and compression pipeline.
- Similarity metrics should be modeled as first-class operations, not just ad hoc helper functions. At minimum this includes `inner_product`, `cosine_similarity`, and `l2_norm`.
- Unit-norm factoring should be a reusable encoding or representation rather than something hardcoded inside TurboQuant. This gives us a common substrate for cosine similarity, normalized-dot-product rewrites, and future vector encodings.
- Vector encodings should be pluggable and composable in the same spirit as the rest of Vortex. TurboQuant is the first serious candidate, but it should not be the only path and should not have to own normalization, approximation semantics, and query execution by itself (especially since it is lossy).
- Similarity search needs dedicated execution paths for the common “constant query vector against a column of vectors” case. This shows up naturally as `ConstantArray` on one side, and we should not pay the same cost as the fully general pairwise case.

There are also more high-level things that likely deserve their own issues:

- We should distinguish exact scan, encoded scan, and indexed search as separate but interoperable execution strategies. These are different tradeoffs and should be visible to the planner.
- Vector search likely wants a first-class top-k / nearest-neighbor operator rather than forcing users to manually compose “compute similarity for every row, then sort, then limit”.
- Persisted vector indexes should be part of the story. Even if we start with exact flat scan and encoded scan, the architecture should leave room for ANN indexes and row-id based retrieval.
- We need a principled API for approximation and lossiness. Right now `ApproxOptions` is a useful start, but it is not yet a full user-facing contract for what is allowed to be approximate, when approximation is chosen automatically, or how accuracy tradeoffs are surfaced.

</details>

## Steps / History

- [x] Add a first-class `Vector` extension type: https://github.com/vortex-data/vortex/pull/6964
- [x] Add related expressions
    - [x] Add `cosine_similarity`: https://github.com/vortex-data/vortex/pull/6812
    - [x] Add `l2_norm`: https://github.com/vortex-data/vortex/pull/6964
    - [x] Add `l2_denorm`: https://github.com/vortex-data/vortex/pull/7329
    - [x] Add `inner_product`: https://github.com/vortex-data/vortex/pull/7269
    - [ ] Add `sorf` or some "make random" reversible expression
- [x] Add an initial `ApproxOptions` model for tensor expressions: https://github.com/vortex-data/vortex/pull/7226
- [x] Add initial TurboQuant encoding and quantized-domain similarity support: https://github.com/vortex-data/vortex/pull/7269
    - [x] Make TurboQuant robust enough for real use
        - [x] https://github.com/vortex-data/vortex/pull/7300
        - [x] https://github.com/vortex-data/vortex/pull/7301
        - [x] https://github.com/vortex-data/vortex/pull/7320
        - [x] https://github.com/vortex-data/vortex/pull/7326
        - [x] https://github.com/vortex-data/vortex/pull/7330
    - [x] Modularize into `L2Denorm(norms, Sorf(matrix, Dict(centroids, codes)))`
        - [x] https://github.com/vortex-data/vortex/pull/7349
- [x] `ScalarValue::Array` support for perfomant list scalars (in progress): https://github.com/vortex-data/vortex/pull/6717
    - [x] (HACK) just "decode" the vector as a scalar value into a compact buffer 
- [ ] Optimize vector compute when one side is a `ConstantArray`
    - [ ] https://github.com/vortex-data/vortex/pull/7394

And then is no particular order:

- [x] [TurboQuant RFC 33](https://vortex-data.github.io/rfcs/rfc/0033.html)
- [x] https://github.com/vortex-data/vortex/pull/7391
- [ ] Decide how approximate compute is requested, propagated, and explained
    - [ ] Decide how lossy encodings are represented in metadata and surfaced to users
- [ ] Decide whether we want a dedicated normalized-vector encoding, a norm-splitting encoding (serialized scalar fn), or both
- [ ] Add a first-class nearest-neighbor / top-k search operator (how do we do this)?
- [ ] Vector indexes
    - [ ] Add a persisted vector index abstraction?
- [ ] Integrate vector indexes with filtering / row selection / row-id recovery (this should fall out for free?)
- [ ] Python Bindings (this is blocked by python bindings for extension types)
- [ ] Define correctness / recall / latency benchmarks for vector search
- [ ] Add end-to-end tests for exact scan, encoded scan, and indexed search
- [ ] Add fixture for backcompat tests

## Unresolved Questions

- What is the right abstraction for unit normalization: an encoding, a wrapper type, or just a rewrite rule?
- Should cosine similarity be implemented primarily as normalized dot product, or should it remain a distinct primitive all the way through planning and execution?
- What should the first-class query surface look like for vector search?
- Should Vortex expose a dedicated nearest-neighbor operator, or should this remain expression + order-by + limit for now?
- What is the right boundary between exact compute and approximate compute in the public API?
- How do we meaningfully communicate “lossy” storage to users?
- How do we meaningfully communicate “approximate” compute to users?
- Should the compressor automatically choose a lossy encoding or approximate execution path, or must that always be opt-in (I feel like this should be opt-in)?
- How should approximate results interact with determinism, reproducibility, and explainability?
- Which index strategy should we build first?
- How should vector indexes compose with the existing layout / pruning / scan infrastructure?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Issue: Vector Similarity Search #7297

Vector Similarity Search

Plan

Design

Steps / History

Unresolved Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tracking Issue: Vector Similarity Search #7297

Description

Vector Similarity Search

Plan

Design

Steps / History

Unresolved Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions