Skip to content

TurboQuant encoding for Vectors#7269

Merged
connortsui20 merged 13 commits intodevelopfrom
ct/turboquant
Apr 4, 2026
Merged

TurboQuant encoding for Vectors#7269
connortsui20 merged 13 commits intodevelopfrom
ct/turboquant

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

@connortsui20 connortsui20 commented Apr 2, 2026

Continuation of #7167, authored by @lwwmanning

Summary

Lossy quantization for vector data (e.g., embeddings) based on TurboQuant (https://arxiv.org/abs/2504.19874). Implements the MSE-only variant (Stage 1 of RFC 0033) at 1-8 bits per coordinate (0 for empty arrays), defaulting to 8-bit near-lossless compression (but still a small amount because we use SRHT instead of random orthogonal rotation matrix, something about not satisfying the Haar assumption?).

Key components:

  • TurboQuant array encoding with 4 slots: quantized codes, norms, centroids, and rotation signs. Note that we should probably abstract the codes and centroids as a dictionary encoded thing so we don't have to duplicate pushdown rules, and we might want to make a matrix multiplication ScalarFn expression for the rotations.
  • Structured Random Hadamard Transform (SRHT) for O(d log d) rotation, fully self-contained with no external linear algebra library. This is what claude came up with, but we can see in testing that while this is practical and more efficient, we lose some of the assumptions that a Haar-random orthogonal matrix gives us. I think this is something we can play around with because it's abstracted into a discrete step of the algorithm.
  • Max-Lloyd centroid computation on the Beta distribution for the given dimension.
  • Approximate cosine similarity and dot product computed directly on quantized arrays without full decompression.
  • Pluggable TurboQuantScheme for the cascading compressor.
  • Minimum dimension of 128 (TurboQuant::MIN_DIMENSION) for SRHT quality guarantees.
  • Default 8-bit encoding (MSE ~4e-5, exact 4x compression on f32).
  • Adds vortex_tensor::initialize() for session registration of tensor types, encodings, and scalar functions.

API Changes

  • Adds TurboQuant encoding in vortex-tensor with turboquant_encode() and TurboQuantConfig, and new types TurboQuantData and TurboQuantArray.
  • Adds TurboQuantScheme for compressor integration.
  • Adds TurboQuant::MIN_DIMENSION (128) constant.
  • Adds float_from_f32<T: Float + FromPrimitive> shared helper for infallible f32-to-float conversion.

Testing (claude-generated)

  • Roundtrip tests across bit widths (1-8) and dimensions (128, 256, 768, 1024).
  • MSE quality bounds verified against the theoretical bound from Theorem 1.
  • Edge cases: empty arrays, single-row arrays, all-zero vectors, dimension rejection below 128.
  • Float type coverage: f16, f32, f64 input encoding and roundtrip.
  • Nullable vector support: validity propagation through encode, decode, slice, take, L2 norm readthrough.
  • Quantized-domain cosine similarity and dot product accuracy tests.
  • Serde roundtrip (serialize/deserialize).
  • Compute pushdown tests: slice, take, scalar_at.
  • Compression ratio estimates for typical embedding dimensions.
  • Centroid correctness: count, sorted, symmetric, within bounds, caching, boundary rejection.
  • SRHT rotation: determinism, roundtrip, norm preservation, sign export/import roundtrip.

@connortsui20 connortsui20 added the changelog/feature A new feature label Apr 2, 2026
@connortsui20 connortsui20 force-pushed the ct/turboquant branch 12 times, most recently from fb6bbcf to 3d7dfed Compare April 3, 2026 22:02
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 3, 2026

Merging this PR will not alter performance

✅ 1122 untouched benchmarks
⏩ 1530 skipped benchmarks1


Comparing ct/turboquant (5285a13) with develop (e3c7401)

Open in CodSpeed

Footnotes

  1. 1530 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@connortsui20 connortsui20 marked this pull request as ready for review April 3, 2026 23:03
@connortsui20
Copy link
Copy Markdown
Contributor Author

connortsui20 commented Apr 3, 2026

I would say this is ready for review now, but only with respect to the structure. I have yet to go through the implementation and make sure things make sense, but I have made it so the structure makes sense and we correctly handle the different floating point type inputs as well as null vectors.

I think it would be good to get a review now, and maybe we should just merge this and iterate later.

Copy link
Copy Markdown
Contributor

@gatesn gatesn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it

@connortsui20
Copy link
Copy Markdown
Contributor Author

I'll rebase and fix the errors tomorrow

Copy link
Copy Markdown
Contributor

@lwwmanning lwwmanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed Review — TurboQuant Encoding (RFC 0033 Stage 1)

This PR implements the TurboQuant lossy vector quantization algorithm (arXiv:2504.19874) as a new encoding in the vortex-tensor crate. It is intended to be Stage 1 of RFC 0033. Overall this is well-engineered work — the algorithm is correctly implemented, the code follows Vortex conventions, tests are thorough, and the documentation is excellent. Requesting changes for a few RFC compliance issues.

RFC 0033 Stage 1 Compliance

RFC Requirement Status Details
QJL support removed No QJL code present
4 slots (codes, norms, centroids, rotation_signs) Exactly 4 slots
Scheme default: 5-bit MSE-only Default is 4-bit (compress.rs:44)
Norms dtype: same-or-wider (f64→f64, f32/f16→f32) Correct
Scheme minimum: dimension ≥ 128 TurboQuantScheme::matches() accepts dimension ≥ 3
Metadata: protobuf for forward compat ⚠️ Raw single byte, not protobuf — should be switched now to avoid a migration path later

Strengths

  • Clean architecture. Module structure follows established Vortex patterns (vtable macro, data struct with try_new/new_unchecked, named slot enum, separate compress/decompress). Consistent with BitPacked, RLE, Sparse, etc.
  • Correct SRHT implementation. The 3-round structured random Hadamard transform is correctly implemented with XOR-based branchless sign application (auto-vectorizes to vpxor/veor), iterative Walsh-Hadamard butterfly, and proper normalization factor 1/(n·√n). Forward/inverse symmetry is verified by tests.
  • Thorough validation. TurboQuantData::validate() checks codes dtype, norms dtype matching, centroids power-of-2 constraint, rotation signs length, and degenerate/empty invariants. Debug assertions in new_unchecked add an extra safety net.
  • Smart compute pushdowns. Slice/take operate on per-row children (codes, norms) and clone shared children (centroids, rotation_signs). Quantized cosine similarity and dot product avoid full decompression. L2 norm readthrough is O(1) from stored norms.
  • Excellent test coverage (911 lines): roundtrip, MSE quality bounds, edge cases, nullable vectors, serde roundtrip, compute pushdowns, L2 norm readthrough.
  • Good documentation. Module docs include theoretical MSE bounds, compression ratio tables, and a working example. Per-function docs explain algorithmic context.

Key Issues (see inline comments)

  1. Default bit_width should be 5, not 4 — RFC specifies "5-bit MSE-only (32 centroids)"
  2. Scheme minimum should be dimension ≥ 128 — RFC specifies auto-selection only for d ≥ 128
  3. Metadata serialization should use protobuf — raw byte can't be extended backward-compatibly; FixedShapeTensor in the same crate uses prost
  4. Unresolved TODO in f32_to_t — should be resolved or documented before merge
  5. Cosine similarity doc claims "same rotation" — but doesn't validate this; should clarify assumptions

Algorithmic Note

The theoretical MSE bound (Theorem 1 in the paper) is proved for Haar-distributed random orthogonal matrices, not SORF/SRHT. The SRHT is a practical approximation. The RFC explicitly acknowledges this. The tests empirically validate the bound holds with SRHT, which is good — but worth noting in the module docs.

Minor Items

  • new_unchecked is pub — other encodings use pub(crate)
  • No f16 input roundtrip test
  • No quantized dot product test (only cosine similarity)
  • No TurboQuantScheme::compress() integration test
  • Global centroid cache is unbounded (fine in practice, worth documenting)

Generated by Claude Code

Copy link
Copy Markdown
Contributor

@lwwmanning lwwmanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: after further consideration, the default bit_width should be 8 (near-lossless) rather than the RFC's 5. At 8 bits the normalized MSE is ~4e-5 — effectively transparent — while still achieving 3-4x compression on f32 data. This is a safer default for a general-purpose encoding; users who want more aggressive compression can explicitly configure lower bit widths.


Generated by Claude Code

Copy link
Copy Markdown
Contributor

@lwwmanning lwwmanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude diffed against RFC 33 and had a few minor comments that look valid/easy to fix before merging

@connortsui20
Copy link
Copy Markdown
Contributor Author

So based on my reading of the review comments, the things we actually want to change are:

  • Default bitwidth changed to 8 (makes sense)
  • A few more tests for f16, f64, very large dimensions, and test helper functions themselves
  • Some docs need to be updated
  • Dimension must be >= 128 (is there a theoretical backing for this number)?
  • Fix the f32 cast semantics (we have some safety guarantees that need to be documented)

Everything else is either wrong or something we can think about later.

lwwmanning and others added 6 commits April 4, 2026 13:03
Lossy quantization for vector data (e.g., embeddings) based on TurboQuant
(https://arxiv.org/abs/2504.19874). Supports both MSE-optimal and
inner-product-optimal (Prod with QJL correction) variants at 1-8 bits per
coordinate.

Key components:
- Single TurboQuant array encoding with optional QJL correction fields,
  storing quantized codes, norms, centroids, and rotation signs as children.
- Structured Random Hadamard Transform (SRHT) for O(d log d) rotation,
  fully self-contained with no external linear algebra library.
- Max-Lloyd centroid computation on Beta(d/2, d/2) distribution.
- Approximate cosine similarity and dot product compute directly on
  quantized arrays without full decompression.
- Pluggable TurboQuantScheme for BtrBlocks, exposed via
  WriteStrategyBuilder::with_vector_quantization().
- Benchmarks covering common embedding dimensions (128, 768, 1024, 1536).

Also refactors CompressingStrategy to a single constructor, and adds
vortex_tensor::initialize() for session registration of tensor types,
encodings, and scalar functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Will Manning <will@willmanning.io>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
We are going to implement this later as a separate encoding (if we
decide to implement it at all because word on the street is that the
MSE + QJL is not actually better than MSE on its own).

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
It doesn't really make a lot of sense for us to define this as an
encoding for `FixedSizeList`.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
- Use ExecutionCtx in TurboQuant compress path and import ExecutionCtx
- Extend dtype imports with Nullability and PType to support extension
  types
- Wire in extension utilities: extension_element_ptype and
  extension_list_size for vector extensions
- Remove dimension and bit_width from slice/take compute calls to rely
  on metadata
- Update TurboQuant mod docs to mention VortexSessionExecute
- Change scheme.compress to use the provided compressor argument (not
  _compressor)
- Add an extensive TurboQuant test suite (roundtrip, MSE bounds, edge
  cases, f64 input, serde roundtrip, and dtype checks)
- Align vtable imports to new metadata handling (remove unused
  DeserializeMetadata/SerializeMetadata references)

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20
Copy link
Copy Markdown
Contributor Author

connortsui20 commented Apr 4, 2026

Comments addressed.

There is a small issue related to #7269 (comment)

This is what my claude instance told me after seeing pretty high errors:

Details

The theoretical guarantee. Theorem 1's MSE bound ((sqrt(3) * pi / 2) / 4^b) is proved for Haar-distributed random orthogonal matrices — matrices drawn uniformly from the orthogonal group O(d). The proof depends on two properties that Haar matrices give
you exactly and SRHT gives you approximately:

  1. Uniform distribution on the sphere. For a Haar matrix Π and any unit vector x, Πx is exactly uniformly distributed on S^{d-1}. This means each coordinate of the rotated vector follows exactly the Beta((d-1)/2, (d-1)/2) distribution on [-1, 1]. Our
    Max-Lloyd centroids are computed for this distribution, so they're exactly optimal.

With SRHT, the rotated coordinates are approximately Beta-distributed. The centroids are slightly suboptimal for the actual marginals.

  1. Near-independence of coordinates. The proof relies on coordinates of Πx being "nearly independent" in high dimensions (a deep result from high-dimensional probability, reference [55] in the paper — Vershynin). This ensures quantization errors across
    coordinates don't conspire to accumulate worse than expected.

With SRHT, coordinates are uncorrelated but have different higher-order dependence structure. The 3-round structure (H·D₃·H·D₂·H·D₁) gives good mixing but it's not equivalent to full randomness.

What we gain in exchange: O(d log d) computation and O(d) storage instead of O(d²) for both. The SRHT butterfly + XOR sign application auto-vectorizes into SIMD. For d=1024, that's a ~1000x speedup over a full matrix multiply.

In practice the gap is small. Our tests show:

  • At d=128, 4-bit: MSE is well within the 1x theoretical bound
  • At d=1024, 4-bit: MSE exceeds the bound by ~20% (0.0127 vs 0.0106)
  • Mixing quality improves with dimension (more butterfly stages: 3×log₂(d))

The RFC's fallback plan if SRHT ever proves insufficient is to use a full B×B random orthogonal matrix per block at Stage 2 block sizes (e.g., B=256 → 256KB storage per block)

We likely want to implement both (doesn't seem to be too hard) and then compare them later.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 requested a review from lwwmanning April 4, 2026 17:09
@connortsui20 connortsui20 enabled auto-merge (squash) April 4, 2026 17:24
@connortsui20 connortsui20 merged commit b2a5a70 into develop Apr 4, 2026
108 of 109 checks passed
@connortsui20 connortsui20 deleted the ct/turboquant branch April 4, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants