TurboQuant encoding for Vectors by connortsui20 · Pull Request #7269 · vortex-data/vortex

connortsui20 · 2026-04-02T18:30:57Z

Continuation of #7167, authored by @lwwmanning

Summary

Lossy quantization for vector data (e.g., embeddings) based on TurboQuant (https://arxiv.org/abs/2504.19874). Implements the MSE-only variant (Stage 1 of RFC 0033) at 1-8 bits per coordinate (0 for empty arrays), defaulting to 8-bit near-lossless compression (but still a small amount because we use SRHT instead of random orthogonal rotation matrix, something about not satisfying the Haar assumption?).

Key components:

TurboQuant array encoding with 4 slots: quantized codes, norms, centroids, and rotation signs. Note that we should probably abstract the codes and centroids as a dictionary encoded thing so we don't have to duplicate pushdown rules, and we might want to make a matrix multiplication ScalarFn expression for the rotations.
Structured Random Hadamard Transform (SRHT) for O(d log d) rotation, fully self-contained with no external linear algebra library. This is what claude came up with, but we can see in testing that while this is practical and more efficient, we lose some of the assumptions that a Haar-random orthogonal matrix gives us. I think this is something we can play around with because it's abstracted into a discrete step of the algorithm.
Max-Lloyd centroid computation on the Beta distribution for the given dimension.
Approximate cosine similarity and dot product computed directly on quantized arrays without full decompression.
Pluggable TurboQuantScheme for the cascading compressor.
Minimum dimension of 128 (TurboQuant::MIN_DIMENSION) for SRHT quality guarantees.
Default 8-bit encoding (MSE ~4e-5, exact 4x compression on f32).
Adds vortex_tensor::initialize() for session registration of tensor types, encodings, and scalar functions.

API Changes

Adds TurboQuant encoding in vortex-tensor with turboquant_encode() and TurboQuantConfig, and new types TurboQuantData and TurboQuantArray.
Adds TurboQuantScheme for compressor integration.
Adds TurboQuant::MIN_DIMENSION (128) constant.
Adds float_from_f32<T: Float + FromPrimitive> shared helper for infallible f32-to-float conversion.

Testing (claude-generated)

Roundtrip tests across bit widths (1-8) and dimensions (128, 256, 768, 1024).
MSE quality bounds verified against the theoretical bound from Theorem 1.
Edge cases: empty arrays, single-row arrays, all-zero vectors, dimension rejection below 128.
Float type coverage: f16, f32, f64 input encoding and roundtrip.
Nullable vector support: validity propagation through encode, decode, slice, take, L2 norm readthrough.
Quantized-domain cosine similarity and dot product accuracy tests.
Serde roundtrip (serialize/deserialize).
Compute pushdown tests: slice, take, scalar_at.
Compression ratio estimates for typical embedding dimensions.
Centroid correctness: count, sorted, symmetric, within bounds, caching, boundary rejection.
SRHT rotation: determinism, roundtrip, norm preservation, sign export/import roundtrip.

codspeed-hq · 2026-04-03T22:05:56Z

Merging this PR will not alter performance

✅ 1122 untouched benchmarks
⏩ 1530 skipped benchmarks¹

_{Comparing ct/turboquant (5285a13) with develop (e3c7401)}

1530 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

connortsui20 · 2026-04-03T23:04:17Z

I would say this is ready for review now, but only with respect to the structure. I have yet to go through the implementation and make sure things make sense, but I have made it so the structure makes sense and we correctly handle the different floating point type inputs as well as null vectors.

I think it would be good to get a review now, and maybe we should just merge this and iterate later.

gatesn

Let's do it

connortsui20 · 2026-04-04T00:38:09Z

I'll rebase and fix the errors tomorrow

lwwmanning

Detailed Review — TurboQuant Encoding (RFC 0033 Stage 1)

This PR implements the TurboQuant lossy vector quantization algorithm (arXiv:2504.19874) as a new encoding in the vortex-tensor crate. It is intended to be Stage 1 of RFC 0033. Overall this is well-engineered work — the algorithm is correctly implemented, the code follows Vortex conventions, tests are thorough, and the documentation is excellent. Requesting changes for a few RFC compliance issues.

RFC 0033 Stage 1 Compliance

RFC Requirement	Status	Details
QJL support removed	✅	No QJL code present
4 slots (codes, norms, centroids, rotation_signs)	✅	Exactly 4 slots
Scheme default: 5-bit MSE-only	❌	Default is 4-bit (`compress.rs:44`)
Norms dtype: same-or-wider (f64→f64, f32/f16→f32)	✅	Correct
Scheme minimum: dimension ≥ 128	❌	`TurboQuantScheme::matches()` accepts dimension ≥ 3
Metadata: protobuf for forward compat	⚠️	Raw single byte, not protobuf — should be switched now to avoid a migration path later

Strengths

Clean architecture. Module structure follows established Vortex patterns (vtable macro, data struct with try_new/new_unchecked, named slot enum, separate compress/decompress). Consistent with BitPacked, RLE, Sparse, etc.
Correct SRHT implementation. The 3-round structured random Hadamard transform is correctly implemented with XOR-based branchless sign application (auto-vectorizes to vpxor/veor), iterative Walsh-Hadamard butterfly, and proper normalization factor 1/(n·√n). Forward/inverse symmetry is verified by tests.
Thorough validation. TurboQuantData::validate() checks codes dtype, norms dtype matching, centroids power-of-2 constraint, rotation signs length, and degenerate/empty invariants. Debug assertions in new_unchecked add an extra safety net.
Smart compute pushdowns. Slice/take operate on per-row children (codes, norms) and clone shared children (centroids, rotation_signs). Quantized cosine similarity and dot product avoid full decompression. L2 norm readthrough is O(1) from stored norms.
Excellent test coverage (911 lines): roundtrip, MSE quality bounds, edge cases, nullable vectors, serde roundtrip, compute pushdowns, L2 norm readthrough.
Good documentation. Module docs include theoretical MSE bounds, compression ratio tables, and a working example. Per-function docs explain algorithmic context.

Key Issues (see inline comments)

Default bit_width should be 5, not 4 — RFC specifies "5-bit MSE-only (32 centroids)"
Scheme minimum should be dimension ≥ 128 — RFC specifies auto-selection only for d ≥ 128
Metadata serialization should use protobuf — raw byte can't be extended backward-compatibly; FixedShapeTensor in the same crate uses prost
Unresolved TODO in f32_to_t — should be resolved or documented before merge
Cosine similarity doc claims "same rotation" — but doesn't validate this; should clarify assumptions

Algorithmic Note

The theoretical MSE bound (Theorem 1 in the paper) is proved for Haar-distributed random orthogonal matrices, not SORF/SRHT. The SRHT is a practical approximation. The RFC explicitly acknowledges this. The tests empirically validate the bound holds with SRHT, which is good — but worth noting in the module docs.

Minor Items

new_unchecked is pub — other encodings use pub(crate)
No f16 input roundtrip test
No quantized dot product test (only cosine similarity)
No TurboQuantScheme::compress() integration test
Global centroid cache is unbounded (fine in practice, worth documenting)

Generated by Claude Code

vortex-tensor/src/encodings/turboquant/compress.rs

vortex-tensor/src/encodings/turboquant/array/scheme.rs

vortex-tensor/src/encodings/turboquant/vtable.rs

vortex-tensor/src/encodings/turboquant/compute/cosine_similarity.rs

vortex-tensor/src/encodings/turboquant/array/data.rs

vortex-tensor/src/encodings/turboquant/compress.rs

vortex-tensor/src/encodings/turboquant/tests.rs

vortex-tensor/src/encodings/turboquant/array/centroids.rs

vortex-tensor/src/encodings/turboquant/mod.rs

lwwmanning

Follow-up: after further consideration, the default bit_width should be 8 (near-lossless) rather than the RFC's 5. At 8 bits the normalized MSE is ~4e-5 — effectively transparent — while still achieving 3-4x compression on f32 data. This is a safer default for a general-purpose encoding; users who want more aggressive compression can explicitly configure lower bit widths.

Generated by Claude Code

vortex-tensor/src/encodings/turboquant/compress.rs

lwwmanning

Claude diffed against RFC 33 and had a few minor comments that look valid/easy to fix before merging

connortsui20 · 2026-04-04T15:49:25Z

So based on my reading of the review comments, the things we actually want to change are:

Default bitwidth changed to 8 (makes sense)
A few more tests for f16, f64, very large dimensions, and test helper functions themselves
Some docs need to be updated
Dimension must be >= 128 (is there a theoretical backing for this number)?
Fix the f32 cast semantics (we have some safety guarantees that need to be documented)

Everything else is either wrong or something we can think about later.

Lossy quantization for vector data (e.g., embeddings) based on TurboQuant (https://arxiv.org/abs/2504.19874). Supports both MSE-optimal and inner-product-optimal (Prod with QJL correction) variants at 1-8 bits per coordinate. Key components: - Single TurboQuant array encoding with optional QJL correction fields, storing quantized codes, norms, centroids, and rotation signs as children. - Structured Random Hadamard Transform (SRHT) for O(d log d) rotation, fully self-contained with no external linear algebra library. - Max-Lloyd centroid computation on Beta(d/2, d/2) distribution. - Approximate cosine similarity and dot product compute directly on quantized arrays without full decompression. - Pluggable TurboQuantScheme for BtrBlocks, exposed via WriteStrategyBuilder::with_vector_quantization(). - Benchmarks covering common embedding dimensions (128, 768, 1024, 1536). Also refactors CompressingStrategy to a single constructor, and adds vortex_tensor::initialize() for session registration of tensor types, encodings, and scalar functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Will Manning <will@willmanning.io> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

We are going to implement this later as a separate encoding (if we decide to implement it at all because word on the street is that the MSE + QJL is not actually better than MSE on its own). Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

It doesn't really make a lot of sense for us to define this as an encoding for `FixedSizeList`. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

- Use ExecutionCtx in TurboQuant compress path and import ExecutionCtx - Extend dtype imports with Nullability and PType to support extension types - Wire in extension utilities: extension_element_ptype and extension_list_size for vector extensions - Remove dimension and bit_width from slice/take compute calls to rely on metadata - Update TurboQuant mod docs to mention VortexSessionExecute - Change scheme.compress to use the provided compressor argument (not _compressor) - Add an extensive TurboQuant test suite (roundtrip, MSE bounds, edge cases, f64 input, serde roundtrip, and dtype checks) - Align vtable imports to new metadata handling (remove unused DeserializeMetadata/SerializeMetadata references) Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 · 2026-04-04T17:05:40Z

Comments addressed.

There is a small issue related to #7269 (comment)

This is what my claude instance told me after seeing pretty high errors:

Details

The theoretical guarantee. Theorem 1's MSE bound ((sqrt(3) * pi / 2) / 4^b) is proved for Haar-distributed random orthogonal matrices — matrices drawn uniformly from the orthogonal group O(d). The proof depends on two properties that Haar matrices give
you exactly and SRHT gives you approximately:

Uniform distribution on the sphere. For a Haar matrix Π and any unit vector x, Πx is exactly uniformly distributed on S^{d-1}. This means each coordinate of the rotated vector follows exactly the Beta((d-1)/2, (d-1)/2) distribution on [-1, 1]. Our
Max-Lloyd centroids are computed for this distribution, so they're exactly optimal.

With SRHT, the rotated coordinates are approximately Beta-distributed. The centroids are slightly suboptimal for the actual marginals.

Near-independence of coordinates. The proof relies on coordinates of Πx being "nearly independent" in high dimensions (a deep result from high-dimensional probability, reference [55] in the paper — Vershynin). This ensures quantization errors across
coordinates don't conspire to accumulate worse than expected.

With SRHT, coordinates are uncorrelated but have different higher-order dependence structure. The 3-round structure (H·D₃·H·D₂·H·D₁) gives good mixing but it's not equivalent to full randomness.

What we gain in exchange: O(d log d) computation and O(d) storage instead of O(d²) for both. The SRHT butterfly + XOR sign application auto-vectorizes into SIMD. For d=1024, that's a ~1000x speedup over a full matrix multiply.

In practice the gap is small. Our tests show:

At d=128, 4-bit: MSE is well within the 1x theoretical bound
At d=1024, 4-bit: MSE exceeds the bound by ~20% (0.0127 vs 0.0106)
Mixing quality improves with dimension (more butterfly stages: 3×log₂(d))

The RFC's fallback plan if SRHT ever proves insufficient is to use a full B×B random orthogonal matrix per block at Stage 2 block sizes (e.g., B=256 → 256KB storage per block)

We likely want to implement both (doesn't seem to be too hard) and then compare them later.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 added the changelog/feature A new feature label Apr 2, 2026

connortsui20 force-pushed the ct/turboquant branch 12 times, most recently from fb6bbcf to 3d7dfed Compare April 3, 2026 22:02

connortsui20 marked this pull request as ready for review April 3, 2026 23:03

connortsui20 requested review from gatesn and lwwmanning April 3, 2026 23:04

gatesn approved these changes Apr 3, 2026

View reviewed changes

lwwmanning requested changes Apr 4, 2026

View reviewed changes

lwwmanning reviewed Apr 4, 2026

View reviewed changes

vortex-tensor/src/encodings/turboquant/compress.rs Outdated Show resolved Hide resolved

lwwmanning reviewed Apr 4, 2026

View reviewed changes

lwwmanning and others added 6 commits April 4, 2026 13:03

remove QJL

4c90908

We are going to implement this later as a separate encoding (if we decide to implement it at all because word on the street is that the MSE + QJL is not actually better than MSE on its own). Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

TurboQuant is only an encoding for Vectors

cc0ae6c

It doesn't really make a lot of sense for us to define this as an encoding for `FixedSizeList`. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

restructure modules for turboquant

f47a913

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

even more cleanup

9d3f807

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 added 6 commits April 4, 2026 13:03

fix minor things

626bb47

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

fix nullability handling

b9c26d7

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

rebase to new vtable world

2113efe

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

fix cosine similarity and dot product

2df38ce

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

fix validity handling

54b59b9

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

fix casting issues and other minor things

d4c1cdf

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 force-pushed the ct/turboquant branch from 77a4288 to 3a2b169 Compare April 4, 2026 17:03

change defaults and constraints and tests

5285a13

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 force-pushed the ct/turboquant branch from 3a2b169 to 5285a13 Compare April 4, 2026 17:08

connortsui20 requested a review from lwwmanning April 4, 2026 17:09

connortsui20 enabled auto-merge (squash) April 4, 2026 17:24

lwwmanning approved these changes Apr 4, 2026

View reviewed changes

connortsui20 merged commit b2a5a70 into develop Apr 4, 2026
108 of 109 checks passed

connortsui20 deleted the ct/turboquant branch April 4, 2026 17:25

Conversation

connortsui20 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

API Changes

Testing (claude-generated)

Uh oh!

codspeed-hq bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

connortsui20 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatesn left a comment

Choose a reason for hiding this comment

Uh oh!

connortsui20 commented Apr 4, 2026

Uh oh!

lwwmanning left a comment

Choose a reason for hiding this comment

Detailed Review — TurboQuant Encoding (RFC 0033 Stage 1)

RFC 0033 Stage 1 Compliance

Strengths

Key Issues (see inline comments)

Algorithmic Note

Minor Items

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lwwmanning left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lwwmanning left a comment

Choose a reason for hiding this comment

Uh oh!

connortsui20 commented Apr 4, 2026

Uh oh!

connortsui20 commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

connortsui20 commented Apr 2, 2026 •

edited

Loading

codspeed-hq bot commented Apr 3, 2026 •

edited

Loading

connortsui20 commented Apr 3, 2026 •

edited

Loading

connortsui20 commented Apr 4, 2026 •

edited

Loading