Hi! Filing this from the ArcadeDB project, which uses JVector as the HNSW backend for its LSM_VECTOR index.
Background
We're adding pre-quantized int8 ingest to ArcadeDB (ArcadeData/arcadedb#4132) so callers using providers that emit int8 directly (Cohere embed-english-v3.0, OpenAI text-embedding-3-large reduced precision, Sentence Transformers with int8 quantization) can skip a precision-losing client-side int8 → float32 round-trip.
We dug into JVector 4.0.0-rc.8 to wire the path through and found the HNSW graph API operates on VectorFloat<?> end-to-end:
RandomAccessVectorValues.getVector(int ordinal) returns VectorFloat<?>.
GraphIndexBuilder constructors take RandomAccessVectorValues (float-only).
VectorSimilarityFunction.compare(VectorFloat<?>, VectorFloat<?>) - the abstract method signature.
So a caller with int8 input must dequantize to float32 for graph build and for every query, even when the application semantics are int8-throughout. ByteSequence<?> exists in the type system but is used only for PQ/BQ codes (sidecar against the float-vector graph), not as a primary HNSW vector type.
Ask
A native byte (int8) vector path:
- A
RandomAccessByteVectorValues (or a generalised RandomAccessVectorValues<T>).
VectorSimilarityFunction overloads / variants for byte[] (cosine + dot product on byte vectors with per-block min/max calibration; euclidean on bytes is also straightforward).
GraphIndexBuilder constructor(s) that accept the byte-vector RAVV + byte similarity function.
- Search-side equivalent in
GraphSearcher.
Why it matters
Modern embedding providers default to int8/binary outputs at scale - Cohere binary embeddings are 1/32× the size of float32, Cohere int8 is 1/4×. Forcing dequantize-on-build/search means:
- Build cost: O(N * dim * 4) bytes of transient float32 even though the source is bytes.
- Search cost: every query vector dequantises before comparison - JVector's SIMD intrinsics for the byte-similarity case never get exercised.
- Storage cost: applications keep bytes in their primary store but JVector wants floats, so RAM and on-disk size grow 4× beyond what the application needs.
Lucene 9.x added VectorEncoding.BYTE for similar reasons; we'd love the same in JVector to close the precision/size loop.
ArcadeDB context
For reference, our matrix vs Qdrant / Milvus 2.5 (docs/arcadedb-vs-leading-vector-dbms.md) flags pre-quantized ingest as a P2 gap. We can ship an MVP that dequantizes int8 → float32 server-side (covered in ArcadeData/arcadedb#4132) and that closes the API ergonomics gap, but the full "end-to-end int8 with no float32 transient" win requires the JVector-side support.
Happy to contribute if there's a design direction the maintainers are considering. Otherwise, this is a tracking request.
Thanks!
Hi! Filing this from the ArcadeDB project, which uses JVector as the HNSW backend for its
LSM_VECTORindex.Background
We're adding pre-quantized int8 ingest to ArcadeDB (ArcadeData/arcadedb#4132) so callers using providers that emit int8 directly (Cohere
embed-english-v3.0, OpenAItext-embedding-3-largereduced precision, Sentence Transformers with int8 quantization) can skip a precision-losing client-sideint8 → float32round-trip.We dug into JVector 4.0.0-rc.8 to wire the path through and found the HNSW graph API operates on
VectorFloat<?>end-to-end:RandomAccessVectorValues.getVector(int ordinal)returnsVectorFloat<?>.GraphIndexBuilderconstructors takeRandomAccessVectorValues(float-only).VectorSimilarityFunction.compare(VectorFloat<?>, VectorFloat<?>)- the abstract method signature.So a caller with int8 input must dequantize to float32 for graph build and for every query, even when the application semantics are int8-throughout.
ByteSequence<?>exists in the type system but is used only for PQ/BQ codes (sidecar against the float-vector graph), not as a primary HNSW vector type.Ask
A native byte (int8) vector path:
RandomAccessByteVectorValues(or a generalisedRandomAccessVectorValues<T>).VectorSimilarityFunctionoverloads / variants forbyte[](cosine + dot product on byte vectors with per-block min/max calibration; euclidean on bytes is also straightforward).GraphIndexBuilderconstructor(s) that accept the byte-vector RAVV + byte similarity function.GraphSearcher.Why it matters
Modern embedding providers default to int8/binary outputs at scale - Cohere binary embeddings are 1/32× the size of float32, Cohere int8 is 1/4×. Forcing dequantize-on-build/search means:
Lucene 9.x added
VectorEncoding.BYTEfor similar reasons; we'd love the same in JVector to close the precision/size loop.ArcadeDB context
For reference, our matrix vs Qdrant / Milvus 2.5 (docs/arcadedb-vs-leading-vector-dbms.md) flags pre-quantized ingest as a P2 gap. We can ship an MVP that dequantizes int8 → float32 server-side (covered in ArcadeData/arcadedb#4132) and that closes the API ergonomics gap, but the full "end-to-end int8 with no float32 transient" win requires the JVector-side support.
Happy to contribute if there's a design direction the maintainers are considering. Otherwise, this is a tracking request.
Thanks!