Skip to content

Native INT8 (byte vector) HNSW build + search API #665

@lvca

Description

@lvca

Hi! Filing this from the ArcadeDB project, which uses JVector as the HNSW backend for its LSM_VECTOR index.

Background

We're adding pre-quantized int8 ingest to ArcadeDB (ArcadeData/arcadedb#4132) so callers using providers that emit int8 directly (Cohere embed-english-v3.0, OpenAI text-embedding-3-large reduced precision, Sentence Transformers with int8 quantization) can skip a precision-losing client-side int8 → float32 round-trip.

We dug into JVector 4.0.0-rc.8 to wire the path through and found the HNSW graph API operates on VectorFloat<?> end-to-end:

  • RandomAccessVectorValues.getVector(int ordinal) returns VectorFloat<?>.
  • GraphIndexBuilder constructors take RandomAccessVectorValues (float-only).
  • VectorSimilarityFunction.compare(VectorFloat<?>, VectorFloat<?>) - the abstract method signature.

So a caller with int8 input must dequantize to float32 for graph build and for every query, even when the application semantics are int8-throughout. ByteSequence<?> exists in the type system but is used only for PQ/BQ codes (sidecar against the float-vector graph), not as a primary HNSW vector type.

Ask

A native byte (int8) vector path:

  • A RandomAccessByteVectorValues (or a generalised RandomAccessVectorValues<T>).
  • VectorSimilarityFunction overloads / variants for byte[] (cosine + dot product on byte vectors with per-block min/max calibration; euclidean on bytes is also straightforward).
  • GraphIndexBuilder constructor(s) that accept the byte-vector RAVV + byte similarity function.
  • Search-side equivalent in GraphSearcher.

Why it matters

Modern embedding providers default to int8/binary outputs at scale - Cohere binary embeddings are 1/32× the size of float32, Cohere int8 is 1/4×. Forcing dequantize-on-build/search means:

  • Build cost: O(N * dim * 4) bytes of transient float32 even though the source is bytes.
  • Search cost: every query vector dequantises before comparison - JVector's SIMD intrinsics for the byte-similarity case never get exercised.
  • Storage cost: applications keep bytes in their primary store but JVector wants floats, so RAM and on-disk size grow 4× beyond what the application needs.

Lucene 9.x added VectorEncoding.BYTE for similar reasons; we'd love the same in JVector to close the precision/size loop.

ArcadeDB context

For reference, our matrix vs Qdrant / Milvus 2.5 (docs/arcadedb-vs-leading-vector-dbms.md) flags pre-quantized ingest as a P2 gap. We can ship an MVP that dequantizes int8 → float32 server-side (covered in ArcadeData/arcadedb#4132) and that closes the API ergonomics gap, but the full "end-to-end int8 with no float32 transient" win requires the JVector-side support.

Happy to contribute if there's a design direction the maintainers are considering. Otherwise, this is a tracking request.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions