From d96ef543c495face6ca3592b4a8fed63dca90cdc Mon Sep 17 00:00:00 2001 From: Raghuveer Devulapalli Date: Fri, 19 Jun 2026 04:23:19 +0000 Subject: [PATCH 1/2] Add release notes for PR #681 --- docs/release notes/4.1.0/681.performance.md | 28 +++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 docs/release notes/4.1.0/681.performance.md diff --git a/docs/release notes/4.1.0/681.performance.md b/docs/release notes/4.1.0/681.performance.md new file mode 100644 index 000000000..578ae76c6 --- /dev/null +++ b/docs/release notes/4.1.0/681.performance.md @@ -0,0 +1,28 @@ +## Performance Improvements: Vector Serialization + +### Summary +Optimized vector serialization in `MemorySegmentVectorProvider` by replacing +per-element scalar writes with bulk operations, significantly reducing +serialization overhead. + +### Key Changes +- **`writeFloatVector`** + - Uses backing `float[]` with `writeFloats()` instead of per-element + `writeFloat()` + +- **`writeByteSequence`** + - Uses backing `byte[]` with bulk `write()` instead of per-element + `writeByte()` + +### Impact +- Removes a major scalar-write bottleneck in the native SIMD backend +- Improves index construction performance on large Intel hosts (e.g., AWS + `x8i.24xlarge`) + +### Index Build Improvements +- **openai-1536-1m:** ~57% faster (105.59s → 45.22s) +- **openai-3072-1m:** ~38% faster (164.63s → 102.75s) + +### Microbenchmark Summary +- Bulk serialization improves throughput by **~1.3× to 1.9×** across tested + vector sizes From 83492335d8e865e016dcba39932c8714dfdeeb74 Mon Sep 17 00:00:00 2001 From: Raghuveer Devulapalli Date: Thu, 25 Jun 2026 04:03:05 +0000 Subject: [PATCH 2/2] Format release notes based on guidelines --- docs/release notes/4.1.0/681.performance.md | 41 +++++++++------------ 1 file changed, 18 insertions(+), 23 deletions(-) diff --git a/docs/release notes/4.1.0/681.performance.md b/docs/release notes/4.1.0/681.performance.md index 578ae76c6..4e9b13d68 100644 --- a/docs/release notes/4.1.0/681.performance.md +++ b/docs/release notes/4.1.0/681.performance.md @@ -1,28 +1,23 @@ -## Performance Improvements: Vector Serialization +### Faster Index Construction via Bulk Vector Serialization -### Summary +**Description** Optimized vector serialization in `MemorySegmentVectorProvider` by replacing -per-element scalar writes with bulk operations, significantly reducing -serialization overhead. +per-element scalar writes with bulk operations. At high core counts, +per-element scalar writes become a serialization bottleneck: multiple threads +contend on a single lock, stalling index construction and negating the benefit +of additional cores. Bulk writes eliminate that contention, restoring scaling +at large core counts. -### Key Changes -- **`writeFloatVector`** - - Uses backing `float[]` with `writeFloats()` instead of per-element - `writeFloat()` +- **`writeFloatVector`** — uses backing `float[]` with `writeFloats()` instead + of per-element `writeFloat()` +- **`writeByteSequence`** — uses backing `byte[]` with bulk `write()` instead + of per-element `writeByte()` -- **`writeByteSequence`** - - Uses backing `byte[]` with bulk `write()` instead of per-element - `writeByte()` +**How to enable** +No configuration required. The improvement is applied automatically when using +the native SIMD backend (`jvector-native`). -### Impact -- Removes a major scalar-write bottleneck in the native SIMD backend -- Improves index construction performance on large Intel hosts (e.g., AWS - `x8i.24xlarge`) - -### Index Build Improvements -- **openai-1536-1m:** ~57% faster (105.59s → 45.22s) -- **openai-3072-1m:** ~38% faster (164.63s → 102.75s) - -### Microbenchmark Summary -- Bulk serialization improves throughput by **~1.3× to 1.9×** across tested - vector sizes +**Performance (AWS `x8i.24xlarge`)** +- Bulk serialization throughput: **~1.3× to 1.9×** improvement across tested vector sizes +- **openai-1536-1m** index build: ~57% faster (105.59s → 45.22s) +- **openai-3072-1m** index build: ~38% faster (164.63s → 102.75s)