From d96ef543c495face6ca3592b4a8fed63dca90cdc Mon Sep 17 00:00:00 2001
From: Raghuveer Devulapalli <raghuveer.devulapalli@ibm.com>
Date: Fri, 19 Jun 2026 04:23:19 +0000
Subject: [PATCH 1/2] Add release notes for PR #681

---
 docs/release notes/4.1.0/681.performance.md | 28 +++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 docs/release notes/4.1.0/681.performance.md

diff --git a/docs/release notes/4.1.0/681.performance.md b/docs/release notes/4.1.0/681.performance.md
new file mode 100644
index 000000000..578ae76c6
--- /dev/null
+++ b/docs/release notes/4.1.0/681.performance.md	
@@ -0,0 +1,28 @@
+## Performance Improvements: Vector Serialization
+
+### Summary
+Optimized vector serialization in `MemorySegmentVectorProvider` by replacing
+per-element scalar writes with bulk operations, significantly reducing
+serialization overhead.
+
+### Key Changes
+- **`writeFloatVector`**
+  - Uses backing `float[]` with `writeFloats()` instead of per-element
+    `writeFloat()`
+
+- **`writeByteSequence`**
+  - Uses backing `byte[]` with bulk `write()` instead of per-element
+    `writeByte()`
+
+### Impact
+- Removes a major scalar-write bottleneck in the native SIMD backend
+- Improves index construction performance on large Intel hosts (e.g., AWS
+  `x8i.24xlarge`)
+
+### Index Build Improvements
+- **openai-1536-1m:** ~57% faster (105.59s → 45.22s)
+- **openai-3072-1m:** ~38% faster (164.63s → 102.75s)
+
+### Microbenchmark Summary
+- Bulk serialization improves throughput by **~1.3× to 1.9×** across tested
+  vector sizes

From 83492335d8e865e016dcba39932c8714dfdeeb74 Mon Sep 17 00:00:00 2001
From: Raghuveer Devulapalli <raghuveer.devulapalli@ibm.com>
Date: Thu, 25 Jun 2026 04:03:05 +0000
Subject: [PATCH 2/2] Format release notes based on guidelines

---
 docs/release notes/4.1.0/681.performance.md | 41 +++++++++------------
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/docs/release notes/4.1.0/681.performance.md b/docs/release notes/4.1.0/681.performance.md
index 578ae76c6..4e9b13d68 100644
--- a/docs/release notes/4.1.0/681.performance.md	
+++ b/docs/release notes/4.1.0/681.performance.md	
@@ -1,28 +1,23 @@
-## Performance Improvements: Vector Serialization
+### Faster Index Construction via Bulk Vector Serialization
 
-### Summary
+**Description**
 Optimized vector serialization in `MemorySegmentVectorProvider` by replacing
-per-element scalar writes with bulk operations, significantly reducing
-serialization overhead.
+per-element scalar writes with bulk operations. At high core counts,
+per-element scalar writes become a serialization bottleneck: multiple threads
+contend on a single lock, stalling index construction and negating the benefit
+of additional cores. Bulk writes eliminate that contention, restoring scaling
+at large core counts.
 
-### Key Changes
-- **`writeFloatVector`**
-  - Uses backing `float[]` with `writeFloats()` instead of per-element
-    `writeFloat()`
+- **`writeFloatVector`** — uses backing `float[]` with `writeFloats()` instead
+  of per-element `writeFloat()`
+- **`writeByteSequence`** — uses backing `byte[]` with bulk `write()` instead
+  of per-element `writeByte()`
 
-- **`writeByteSequence`**
-  - Uses backing `byte[]` with bulk `write()` instead of per-element
-    `writeByte()`
+**How to enable**
+No configuration required. The improvement is applied automatically when using
+the native SIMD backend (`jvector-native`).
 
-### Impact
-- Removes a major scalar-write bottleneck in the native SIMD backend
-- Improves index construction performance on large Intel hosts (e.g., AWS
-  `x8i.24xlarge`)
-
-### Index Build Improvements
-- **openai-1536-1m:** ~57% faster (105.59s → 45.22s)
-- **openai-3072-1m:** ~38% faster (164.63s → 102.75s)
-
-### Microbenchmark Summary
-- Bulk serialization improves throughput by **~1.3× to 1.9×** across tested
-  vector sizes
+**Performance (AWS `x8i.24xlarge`)**
+- Bulk serialization throughput: **~1.3× to 1.9×** improvement across tested vector sizes
+- **openai-1536-1m** index build: ~57% faster (105.59s → 45.22s)
+- **openai-3072-1m** index build: ~38% faster (164.63s → 102.75s)