diff --git a/docs/release notes/4.1.0/681.performance.md b/docs/release notes/4.1.0/681.performance.md
new file mode 100644
index 000000000..4e9b13d68
--- /dev/null
+++ b/docs/release notes/4.1.0/681.performance.md	
@@ -0,0 +1,23 @@
+### Faster Index Construction via Bulk Vector Serialization
+
+**Description**
+Optimized vector serialization in `MemorySegmentVectorProvider` by replacing
+per-element scalar writes with bulk operations. At high core counts,
+per-element scalar writes become a serialization bottleneck: multiple threads
+contend on a single lock, stalling index construction and negating the benefit
+of additional cores. Bulk writes eliminate that contention, restoring scaling
+at large core counts.
+
+- **`writeFloatVector`** — uses backing `float[]` with `writeFloats()` instead
+  of per-element `writeFloat()`
+- **`writeByteSequence`** — uses backing `byte[]` with bulk `write()` instead
+  of per-element `writeByte()`
+
+**How to enable**
+No configuration required. The improvement is applied automatically when using
+the native SIMD backend (`jvector-native`).
+
+**Performance (AWS `x8i.24xlarge`)**
+- Bulk serialization throughput: **~1.3× to 1.9×** improvement across tested vector sizes
+- **openai-1536-1m** index build: ~57% faster (105.59s → 45.22s)
+- **openai-3072-1m** index build: ~38% faster (164.63s → 102.75s)