[SPARK-55652][SQL] Optimize VectorizedPlainValuesReader.readShorts() with direct array access for heap buffers#54441
[SPARK-55652][SQL] Optimize VectorizedPlainValuesReader.readShorts() with direct array access for heap buffers#54441LuciferYang wants to merge 4 commits intoapache:masterfrom
VectorizedPlainValuesReader.readShorts() with direct array access for heap buffers#54441Conversation
|
The benchmark result of JDK 17 BTW, which CPU are you using for benchmarking? |
Off-heap is not the optimization objective; it just needs to ensure there's no significant performance degradation compared to the previous state. |
@pan3793 I hardcoded the test commands and updated the test results using GitHub Actions Host Runner. It seems that when using Java 21, the performance of old code falls short compared to Java 17, but the new code will gain some unintended additional benefits. |
|
Merged into master. Thanks @HyukjinKwon @pan3793 |
|
Oh, nice improvement. Thank you, @LuciferYang !
|
What changes were proposed in this pull request?
This PR optimizes
VectorizedPlainValuesReader.readShortsby introducing a new batch write methodputShortsFromIntsLittleEndianinWritableColumnVector,OnHeapColumnVector, andOffHeapColumnVector.In Parquet,
SHORTvalues are stored as 4-byte little-endian integers. The previous implementation read each value individually viaByteBuffer.getInt()and calledputShort()per element, incurring a virtual method dispatch per value and preventing JIT vectorization.The new approach:
putShortsFromIntsLittleEndian(int rowId, int count, byte[] src, int srcIndex)as an abstract method inWritableColumnVector, with implementations in bothOnHeapColumnVectorandOffHeapColumnVector.Platform.getIntto read directly from the underlyingbyte[], handle big-endian platforms by reversing bytes outside the loop, and write directly toshortData[](OnHeap) or off-heap memory viaPlatform.putShort(OffHeap).readShortsinVectorizedPlainValuesReaderdelegates toputShortsFromIntsLittleEndianwhenbuffer.hasArray()is true, matching the pattern already established byreadIntegers,readLongs,readFloats, andreadDoubles.Why are the changes needed?
The previous implementation of
readShortsdid not take advantage of thehasArray()fast path that other fixed-width type readers (readIntegers,readLongs, etc.) already use. This caused unnecessary overhead from:putShort()ByteBuffer.getInt()overhead including internal bounds checking and byte-order branching on every callBy pushing the batch operation into
WritableColumnVectorand operating directly on the underlying array, the JIT compiler can more effectively inline and vectorize the tight loop, eliminating these overheads for the common heap-buffer case.Does this PR introduce any user-facing change?
No.
How was this patch tested?
ColumnarBatchSuiteto testWritableColumnVector#putShortsFromIntsLittleEndianOldVectorizedPlainValuesReader, and compare the latency of the old and newreadShortsmethods using JMH:Benchmark Code (click to expand)
Perform
build/sbt "sql/Test/runMain org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReaderJMHBenchmark"to conduct the testBenchmark results:
The test results reveal that the optimized OnHeap path achieves nearly 50%+ performance improvement. The OffHeap path shows no significant negative impact.
Was this patch authored or co-authored using generative AI tooling?
The benchmark code used for performance testing was generated by GitHub Copilot.