Describe the enhancement requested
ColumnChunkPageWriter.writePage() and ColumnChunkPageWriter.writePageV2() call BytesInput.toByteArray() to feed compressed page data into CRC32.update(byte[]). When the writer uses a direct ByteBufferAllocator, this forces a full heap copy of every compressed page solely for checksumming.
CRC32.update(ByteBuffer) has been available since Java 9 and operates directly on the buffer's memory without copying. Replacing crc.update(compressedBytes.toByteArray()) with crc.update(compressedBytes.toByteBuffer(releaser)) eliminates one heap byte[] allocation per page. The releaser is already a field on ColumnChunkPageWriter.
In a local benchmark (~350M records, SNAPPY compression, DirectCodecFactory with off-heap ByteBufferAllocator), this change reduced sampled byte[] allocation by 23% (8,565 MB to 6,604 MB) and GC collections by 13% (103 to 90) compared to DirectCodecFactory alone.
Component(s)
Core
Describe the enhancement requested
ColumnChunkPageWriter.writePage()andColumnChunkPageWriter.writePageV2()callBytesInput.toByteArray()to feed compressed page data intoCRC32.update(byte[]). When the writer uses a directByteBufferAllocator, this forces a full heap copy of every compressed page solely for checksumming.CRC32.update(ByteBuffer)has been available since Java 9 and operates directly on the buffer's memory without copying. Replacingcrc.update(compressedBytes.toByteArray())withcrc.update(compressedBytes.toByteBuffer(releaser))eliminates one heapbyte[]allocation per page. The releaser is already a field onColumnChunkPageWriter.In a local benchmark (~350M records, SNAPPY compression,
DirectCodecFactorywith off-heapByteBufferAllocator), this change reduced sampledbyte[]allocation by 23% (8,565 MB to 6,604 MB) and GC collections by 13% (103 to 90) compared toDirectCodecFactoryalone.Component(s)
Core