Skip to content

[VL]:shuffle: extend TypeAwareCompress to INT128 (DECIMAL HUGEINT)#12299

Open
hhr293 wants to merge 3 commits into
apache:mainfrom
hhr293:hhr/tac-int128
Open

[VL]:shuffle: extend TypeAwareCompress to INT128 (DECIMAL HUGEINT)#12299
hhr293 wants to merge 3 commits into
apache:mainfrom
hhr293:hhr/tac-int128

Conversation

@hhr293

@hhr293 hhr293 commented Jun 15, 2026

Copy link
Copy Markdown

What changes are proposed in this pull request?

[VL] Extend TypeAwareCompress (TAC) to INT128 (DECIMAL HUGEINT).

The TAC codec from #11894 only covered INT64. DECIMAL(p>18) is stored as
128-bit HugeInt and previously fell through to LZ4, missing a structural
compression opportunity: in OLAP workloads (TPC-H prices, taxes, etc.)
DECIMAL values usually fit in INT64, so the high 64 bits are 0 and the
low 64 bits are narrow — exactly the pattern FFOR exploits.

Implementation: Split each 16B value into lo/hi uint64 sub-streams via
stride-2 gather into stack-allocated scratch buffers, then run the existing
64-bit FFOR encoder on each. Wire format reuses the 64-bit per-block
(bw, count, base) header twice — the stream is self-describing, no hi/lo
length prefix needed. hi sub-streams that are all equal to base degenerate
to just the 16B header (bw=0). Velox HUGEINT is mapped to tac::kUInt128;
shuffle writer and frame format unchanged.

How was this patch tested?

  • New unit tests in FForCodecTest.cc: 128-bit compress/decompress round-trips,
    alignment combinations, corrupted-input rejection, and tail handling.
  • Existing 64-bit tests continue to pass.
  • End-to-end validation on TPC-H SF=6000 (no regression on other queries).

Performance

Results vs LZ4 on the same int128 input:

  • Compression ratio: 0.116 (TAC) vs 0.193 (LZ4) — 40% smaller

End-to-end on TPC-H SF=6000, the wins concentrate on queries with heavy
decimal-keyed shuffles:

  • q15: shuffle size −15%, latency −8%
  • q17: shuffle size −8%, latency −3%
  • q18: shuffle size −8%, latency −3%

Was this patch authored or co-authored using generative AI tooling?

Reviewed-by: Claude claude-opus-4-7

Copilot AI review requested due to automatic review settings June 15, 2026 08:57

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR extends the type-aware compression (TAC) pipeline to support Velox HUGEINT / DECIMAL128 by introducing a 128-bit FFOR-based codec (implemented as two 64-bit sub-streams).

Changes:

  • Add TAC type mapping and codec support for 128-bit values (kUInt128 / Velox HUGEINT).
  • Refactor 64-bit FFOR block encode/decode into shared helpers and implement a new 128-bit codec on top.
  • Add unit tests covering 128-bit round-trips, tails, misalignment, and truncated-input handling.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cpp/velox/shuffle/VeloxTypeAwareCompress.h Map Velox HUGEINT to TAC kUInt128.
cpp/core/utils/tac/ffor.hpp Add shared encode/decode helpers and implement 128-bit FFOR codec.
cpp/core/utils/tac/TypeAwareCompressCodec.h Add kUInt128 enum and document 128-bit support.
cpp/core/utils/tac/TypeAwareCompressCodec.cc Extend TAC codec dispatch for kUInt128 in compress/decompress and maxCompressedLen.
cpp/core/utils/tac/FForCodec.h Add 128-bit compress/decompress API surface.
cpp/core/utils/tac/FForCodec.cc Implement 128-bit compress/decompress wrappers and bounds.
cpp/core/tests/FForCodecTest.cc Add tests for TAC kUInt128 behavior and edge cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cpp/core/utils/tac/ffor.hpp
auto* in = input;
auto codecId = static_cast<CodecId>(*in++);
[[maybe_unused]] auto tacType = *in++;
auto tacType = (*in++);
Comment thread cpp/core/utils/tac/ffor.hpp
Comment on lines +79 to +81
if (outputSize < maxLen) {
return arrow::Status::Invalid("FForCodec: output buffer too small for 128-bit compression.");
}
@github-actions github-actions Bot added the VELOX label Jun 15, 2026
@hhr293 hhr293 requested a review from Copilot June 15, 2026 09:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment on lines +70 to +88
FForCodec::compress128(const uint8_t* input, int64_t inputSize, uint8_t* output, int64_t outputSize) {
if (inputSize == 0) {
return 0;
}
if (inputSize % sizeof(__int128_t) != 0) {
return arrow::Status::Invalid("FForCodec: input size ", inputSize, " is not a multiple of ", sizeof(__int128_t), ".");
}

size_t numValues = inputSize / sizeof(__int128_t);
auto maxLen = static_cast<int64_t>(ffor::compress128Bound(numValues));
if (outputSize < maxLen) {
return arrow::Status::Invalid(
"FForCodec: output buffer too small for 128-bit compression (need ",
maxLen, " bytes, got ", outputSize, ").");
}

auto written = ffor::compress128(input, numValues, output);
return static_cast<int64_t>(written);
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inputSize == 0 early return is correct. When inputSize is 0,
TypeAwareCompressCodec::compress() never calls FForCodec::compress128() in the
first place — the caller checks inputLen before invoking the codec. This guard
is purely defensive (same pattern as the 64-bit compress()).

Even if it were reached, returning 0 bytes written from the codec layer signals
"nothing compressed" to the caller, which is the correct semantic for empty
input. No malformed stream is produced because no stream is produced at all.

Comment on lines +91 to +101
FForCodec::decompress128(const uint8_t* input, int64_t inputSize, uint8_t* output, int64_t outputSize) {
if (outputSize == 0) {
return 0;
}
if (outputSize % sizeof(__int128_t) != 0) {
return arrow::Status::Invalid("FForCodec: output size ", outputSize, " is not a multiple of ", sizeof(__int128_t), ".");
}

auto nDecoded = ffor::decompress128(input, inputSize, output, static_cast<size_t>(outputSize));
return static_cast<int64_t>(nDecoded);
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When outputSize == 0, the caller is saying "I expect zero values." Returning 0
decoded values is the only correct answer regardless of input content.
Validating the input stream format in this case would be wasted work — even if
the stream is corrupt, producing 0 values into a 0-sized buffer is harmless and
matches caller expectations.

The caller already knows the expected output size from shuffle metadata. If the
stream is actually corrupt, the mismatch will be caught at a higher level when
the actual data (non-zero) fails to decode correctly in subsequent calls.

Comment thread cpp/core/utils/tac/TypeAwareCompressCodec.cc
Comment thread cpp/core/utils/tac/ffor.hpp
@hhr293

hhr293 commented Jun 16, 2026

Copy link
Copy Markdown
Author

@marin-ma @FelixYBW Could you please help review this patch? This is a follow-up evolution to PR . While the previous PR successfully introduced TypeAwareCompress (TAC) support for int64, this patch extends the implementation to fully support int128.
Please let me know if there are any concerns or suggestions. Thanks!

ARROW_ASSIGN_OR_RAISE(auto nDecoded, FForCodec::decompress(in, dataLen, output, outputLen));
(void)nDecoded;
return inputLen;
if (codecId != CodecId::kFFor) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why narrowing this condition? Is kFFor the only tac type that will be supported?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted to a switch on codecId so the dispatch surface stays explicit and adding a future codec is a one-case addition rather than reshaping the control flow.

hhr293 and others added 3 commits June 17, 2026 10:37
The TAC codec from apache#11894 only covered INT64.  DECIMAL(p>18) is stored as
128-bit HugeInt and previously fell through to LZ4, missing a structural
compression opportunity: in OLAP workloads (TPC-H prices, taxes, ...)
DECIMAL values usually fit in INT64, so the high 64 bits are 0 and the
low 64 bits are narrow -- exactly the pattern FFOR exploits.

Implementation: split each 16B value into lo/hi uint64 sub-streams via
stride-2 gather into stack-allocated scratch buffers, then run the
existing 64-bit FFOR encoder on each.  Wire format reuses the 64-bit
per-block (bw, count, base) header twice -- the stream is self-
describing, no hi/lo length prefix needed.  hi sub-streams that are all
equal to base degenerate to just the 16B header (bw=0).  Velox HUGEINT
is mapped to tac::kUInt128; shuffle writer and frame format unchanged.

Results vs LZ4 on the same int128 input:
  compression ratio  0.116 (TAC)  vs  0.193 (LZ4)  -- 40% smaller

End-to-end on TPC-H SF=6000, the wins concentrate on the queries with
heavy decimal-keyed shuffles:
  q15: shuffle size -15%, latency -8%
  q17: shuffle size -8%,  latency -3%
  q18: shuffle size -8%,  latency -3%
Generated-by: Claude claude-opus-4-7

Co-authored-by: Guo Wangyang <wangyang.guo@intel.com>
Co-authored-by: Lipeng Zhu <lipeng.zhu@intel.com>
Signed-off-by: huhengrui <hengrui.hu@intel.com>
Copilot AI review requested due to automatic review settings June 18, 2026 01:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment on lines 27 to 35
int64_t TypeAwareCompressCodec::maxCompressedLen(int64_t inputLen, int8_t tacType) {
if (!support(tacType)) {
return 0;
switch (tacType) {
case tac::kUInt64:
return kPayloadHeaderSize + FForCodec::maxCompressedLength(inputLen);
case tac::kUInt128:
return kPayloadHeaderSize + FForCodec::maxCompressedLength128(inputLen);
default:
return 0;
}

@hhr293 hhr293 Jun 18, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Non-zero bound for empty input is standard. maxCompressedLen is a safe upper bound, not a prediction of what compress() writes. The early-return in compress() is an implementation detail; coupling the sizing function to it just makes the contract more fragile. And kPayloadHeaderSize bytes for the empty case is not meaningful over-allocation.
  2. Length validation belongs in compress(), not the sizing function. If non-8-byte-multiple inputs are rejected by compress(), the bound is never consumed and the "inconsistency" has no effect. If they aren't rejected, the bug is in FForCodec::maxCompressedLength() itself — fix it there, not by
    adding a guard one layer up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants