arrow-select: optimise coalesced takes for primitive and view arrays#9758
arrow-select: optimise coalesced takes for primitive and view arrays#9758ClSlaid wants to merge 3 commits intoapache:mainfrom
Conversation
## What - add a direct `push_batch_with_indices` path in `BatchCoalescer` for primitive, `Utf8View`, and `BinaryView` columns when the index array is integer typed and non-null - teach the coalescer internals to copy indexed primitive and view values directly into in-progress output buffers instead of materialising an intermediate taken `RecordBatch` - add dedicated `take:` coalesce benchmarks and new indexed coalescing tests for primitive, `Utf8View`, and `BinaryView` inputs ## How - route supported batches through a direct indices path that chunks the input indices across coalesced output batch boundaries and reuses the existing in-progress array builders - specialise `InProgressPrimitiveArray::copy_indices` to gather values and build the taken null mask directly from the source array - specialise `InProgressByteViewArray::copy_indices` to gather selected views and nulls directly, compute buffer compaction from the selected views, and lazily compute whole-array buffer usage only when the row-copy path needs it - keep unsupported types on the existing `take_record_batch` fallback so the optimisation only applies where the benchmark data shows it is profitable ## Why It Works - the previous `push_batch_with_indices` implementation always paid to allocate and populate a temporary taken `RecordBatch` before coalescing - for primitive and view arrays, the coalescer can write the selected rows straight into its output builders, avoiding that extra batch materialisation and the extra copy it implies - the view-array path remains safe because it preserves the existing reuse-vs-compact behaviour, but bases sparse compaction decisions on the actually selected views rather than the whole source batch ## Tests And Validation - added indexed coalescing tests for mixed primitive, mixed `Utf8View`, mixed `BinaryView`, and `Utf8` fallback behaviour in `arrow-select/src/coalesce.rs` - added `take:` coalesce benchmarks in `arrow/benches/coalesce_kernels.rs` covering primitive, `Utf8View`, `BinaryView`, `Utf8`, and dictionary-backed schemas - validated with: - `cargo test -p arrow-select coalesce --lib` - `cargo clippy -p arrow-select --lib --tests -- -D warnings` - `cargo clippy -p arrow --bench coalesce_kernels --features test_utils -- -D warnings` ## Benchmark Summary - `take: primitive, 8192, nulls: 0, selectivity: 0.01`: `3.5194-3.5796 ms` -> `1.8780-1.9136 ms` - `take: primitive, 8192, nulls: 0.1, selectivity: 0.01`: `5.5208-5.5708 ms` -> `4.0016-4.1647 ms` - `take: primitive, 8192, nulls: 0, selectivity: 0.001`: `23.684-23.813 ms` -> `5.9713-6.0137 ms` - `take: single_utf8view, 8192, nulls: 0, selectivity: 0.01`: `3.0301-3.0830 ms` -> `2.4513-2.4854 ms` - `take: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01`: `1.8643-1.8823 ms` -> `1.2706-1.2856 ms` - `take: single_binaryview, 8192, nulls: 0, selectivity: 0.01`: `3.1346-3.2991 ms` -> `2.7578-2.8539 ms` - `take: mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.01`: `1.9634-2.0215 ms` -> `1.4117-1.4383 ms` Signed-off-by: cl <cailue@apache.org>
Signed-off-by: 蔡略 <cailue@apache.org>
0a6096a to
0bdc485
Compare
|
run benchmark coalesce_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-pr-8991-followup (67ca60c) to 51b02f1 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
@ClSlaid could you add the benchmark extensions in another PR? |
Summary
BatchCoalescer::push_batch_with_indicespath for primitive,Utf8View, andBinaryViewcolumns when the indices are integer typed and non-nullRecordBatchtake_record_batchfallback; benchmark work on this branch showed widening the direct path beyond primitive and view arrays regressedUtf8and dictionary-backed casesTesting
cargo test -p arrow-select coalesce --libcargo clippy -p arrow-select --lib --tests -- -D warningscargo clippy -p arrow --bench coalesce_kernels --features test_utils -- -D warningscargo clippy --workspace --all-targets -- -D warningsBenchmarks
take: primitive, 8192, nulls: 0, selectivity: 0.01:3.5194-3.5796 ms->1.8780-1.9136 mstake: primitive, 8192, nulls: 0.1, selectivity: 0.01:5.5208-5.5708 ms->4.0016-4.1647 mstake: primitive, 8192, nulls: 0, selectivity: 0.001:23.684-23.813 ms->5.9713-6.0137 mstake: single_utf8view, 8192, nulls: 0, selectivity: 0.01:3.0301-3.0830 ms->2.4513-2.4854 mstake: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01:1.8643-1.8823 ms->1.2706-1.2856 mstake: single_binaryview, 8192, nulls: 0, selectivity: 0.01:3.1346-3.2991 ms->2.7578-2.8539 mstake: mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.01:1.9634-2.0215 ms->1.4117-1.4383 ms