arrow-select: optimise coalesced takes for primitive and view arrays by ClSlaid · Pull Request #9758 · apache/arrow-rs

ClSlaid · 2026-04-18T08:44:43Z

Summary

add a direct BatchCoalescer::push_batch_with_indices path for primitive, Utf8View, and BinaryView columns when the indices are integer typed and non-null
specialise indexed copying for primitive and byte-view in-progress arrays so supported schemas can coalesce rows directly without materialising an intermediate taken RecordBatch
keep other data types on the existing take_record_batch fallback; benchmark work on this branch showed widening the direct path beyond primitive and view arrays regressed Utf8 and dictionary-backed cases

Testing

cargo test -p arrow-select coalesce --lib
cargo clippy -p arrow-select --lib --tests -- -D warnings
cargo clippy -p arrow --bench coalesce_kernels --features test_utils -- -D warnings
cargo clippy --workspace --all-targets -- -D warnings

Benchmarks

take: primitive, 8192, nulls: 0, selectivity: 0.01: 3.5194-3.5796 ms -> 1.8780-1.9136 ms
take: primitive, 8192, nulls: 0.1, selectivity: 0.01: 5.5208-5.5708 ms -> 4.0016-4.1647 ms
take: primitive, 8192, nulls: 0, selectivity: 0.001: 23.684-23.813 ms -> 5.9713-6.0137 ms
take: single_utf8view, 8192, nulls: 0, selectivity: 0.01: 3.0301-3.0830 ms -> 2.4513-2.4854 ms
take: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01: 1.8643-1.8823 ms -> 1.2706-1.2856 ms
take: single_binaryview, 8192, nulls: 0, selectivity: 0.01: 3.1346-3.2991 ms -> 2.7578-2.8539 ms
take: mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.01: 1.9634-2.0215 ms -> 1.4117-1.4383 ms

## What - add a direct `push_batch_with_indices` path in `BatchCoalescer` for primitive, `Utf8View`, and `BinaryView` columns when the index array is integer typed and non-null - teach the coalescer internals to copy indexed primitive and view values directly into in-progress output buffers instead of materialising an intermediate taken `RecordBatch` - add dedicated `take:` coalesce benchmarks and new indexed coalescing tests for primitive, `Utf8View`, and `BinaryView` inputs ## How - route supported batches through a direct indices path that chunks the input indices across coalesced output batch boundaries and reuses the existing in-progress array builders - specialise `InProgressPrimitiveArray::copy_indices` to gather values and build the taken null mask directly from the source array - specialise `InProgressByteViewArray::copy_indices` to gather selected views and nulls directly, compute buffer compaction from the selected views, and lazily compute whole-array buffer usage only when the row-copy path needs it - keep unsupported types on the existing `take_record_batch` fallback so the optimisation only applies where the benchmark data shows it is profitable ## Why It Works - the previous `push_batch_with_indices` implementation always paid to allocate and populate a temporary taken `RecordBatch` before coalescing - for primitive and view arrays, the coalescer can write the selected rows straight into its output builders, avoiding that extra batch materialisation and the extra copy it implies - the view-array path remains safe because it preserves the existing reuse-vs-compact behaviour, but bases sparse compaction decisions on the actually selected views rather than the whole source batch ## Tests And Validation - added indexed coalescing tests for mixed primitive, mixed `Utf8View`, mixed `BinaryView`, and `Utf8` fallback behaviour in `arrow-select/src/coalesce.rs` - added `take:` coalesce benchmarks in `arrow/benches/coalesce_kernels.rs` covering primitive, `Utf8View`, `BinaryView`, `Utf8`, and dictionary-backed schemas - validated with: - `cargo test -p arrow-select coalesce --lib` - `cargo clippy -p arrow-select --lib --tests -- -D warnings` - `cargo clippy -p arrow --bench coalesce_kernels --features test_utils -- -D warnings` ## Benchmark Summary - `take: primitive, 8192, nulls: 0, selectivity: 0.01`: `3.5194-3.5796 ms` -> `1.8780-1.9136 ms` - `take: primitive, 8192, nulls: 0.1, selectivity: 0.01`: `5.5208-5.5708 ms` -> `4.0016-4.1647 ms` - `take: primitive, 8192, nulls: 0, selectivity: 0.001`: `23.684-23.813 ms` -> `5.9713-6.0137 ms` - `take: single_utf8view, 8192, nulls: 0, selectivity: 0.01`: `3.0301-3.0830 ms` -> `2.4513-2.4854 ms` - `take: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01`: `1.8643-1.8823 ms` -> `1.2706-1.2856 ms` - `take: single_binaryview, 8192, nulls: 0, selectivity: 0.01`: `3.1346-3.2991 ms` -> `2.7578-2.8539 ms` - `take: mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.01`: `1.9634-2.0215 ms` -> `1.4117-1.4383 ms` Signed-off-by: cl <cailue@apache.org>

ClSlaid · 2026-04-18T09:02:06Z

/cc @alamb please have a look, this is a successor PR of #8991

Signed-off-by: 蔡略 <cailue@apache.org>

Dandandan · 2026-04-23T05:40:12Z

run benchmark coalesce_kernels

adriangbot · 2026-04-23T05:43:53Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4301979475-1773-wq8lp 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing optimize-pr-8991-followup (67ca60c) to 51b02f1 (merge-base) diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench coalesce_kernels
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-23T06:16:21Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                                                              main                                   optimize-pr-8991-followup
-----                                                                                              ----                                   -------------------------
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                                             1.00    119.1±6.21ms        ? ?/sec    1.09    129.5±6.28ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                              1.01      5.6±0.03ms        ? ?/sec    1.00      5.5±0.03ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                               1.00      2.8±0.01ms        ? ?/sec    1.03      2.8±0.01ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                               1.01      2.3±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                                           1.00    164.9±2.32ms        ? ?/sec    1.06    175.1±6.06ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                                            1.00      6.4±0.12ms        ? ?/sec    1.01      6.5±0.06ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                                             1.00      3.3±0.10ms        ? ?/sec    1.00      3.3±0.05ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                                             1.01      2.7±0.02ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                                             1.00     43.4±0.36ms        ? ?/sec    1.00     43.3±0.16ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                              1.00      7.7±0.03ms        ? ?/sec    1.01      7.8±0.03ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                               1.00      4.1±0.01ms        ? ?/sec    1.04      4.2±0.01ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                               1.00      2.7±0.02ms        ? ?/sec    1.01      2.7±0.01ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                                           1.00     54.9±0.26ms        ? ?/sec    1.00     55.0±0.27ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                                            1.00      9.4±0.11ms        ? ?/sec    1.01      9.5±0.12ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                                             1.00      5.0±0.08ms        ? ?/sec    1.03      5.1±0.10ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                                             1.00      3.3±0.02ms        ? ?/sec    1.02      3.4±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001                    1.00     37.7±0.34ms        ? ?/sec    1.00     37.6±0.23ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01                     1.00      4.3±0.02ms        ? ?/sec    1.00      4.3±0.01ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1                      1.00      2.1±0.01ms        ? ?/sec    1.03      2.2±0.00ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8                      1.00   1422.1±5.99µs        ? ?/sec    1.02   1455.2±5.77µs        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001                  1.00     47.6±0.22ms        ? ?/sec    1.01     48.0±0.16ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01                   1.00      6.3±0.06ms        ? ?/sec    1.00      6.3±0.08ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1                    1.00      3.2±0.06ms        ? ?/sec    1.02      3.3±0.06ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8                    1.00  1430.3±11.06µs        ? ?/sec    1.00  1432.1±10.41µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001                     1.00     33.9±0.30ms        ? ?/sec    1.00     33.9±0.14ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01                      1.00      3.4±0.02ms        ? ?/sec    1.02      3.4±0.01ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1                       1.00   1267.7±5.39µs        ? ?/sec    1.02   1291.0±2.63µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8                       1.00    615.2±2.31µs        ? ?/sec    1.00    613.5±1.76µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001                   1.00     43.9±0.24ms        ? ?/sec    1.00     44.0±0.12ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01                    1.00      5.6±0.06ms        ? ?/sec    1.00      5.6±0.08ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1                     1.00      2.5±0.06ms        ? ?/sec    1.02      2.5±0.06ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8                     1.00  1427.9±11.19µs        ? ?/sec    1.01  1437.0±10.29µs        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.001                                              1.00     74.2±0.78ms        ? ?/sec    1.02     75.9±0.66ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.01                                               1.00      6.9±0.06ms        ? ?/sec    1.00      6.9±0.04ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.1                                                1.00  1842.8±12.64µs        ? ?/sec    1.03   1896.7±6.92µs        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.8                                                1.00   1130.5±5.83µs        ? ?/sec    1.02   1148.7±5.10µs        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001                                            1.00    103.0±0.88ms        ? ?/sec    1.00    103.3±0.77ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.01                                             1.00     12.4±0.28ms        ? ?/sec    1.00     12.5±0.33ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.1                                              1.00      4.9±0.21ms        ? ?/sec    1.02      4.9±0.26ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.8                                              1.00      3.2±0.03ms        ? ?/sec    1.01      3.2±0.03ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                                        1.01     53.8±0.80ms        ? ?/sec    1.00     53.4±0.34ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                                         1.01      5.9±0.05ms        ? ?/sec    1.00      5.9±0.01ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                                          1.00      2.1±0.01ms        ? ?/sec    1.01      2.1±0.01ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                                          1.00    943.3±2.18µs        ? ?/sec    1.00    941.8±3.29µs        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                                      1.00     69.8±0.53ms        ? ?/sec    1.00     69.6±0.21ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                                       1.01      8.4±0.07ms        ? ?/sec    1.00      8.3±0.06ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                                        1.00      3.0±0.02ms        ? ?/sec    1.01      3.0±0.02ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                                        1.00   1833.1±4.22µs        ? ?/sec    1.00   1828.7±3.65µs        ? ?/sec
take: mixed_binaryview (max_string_len=128), 8192, nulls: 0, selectivity: 0.001                                                           1.00      5.8±0.02ms        ? ?/sec
take: mixed_binaryview (max_string_len=128), 8192, nulls: 0, selectivity: 0.01                                                            1.00      2.4±0.00ms        ? ?/sec
take: mixed_binaryview (max_string_len=128), 8192, nulls: 0, selectivity: 0.1                                                             1.00      2.6±0.00ms        ? ?/sec
take: mixed_binaryview (max_string_len=128), 8192, nulls: 0, selectivity: 0.8                                                             1.00   1948.7±2.39µs        ? ?/sec
take: mixed_binaryview (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001                                                         1.00     11.5±0.04ms        ? ?/sec
take: mixed_binaryview (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01                                                          1.00      3.4±0.00ms        ? ?/sec
take: mixed_binaryview (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1                                                           1.00      3.1±0.01ms        ? ?/sec
take: mixed_binaryview (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8                                                           1.00   1528.9±3.21µs        ? ?/sec
take: mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.001                                                            1.00      5.2±0.01ms        ? ?/sec
take: mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.01                                                             1.00   1448.8±4.61µs        ? ?/sec
take: mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.1                                                              1.00   1504.6±3.24µs        ? ?/sec
take: mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.8                                                              1.00   1111.0±4.23µs        ? ?/sec
take: mixed_binaryview (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001                                                          1.00     10.8±0.03ms        ? ?/sec
take: mixed_binaryview (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01                                                           1.00      2.5±0.02ms        ? ?/sec
take: mixed_binaryview (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1                                                            1.00      2.1±0.00ms        ? ?/sec
take: mixed_binaryview (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8                                                            1.00   1265.9±4.05µs        ? ?/sec
take: mixed_dict, 8192, nulls: 0, selectivity: 0.001                                                                                      1.00    109.0±1.80ms        ? ?/sec
take: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                                                                       1.00      5.6±0.03ms        ? ?/sec
take: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                                                                        1.00      3.8±0.02ms        ? ?/sec
take: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                                                                        1.00      2.0±0.01ms        ? ?/sec
take: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                                                                                    1.00    104.3±1.76ms        ? ?/sec
take: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                                                                                     1.00      6.0±0.01ms        ? ?/sec
take: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                                                                                      1.00      3.7±0.02ms        ? ?/sec
take: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                                                                                      1.00      2.1±0.01ms        ? ?/sec
take: mixed_utf8 extra_large_repeat, input: 8192, output: 131072, nulls: 0, biggest: Some(1024)                                           1.00     47.3±1.52ms        ? ?/sec
take: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                                                                                      1.00     25.0±0.07ms        ? ?/sec
take: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                                                                       1.00      7.0±0.04ms        ? ?/sec
take: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                                                                        1.00      5.4±0.03ms        ? ?/sec
take: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                                                                        1.00      4.6±0.04ms        ? ?/sec
take: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                                                                                    1.00     33.4±0.09ms        ? ?/sec
take: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                                                                                     1.00      8.4±0.03ms        ? ?/sec
take: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                                                                                      1.00      5.8±0.03ms        ? ?/sec
take: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                                                                                      1.00      4.8±0.03ms        ? ?/sec
take: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001                                                             1.00      5.7±0.03ms        ? ?/sec
take: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01                                                              1.00      2.4±0.00ms        ? ?/sec
take: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1                                                               1.00      2.4±0.00ms        ? ?/sec
take: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8                                                               1.00   1926.2±2.56µs        ? ?/sec
take: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001                                                           1.00     11.4±0.04ms        ? ?/sec
take: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01                                                            1.00      3.4±0.01ms        ? ?/sec
take: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1                                                             1.00      2.9±0.01ms        ? ?/sec
take: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8                                                             1.00   1520.7±3.85µs        ? ?/sec
take: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001                                                              1.00      5.1±0.03ms        ? ?/sec
take: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01                                                               1.00   1423.2±7.08µs        ? ?/sec
take: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1                                                                1.00   1223.1±1.60µs        ? ?/sec
take: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8                                                                1.00    703.3±1.97µs        ? ?/sec
take: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001                                                            1.00     10.7±0.04ms        ? ?/sec
take: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01                                                             1.00      2.5±0.02ms        ? ?/sec
take: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1                                                              1.00   1807.2±4.43µs        ? ?/sec
take: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8                                                              1.00   1305.7±3.14µs        ? ?/sec
take: primitive extra_large_repeat, input: 8192, output: 131072, nulls: 0, biggest: Some(1024)                                            1.00   1562.3±2.51µs        ? ?/sec
take: primitive, 8192, nulls: 0, selectivity: 0.001                                                                                       1.00      9.9±0.03ms        ? ?/sec
take: primitive, 8192, nulls: 0, selectivity: 0.01                                                                                        1.00      2.7±0.01ms        ? ?/sec
take: primitive, 8192, nulls: 0, selectivity: 0.1                                                                                         1.00   1639.0±2.17µs        ? ?/sec
take: primitive, 8192, nulls: 0, selectivity: 0.8                                                                                         1.00   1039.6±6.36µs        ? ?/sec
take: primitive, 8192, nulls: 0.1, selectivity: 0.001                                                                                     1.00     24.2±0.21ms        ? ?/sec
take: primitive, 8192, nulls: 0.1, selectivity: 0.01                                                                                      1.00      5.4±0.01ms        ? ?/sec
take: primitive, 8192, nulls: 0.1, selectivity: 0.1                                                                                       1.00      3.2±0.01ms        ? ?/sec
take: primitive, 8192, nulls: 0.1, selectivity: 0.8                                                                                       1.00      2.4±0.01ms        ? ?/sec
take: single_binaryview, 8192, nulls: 0, selectivity: 0.001                                                                               1.00      9.5±0.02ms        ? ?/sec
take: single_binaryview, 8192, nulls: 0, selectivity: 0.01                                                                                1.00      2.4±0.00ms        ? ?/sec
take: single_binaryview, 8192, nulls: 0, selectivity: 0.1                                                                                 1.00      3.0±0.00ms        ? ?/sec
take: single_binaryview, 8192, nulls: 0, selectivity: 0.8                                                                                 1.00   1343.3±2.37µs        ? ?/sec
take: single_binaryview, 8192, nulls: 0.1, selectivity: 0.001                                                                             1.00     14.6±0.03ms        ? ?/sec
take: single_binaryview, 8192, nulls: 0.1, selectivity: 0.01                                                                              1.00      3.2±0.00ms        ? ?/sec
take: single_binaryview, 8192, nulls: 0.1, selectivity: 0.1                                                                               1.00      3.4±0.01ms        ? ?/sec
take: single_binaryview, 8192, nulls: 0.1, selectivity: 0.8                                                                               1.00      2.3±0.00ms        ? ?/sec
take: single_utf8view, 8192, nulls: 0, selectivity: 0.001                                                                                 1.00     10.3±0.06ms        ? ?/sec
take: single_utf8view, 8192, nulls: 0, selectivity: 0.01                                                                                  1.00      2.4±0.00ms        ? ?/sec
take: single_utf8view, 8192, nulls: 0, selectivity: 0.1                                                                                   1.00      2.5±0.00ms        ? ?/sec
take: single_utf8view, 8192, nulls: 0, selectivity: 0.8                                                                                   1.00   1332.1±1.86µs        ? ?/sec
take: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                                                                               1.00     14.3±0.03ms        ? ?/sec
take: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                                                                                1.00      3.2±0.01ms        ? ?/sec
take: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                                                                                 1.00      2.8±0.00ms        ? ?/sec
take: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                                                                                 1.00      2.2±0.01ms        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	540.1s
Peak memory	3.1 GiB
Avg memory	3.0 GiB
CPU user	511.6s
CPU sys	25.6s
Peak spill	0 B

branch

Metric	Value
Wall time	1345.3s
Peak memory	3.1 GiB
Avg memory	3.0 GiB
CPU user	1285.6s
CPU sys	58.5s
Peak spill	0 B

File an issue against this benchmark runner

Dandandan · 2026-04-23T06:54:10Z

@ClSlaid could you add the benchmark extensions in another PR?
When merged we can run the benchmark runner on this PR.

github-actions Bot added the arrow Changes to the arrow crate label Apr 18, 2026

Dandandan reviewed Apr 18, 2026

View reviewed changes

Comment thread arrow-select/src/coalesce.rs Outdated

Dandandan reviewed Apr 18, 2026

View reviewed changes

Comment thread arrow-select/src/coalesce.rs Outdated

arrow-select: chunk oversized materialized takes

0bdc485

Signed-off-by: 蔡略 <cailue@apache.org>

ClSlaid force-pushed the optimize-pr-8991-followup branch from 0a6096a to 0bdc485 Compare April 18, 2026 17:21

ClSlaid mentioned this pull request Apr 18, 2026

Add deferred materialization for oversized coalesced takes in BatchCoalescer #9760

Open

arrow-select: shorten coalesce take test names

67ca60c

alamb added the performance label Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arrow-select: optimise coalesced takes for primitive and view arrays#9758

arrow-select: optimise coalesced takes for primitive and view arrays#9758
ClSlaid wants to merge 3 commits intoapache:mainfrom
ClSlaid:optimize-pr-8991-followup

ClSlaid commented Apr 18, 2026

Uh oh!

ClSlaid commented Apr 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Dandandan commented Apr 23, 2026

Uh oh!

adriangbot commented Apr 23, 2026

Uh oh!

adriangbot commented Apr 23, 2026

Uh oh!

Dandandan commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ClSlaid commented Apr 18, 2026

Summary

Testing

Benchmarks

Uh oh!

ClSlaid commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dandandan commented Apr 23, 2026

Uh oh!

adriangbot commented Apr 23, 2026

Uh oh!

adriangbot commented Apr 23, 2026

Uh oh!

Dandandan commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ClSlaid commented Apr 18, 2026 •

edited

Loading