Skip to content

feat(parquet): fuse level encoding with counting and histogram updates#9795

Open
HippoBaro wants to merge 1 commit intoapache:mainfrom
HippoBaro:fuse_lvl_encoding_hist_counting
Open

feat(parquet): fuse level encoding with counting and histogram updates#9795
HippoBaro wants to merge 1 commit intoapache:mainfrom
HippoBaro:fuse_lvl_encoding_hist_counting

Conversation

@HippoBaro
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

See #9731

What changes are included in this PR?

Add put_with_observer() to LevelEncoder that calls an FnMut(i16, usize) observer for each value during encoding. This allows callers to piggyback counting and histogram updates into the encoding pass without extra iterations over the level buffer.

Previously, write_mini_batch() made 3 separate passes over each level array: one to count non-null values or row boundaries, one to update the level histogram, and one to RLE-encode. Now all three operations happen in a single pass via the observer closure.

Replace LevelHistogram::update_from_levels() with a new LevelHistogram::increment_by() that accepts a count, and remove the now-unnecessary update_definition_level_histogram() and update_repetition_level_histogram() methods from PageMetrics.

Are these changes tested?

All tests passing; existing tests give 100% coverage.

Are there any user-facing changes?

None

@github-actions github-actions Bot added the parquet Changes to the parquet crate label Apr 23, 2026
@HippoBaro
Copy link
Copy Markdown
Contributor Author

HippoBaro commented Apr 23, 2026

@alamb Thank you for your continued reviews and feedback! This is the next PR is my series from #9653

Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, thanks @HippoBaro.

Comment thread parquet/src/file/metadata/mod.rs
Comment thread parquet/src/encodings/levels.rs Outdated
@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented Apr 23, 2026

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4301566533-1769-mccwf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fuse_lvl_encoding_hist_counting (849685c) to b93240a (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              fuse_lvl_encoding_hist_counting        main
-----                                              -------------------------------        ----
bool/bloom_filter                                  1.00     12.8±0.10ms    19.6 MB/sec    1.09     13.9±0.07ms    18.0 MB/sec
bool/cdc                                           1.00     15.3±0.12ms    16.3 MB/sec    1.08     16.5±0.07ms    15.2 MB/sec
bool/default                                       1.00     10.6±0.08ms    23.6 MB/sec    1.11     11.8±0.05ms    21.2 MB/sec
bool/parquet_2                                     1.00     14.3±0.10ms    17.5 MB/sec    1.08     15.4±0.05ms    16.3 MB/sec
bool/zstd                                          1.00     11.1±0.09ms    22.5 MB/sec    1.11     12.3±0.07ms    20.4 MB/sec
bool/zstd_parquet_2                                1.00     14.7±0.10ms    17.0 MB/sec    1.08     15.8±0.06ms    15.8 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.39ms    17.8 MB/sec    1.00      7.1±0.04ms    17.7 MB/sec
bool_non_null/cdc                                  1.01      6.9±0.05ms    18.2 MB/sec    1.00      6.8±0.02ms    18.4 MB/sec
bool_non_null/default                              1.00      4.2±0.02ms    29.9 MB/sec    1.02      4.3±0.03ms    29.2 MB/sec
bool_non_null/parquet_2                            1.00      8.9±0.03ms    14.0 MB/sec    1.01      9.0±0.03ms    13.9 MB/sec
bool_non_null/zstd                                 1.00      4.5±0.03ms    27.6 MB/sec    1.02      4.6±0.03ms    27.0 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.4±0.03ms    13.3 MB/sec    1.00      9.4±0.04ms    13.3 MB/sec
float_with_nans/bloom_filter                       1.00     94.1±1.16ms   148.9 MB/sec    1.00     93.7±1.65ms   149.4 MB/sec
float_with_nans/cdc                                1.00     82.2±1.07ms   170.3 MB/sec    1.00     82.1±0.94ms   170.5 MB/sec
float_with_nans/default                            1.01     74.6±0.67ms   187.8 MB/sec    1.00     73.9±0.22ms   189.4 MB/sec
float_with_nans/parquet_2                          1.02     96.0±1.34ms   145.9 MB/sec    1.00     94.2±0.28ms   148.7 MB/sec
float_with_nans/zstd                               1.00    113.3±0.62ms   123.6 MB/sec    1.00    113.0±0.90ms   123.9 MB/sec
float_with_nans/zstd_parquet_2                     1.01    132.9±1.42ms   105.4 MB/sec    1.00    131.0±0.26ms   106.9 MB/sec
list_primitive/bloom_filter                        1.05    338.7±8.69ms  1610.2 MB/sec    1.00    322.9±8.91ms  1689.1 MB/sec
list_primitive/cdc                                 1.00    347.8±5.04ms  1567.9 MB/sec    1.00    346.7±3.69ms  1573.2 MB/sec
list_primitive/default                             1.07    256.4±7.19ms     2.1 GB/sec    1.00    238.5±3.82ms     2.2 GB/sec
list_primitive/parquet_2                           1.03    269.7±8.02ms  2022.1 MB/sec    1.00    260.7±1.53ms     2.0 GB/sec
list_primitive/zstd                                1.01    492.8±7.53ms  1106.7 MB/sec    1.00    486.8±4.28ms  1120.3 MB/sec
list_primitive/zstd_parquet_2                      1.00    482.1±5.15ms  1131.2 MB/sec    1.00    483.0±1.52ms  1129.1 MB/sec
list_primitive_non_null/bloom_filter               1.00   422.0±11.73ms  1289.8 MB/sec    1.05   441.4±19.15ms  1233.0 MB/sec
list_primitive_non_null/cdc                        1.00    425.8±8.27ms  1278.1 MB/sec    1.05    446.7±8.81ms  1218.2 MB/sec
list_primitive_non_null/default                    1.00    289.4±8.90ms  1880.3 MB/sec    1.04    299.8±6.75ms  1815.3 MB/sec
list_primitive_non_null/parquet_2                  1.00   292.9±12.55ms  1858.2 MB/sec    1.15    336.8±3.37ms  1615.7 MB/sec
list_primitive_non_null/zstd                       1.00   705.5±12.46ms   771.4 MB/sec    1.02   719.9±16.02ms   756.0 MB/sec
list_primitive_non_null/zstd_parquet_2             1.02    691.2±9.00ms   787.4 MB/sec    1.00    680.9±8.01ms   799.3 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     19.7±0.19ms  1898.1 MB/sec    1.72     33.9±0.16ms  1102.2 MB/sec
list_primitive_sparse_99pct_null/cdc               1.00     30.1±0.08ms  1239.7 MB/sec    1.48     44.6±0.17ms   838.7 MB/sec
list_primitive_sparse_99pct_null/default           1.00     19.6±0.16ms  1907.5 MB/sec    1.71     33.4±0.17ms  1117.2 MB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     19.3±0.08ms  1940.7 MB/sec    1.74     33.5±0.15ms  1113.9 MB/sec
list_primitive_sparse_99pct_null/zstd              1.00     21.2±0.16ms  1762.5 MB/sec    1.69     35.8±0.16ms  1043.2 MB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     19.6±0.26ms  1904.6 MB/sec    1.71     33.6±0.06ms  1111.7 MB/sec
primitive/bloom_filter                             1.00    150.8±2.05ms   297.7 MB/sec    1.03    154.9±2.13ms   289.8 MB/sec
primitive/cdc                                      1.00    156.2±1.45ms   287.4 MB/sec    1.05    163.3±0.87ms   274.8 MB/sec
primitive/default                                  1.00    117.9±1.39ms   380.7 MB/sec    1.06    125.1±5.02ms   358.7 MB/sec
primitive/parquet_2                                1.00    133.0±1.10ms   337.4 MB/sec    1.04    138.0±0.99ms   325.1 MB/sec
primitive/zstd                                     1.00    147.5±1.30ms   304.2 MB/sec    1.04    154.2±1.33ms   291.1 MB/sec
primitive/zstd_parquet_2                           1.00    166.3±1.22ms   269.8 MB/sec    1.04    172.7±1.07ms   259.9 MB/sec
primitive_all_null/bloom_filter                    1.00     22.8±0.04ms  1971.7 MB/sec    1.72     39.2±0.10ms  1143.6 MB/sec
primitive_all_null/cdc                             1.00     39.3±0.14ms  1142.3 MB/sec    1.41     55.4±0.15ms   810.2 MB/sec
primitive_all_null/default                         1.00     22.2±0.05ms  2025.5 MB/sec    1.73     38.3±0.04ms  1170.7 MB/sec
primitive_all_null/parquet_2                       1.00     22.2±0.05ms  2024.3 MB/sec    1.73     38.3±0.03ms  1171.4 MB/sec
primitive_all_null/zstd                            1.00     22.3±0.06ms  2013.1 MB/sec    1.73     38.5±0.06ms  1166.3 MB/sec
primitive_all_null/zstd_parquet_2                  1.00     22.3±0.05ms  2016.2 MB/sec    1.73     38.5±0.07ms  1167.1 MB/sec
primitive_non_null/bloom_filter                    1.02    116.9±2.31ms   376.4 MB/sec    1.00    115.1±2.90ms   382.2 MB/sec
primitive_non_null/cdc                             1.01     90.9±1.11ms   484.3 MB/sec    1.00     90.4±0.32ms   486.9 MB/sec
primitive_non_null/default                         1.00     68.2±0.63ms   645.1 MB/sec    1.02     69.7±1.45ms   631.1 MB/sec
primitive_non_null/parquet_2                       1.00     90.2±0.69ms   487.7 MB/sec    1.01     91.2±1.03ms   482.2 MB/sec
primitive_non_null/zstd                            1.01    106.9±1.11ms   411.8 MB/sec    1.00    106.0±0.40ms   415.2 MB/sec
primitive_non_null/zstd_parquet_2                  1.00    130.8±2.26ms   336.3 MB/sec    1.01    131.6±2.34ms   334.3 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     28.3±0.41ms  1585.4 MB/sec    1.62     45.8±0.29ms   979.9 MB/sec
primitive_sparse_99pct_null/cdc                    1.00     43.7±0.27ms  1026.3 MB/sec    1.41     61.5±0.23ms   729.7 MB/sec
primitive_sparse_99pct_null/default                1.00     26.3±0.26ms  1705.8 MB/sec    1.66     43.8±0.17ms  1025.4 MB/sec
primitive_sparse_99pct_null/parquet_2              1.00     26.1±0.10ms  1717.6 MB/sec    1.69     44.2±0.86ms  1016.0 MB/sec
primitive_sparse_99pct_null/zstd                   1.00     29.3±0.18ms  1533.2 MB/sec    1.61     47.2±0.22ms   950.6 MB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     28.2±0.30ms  1593.4 MB/sec    1.63     45.8±0.24ms   980.4 MB/sec
string/bloom_filter                                1.00   224.1±17.37ms     2.3 GB/sec    1.03   230.9±26.78ms     2.2 GB/sec
string/cdc                                         1.02    225.4±6.50ms     2.3 GB/sec    1.00    221.2±5.86ms     2.3 GB/sec
string/default                                     1.00   131.4±18.67ms     3.9 GB/sec    1.04   136.6±22.40ms     3.7 GB/sec
string/parquet_2                                   1.00    115.8±8.45ms     4.4 GB/sec    1.01    117.5±6.84ms     4.4 GB/sec
string/zstd                                        1.01    425.1±4.51ms  1233.2 MB/sec    1.00    420.4±4.65ms  1246.9 MB/sec
string/zstd_parquet_2                              1.02    405.7±9.20ms  1292.1 MB/sec    1.00    396.4±1.62ms  1322.6 MB/sec
string_and_binary_view/bloom_filter                1.04     68.0±1.93ms   474.6 MB/sec    1.00     65.6±0.85ms   491.4 MB/sec
string_and_binary_view/cdc                         1.00     58.8±0.32ms   548.6 MB/sec    1.03     60.4±0.78ms   534.2 MB/sec
string_and_binary_view/default                     1.00     48.7±0.19ms   662.7 MB/sec    1.03     50.0±0.78ms   645.2 MB/sec
string_and_binary_view/parquet_2                   1.00     59.5±0.31ms   542.2 MB/sec    1.01     60.2±0.21ms   535.4 MB/sec
string_and_binary_view/zstd                        1.00     86.0±0.77ms   375.1 MB/sec    1.01     86.9±1.30ms   371.1 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     74.5±0.92ms   432.9 MB/sec    1.00     74.3±0.31ms   434.1 MB/sec
string_dictionary/bloom_filter                     1.00     92.8±2.05ms     2.8 GB/sec    1.00     92.9±0.69ms     2.8 GB/sec
string_dictionary/cdc                              1.00     53.3±0.68ms     4.8 GB/sec    1.03     54.9±1.98ms     4.7 GB/sec
string_dictionary/default                          1.00     51.7±1.71ms     5.0 GB/sec    1.01     52.0±3.83ms     5.0 GB/sec
string_dictionary/parquet_2                        1.00     54.9±0.59ms     4.7 GB/sec    1.00     55.2±0.56ms     4.7 GB/sec
string_dictionary/zstd                             1.00    213.2±2.93ms  1238.8 MB/sec    1.00    212.7±2.80ms  1241.6 MB/sec
string_dictionary/zstd_parquet_2                   1.00    200.2±1.05ms  1319.5 MB/sec    1.00    199.6±0.90ms  1323.2 MB/sec
string_non_null/bloom_filter                       1.05   266.4±17.06ms  1967.3 MB/sec    1.00   253.3±17.60ms     2.0 GB/sec
string_non_null/cdc                                1.01    275.4±8.09ms  1902.7 MB/sec    1.00   271.8±10.18ms  1927.7 MB/sec
string_non_null/default                            1.00   139.6±16.41ms     3.7 GB/sec    1.00   139.4±14.48ms     3.7 GB/sec
string_non_null/parquet_2                          1.00    139.1±9.60ms     3.7 GB/sec    1.09    151.1±2.23ms     3.4 GB/sec
string_non_null/zstd                               1.00    539.6±7.30ms   971.1 MB/sec    1.05    568.3±8.71ms   922.1 MB/sec
string_non_null/zstd_parquet_2                     1.00    521.5±8.12ms  1004.8 MB/sec    1.00    521.1±5.43ms  1005.6 MB/sec
struct_all_null/bloom_filter                       1.00      9.1±0.06ms  1776.2 MB/sec    1.76     16.0±0.03ms  1009.0 MB/sec
struct_all_null/cdc                                1.00     15.4±0.04ms  1044.2 MB/sec    1.46     22.5±0.12ms   715.8 MB/sec
struct_all_null/default                            1.00      8.7±0.02ms  1848.7 MB/sec    1.80     15.7±0.04ms  1028.9 MB/sec
struct_all_null/parquet_2                          1.00      8.7±0.01ms  1855.7 MB/sec    1.80     15.7±0.04ms  1030.3 MB/sec
struct_all_null/zstd                               1.00      8.8±0.02ms  1834.7 MB/sec    1.79     15.7±0.05ms  1025.2 MB/sec
struct_all_null/zstd_parquet_2                     1.00      8.7±0.02ms  1846.8 MB/sec    1.79     15.7±0.03ms  1029.7 MB/sec
struct_non_null/bloom_filter                       1.00     55.3±1.04ms   289.4 MB/sec    1.12     62.1±0.70ms   257.8 MB/sec
struct_non_null/cdc                                1.00     51.2±0.16ms   312.7 MB/sec    1.14     58.3±0.13ms   274.3 MB/sec
struct_non_null/default                            1.00     38.9±0.13ms   411.3 MB/sec    1.20     46.6±0.46ms   343.0 MB/sec
struct_non_null/parquet_2                          1.00     47.9±0.55ms   334.2 MB/sec    1.14     54.5±0.13ms   293.4 MB/sec
struct_non_null/zstd                               1.00     48.3±0.52ms   331.4 MB/sec    1.14     55.3±0.57ms   289.5 MB/sec
struct_non_null/zstd_parquet_2                     1.00     61.8±0.54ms   258.8 MB/sec    1.12     68.9±0.47ms   232.1 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00     11.6±0.31ms  1391.0 MB/sec    1.66     19.2±0.12ms   839.9 MB/sec
struct_sparse_99pct_null/cdc                       1.00     17.7±0.08ms   908.6 MB/sec    1.44     25.5±0.21ms   631.6 MB/sec
struct_sparse_99pct_null/default                   1.00     11.0±0.21ms  1465.9 MB/sec    1.69     18.6±0.09ms   865.8 MB/sec
struct_sparse_99pct_null/parquet_2                 1.00     10.8±0.04ms  1491.3 MB/sec    1.71     18.5±0.08ms   872.3 MB/sec
struct_sparse_99pct_null/zstd                      1.00     12.4±0.17ms  1305.5 MB/sec    1.61     19.9±0.16ms   810.7 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00     11.6±0.14ms  1392.3 MB/sec    1.65     19.1±0.08ms   843.3 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1975.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1918.3s
CPU sys 55.3s
Peak spill 0 B

branch

Metric Value
Wall time 1940.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1881.7s
CPU sys 55.7s
Peak spill 0 B

File an issue against this benchmark runner

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HippoBaro -- I think this looks really close. Thank you @etseidl for the review

Comment thread parquet/src/column/writer/mod.rs Outdated
self.rep_levels_encoder
.put_with_observer(levels, |level, count| {
new_rows += (count as u32) * (level == 0) as u32;
if let Some(ref mut h) = self.page_metrics.repetition_level_histogram {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to move this check out of the loop (so call put if self.page_metrics.repetition_level_histogram is none and and call with_with_observer if it s some

Maybe could improve the inner loop even more

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick test of that idea earlier and it didn't seem worth the added complexity. But it probably deserves a second look on better hardware 😄.

Copy link
Copy Markdown
Contributor Author

@HippoBaro HippoBaro Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did the experiment locally with this:

match self.page_metrics.definition_level_histogram.as_mut() {
    Some(histogram) => encoder.put_with_observer(levels, |level, count| {
        values_to_write += count * (level == max_def) as usize;
        histogram.increment_by(level, count as i64);
    }),
    None => encoder.put_with_observer(levels, |level, count| {
        values_to_write += count * (level == max_def) as usize;
    }),
};

Benchmarks show a 2-3% improvements on average for list_primitive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed the above change!

@HippoBaro HippoBaro force-pushed the fuse_lvl_encoding_hist_counting branch from 849685c to 8c4bd85 Compare April 24, 2026 00:44
Add `put_with_observer()` to `LevelEncoder` so callers can piggyback
row/null counting and histogram updates onto level encoding without
extra passes over the level buffers.

Update `write_mini_batch()` to encode definition and repetition levels
while collecting the associated metrics in the same pass, and hoist the
histogram-enabled branch out of the inner loop.

Add `LevelHistogram::increment_by()` for counted updates, keep
`update_from_levels()` as a deprecated compatibility wrapper, and
remove the now-unnecessary PageMetrics histogram update helpers.

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
@HippoBaro HippoBaro force-pushed the fuse_lvl_encoding_hist_counting branch from 8c4bd85 to 28f25c0 Compare April 24, 2026 00:47
@HippoBaro
Copy link
Copy Markdown
Contributor Author

@alamb @etseidl Thank you both! The branch is updated with your feedback 🙇

@HippoBaro
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

@HippoBaro
Copy link
Copy Markdown
Contributor Author

Hi @HippoBaro, thanks for the request (#9795 (comment)). Only whitelisted users can trigger benchmarks.

I thought so 😅 Was worth a try!

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented Apr 24, 2026

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4309702603-1798-fx7gw 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing fuse_lvl_encoding_hist_counting (28f25c0) to b93240a (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              fuse_lvl_encoding_hist_counting        main
-----                                              -------------------------------        ----
bool/bloom_filter                                  1.00     12.5±0.06ms    20.0 MB/sec    1.11     13.9±0.07ms    18.0 MB/sec
bool/cdc                                           1.00     15.1±0.12ms    16.5 MB/sec    1.09     16.6±0.08ms    15.1 MB/sec
bool/default                                       1.00     10.4±0.04ms    24.1 MB/sec    1.13     11.7±0.07ms    21.3 MB/sec
bool/parquet_2                                     1.00     14.1±0.05ms    17.7 MB/sec    1.09     15.4±0.06ms    16.3 MB/sec
bool/zstd                                          1.00     10.9±0.05ms    23.0 MB/sec    1.12     12.2±0.06ms    20.5 MB/sec
bool/zstd_parquet_2                                1.00     14.5±0.07ms    17.3 MB/sec    1.09     15.8±0.06ms    15.9 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.04ms    17.9 MB/sec    1.04      7.2±0.03ms    17.3 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.10ms    18.5 MB/sec    1.03      7.0±0.03ms    17.9 MB/sec
bool_non_null/default                              1.00      4.2±0.03ms    29.8 MB/sec    1.02      4.3±0.02ms    29.3 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.04ms    14.0 MB/sec    1.00      9.0±0.03ms    13.9 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.2 MB/sec    1.01      4.6±0.03ms    27.0 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.4±0.05ms    13.3 MB/sec    1.00      9.4±0.05ms    13.3 MB/sec
float_with_nans/bloom_filter                       1.01     95.1±2.30ms   147.2 MB/sec    1.00     94.4±1.81ms   148.4 MB/sec
float_with_nans/cdc                                1.00     83.4±1.63ms   167.9 MB/sec    1.00     83.1±1.27ms   168.5 MB/sec
float_with_nans/default                            1.01     75.8±1.32ms   184.7 MB/sec    1.00     75.3±1.33ms   185.8 MB/sec
float_with_nans/parquet_2                          1.00     96.7±2.06ms   144.8 MB/sec    1.01     97.3±1.50ms   143.9 MB/sec
float_with_nans/zstd                               1.00    113.4±1.11ms   123.5 MB/sec    1.00    113.2±1.21ms   123.6 MB/sec
float_with_nans/zstd_parquet_2                     1.01    135.4±2.16ms   103.4 MB/sec    1.00    133.5±2.21ms   104.9 MB/sec
list_primitive/bloom_filter                        1.00   360.5±12.85ms  1512.7 MB/sec    1.03   371.3±13.70ms  1468.8 MB/sec
list_primitive/cdc                                 1.00    359.0±8.40ms  1519.2 MB/sec    1.01   363.7±12.49ms  1499.5 MB/sec
list_primitive/default                             1.00    273.1±5.63ms  1997.3 MB/sec    1.02    279.8±5.03ms  1948.8 MB/sec
list_primitive/parquet_2                           1.01    289.3±2.97ms  1885.0 MB/sec    1.00    286.7±5.31ms  1902.5 MB/sec
list_primitive/zstd                                1.00    515.5±6.44ms  1058.0 MB/sec    1.01    519.7±6.35ms  1049.4 MB/sec
list_primitive/zstd_parquet_2                      1.00    482.7±6.72ms  1129.7 MB/sec    1.02    493.3±7.60ms  1105.5 MB/sec
list_primitive_non_null/bloom_filter               1.00   468.2±24.92ms  1162.3 MB/sec    1.01   473.3±21.40ms  1149.8 MB/sec
list_primitive_non_null/cdc                        1.00   432.0±11.38ms  1259.7 MB/sec    1.05   453.5±10.18ms  1200.2 MB/sec
list_primitive_non_null/default                    1.00   320.3±16.87ms  1699.4 MB/sec    1.00   319.7±12.02ms  1702.5 MB/sec
list_primitive_non_null/parquet_2                  1.00    306.1±7.74ms  1778.0 MB/sec    1.06   323.6±12.83ms  1681.8 MB/sec
list_primitive_non_null/zstd                       1.00   709.9±10.28ms   766.7 MB/sec    1.04   735.9±19.50ms   739.5 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    667.2±5.10ms   815.8 MB/sec    1.09    729.3±3.75ms   746.3 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     20.1±0.26ms  1862.2 MB/sec    1.68     33.7±0.34ms  1110.0 MB/sec
list_primitive_sparse_99pct_null/cdc               1.00     30.6±0.31ms  1220.7 MB/sec    1.45     44.4±0.35ms   842.0 MB/sec
list_primitive_sparse_99pct_null/default           1.00     19.6±0.20ms  1911.1 MB/sec    1.71     33.5±0.22ms  1116.3 MB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     19.7±0.21ms  1900.8 MB/sec    1.69     33.3±0.29ms  1122.7 MB/sec
list_primitive_sparse_99pct_null/zstd              1.00     21.6±0.19ms  1730.6 MB/sec    1.65     35.5±0.19ms  1051.3 MB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     20.1±0.07ms  1861.4 MB/sec    1.67     33.6±0.22ms  1112.6 MB/sec
primitive/bloom_filter                             1.00    149.0±3.52ms   301.2 MB/sec    1.06    158.5±3.22ms   283.1 MB/sec
primitive/cdc                                      1.00    155.9±1.78ms   287.9 MB/sec    1.05    163.1±1.81ms   275.1 MB/sec
primitive/default                                  1.00    116.6±1.76ms   384.9 MB/sec    1.06    124.0±1.55ms   362.0 MB/sec
primitive/parquet_2                                1.00    132.1±1.59ms   339.8 MB/sec    1.06    139.5±1.69ms   321.8 MB/sec
primitive/zstd                                     1.00    146.3±1.73ms   306.8 MB/sec    1.05    154.0±1.93ms   291.3 MB/sec
primitive/zstd_parquet_2                           1.00    165.0±2.42ms   272.0 MB/sec    1.05    173.7±1.60ms   258.4 MB/sec
primitive_all_null/bloom_filter                    1.00     23.0±0.13ms  1955.3 MB/sec    1.70     39.1±0.15ms  1147.5 MB/sec
primitive_all_null/cdc                             1.00     39.2±0.14ms  1143.8 MB/sec    1.42     55.5±0.24ms   808.0 MB/sec
primitive_all_null/default                         1.00     22.2±0.04ms  2021.9 MB/sec    1.73     38.3±0.10ms  1170.7 MB/sec
primitive_all_null/parquet_2                       1.00     22.2±0.05ms  2022.9 MB/sec    1.73     38.4±0.07ms  1168.1 MB/sec
primitive_all_null/zstd                            1.00     22.4±0.02ms  2004.1 MB/sec    1.72     38.5±0.11ms  1165.6 MB/sec
primitive_all_null/zstd_parquet_2                  1.00     22.3±0.05ms  2015.9 MB/sec    1.73     38.5±0.09ms  1166.8 MB/sec
primitive_non_null/bloom_filter                    1.00    117.6±3.16ms   374.1 MB/sec    1.00    118.1±3.55ms   372.6 MB/sec
primitive_non_null/cdc                             1.00     90.6±1.75ms   485.7 MB/sec    1.02     92.5±1.35ms   475.5 MB/sec
primitive_non_null/default                         1.01     69.1±1.41ms   636.8 MB/sec    1.00     68.6±1.53ms   641.1 MB/sec
primitive_non_null/parquet_2                       1.00     90.8±1.37ms   484.6 MB/sec    1.00     90.8±1.43ms   484.7 MB/sec
primitive_non_null/zstd                            1.01    108.2±1.68ms   406.5 MB/sec    1.00    107.7±1.90ms   408.7 MB/sec
primitive_non_null/zstd_parquet_2                  1.01    132.6±2.68ms   331.8 MB/sec    1.00    131.9±2.66ms   333.7 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     28.7±0.43ms  1563.9 MB/sec    1.60     45.8±0.54ms   978.9 MB/sec
primitive_sparse_99pct_null/cdc                    1.00     45.0±0.31ms   998.3 MB/sec    1.38     61.8±0.50ms   725.8 MB/sec
primitive_sparse_99pct_null/default                1.00     26.5±0.29ms  1690.8 MB/sec    1.66     44.1±0.21ms  1017.3 MB/sec
primitive_sparse_99pct_null/parquet_2              1.00     26.4±0.31ms  1696.8 MB/sec    1.66     43.9±0.31ms  1022.4 MB/sec
primitive_sparse_99pct_null/zstd                   1.00     29.9±0.29ms  1499.7 MB/sec    1.57     47.1±0.27ms   952.4 MB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     28.7±0.27ms  1565.9 MB/sec    1.61     46.1±0.29ms   974.0 MB/sec
string/bloom_filter                                1.02   232.5±24.42ms     2.2 GB/sec    1.00   228.0±23.52ms     2.2 GB/sec
string/cdc                                         1.01    227.3±9.43ms     2.3 GB/sec    1.00    224.9±6.06ms     2.3 GB/sec
string/default                                     1.00   136.7±23.17ms     3.7 GB/sec    1.06   145.3±24.08ms     3.5 GB/sec
string/parquet_2                                   1.00    110.7±5.98ms     4.6 GB/sec    1.09    121.0±2.15ms     4.2 GB/sec
string/zstd                                        1.00    437.6±6.12ms  1197.9 MB/sec    1.02   447.6±18.22ms  1171.4 MB/sec
string/zstd_parquet_2                              1.02   408.1±10.71ms  1284.6 MB/sec    1.00    399.5±2.83ms  1312.3 MB/sec
string_and_binary_view/bloom_filter                1.00     66.8±2.41ms   482.7 MB/sec    1.03     68.6±2.32ms   470.2 MB/sec
string_and_binary_view/cdc                         1.00     58.3±1.50ms   553.0 MB/sec    1.04     60.5±1.53ms   532.6 MB/sec
string_and_binary_view/default                     1.00     48.0±1.19ms   671.6 MB/sec    1.05     50.4±1.32ms   640.0 MB/sec
string_and_binary_view/parquet_2                   1.00     60.5±0.56ms   532.8 MB/sec    1.01     61.0±1.33ms   528.6 MB/sec
string_and_binary_view/zstd                        1.00     84.7±1.36ms   380.9 MB/sec    1.02     86.8±1.39ms   371.7 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     73.0±1.48ms   441.9 MB/sec    1.05     76.5±1.01ms   421.5 MB/sec
string_dictionary/bloom_filter                     1.27    120.6±8.46ms     2.1 GB/sec    1.00     94.9±7.24ms     2.7 GB/sec
string_dictionary/cdc                              1.00     67.7±4.24ms     3.8 GB/sec    1.45     98.4±4.29ms     2.6 GB/sec
string_dictionary/default                          1.35     68.5±2.68ms     3.8 GB/sec    1.00     50.7±2.32ms     5.1 GB/sec
string_dictionary/parquet_2                        1.01     56.6±4.88ms     4.6 GB/sec    1.00     55.8±1.63ms     4.6 GB/sec
string_dictionary/zstd                             1.02    220.1±4.41ms  1199.9 MB/sec    1.00    215.0±3.24ms  1228.4 MB/sec
string_dictionary/zstd_parquet_2                   1.00    200.1±1.51ms  1319.8 MB/sec    1.00    200.2±1.47ms  1319.6 MB/sec
string_non_null/bloom_filter                       1.02   272.4±21.37ms  1923.9 MB/sec    1.00   268.2±22.67ms  1953.6 MB/sec
string_non_null/cdc                                1.00   276.1±10.37ms  1897.6 MB/sec    1.01   279.9±11.35ms  1872.1 MB/sec
string_non_null/default                            1.00   142.4±11.79ms     3.6 GB/sec    1.07   153.1±15.37ms     3.3 GB/sec
string_non_null/parquet_2                          1.00    141.6±5.32ms     3.6 GB/sec    1.08    153.4±3.52ms     3.3 GB/sec
string_non_null/zstd                               1.00    564.5±8.88ms   928.3 MB/sec    1.03   578.6±16.42ms   905.6 MB/sec
string_non_null/zstd_parquet_2                     1.00    522.2±9.31ms  1003.5 MB/sec    1.00    521.9±8.51ms  1004.0 MB/sec
struct_all_null/bloom_filter                       1.00      9.1±0.07ms  1781.4 MB/sec    1.78     16.1±0.07ms  1002.8 MB/sec
struct_all_null/cdc                                1.00     15.6±0.15ms  1032.0 MB/sec    1.44     22.6±0.16ms   714.2 MB/sec
struct_all_null/default                            1.00      8.7±0.03ms  1851.1 MB/sec    1.80     15.7±0.04ms  1028.8 MB/sec
struct_all_null/parquet_2                          1.00      8.7±0.02ms  1847.3 MB/sec    1.79     15.6±0.05ms  1030.7 MB/sec
struct_all_null/zstd                               1.00      8.8±0.04ms  1837.8 MB/sec    1.80     15.8±0.06ms  1023.7 MB/sec
struct_all_null/zstd_parquet_2                     1.00      8.8±0.03ms  1841.5 MB/sec    1.79     15.7±0.06ms  1027.0 MB/sec
struct_non_null/bloom_filter                       1.00     56.8±1.51ms   281.7 MB/sec    1.16     65.8±1.08ms   243.2 MB/sec
struct_non_null/cdc                                1.00     52.2±0.84ms   306.4 MB/sec    1.14     59.3±0.93ms   269.9 MB/sec
struct_non_null/default                            1.00     39.8±0.79ms   402.2 MB/sec    1.18     46.9±0.70ms   341.3 MB/sec
struct_non_null/parquet_2                          1.00     49.0±0.51ms   326.7 MB/sec    1.13     55.2±0.76ms   289.6 MB/sec
struct_non_null/zstd                               1.00     48.5±0.83ms   329.7 MB/sec    1.15     56.0±0.77ms   285.8 MB/sec
struct_non_null/zstd_parquet_2                     1.00     62.1±0.90ms   257.6 MB/sec    1.11     68.7±0.88ms   232.9 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00     11.8±0.28ms  1361.0 MB/sec    1.65     19.6±0.05ms   822.5 MB/sec
struct_sparse_99pct_null/cdc                       1.00     18.2±0.35ms   884.3 MB/sec    1.41     25.7±0.28ms   628.5 MB/sec
struct_sparse_99pct_null/default                   1.00     11.1±0.20ms  1450.6 MB/sec    1.67     18.6±0.14ms   866.4 MB/sec
struct_sparse_99pct_null/parquet_2                 1.00     11.3±0.16ms  1425.1 MB/sec    1.64     18.5±0.18ms   870.0 MB/sec
struct_sparse_99pct_null/zstd                      1.00     12.5±0.21ms  1292.3 MB/sec    1.61     20.0±0.15ms   805.0 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00     12.0±0.22ms  1343.7 MB/sec    1.61     19.3±0.20ms   833.7 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 2015.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1921.3s
CPU sys 91.6s
Peak spill 0 B

branch

Metric Value
Wall time 1965.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1882.2s
CPU sys 81.6s
Peak spill 0 B

File an issue against this benchmark runner

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks very nice persoanlly. The only last reamining question in my mind is if we should be super safe and restore put with a deprecated note.

I think if we want to include it in the next arrow (58.2.0, I am hoping to cut the release in the next day or two) we should include the deprecation

If we are ok with waiting until arrow 59 (in a month or so) we can probably not worry about put

/// Put/encode levels vector into this level encoder.
/// Returns number of encoded values that are less than or equal to length of the
/// input buffer.
/// Put/encode levels vector into this level encoder and call
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make a put here that is deprecated that just calls through to put_with_observer?

Copy link
Copy Markdown
Contributor Author

@HippoBaro HippoBaro Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how you manage the release process. I’m happy to go either way as long as it doesn’t delay merging to main. If both options allow that, I don’t have a preference. Otherwise, I’d rather formally deprecate put than wait to remove it.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 24, 2026

Thanks @etseidl and @HippoBaro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants