chore: Update to arrow/parquet 59.0.0#22744
Conversation
| .bloom_filter_properties(&ColumnPath::from("")) | ||
| .expect("expected bloom properties!") | ||
| .fpp, | ||
| .fpp(), |
There was a problem hiding this comment.
These fields are made private in
| }; | ||
| if let Some(bloom_filter_ndv) = bloom_filter_ndv { | ||
| builder = builder.set_bloom_filter_ndv(*bloom_filter_ndv); | ||
| builder = builder.set_bloom_filter_max_ndv(*bloom_filter_ndv); |
There was a problem hiding this comment.
- due to feat(parquet): add BloomFilterPropertiesBuilder arrow-rs#9877 which deprecated set_bloom_filtr_ndv
| ndv: DEFAULT_BLOOM_FILTER_NDV | ||
| }), | ||
| Some( | ||
| &BloomFilterProperties::builder() |
There was a problem hiding this comment.
Properties are now built with builder:
| &[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], | ||
| &[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 5, 6], | ||
| ]))], | ||
| vec![Arc::new( |
There was a problem hiding this comment.
from impl was removed because it could panic
| scale: 2, | ||
| precision: 9, | ||
| }) | ||
| .with_logical_type(LogicalType::decimal(2, 9)) |
There was a problem hiding this comment.
have to use new helpers added in
| ] | ||
|
|
||
| [[package]] | ||
| name = "integer-encoding" |
There was a problem hiding this comment.
yay for removing older deps
| "cfg-if", | ||
| ] | ||
|
|
||
| [[package]] |
There was a problem hiding this comment.
no more thrift! We now use the entirely new thrift encoder and not the thrift generator
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
55c7eb3 to
e616241
Compare
| | alltypes_plain.parquet | 1851 | 8882 | 2 | page_index=false | | ||
| | alltypes_tiny_pages.parquet | 454233 | 269074 | 2 | page_index=true | | ||
| | lz4_raw_compressed_larger.parquet | 380836 | 1339 | 2 | page_index=false | | ||
| | alltypes_plain.parquet | 1851 | 8794 | 2 | page_index=false | |
There was a problem hiding this comment.
I think this changed (smaller in memory size) due to the representation change of CompressionCodec in this pr
It changes from Compression which also carries the compression level: ZSTD(ZstdLevel), GZIP(GzipLevel), BROTLI(BrotliLevel) — and ZstdLevel(i32) / GzipLevel(u32) / BrotliLevel(u32) and are 4-byte wrappers. So Compression = 4-byte discriminant + 4-byte level = 8 bytes.
To a fieldless enum CompressionCodec -- 1 byte
| Total Requests: 2 | ||
| - GET (opts) path=parquet_table.parquet head=true | ||
| - GET (ranges) path=parquet_table.parquet ranges=1064-1481,1481-1594,1594-2011,2011-2124 | ||
| - GET (ranges) path=parquet_table.parquet ranges=1064-1594,1594-2124 |
There was a problem hiding this comment.
this seems like an improvement -- contiguous ranges are coalesced into fewer ranges. I tracked it down to this PR from @HippoBaro
e616241 to
4d274ef
Compare
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing alamb/update_arrow_59 (4d274ef) to 18e7c8e (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing alamb/update_arrow_59 (4d274ef) to 18e7c8e (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing alamb/update_arrow_59 (4d274ef) to 18e7c8e (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
run benchmark clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing alamb/update_arrow_59 (4d274ef) to 18e7c8e (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
Performance looks about the same |
Which issue does this PR close?
59.0.0(May 2026) arrow-rs#9110WIP: I am using this PR to test the arrow release
Rationale for this change
Update to latest version of arrow/parquet
What changes are included in this PR?
Are these changes tested?
By CI
Are there any user-facing changes?
New dependency