parquet: fix panic in DeltaByteArrayDecoder on invalid prefix lengths#9797
parquet: fix panic in DeltaByteArrayDecoder on invalid prefix lengths#9797pchintar wants to merge 3 commits intoapache:mainfrom
Conversation
c463a0a to
e0201f5
Compare
| let prefix_len_i32 = self.prefix_lengths[self.current_idx]; | ||
| if prefix_len_i32 < 0 { | ||
| return Err(general_err!( | ||
| "Invalid DELTA_BYTE_ARRAY prefix length {}", | ||
| prefix_len_i32 | ||
| )); | ||
| } | ||
| let prefix_len = prefix_len_i32 as usize; |
There was a problem hiding this comment.
| let prefix_len_i32 = self.prefix_lengths[self.current_idx]; | |
| if prefix_len_i32 < 0 { | |
| return Err(general_err!( | |
| "Invalid DELTA_BYTE_ARRAY prefix length {}", | |
| prefix_len_i32 | |
| )); | |
| } | |
| let prefix_len = prefix_len_i32 as usize; | |
| let prefix_len = usize::try_from(self.prefix_lengths[self.current_idx])?; |
A little less verbose. 😄
I notice the added test doesn't cover this possibility.
There was a problem hiding this comment.
Thanks @etseidl for the suggestion — updated to use usize::try_from(...) with proper error mapping.
Also added an additional test test_delta_byte_array_negative_prefix_len_returns_error to cover the negative prefix length case so that path is now exercised as well.
|
run benchmark parquet_round_trip |
|
Hi @etseidl, your benchmark configuration could not be parsed (#9797 (comment)). Error: Supported benchmarks:
Usage: Per-side configuration ( env:
SHARED_SETTING: enabled
baseline:
ref: v45.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
ref: v46.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 2GFile an issue against this benchmark runner |
|
run benchmark parquet_round_trip env:
BENCH_FILTER:read.*delta_byte |
|
Hi @etseidl, your benchmark configuration could not be parsed (#9797 (comment)). Error: Supported benchmarks:
Usage: Per-side configuration ( env:
SHARED_SETTING: enabled
baseline:
ref: v45.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
ref: v46.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 2GFile an issue against this benchmark runner |
|
run benchmark parquet_round_trip env:
BENCH_FILTER: read.*delta_byte |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing delta_byte_array (542824b) to b93240a (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Thank you @etseidl |
542824b to
69fcdef
Compare
Which issue does this PR close?
Rationale for this change
Currently,
DeltaByteArrayDecoder::getassumes prefix lengths are always valid and directly slicesprevious_value. Invalid prefix lengths (negative or exceeding previous value length) can cause a panic instead of returning an error.What changes are included in this PR?
Add validation for decoded prefix lengths:
previous_value.len()Return
Errinstead of panicking on invalid inputAdd a regression test using corrupted encoded data
Are these changes tested?
Yes.
Added
test_delta_byte_array_invalid_prefix_len_returns_errorTest:
Err(previously panicked)All the other existing tests pass
Are there any user-facing changes?
No.