perf: Implement groups accumulator count distinct primitive types#21561
perf: Implement groups accumulator count distinct primitive types#21561coderfender wants to merge 20 commits intoapache:mainfrom
Conversation
|
Requesting to run benchmarks TPCH , TPCDS , clickbench etc to see how groups accumulator impl performs in count(distinct) use case |
|
run benchmark tpch tpcds clickbench_partitiomed clickbemch_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (f0b2a4a) to 16e578d (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (f0b2a4a) to 16e578d (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (f0b2a4a) to 16e578d (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (f0b2a4a) to 16e578d (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
1 similar comment
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
5692893 to
ee0c865
Compare
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
@Dandandan could you retrigger the benchmarks again please? |
|
run benchmark tpch tpcds clickbench_partitioned clickbench_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (ee0c865) to 16e578d (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (ee0c865) to 16e578d (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (ee0c865) to 16e578d (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (ee0c865) to 16e578d (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
File an issue against this benchmark runner |
5b7f593 to
067a6b6
Compare
|
Comparison with main |
2aa6549 to
cbfee35
Compare
cbfee35 to
c2bc9d3
Compare
|
@Dandandan , could you re run the benchmarks please ? I do see a consistent speedup with Q9 and seems like this PR is ready for review |
|
run benchmark clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (47110a8) to 882be98 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
Thank you for the approval @Dandandan . but it doesn't seem like the benchmarks ran correctly. Could you please trigger it again and perhaps even toch tpcds and other relevant queries please? |
| offsets.push(all_values.len() as i32); | ||
| } | ||
|
|
||
| let values_array = Arc::new(PrimitiveArray::<T>::from_iter_values(all_values)); |
There was a problem hiding this comment.
| let values_array = Arc::new(PrimitiveArray::<T>::from_iter_values(all_values)); | |
| let values_array = Arc::new(PrimitiveArray::<T>::new(ScalarBuffer::from(all_values), None)); |
There was a problem hiding this comment.
This is faster as well @coderfender (avoids a copy).
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (066504d) to 5a427cb (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (066504d) to 5a427cb (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (066504d) to 5a427cb (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
@Dandandan , pushed both optimizations. |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (617e790) to 5a427cb (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (617e790) to 5a427cb (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (617e790) to 5a427cb (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
Results seem to be faster now |
Which issue does this PR close?
Evaluate perf with group accumulators for count distinct
Related : #21575 benchmark PR
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?