perf: Optimize approx count distinct using bitmaps instead of HLL for smaller int datatypes#21453
perf: Optimize approx count distinct using bitmaps instead of HLL for smaller int datatypes#21453coderfender wants to merge 18 commits intoapache:mainfrom
Conversation
|
|
Removed bitmap implementation for u8 / i8 but held on to it for u16 / i16 |
|
Requesting review : @martin-g |
|
u16 / i16 are cc : @neilconway , @Dandandan |
|
It seems like my bitmap setup was suboptimal for (Adding |
…smaller datatypes (#21456) ## Which issue does this PR close? Remove hashset based accumulators for smaller int data types and use bitmaps. Follow up of : #21453 <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #21488 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Jefffrey
left a comment
There was a problem hiding this comment.
With #21456 landed, should we be rewriting the expression itself from approx_count_distinct -> count_distinct for these cases to reuse the code? Unless there's some differences I haven't spotted between this PR and the other?
|
Sure. I think we're shortly be able to extract same part gains hut changing wiring rather than implementing similar bit map code |
|
Seems like state is binary array while count distinct's accumulator uses vec of ints . Let me patch that and see if the tests pass |
|
I was thinking along the lines of utilizing fn simplify(&self) -> Option<AggregateFunctionSimplification> {
Some(Box::new(|aggregate_function, info| {
let input_type = info.get_data_type(&aggregate_function.params.args[0])?;
match input_type {
DataType::UInt8 | DataType::Int8 | DataType::UInt16 | DataType::Int16 => {
let rewritten = Expr::AggregateFunction(AggregateFunction::new_udf(
count_udaf(),
aggregate_function.params.args,
aggregate_function.params.distinct,
aggregate_function.params.filter,
aggregate_function.params.order_by,
aggregate_function.params.null_treatment,
));
Ok(rewritten)
}
_ => Ok(Expr::AggregateFunction(aggregate_function)),
}
}))
}Which can avoid the need of a wrapper. Also for the boolean accumulator we could make regular count distinct use that too and include boolean type as part of the simplify call 🤔 |
|
Ahh I see . I was trying compare benchmarks FYI but let me try and switch to using count distinct then |
|
@Jefffrey , the catch between |
|
That's a good point; I believe the simplify API for UDAFs don't allow introducing a cast to the final result, so I guess it rules out that approach. It does make me wonder why |
|
run benchmark approx_distinct |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing bitmap_instead_hll_smaller_int_types (e0f2aa9) to afc0784 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Jefffrey
left a comment
There was a problem hiding this comment.
The changes for u8/i8/u16/i16 look good, benchmarks look good too. Just some remarks on the boolean related code here.
|
run benchmark approx_distinct |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing bitmap_instead_hll_smaller_int_types (d2b57e5) to 1fbbba5 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Which issue does this PR close?
approx_distinctshould be leveraging bitmap for counting u8/16 and i8/16 #1109Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?