Skip to content

perf: Optimize approx count distinct using bitmaps instead of HLL for smaller int datatypes#21453

Open
coderfender wants to merge 18 commits intoapache:mainfrom
coderfender:bitmap_instead_hll_smaller_int_types
Open

perf: Optimize approx count distinct using bitmaps instead of HLL for smaller int datatypes#21453
coderfender wants to merge 18 commits intoapache:mainfrom
coderfender:bitmap_instead_hll_smaller_int_types

Conversation

@coderfender
Copy link
Copy Markdown
Contributor

@coderfender coderfender commented Apr 8, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@coderfender
Copy link
Copy Markdown
Contributor Author

  │ Type │   Change    │   Verdict    │
  ├──────┼─────────────┼──────────────┤
  │ u8   │ +20% slower │    Regressed │
  ├──────┼─────────────┼──────────────┤
  │ i8   │ +14% slower │    Regressed │
  ├──────┼─────────────┼──────────────┤
  │ u16  │ -40% faster │    Improved! │
  ├──────┼─────────────┼──────────────┤
  │ i16  │ -37% faster │    Improved! │

@github-actions github-actions bot added the functions Changes to functions implementation label Apr 8, 2026
@coderfender
Copy link
Copy Markdown
Contributor Author

Removed bitmap implementation for u8 / i8 but held on to it for u16 / i16

@coderfender
Copy link
Copy Markdown
Contributor Author

Requesting review : @martin-g

@coderfender
Copy link
Copy Markdown
Contributor Author

critcmp results :

u16 / i16 are 2X faster :)

critcmp bitmap main
group                                          bitmap                                 main
-----                                          ------                                 ----
approx_distinct i16 bitmap                     1.00      3.0±0.01µs        ? ?/sec    2.00      5.9±0.08µs        ? ?/sec
approx_distinct i64 80% distinct               1.00      5.6±0.15µs        ? ?/sec    1.03      5.8±0.14µs        ? ?/sec
approx_distinct i64 99% distinct               1.00      5.8±0.12µs        ? ?/sec    1.01      5.8±0.17µs        ? ?/sec
approx_distinct i8 bitmap                      1.00      6.0±0.28µs        ? ?/sec    1.00      5.9±0.06µs        ? ?/sec
approx_distinct u16 bitmap                     1.00      3.0±0.24µs        ? ?/sec    1.93      5.8±0.23µs        ? ?/sec
approx_distinct u8 bitmap                      1.00      5.8±0.08µs        ? ?/sec    1.01      5.8±0.34µs        ? ?/sec
approx_distinct utf8 long 80% distinct         1.00     16.3±0.08µs        ? ?/sec    1.00     16.2±0.49µs        ? ?/sec
approx_distinct utf8 long 99% distinct         1.01     16.3±0.05µs        ? ?/sec    1.00     16.2±0.23µs        ? ?/sec
approx_distinct utf8 short 80% distinct        1.02     11.1±0.14µs        ? ?/sec    1.00     11.0±0.08µs        ? ?/sec
approx_distinct utf8 short 99% distinct        1.01     11.2±0.67µs        ? ?/sec    1.00     11.0±0.48µs        ? ?/sec
approx_distinct utf8view long 80% distinct     1.04     19.7±0.58µs        ? ?/sec    1.00     19.0±0.49µs        ? ?/sec
approx_distinct utf8view long 99% distinct     1.04     19.7±0.35µs        ? ?/sec    1.00     19.0±0.19µs        ? ?/sec
approx_distinct utf8view short 80% distinct    1.00      6.3±0.06µs        ? ?/sec    1.00      6.3±0.40µs        ? ?/sec
approx_distinct utf8view short 99% distinct    1.02      6.4±0.17µs        ? ?/sec    1.00      6.2±0.22µs        ? ?/sec

cc : @neilconway , @Dandandan

@coderfender
Copy link
Copy Markdown
Contributor Author

coderfender commented Apr 8, 2026

It seems like my bitmap setup was suboptimal for u8/i8 . Instead of using [u8;4] I tried not bothering with the dense packing which might cause cache misses and went with [bool:256] . This significantly sped up the operation and now we are at least 2x faster than HLL for smaller integer data types

(Adding critcmp for only relevant data types)

group                                            branch                               main
-----                                            -------                              ----
approx_distinct i16 bitmap                     1.00      3.1±0.23µs        ? ?/sec    1.94      5.9±0.08µs        ? ?/sec
approx_distinct i8 bitmap                      1.00      2.1±0.17µs        ? ?/sec    2.87      5.9±0.06µs        ? ?/sec
approx_distinct u16 bitmap                     1.00      3.0±0.05µs        ? ?/sec    1.95      5.8±0.23µs        ? ?/sec
approx_distinct u8 bitmap                      1.00      2.2±0.18µs        ? ?/sec    2.69      5.8±0.34µs        ? ?/sec

@coderfender coderfender changed the title perf: Bitmap instead hll smaller int types perf: Count distinct bitmap instead hll smaller int types Apr 8, 2026
@coderfender coderfender changed the title perf: Count distinct bitmap instead hll smaller int types perf: approx_distinct bitmap instead hll smaller int types Apr 8, 2026
@coderfender coderfender changed the title perf: approx_distinct bitmap instead hll smaller int types perf: approx_distinct bitmap instead of hll smaller int types Apr 9, 2026
@coderfender coderfender changed the title perf: approx_distinct bitmap instead of hll smaller int types perf: Optimize approx count distinct using bitmaps instead of hashsets for smaller datatypes Apr 14, 2026
github-merge-queue bot pushed a commit that referenced this pull request Apr 16, 2026
…smaller datatypes (#21456)

## Which issue does this PR close?

Remove hashset based accumulators for smaller int data types and use
bitmaps. Follow up of : #21453
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #21488

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #21456 landed, should we be rewriting the expression itself from approx_count_distinct -> count_distinct for these cases to reuse the code? Unless there's some differences I haven't spotted between this PR and the other?

@coderfender
Copy link
Copy Markdown
Contributor Author

Sure. I think we're shortly be able to extract same part gains hut changing wiring rather than implementing similar bit map code

@coderfender
Copy link
Copy Markdown
Contributor Author

Seems like state is binary array while count distinct's accumulator uses vec of ints . Let me patch that and see if the tests pass

@Jefffrey
Copy link
Copy Markdown
Contributor

I was thinking along the lines of utilizing simplify on approx_distinct, like:

    fn simplify(&self) -> Option<AggregateFunctionSimplification> {
        Some(Box::new(|aggregate_function, info| {
            let input_type = info.get_data_type(&aggregate_function.params.args[0])?;
            match input_type {
                DataType::UInt8 | DataType::Int8 | DataType::UInt16 | DataType::Int16 => {
                    let rewritten = Expr::AggregateFunction(AggregateFunction::new_udf(
                        count_udaf(),
                        aggregate_function.params.args,
                        aggregate_function.params.distinct,
                        aggregate_function.params.filter,
                        aggregate_function.params.order_by,
                        aggregate_function.params.null_treatment,
                    ));
                    Ok(rewritten)
                }
                _ => Ok(Expr::AggregateFunction(aggregate_function)),
            }
        }))
    }

Which can avoid the need of a wrapper. Also for the boolean accumulator we could make regular count distinct use that too and include boolean type as part of the simplify call 🤔

@coderfender
Copy link
Copy Markdown
Contributor Author

Ahh I see . I was trying compare benchmarks FYI but let me try and switch to using count distinct then

group                                          after_cold                              before
-----                                          ----------                              ------
approx_distinct i16 bitmap                     1.00      3.0±0.11µs        ? ?/sec     1.92      5.7±0.17µs        ? ?/sec
approx_distinct i64 80% distinct               1.02      5.7±0.04µs        ? ?/sec     1.00      5.6±0.08µs        ? ?/sec
approx_distinct i64 99% distinct               1.01      5.8±0.06µs        ? ?/sec     1.00      5.7±0.21µs        ? ?/sec
approx_distinct i8 bitmap                      1.00   1103.7±5.90ns        ? ?/sec     5.35      5.9±0.24µs        ? ?/sec
approx_distinct u16 bitmap                     1.00      3.9±0.23µs        ? ?/sec     1.51      5.9±0.15µs        ? ?/sec
approx_distinct u8 bitmap                      1.00  1142.4±117.79ns        ? ?/sec    5.00      5.7±0.08µs        ? ?/sec
approx_distinct utf8 long 80% distinct         1.01     16.1±0.32µs        ? ?/sec     1.00     16.0±0.08µs        ? ?/sec
approx_distinct utf8 long 99% distinct         1.00     16.1±0.41µs        ? ?/sec     1.01     16.3±0.51µs        ? ?/sec
approx_distinct utf8 short 80% distinct        1.00     10.9±0.21µs        ? ?/sec     1.01     11.0±0.81µs        ? ?/sec
approx_distinct utf8 short 99% distinct        1.01     11.0±0.06µs        ? ?/sec     1.00     10.9±0.43µs        ? ?/sec
approx_distinct utf8view long 80% distinct     1.00     18.4±0.32µs        ? ?/sec     1.01     18.7±0.51µs        ? ?/sec
approx_distinct utf8view long 99% distinct     1.00     18.5±0.14µs        ? ?/sec     1.01     18.6±0.33µs        ? ?/sec
approx_distinct utf8view short 80% distinct    1.02      6.2±0.10µs        ? ?/sec     1.00      6.0±0.39µs        ? ?/sec
approx_distinct utf8view short 99% distinct    1.07      6.4±0.25µs        ? ?/sec     1.00      6.0±0.15µs        ? ?/sec

@coderfender
Copy link
Copy Markdown
Contributor Author

@Jefffrey , the catch between approx_distinct and count_distinct is that the former returns an unsigned int while the latter returns a signed one (hence the need for my wrapper) . As much as this does not make a difference, I wanted to tackle that problem in a separate PR and go ahead with this wrapper approach. Using simplify means that I would have to try and cast to signed ints which shouldnt add too much overhead though . Let me push a commit and bench it

@Jefffrey
Copy link
Copy Markdown
Contributor

That's a good point; I believe the simplify API for UDAFs don't allow introducing a cast to the final result, so I guess it rules out that approach. It does make me wonder why approx_distinct and count_distinct return different types but I suppose that is a separate issue.

@Jefffrey
Copy link
Copy Markdown
Contributor

run benchmark approx_distinct

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4273217532-1454-n59m9 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing bitmap_instead_hll_smaller_int_types (e0f2aa9) to afc0784 (merge-base) diff
BENCH_NAME=approx_distinct
BENCH_COMMAND=cargo bench --features=parquet --bench approx_distinct
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                          bitmap_instead_hll_smaller_int_types    main
-----                                          ------------------------------------    ----
approx_distinct i16 bitmap                     1.00      5.9±0.00µs        ? ?/sec     2.16     12.8±0.01µs        ? ?/sec
approx_distinct i64 80% distinct               1.00     12.8±0.01µs        ? ?/sec     1.00     12.8±0.01µs        ? ?/sec
approx_distinct i64 99% distinct               1.00     12.8±0.01µs        ? ?/sec     1.00     12.8±0.01µs        ? ?/sec
approx_distinct i8 bitmap                      1.00      4.3±0.02µs        ? ?/sec     2.96     12.8±0.01µs        ? ?/sec
approx_distinct u16 bitmap                     1.00      5.9±0.00µs        ? ?/sec     2.17     12.8±0.06µs        ? ?/sec
approx_distinct u8 bitmap                      1.00      4.3±0.01µs        ? ?/sec     2.96     12.8±0.01µs        ? ?/sec
approx_distinct utf8 long 80% distinct         1.00     39.4±0.03µs        ? ?/sec     1.00     39.4±0.05µs        ? ?/sec
approx_distinct utf8 long 99% distinct         1.01     39.0±0.02µs        ? ?/sec     1.00     38.7±0.03µs        ? ?/sec
approx_distinct utf8 short 80% distinct        1.00     25.6±0.03µs        ? ?/sec     1.00     25.5±0.04µs        ? ?/sec
approx_distinct utf8 short 99% distinct        1.00     25.7±0.02µs        ? ?/sec     1.00     25.7±0.04µs        ? ?/sec
approx_distinct utf8view long 80% distinct     1.00     46.6±0.02µs        ? ?/sec     1.01     46.9±0.02µs        ? ?/sec
approx_distinct utf8view long 99% distinct     1.00     46.7±0.02µs        ? ?/sec     1.01     46.9±0.07µs        ? ?/sec
approx_distinct utf8view short 80% distinct    1.00     14.0±0.01µs        ? ?/sec     1.00     14.0±0.01µs        ? ?/sec
approx_distinct utf8view short 99% distinct    1.00     14.0±0.01µs        ? ?/sec     1.00     14.0±0.01µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 127.0s
Peak memory 3.5 GiB
Avg memory 3.5 GiB
CPU user 162.4s
CPU sys 0.9s
Peak spill 0 B

branch

Metric Value
Wall time 127.5s
Peak memory 3.5 GiB
Avg memory 3.5 GiB
CPU user 163.4s
CPU sys 0.3s
Peak spill 0 B

File an issue against this benchmark runner

Copy link
Copy Markdown
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes for u8/i8/u16/i16 look good, benchmarks look good too. Just some remarks on the boolean related code here.

Comment thread datafusion/functions-aggregate/src/approx_distinct.rs Outdated
Comment thread datafusion/functions-aggregate/src/approx_distinct.rs Outdated
Comment thread datafusion/functions-aggregate/src/approx_distinct.rs
@Jefffrey
Copy link
Copy Markdown
Contributor

run benchmark approx_distinct

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4274941352-1488-9p9lx 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing bitmap_instead_hll_smaller_int_types (d2b57e5) to 1fbbba5 (merge-base) diff
BENCH_NAME=approx_distinct
BENCH_COMMAND=cargo bench --features=parquet --bench approx_distinct
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                          bitmap_instead_hll_smaller_int_types    main
-----                                          ------------------------------------    ----
approx_distinct i16 bitmap                     1.00      5.9±0.01µs        ? ?/sec     2.16     12.8±0.01µs        ? ?/sec
approx_distinct i64 80% distinct               1.00     12.8±0.01µs        ? ?/sec     1.00     12.8±0.01µs        ? ?/sec
approx_distinct i64 99% distinct               1.00     12.8±0.01µs        ? ?/sec     1.00     12.8±0.03µs        ? ?/sec
approx_distinct i8 bitmap                      1.00      4.3±0.03µs        ? ?/sec     2.97     12.8±0.00µs        ? ?/sec
approx_distinct u16 bitmap                     1.00      5.9±0.00µs        ? ?/sec     2.18     12.9±0.06µs        ? ?/sec
approx_distinct u8 bitmap                      1.00      4.3±0.00µs        ? ?/sec     2.97     12.8±0.01µs        ? ?/sec
approx_distinct utf8 long 80% distinct         1.01     39.1±0.03µs        ? ?/sec     1.00     38.6±0.01µs        ? ?/sec
approx_distinct utf8 long 99% distinct         1.00     39.1±0.02µs        ? ?/sec     1.00     39.1±0.02µs        ? ?/sec
approx_distinct utf8 short 80% distinct        1.00     25.7±0.02µs        ? ?/sec     1.00     25.7±0.03µs        ? ?/sec
approx_distinct utf8 short 99% distinct        1.00     25.5±0.02µs        ? ?/sec     1.00     25.6±0.05µs        ? ?/sec
approx_distinct utf8view long 80% distinct     1.01     47.0±0.04µs        ? ?/sec     1.00     46.7±0.05µs        ? ?/sec
approx_distinct utf8view long 99% distinct     1.00     47.0±0.05µs        ? ?/sec     1.00     46.9±0.07µs        ? ?/sec
approx_distinct utf8view short 80% distinct    1.00     14.0±0.01µs        ? ?/sec     1.00     14.0±0.01µs        ? ?/sec
approx_distinct utf8view short 99% distinct    1.00     14.0±0.01µs        ? ?/sec     1.01     14.2±0.01µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 126.8s
Peak memory 3.5 GiB
Avg memory 3.5 GiB
CPU user 162.3s
CPU sys 0.9s
Peak spill 0 B

branch

Metric Value
Wall time 127.3s
Peak memory 3.5 GiB
Avg memory 3.5 GiB
CPU user 163.4s
CPU sys 0.3s
Peak spill 0 B

File an issue against this benchmark runner

@coderfender coderfender changed the title perf: Optimize approx count distinct using bitmaps instead of hashsets for smaller datatypes perf: Optimize approx count distinct using bitmaps instead of HLL for smaller datatypes Apr 19, 2026
@coderfender coderfender changed the title perf: Optimize approx count distinct using bitmaps instead of HLL for smaller datatypes perf: Optimize approx count distinct using bitmaps instead of HLL for smaller int datatypes Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

approx_distinct should be leveraging bitmap for counting u8/16 and i8/16

3 participants