perf: Optimize approx count distinct using bitmaps instead of HLL for smaller int datatypes by coderfender · Pull Request #21453 · apache/datafusion

coderfender · 2026-04-08T06:37:38Z

Which issue does this PR close?

Closes approx_distinct should be leveraging bitmap for counting u8/16 and i8/16 #1109

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

coderfender · 2026-04-08T06:42:33Z

  │ Type │   Change    │   Verdict    │
  ├──────┼─────────────┼──────────────┤
  │ u8   │ +20% slower │    Regressed │
  ├──────┼─────────────┼──────────────┤
  │ i8   │ +14% slower │    Regressed │
  ├──────┼─────────────┼──────────────┤
  │ u16  │ -40% faster │    Improved! │
  ├──────┼─────────────┼──────────────┤
  │ i16  │ -37% faster │    Improved! │

coderfender · 2026-04-08T07:01:36Z

Removed bitmap implementation for u8 / i8 but held on to it for u16 / i16

coderfender · 2026-04-08T07:15:49Z

Requesting review : @martin-g

coderfender · 2026-04-08T07:27:34Z

critcmp results :

u16 / i16 are 2X faster :)

critcmp bitmap main
group                                          bitmap                                 main
-----                                          ------                                 ----
approx_distinct i16 bitmap                     1.00      3.0±0.01µs        ? ?/sec    2.00      5.9±0.08µs        ? ?/sec
approx_distinct i64 80% distinct               1.00      5.6±0.15µs        ? ?/sec    1.03      5.8±0.14µs        ? ?/sec
approx_distinct i64 99% distinct               1.00      5.8±0.12µs        ? ?/sec    1.01      5.8±0.17µs        ? ?/sec
approx_distinct i8 bitmap                      1.00      6.0±0.28µs        ? ?/sec    1.00      5.9±0.06µs        ? ?/sec
approx_distinct u16 bitmap                     1.00      3.0±0.24µs        ? ?/sec    1.93      5.8±0.23µs        ? ?/sec
approx_distinct u8 bitmap                      1.00      5.8±0.08µs        ? ?/sec    1.01      5.8±0.34µs        ? ?/sec
approx_distinct utf8 long 80% distinct         1.00     16.3±0.08µs        ? ?/sec    1.00     16.2±0.49µs        ? ?/sec
approx_distinct utf8 long 99% distinct         1.01     16.3±0.05µs        ? ?/sec    1.00     16.2±0.23µs        ? ?/sec
approx_distinct utf8 short 80% distinct        1.02     11.1±0.14µs        ? ?/sec    1.00     11.0±0.08µs        ? ?/sec
approx_distinct utf8 short 99% distinct        1.01     11.2±0.67µs        ? ?/sec    1.00     11.0±0.48µs        ? ?/sec
approx_distinct utf8view long 80% distinct     1.04     19.7±0.58µs        ? ?/sec    1.00     19.0±0.49µs        ? ?/sec
approx_distinct utf8view long 99% distinct     1.04     19.7±0.35µs        ? ?/sec    1.00     19.0±0.19µs        ? ?/sec
approx_distinct utf8view short 80% distinct    1.00      6.3±0.06µs        ? ?/sec    1.00      6.3±0.40µs        ? ?/sec
approx_distinct utf8view short 99% distinct    1.02      6.4±0.17µs        ? ?/sec    1.00      6.2±0.22µs        ? ?/sec

cc : @neilconway , @Dandandan

coderfender · 2026-04-08T07:51:52Z

It seems like my bitmap setup was suboptimal for u8/i8 . Instead of using [u8;4] I tried not bothering with the dense packing which might cause cache misses and went with [bool:256] . This significantly sped up the operation and now we are at least 2x faster than HLL for smaller integer data types

(Adding critcmp for only relevant data types)

group                                            branch                               main
-----                                            -------                              ----
approx_distinct i16 bitmap                     1.00      3.1±0.23µs        ? ?/sec    1.94      5.9±0.08µs        ? ?/sec
approx_distinct i8 bitmap                      1.00      2.1±0.17µs        ? ?/sec    2.87      5.9±0.06µs        ? ?/sec
approx_distinct u16 bitmap                     1.00      3.0±0.05µs        ? ?/sec    1.95      5.8±0.23µs        ? ?/sec
approx_distinct u8 bitmap                      1.00      2.2±0.18µs        ? ?/sec    2.69      5.8±0.34µs        ? ?/sec

…smaller datatypes (#21456) ## Which issue does this PR close? Remove hashset based accumulators for smaller int data types and use bitmaps. Follow up of : #21453  - Closes #21488 ## Rationale for this change  ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?   --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Jefffrey

With #21456 landed, should we be rewriting the expression itself from approx_count_distinct -> count_distinct for these cases to reuse the code? Unless there's some differences I haven't spotted between this PR and the other?

coderfender · 2026-04-17T14:00:07Z

Sure. I think we're shortly be able to extract same part gains hut changing wiring rather than implementing similar bit map code

coderfender · 2026-04-18T01:36:59Z

Seems like state is binary array while count distinct's accumulator uses vec of ints . Let me patch that and see if the tests pass

Jefffrey · 2026-04-18T06:02:26Z

I was thinking along the lines of utilizing simplify on approx_distinct, like:

    fn simplify(&self) -> Option<AggregateFunctionSimplification> {
        Some(Box::new(|aggregate_function, info| {
            let input_type = info.get_data_type(&aggregate_function.params.args[0])?;
            match input_type {
                DataType::UInt8 | DataType::Int8 | DataType::UInt16 | DataType::Int16 => {
                    let rewritten = Expr::AggregateFunction(AggregateFunction::new_udf(
                        count_udaf(),
                        aggregate_function.params.args,
                        aggregate_function.params.distinct,
                        aggregate_function.params.filter,
                        aggregate_function.params.order_by,
                        aggregate_function.params.null_treatment,
                    ));
                    Ok(rewritten)
                }
                _ => Ok(Expr::AggregateFunction(aggregate_function)),
            }
        }))
    }

Which can avoid the need of a wrapper. Also for the boolean accumulator we could make regular count distinct use that too and include boolean type as part of the simplify call 🤔

coderfender · 2026-04-18T06:16:52Z

Ahh I see . I was trying compare benchmarks FYI but let me try and switch to using count distinct then

group                                          after_cold                              before
-----                                          ----------                              ------
approx_distinct i16 bitmap                     1.00      3.0±0.11µs        ? ?/sec     1.92      5.7±0.17µs        ? ?/sec
approx_distinct i64 80% distinct               1.02      5.7±0.04µs        ? ?/sec     1.00      5.6±0.08µs        ? ?/sec
approx_distinct i64 99% distinct               1.01      5.8±0.06µs        ? ?/sec     1.00      5.7±0.21µs        ? ?/sec
approx_distinct i8 bitmap                      1.00   1103.7±5.90ns        ? ?/sec     5.35      5.9±0.24µs        ? ?/sec
approx_distinct u16 bitmap                     1.00      3.9±0.23µs        ? ?/sec     1.51      5.9±0.15µs        ? ?/sec
approx_distinct u8 bitmap                      1.00  1142.4±117.79ns        ? ?/sec    5.00      5.7±0.08µs        ? ?/sec
approx_distinct utf8 long 80% distinct         1.01     16.1±0.32µs        ? ?/sec     1.00     16.0±0.08µs        ? ?/sec
approx_distinct utf8 long 99% distinct         1.00     16.1±0.41µs        ? ?/sec     1.01     16.3±0.51µs        ? ?/sec
approx_distinct utf8 short 80% distinct        1.00     10.9±0.21µs        ? ?/sec     1.01     11.0±0.81µs        ? ?/sec
approx_distinct utf8 short 99% distinct        1.01     11.0±0.06µs        ? ?/sec     1.00     10.9±0.43µs        ? ?/sec
approx_distinct utf8view long 80% distinct     1.00     18.4±0.32µs        ? ?/sec     1.01     18.7±0.51µs        ? ?/sec
approx_distinct utf8view long 99% distinct     1.00     18.5±0.14µs        ? ?/sec     1.01     18.6±0.33µs        ? ?/sec
approx_distinct utf8view short 80% distinct    1.02      6.2±0.10µs        ? ?/sec     1.00      6.0±0.39µs        ? ?/sec
approx_distinct utf8view short 99% distinct    1.07      6.4±0.25µs        ? ?/sec     1.00      6.0±0.15µs        ? ?/sec

coderfender · 2026-04-18T06:21:24Z

@Jefffrey , the catch between approx_distinct and count_distinct is that the former returns an unsigned int while the latter returns a signed one (hence the need for my wrapper) . As much as this does not make a difference, I wanted to tackle that problem in a separate PR and go ahead with this wrapper approach. Using simplify means that I would have to try and cast to signed ints which shouldnt add too much overhead though . Let me push a commit and bench it

Jefffrey · 2026-04-18T08:17:30Z

That's a good point; I believe the simplify API for UDAFs don't allow introducing a cast to the final result, so I guess it rules out that approach. It does make me wonder why approx_distinct and count_distinct return different types but I suppose that is a separate issue.

Jefffrey · 2026-04-18T08:18:20Z

run benchmark approx_distinct

adriangbot · 2026-04-18T08:20:53Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4273217532-1454-n59m9 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing bitmap_instead_hll_smaller_int_types (e0f2aa9) to afc0784 (merge-base) diff
BENCH_NAME=approx_distinct
BENCH_COMMAND=cargo bench --features=parquet --bench approx_distinct
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-18T08:29:10Z

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                          bitmap_instead_hll_smaller_int_types    main
-----                                          ------------------------------------    ----
approx_distinct i16 bitmap                     1.00      5.9±0.00µs        ? ?/sec     2.16     12.8±0.01µs        ? ?/sec
approx_distinct i64 80% distinct               1.00     12.8±0.01µs        ? ?/sec     1.00     12.8±0.01µs        ? ?/sec
approx_distinct i64 99% distinct               1.00     12.8±0.01µs        ? ?/sec     1.00     12.8±0.01µs        ? ?/sec
approx_distinct i8 bitmap                      1.00      4.3±0.02µs        ? ?/sec     2.96     12.8±0.01µs        ? ?/sec
approx_distinct u16 bitmap                     1.00      5.9±0.00µs        ? ?/sec     2.17     12.8±0.06µs        ? ?/sec
approx_distinct u8 bitmap                      1.00      4.3±0.01µs        ? ?/sec     2.96     12.8±0.01µs        ? ?/sec
approx_distinct utf8 long 80% distinct         1.00     39.4±0.03µs        ? ?/sec     1.00     39.4±0.05µs        ? ?/sec
approx_distinct utf8 long 99% distinct         1.01     39.0±0.02µs        ? ?/sec     1.00     38.7±0.03µs        ? ?/sec
approx_distinct utf8 short 80% distinct        1.00     25.6±0.03µs        ? ?/sec     1.00     25.5±0.04µs        ? ?/sec
approx_distinct utf8 short 99% distinct        1.00     25.7±0.02µs        ? ?/sec     1.00     25.7±0.04µs        ? ?/sec
approx_distinct utf8view long 80% distinct     1.00     46.6±0.02µs        ? ?/sec     1.01     46.9±0.02µs        ? ?/sec
approx_distinct utf8view long 99% distinct     1.00     46.7±0.02µs        ? ?/sec     1.01     46.9±0.07µs        ? ?/sec
approx_distinct utf8view short 80% distinct    1.00     14.0±0.01µs        ? ?/sec     1.00     14.0±0.01µs        ? ?/sec
approx_distinct utf8view short 99% distinct    1.00     14.0±0.01µs        ? ?/sec     1.00     14.0±0.01µs        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	127.0s
Peak memory	3.5 GiB
Avg memory	3.5 GiB
CPU user	162.4s
CPU sys	0.9s
Peak spill	0 B

branch

Metric	Value
Wall time	127.5s
Peak memory	3.5 GiB
Avg memory	3.5 GiB
CPU user	163.4s
CPU sys	0.3s
Peak spill	0 B

File an issue against this benchmark runner

Jefffrey

The changes for u8/i8/u16/i16 look good, benchmarks look good too. Just some remarks on the boolean related code here.

Jefffrey · 2026-04-19T01:36:57Z

run benchmark approx_distinct

adriangbot · 2026-04-19T01:38:55Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4274941352-1488-9p9lx 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing bitmap_instead_hll_smaller_int_types (d2b57e5) to 1fbbba5 (merge-base) diff
BENCH_NAME=approx_distinct
BENCH_COMMAND=cargo bench --features=parquet --bench approx_distinct
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-19T01:45:31Z

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                          bitmap_instead_hll_smaller_int_types    main
-----                                          ------------------------------------    ----
approx_distinct i16 bitmap                     1.00      5.9±0.01µs        ? ?/sec     2.16     12.8±0.01µs        ? ?/sec
approx_distinct i64 80% distinct               1.00     12.8±0.01µs        ? ?/sec     1.00     12.8±0.01µs        ? ?/sec
approx_distinct i64 99% distinct               1.00     12.8±0.01µs        ? ?/sec     1.00     12.8±0.03µs        ? ?/sec
approx_distinct i8 bitmap                      1.00      4.3±0.03µs        ? ?/sec     2.97     12.8±0.00µs        ? ?/sec
approx_distinct u16 bitmap                     1.00      5.9±0.00µs        ? ?/sec     2.18     12.9±0.06µs        ? ?/sec
approx_distinct u8 bitmap                      1.00      4.3±0.00µs        ? ?/sec     2.97     12.8±0.01µs        ? ?/sec
approx_distinct utf8 long 80% distinct         1.01     39.1±0.03µs        ? ?/sec     1.00     38.6±0.01µs        ? ?/sec
approx_distinct utf8 long 99% distinct         1.00     39.1±0.02µs        ? ?/sec     1.00     39.1±0.02µs        ? ?/sec
approx_distinct utf8 short 80% distinct        1.00     25.7±0.02µs        ? ?/sec     1.00     25.7±0.03µs        ? ?/sec
approx_distinct utf8 short 99% distinct        1.00     25.5±0.02µs        ? ?/sec     1.00     25.6±0.05µs        ? ?/sec
approx_distinct utf8view long 80% distinct     1.01     47.0±0.04µs        ? ?/sec     1.00     46.7±0.05µs        ? ?/sec
approx_distinct utf8view long 99% distinct     1.00     47.0±0.05µs        ? ?/sec     1.00     46.9±0.07µs        ? ?/sec
approx_distinct utf8view short 80% distinct    1.00     14.0±0.01µs        ? ?/sec     1.00     14.0±0.01µs        ? ?/sec
approx_distinct utf8view short 99% distinct    1.00     14.0±0.01µs        ? ?/sec     1.01     14.2±0.01µs        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	126.8s
Peak memory	3.5 GiB
Avg memory	3.5 GiB
CPU user	162.3s
CPU sys	0.9s
Peak spill	0 B

branch

Metric	Value
Wall time	127.3s
Peak memory	3.5 GiB
Avg memory	3.5 GiB
CPU user	163.4s
CPU sys	0.3s
Peak spill	0 B

File an issue against this benchmark runner

coderfender added 3 commits April 7, 2026 22:28

bitmap_smaller_datatypes

7601623

bitmap_smaller_datatypes

b320233

bitmap_instead_of_hll_smaller_datatypes

ecee8eb

bitmap_instead_of_hll_smaller_datatypes

47bb82e

github-actions bot added the functions Changes to functions implementation label Apr 8, 2026

bitmap_instead_of_hll_smaller_datatypes

2291af5

coderfender mentioned this pull request Apr 8, 2026

perf : Optimize count distinct using bitmaps instead of hashsets for smaller datatypes #21456

Merged

Merge branch 'main' into bitmap_instead_hll_smaller_int_types

d987292

coderfender changed the title ~~perf: Bitmap instead hll smaller int types~~ perf: Count distinct bitmap instead hll smaller int types Apr 8, 2026

coderfender changed the title ~~perf: Count distinct bitmap instead hll smaller int types~~ perf: approx_distinct bitmap instead hll smaller int types Apr 8, 2026

coderfender and others added 2 commits April 8, 2026 13:11

init_fmt_check

b03262b

Merge branch 'main' into bitmap_instead_hll_smaller_int_types

953a783

coderfender changed the title ~~perf: approx_distinct bitmap instead hll smaller int types~~ perf: approx_distinct bitmap instead of hll smaller int types Apr 9, 2026

coderfender added 2 commits April 10, 2026 09:44

Merge branch 'main' into bitmap_instead_hll_smaller_int_types

e85ee77

Merge branch 'main' into bitmap_instead_hll_smaller_int_types

ea30877

coderfender changed the title ~~perf: approx_distinct bitmap instead of hll smaller int types~~ perf: Optimize approx count distinct using bitmaps instead of hashsets for smaller datatypes Apr 14, 2026

coderfender added 2 commits April 14, 2026 13:18

Merge branch 'main' into bitmap_instead_hll_smaller_int_types

5e6d5e3

Merge branch 'main' into bitmap_instead_hll_smaller_int_types

87aa8ed

Jefffrey reviewed Apr 17, 2026

View reviewed changes

coderfender and others added 2 commits April 17, 2026 14:32

Merge branch 'main' into bitmap_instead_hll_smaller_int_types

caed00a

use_count_distinct_bitmap_instead_of_dupe_code

8a95f3f

update_state_fields

6345248

update_state_fields

e0f2aa9

Jefffrey reviewed Apr 18, 2026

View reviewed changes

Comment thread datafusion/functions-aggregate/src/approx_distinct.rs Outdated

Comment thread datafusion/functions-aggregate/src/approx_distinct.rs Outdated

coderfender and others added 2 commits April 18, 2026 08:46

remove_bool_accumulator

c16592d

Merge branch 'main' into bitmap_instead_hll_smaller_int_types

d2b57e5

Jefffrey approved these changes Apr 19, 2026

View reviewed changes

Comment thread datafusion/functions-aggregate/src/approx_distinct.rs

coderfender changed the title ~~perf: Optimize approx count distinct using bitmaps instead of hashsets for smaller datatypes~~ perf: Optimize approx count distinct using bitmaps instead of HLL for smaller datatypes Apr 19, 2026

coderfender changed the title ~~perf: Optimize approx count distinct using bitmaps instead of HLL for smaller datatypes~~ perf: Optimize approx count distinct using bitmaps instead of HLL for smaller int datatypes Apr 19, 2026

Conversation

coderfender commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

coderfender commented Apr 8, 2026

Uh oh!

coderfender commented Apr 8, 2026

Uh oh!

coderfender commented Apr 8, 2026

Uh oh!

coderfender commented Apr 8, 2026

Uh oh!

coderfender commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

coderfender commented Apr 17, 2026

Uh oh!

coderfender commented Apr 18, 2026

Uh oh!

Jefffrey commented Apr 18, 2026

Uh oh!

coderfender commented Apr 18, 2026

Uh oh!

coderfender commented Apr 18, 2026

Uh oh!

Jefffrey commented Apr 18, 2026

Uh oh!

Jefffrey commented Apr 18, 2026

Uh oh!

adriangbot commented Apr 18, 2026

Uh oh!

adriangbot commented Apr 18, 2026

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jefffrey commented Apr 19, 2026

Uh oh!

adriangbot commented Apr 19, 2026

Uh oh!

adriangbot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderfender commented Apr 8, 2026 •

edited

Loading

coderfender commented Apr 8, 2026 •

edited

Loading