Skip to content

refactor: Split hash aggregation logic into separated streams#22729

Open
2010YOUY01 wants to merge 5 commits into
apache:mainfrom
2010YOUY01:split-aggr-refactor-only
Open

refactor: Split hash aggregation logic into separated streams#22729
2010YOUY01 wants to merge 5 commits into
apache:mainfrom
2010YOUY01:split-aggr-refactor-only

Conversation

@2010YOUY01
Copy link
Copy Markdown
Contributor

@2010YOUY01 2010YOUY01 commented Jun 3, 2026

Which issue does this PR close?

Rationale for this change

See issues.

This PR split out partial and final aggregate strem from GroupsHashAggregateStream

To fully migrate hash aggregation, we have to

Todo in this PR:

  • Add a temporary configuration enable_migration_aggregate to turn off this path

Since it should be a regression if the above features are not added, it also helps if to prevent potential regressions from the migration of other aggregate streams.

What changes are included in this PR?

Split out the streams from GroupsHashAggregateStream

  1. Partial stage of hash aggregation
  2. Final stage of hash aggregation

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label Jun 3, 2026
impl Stream for PartialFinalHashAggregateStream {
type Item = Result<RecordBatch>;

fn poll_next(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state machines are identical for now, but in follow-up work, such as skipping partial aggregation for high-cardinality inputs, their control flows will diverge. I think separating them improves clarity, as discussed in #22710.

Some duplication is inevitable, but that is the trade-off.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given these are two separate structs, what is the rationale for keepign them in the same module?

datafusion/physical-plan/src/aggregates/hash_aggregate.rs

Perhaps it would be clearer if we put them in their own modules (could be a follow on PR)

like

datafusion/physical-plan/src/aggregates/streams/partial_final.rs
datafusion/physical-plan/src/aggregates/streams/initial_partial.rs

@2010YOUY01
Copy link
Copy Markdown
Contributor Author

cc @Dandandan @ariel-miculas @alamb, who have expressed interest before.

@ariel-miculas
Copy link
Copy Markdown
Contributor

I'm curious about the high-level vision: is the plan to close #15591 in favor of this new approach?

I would like the redesign of hash aggregation to take into account the memory constraints imposed by the finite memory pool, i.e. how does the implementation perform under OOM conditions.

Otherwise we'll end up with the same issues that exist now. E.g. EmitTo::First(n) wasn't designed for emitting a large portion of the existing groups, so it over-allocated when used for emitting early in partial aggregation OOM case.

@2010YOUY01
Copy link
Copy Markdown
Contributor Author

I'm curious about the high-level vision: is the plan to close #15591 in favor of this new approach?

Yes, the goal is to support blocked state management.

The existing challenge is that the current implementation is hard to extend and review. I want to clean things up through this refactor first, and then apply the actual change.

I would like the redesign of hash aggregation to take into account the memory constraints imposed by the finite memory pool, i.e. how does the implementation perform under OOM conditions.

Otherwise we'll end up with the same issues that exist now. E.g. EmitTo::First(n) wasn't designed for emitting a large portion of the existing groups, so it over-allocated when used for emitting early in partial aggregation OOM case.

All of these issues are symptoms of managing state in a large contiguous Vec. Blocked memory allocation should address them naturally.

@Rachelint
Copy link
Copy Markdown
Contributor

Rachelint commented Jun 4, 2026

The existing challenge is that the current implementation is hard to extend and review. I want to clean things up through this refactor first, and then apply the actual change.

Current one seems the refactor of original row_hash.rs?

But in #15591, code changes in row_hash.rs actually very few(154 lines), and main code changes are blocked version implementation of accumulators and group values, and test codes(3000+ lines).

I totally agree with we should refactor code in row_hash.rs. It is very very messy and hard to maintain as I complained before, due to we mix the partial and final logic in a single stream.
But seems it may not help very much to reduce codes about supporting blocked memory management?

I think maybe #15591 and refactor can be pushed forward in parallel, they don't have very much in common.
As said above only few codes are for supporting blocked mode in the aggr stream, large codes are for blocked version accumulators and group values.

@2010YOUY01
Copy link
Copy Markdown
Contributor Author

But seems it may not help very much to reduce codes about supporting blocked memory management?

That's true, the actual code change is about the same. The major thing get reduced is the cognitive overhead when maintaining this feature, I'll try to explain better:

I think the real pain point is:

  1. To review and merge Intermediate result blocked approach to aggregation memory management #15591, reviewers need to understand how the actual blocked state change interacts with row_hash.rs.
  2. row_hash.rs is way too complex, so it is hard for reviewers to see the boundary between the actual blocked state-management changes and the existing hash-aggregation state machine.

The purpose of this refactor is to make that boundary explicit. With this split, the key blocked state-management changes (Accumulator + GroupValues) should only interact with the new dedicated stream, rather than being wired into the complex state machine in row_hash.rs.

I think maybe #15591 and refactor can be pushed forward in parallel, they don't have very much in common.

Yes I agree. I was thinking maybe #15591 can be updated after this refactor, if we can ensure the blocked Accumulator and GroupValues can only get planned into the new aggregate streams in this refactor, and the old row_hash.rs won't get affected, that change would be easier to review.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @2010YOUY01 -- this looks like a good idea, but I am slightly confused

  1. The PR says it "splits" logic but I actually see two entirely new streams (and no reduction / refactoring of the existing streams)
  2. I am not sure how the two new streams are related to the existing stream

I am also worried about so much new code (I realize that some duplication is inevitable, but this seems like it adds 2 new additional copies)

Also, the PR descriptio says

Todo in this PR:

Add a temporary configuration enable_migration_aggregate to turn off this path

But it is unchecked -- do you still plan to do that?

impl Stream for PartialFinalHashAggregateStream {
type Item = Result<RecordBatch>;

fn poll_next(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given these are two separate structs, what is the rationale for keepign them in the same module?

datafusion/physical-plan/src/aggregates/hash_aggregate.rs

Perhaps it would be clearer if we put them in their own modules (could be a follow on PR)

like

datafusion/physical-plan/src/aggregates/streams/partial_final.rs
datafusion/physical-plan/src/aggregates/streams/initial_partial.rs

/// is for the final stage.
///
/// See [`InitialPartialHashAggregateStream`] for details.
pub(crate) struct PartialFinalHashAggregateStream {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor nit is I found this naming confusing

  • PartialFinalHashAggregateStream -- the Partial is before Final
  • InitialPartialHashAggregateStream -- the Partial is after Initial

Would it be more consistent to call it PartialInitialHashAggregateStream

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be clearer to call them PartialHashAggregateStream and FinalHashAggregateStream 🤔. I'll update the names.

Comment on lines 509 to 511
InitialPartialHash(InitialPartialHashAggregateStream),
PartialFinalHash(PartialFinalHashAggregateStream),
GroupedHash(GroupedHashAggregateStream),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little unclear what the difference betweeen InitialPartialHash PartialFinalHash and GroupedHash

Is it possible to leave some comments?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I am not clear that the relationship between these different strema types are

For example is GroupedHash the same as InitialPartialHash?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why os GroupedHashAggregateStreamnot changed 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments in 3184e00 for 1. input/output schema 2. Incremental migration strategy for why GroupsHashAggregateStream hasn't changed.

For example is GroupedHash the same as InitialPartialHash?

GroupedHash is a reused path for InitialPartial and ParitalFinal and many others, this might explain why changing things inside GroupedHash is very hard.

//!
//! `AggregateExec` keeps finite-memory, ordered, limit, grouping-set,
//! `partial state -> partial state`, and single-stage aggregation on
//! `GroupedHashAggregateStream` for now.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean "for now" ? That seems like a comment about the PR rather than about the state of the code (and thus may be confusing once this PR is merged)

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 4, 2026

I love the idea of refactoring to reduce the cognative load, BTW

@2010YOUY01
Copy link
Copy Markdown
Contributor Author

2010YOUY01 commented Jun 4, 2026

Thank you @2010YOUY01 -- this looks like a good idea, but I am slightly confused

  1. The PR says it "splits" logic but I actually see two entirely new streams (and no reduction / refactoring of the existing streams)
  2. I am not sure how the two new streams are related to the existing stream

@alamb Thank you for the review!

The idea is incremental migration: we have to keep some duplication during the migration, but eventually the old GroupedHashAggregateStream can be deleted entirely.

// Before
AggregateExec::execute() {
    // One multiplexed stream handles all paths.
    GroupedHashAggregateStream::new(...)
}

// Incremental migration process
AggregateExec::execute() {
    match self.plan_stream() {
        // Migrated path 1/5
        StreamPlan::Partial => PartialHashAggregateStream::new(...),

        // Migrated path 2/5
        StreamPlan::Final => FinalHashAggregateStream::new(...),

        // Paths not migrated yet still use the old implementation.
        _ => GroupedHashAggregateStream::new(...),
    }
}

// After all paths are migrated
AggregateExec::execute() {
    match self.plan_stream() {
        StreamPlan::Partial => PartialHashAggregateStream::new(...),
        StreamPlan::Final => FinalHashAggregateStream::new(...),

        // Migrated path 3/5
        // ...

        // No fallback remains; GroupedHashAggregateStream can be deleted.
    }
}

The reasons for this approach are:

  • Ideally, we would remove the migrated paths from GroupedHashAggregateStream as we go. However, I have found that code particularly difficult to modify safely.
  • This incremental migration minimizes risk by keeping the existing implementation as a fallback while new paths are introduced. Once all paths have been migrated, we can delete the old implementation and validate the result with the existing test suite. The main challenge is that it is difficult to test each migrated path in isolation, and I can't think about a good idea to address that, so I expect the final deletion step to require careful review and validation.

Also, the PR descriptio says

Todo in this PR:
Add a temporary configuration enable_migration_aggregate to turn off this path

But it is unchecked -- do you still plan to do that?

Added in 4a2b907

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Jun 4, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 4, 2026

The idea is incremental migration: we have to keep some duplication during the migration, but eventually the old GroupedHashAggregateStream can be deleted entirely.

I see -- the plan for eventually removing GroupedHashAggregateStream sounds good and makes sense

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 4, 2026

run benchmarks

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-common v53.1.0 (current)
       Built [  36.536s] (current)
     Parsing datafusion-common v53.1.0 (current)
      Parsed [   0.063s] (current)
    Building datafusion-common v53.1.0 (baseline)
       Built [  32.942s] (baseline)
     Parsing datafusion-common v53.1.0 (baseline)
      Parsed [   0.063s] (baseline)
    Checking datafusion-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   1.017s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field ExecutionOptions.enable_migration_aggregate in /home/runner/work/datafusion/datafusion/datafusion/common/src/config.rs:516

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  72.486s] datafusion-common
    Building datafusion-physical-plan v53.1.0 (current)
       Built [  36.036s] (current)
     Parsing datafusion-physical-plan v53.1.0 (current)
      Parsed [   0.141s] (current)
    Building datafusion-physical-plan v53.1.0 (baseline)
       Built [  35.711s] (baseline)
     Parsing datafusion-physical-plan v53.1.0 (baseline)
      Parsed [   0.135s] (baseline)
    Checking datafusion-physical-plan v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.865s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  74.638s] datafusion-physical-plan
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 170.749s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.025s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 168.436s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.026s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.132s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 342.067s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 4, 2026
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @2010YOUY01 -- this looks good to me as long as the benchmarks show no performance regression

my only concern is that we will get half way through the migration and leave the code in an even worse state (even more complicated) but I think your comments have made it about as clear as possible

Thank you for pushing this along

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4622443332-435-vtqjg 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing split-aggr-refactor-only (09462b9) to 48adae4 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4622443332-436-rmmwl 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing split-aggr-refactor-only (09462b9) to 48adae4 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4622443332-437-lhw7s 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing split-aggr-refactor-only (09462b9) to 48adae4 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
   Compiling datafusion-optimizer v53.1.0 (/workspace/datafusion-branch/datafusion/optimizer)
   Compiling datafusion-physical-plan v53.1.0 (/workspace/datafusion-branch/datafusion/physical-plan)
   Compiling datafusion-physical-expr-adapter v53.1.0 (/workspace/datafusion-branch/datafusion/physical-expr-adapter)
warning: ignoring -C extra-filename flag due to -o flag

error[E0428]: the name `Partial` is defined multiple times
  --> datafusion/physical-plan/src/aggregates/hash_table.rs:42:1
   |
40 | pub(super) struct Partial;
   | -------------------------- previous definition of the type `Partial` here
41 | /// Marker for partial state -> final value aggregation.
42 | pub(super) struct Partial;
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^ `Partial` redefined here
   |
   = note: `Partial` must be defined only once in the type namespace of this module

For more information about this error, try `rustc --explain E0428`.
warning: `datafusion-physical-plan` (lib) generated 1 warning
error: could not compile `datafusion-physical-plan` (lib) due to 1 previous error; 1 warning emitted
warning: build failed, waiting for other jobs to finish...

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
   Compiling datafusion-functions-window v53.1.0 (/workspace/datafusion-branch/datafusion/functions-window)
   Compiling datafusion-physical-plan v53.1.0 (/workspace/datafusion-branch/datafusion/physical-plan)
   Compiling datafusion-physical-expr-adapter v53.1.0 (/workspace/datafusion-branch/datafusion/physical-expr-adapter)
warning: ignoring -C extra-filename flag due to -o flag

error[E0428]: the name `Partial` is defined multiple times
  --> datafusion/physical-plan/src/aggregates/hash_table.rs:42:1
   |
40 | pub(super) struct Partial;
   | -------------------------- previous definition of the type `Partial` here
41 | /// Marker for partial state -> final value aggregation.
42 | pub(super) struct Partial;
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^ `Partial` redefined here
   |
   = note: `Partial` must be defined only once in the type namespace of this module

For more information about this error, try `rustc --explain E0428`.
warning: `datafusion-physical-plan` (lib) generated 1 warning
error: could not compile `datafusion-physical-plan` (lib) due to 1 previous error; 1 warning emitted
warning: build failed, waiting for other jobs to finish...

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
   Compiling datafusion-optimizer v53.1.0 (/workspace/datafusion-branch/datafusion/optimizer)
   Compiling datafusion-physical-plan v53.1.0 (/workspace/datafusion-branch/datafusion/physical-plan)
   Compiling datafusion-physical-expr-adapter v53.1.0 (/workspace/datafusion-branch/datafusion/physical-expr-adapter)
warning: ignoring -C extra-filename flag due to -o flag

error[E0428]: the name `Partial` is defined multiple times
  --> datafusion/physical-plan/src/aggregates/hash_table.rs:42:1
   |
40 | pub(super) struct Partial;
   | -------------------------- previous definition of the type `Partial` here
41 | /// Marker for partial state -> final value aggregation.
42 | pub(super) struct Partial;
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^ `Partial` redefined here
   |
   = note: `Partial` must be defined only once in the type namespace of this module

For more information about this error, try `rustc --explain E0428`.
warning: `datafusion-physical-plan` (lib) generated 1 warning
error: could not compile `datafusion-physical-plan` (lib) due to 1 previous error; 1 warning emitted
warning: build failed, waiting for other jobs to finish...

File an issue against this benchmark runner

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 4, 2026

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4622519035-438-xhr4s 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing split-aggr-refactor-only (bd75229) to 48adae4 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4622519035-439-rf7jf 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing split-aggr-refactor-only (bd75229) to 48adae4 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4622519035-440-24rrh 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing split-aggr-refactor-only (bd75229) to 48adae4 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@2010YOUY01
Copy link
Copy Markdown
Contributor Author

I expect very high-cardinality aggregations to be slower for now, because I plan to address that in a follow-up PR:

I would change the default for the configuration added in 4a2b907 to false, and enable it again once that feature is restored.

@Rachelint
Copy link
Copy Markdown
Contributor

The purpose of this refactor is to make that boundary explicit. With this split, the key blocked state-management changes (Accumulator + GroupValues) should only interact with the new dedicated stream, rather than being wired into the complex state machine in row_hash.rs.

Make sense. Currently , it is actually very very hard to understand the state machine in row_hash.rs...

Yes I agree. I was thinking maybe #15591 can be updated after this refactor, if we can ensure the blocked Accumulator and GroupValues can only get planned into the new aggregate streams in this refactor, and the old row_hash.rs won't get affected, that change would be easier to review.

Sounds good, #15591 still not apply the blocked logic into splling path and others, and I think really hard if we do it in the old logic...

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and split-aggr-refactor-only
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃       split-aggr-refactor-only ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 38.26 / 39.66 ±1.63 / 42.79 ms │ 38.24 / 38.80 ±1.05 / 40.89 ms │    no change │
│ QQuery 2  │ 18.55 / 18.97 ±0.34 / 19.38 ms │ 18.27 / 18.63 ±0.24 / 18.97 ms │    no change │
│ QQuery 3  │ 33.45 / 33.65 ±0.22 / 34.06 ms │ 31.05 / 32.26 ±1.16 / 33.64 ms │    no change │
│ QQuery 4  │ 17.11 / 17.59 ±0.80 / 19.19 ms │ 17.08 / 17.46 ±0.50 / 18.45 ms │    no change │
│ QQuery 5  │ 37.39 / 40.25 ±1.72 / 42.68 ms │ 39.33 / 40.00 ±0.45 / 40.74 ms │    no change │
│ QQuery 6  │ 15.90 / 16.56 ±0.91 / 18.36 ms │ 15.98 / 16.36 ±0.39 / 17.11 ms │    no change │
│ QQuery 7  │ 42.96 / 44.09 ±0.66 / 44.75 ms │ 42.92 / 46.15 ±2.61 / 49.12 ms │    no change │
│ QQuery 8  │ 42.62 / 42.90 ±0.32 / 43.50 ms │ 42.70 / 44.08 ±1.25 / 46.03 ms │    no change │
│ QQuery 9  │ 48.43 / 48.97 ±0.68 / 50.22 ms │ 48.56 / 49.10 ±0.75 / 50.54 ms │    no change │
│ QQuery 10 │ 41.65 / 41.90 ±0.20 / 42.19 ms │ 41.89 / 44.12 ±1.82 / 47.04 ms │ 1.05x slower │
│ QQuery 11 │ 12.86 / 13.23 ±0.43 / 13.92 ms │ 12.69 / 12.93 ±0.20 / 13.18 ms │    no change │
│ QQuery 12 │ 23.82 / 24.31 ±0.33 / 24.85 ms │ 23.54 / 24.32 ±0.54 / 24.81 ms │    no change │
│ QQuery 13 │ 32.00 / 33.62 ±1.20 / 35.22 ms │ 33.10 / 34.46 ±0.98 / 35.62 ms │    no change │
│ QQuery 14 │ 23.42 / 24.10 ±0.94 / 25.97 ms │ 23.52 / 23.66 ±0.13 / 23.84 ms │    no change │
│ QQuery 15 │ 30.94 / 31.38 ±0.70 / 32.77 ms │ 30.86 / 31.04 ±0.13 / 31.17 ms │    no change │
│ QQuery 16 │ 14.07 / 14.18 ±0.07 / 14.26 ms │ 13.83 / 14.13 ±0.21 / 14.38 ms │    no change │
│ QQuery 17 │ 72.42 / 72.64 ±0.14 / 72.85 ms │ 72.52 / 73.74 ±1.31 / 75.78 ms │    no change │
│ QQuery 18 │ 57.49 / 58.55 ±0.66 / 59.43 ms │ 57.86 / 59.77 ±2.25 / 64.15 ms │    no change │
│ QQuery 19 │ 32.77 / 32.97 ±0.30 / 33.55 ms │ 32.45 / 32.72 ±0.23 / 33.12 ms │    no change │
│ QQuery 20 │ 31.77 / 32.07 ±0.21 / 32.32 ms │ 31.80 / 32.30 ±0.63 / 33.53 ms │    no change │
│ QQuery 21 │ 55.26 / 56.55 ±1.48 / 59.42 ms │ 55.20 / 57.39 ±1.43 / 59.19 ms │    no change │
│ QQuery 22 │ 13.59 / 13.77 ±0.12 / 13.97 ms │ 13.45 / 13.85 ±0.24 / 14.18 ms │    no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                       ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 751.90ms │
│ Total Time (split-aggr-refactor-only)   │ 757.24ms │
│ Average Time (HEAD)                     │  34.18ms │
│ Average Time (split-aggr-refactor-only) │  34.42ms │
│ Queries Faster                          │        0 │
│ Queries Slower                          │        1 │
│ Queries with No Change                  │       21 │
│ Queries with Failure                    │        0 │
└─────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 5.7 GiB
Avg memory 4.9 GiB
CPU user 29.6s
CPU sys 2.1s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 5.7 GiB
Avg memory 4.9 GiB
CPU user 29.6s
CPU sys 2.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and split-aggr-refactor-only
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃              split-aggr-refactor-only ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           5.91 / 6.34 ±0.79 / 7.91 ms │           5.79 / 6.27 ±0.75 / 7.76 ms │     no change │
│ QQuery 2  │        80.06 / 80.59 ±0.29 / 80.88 ms │        80.81 / 80.98 ±0.09 / 81.07 ms │     no change │
│ QQuery 3  │        28.46 / 29.01 ±0.44 / 29.62 ms │        28.65 / 28.89 ±0.16 / 29.15 ms │     no change │
│ QQuery 4  │     487.89 / 492.39 ±3.63 / 498.80 ms │     483.77 / 491.47 ±3.95 / 494.87 ms │     no change │
│ QQuery 5  │        51.44 / 52.35 ±1.28 / 54.83 ms │        50.96 / 51.63 ±0.35 / 51.90 ms │     no change │
│ QQuery 6  │        36.14 / 36.73 ±0.46 / 37.24 ms │        36.61 / 37.00 ±0.39 / 37.70 ms │     no change │
│ QQuery 7  │        95.49 / 96.10 ±0.45 / 96.72 ms │        95.78 / 96.18 ±0.38 / 96.84 ms │     no change │
│ QQuery 8  │        36.63 / 38.66 ±3.11 / 44.84 ms │        36.88 / 38.81 ±2.89 / 44.52 ms │     no change │
│ QQuery 9  │        52.84 / 54.75 ±1.40 / 56.73 ms │        51.48 / 53.21 ±1.00 / 54.23 ms │     no change │
│ QQuery 10 │        68.67 / 68.96 ±0.23 / 69.33 ms │        68.74 / 68.91 ±0.13 / 69.05 ms │     no change │
│ QQuery 11 │     298.61 / 304.39 ±3.96 / 311.02 ms │     299.76 / 303.28 ±1.84 / 304.79 ms │     no change │
│ QQuery 12 │        28.56 / 29.02 ±0.43 / 29.66 ms │        28.58 / 29.13 ±0.53 / 30.09 ms │     no change │
│ QQuery 13 │     119.14 / 120.60 ±2.27 / 125.10 ms │     119.96 / 121.08 ±0.92 / 122.65 ms │     no change │
│ QQuery 14 │     507.55 / 509.52 ±2.83 / 515.12 ms │     504.95 / 507.66 ±1.78 / 510.07 ms │     no change │
│ QQuery 15 │        58.97 / 59.68 ±0.70 / 61.02 ms │        57.95 / 60.65 ±2.53 / 65.27 ms │     no change │
│ QQuery 16 │           6.66 / 6.82 ±0.21 / 7.23 ms │           6.67 / 6.83 ±0.16 / 7.10 ms │     no change │
│ QQuery 17 │        81.05 / 82.00 ±1.36 / 84.69 ms │        81.05 / 81.76 ±0.61 / 82.63 ms │     no change │
│ QQuery 18 │     125.36 / 126.05 ±0.58 / 126.77 ms │     126.13 / 126.84 ±0.60 / 127.88 ms │     no change │
│ QQuery 19 │        42.31 / 43.29 ±1.26 / 45.72 ms │        42.42 / 43.48 ±1.24 / 45.87 ms │     no change │
│ QQuery 20 │        35.66 / 36.22 ±0.49 / 36.79 ms │        35.73 / 36.56 ±0.56 / 37.47 ms │     no change │
│ QQuery 21 │        17.10 / 17.34 ±0.25 / 17.75 ms │        16.83 / 17.20 ±0.43 / 17.95 ms │     no change │
│ QQuery 22 │        62.04 / 64.67 ±2.39 / 68.84 ms │        62.32 / 63.16 ±0.78 / 64.56 ms │     no change │
│ QQuery 23 │     342.96 / 345.36 ±2.55 / 348.86 ms │     394.51 / 398.01 ±2.79 / 402.13 ms │  1.15x slower │
│ QQuery 24 │     225.47 / 227.00 ±1.62 / 230.02 ms │     225.47 / 229.96 ±4.68 / 238.81 ms │     no change │
│ QQuery 25 │     112.78 / 114.74 ±1.44 / 116.38 ms │     114.98 / 117.37 ±2.20 / 121.25 ms │     no change │
│ QQuery 26 │        58.18 / 58.57 ±0.31 / 58.96 ms │        58.30 / 58.97 ±0.52 / 59.65 ms │     no change │
│ QQuery 27 │           6.65 / 6.78 ±0.19 / 7.15 ms │           6.65 / 6.77 ±0.17 / 7.09 ms │     no change │
│ QQuery 28 │        60.97 / 61.33 ±0.40 / 62.05 ms │        57.59 / 63.94 ±4.55 / 70.74 ms │     no change │
│ QQuery 29 │      98.46 / 102.64 ±2.99 / 105.77 ms │      98.75 / 100.13 ±1.67 / 102.47 ms │     no change │
│ QQuery 30 │        32.02 / 32.69 ±0.70 / 33.94 ms │        32.49 / 32.72 ±0.19 / 33.05 ms │     no change │
│ QQuery 31 │     111.43 / 112.44 ±1.21 / 114.69 ms │     111.70 / 113.86 ±1.61 / 116.50 ms │     no change │
│ QQuery 32 │        20.00 / 20.38 ±0.34 / 20.98 ms │        19.93 / 20.17 ±0.38 / 20.93 ms │     no change │
│ QQuery 33 │        38.97 / 39.51 ±0.35 / 39.98 ms │        38.65 / 39.03 ±0.37 / 39.69 ms │     no change │
│ QQuery 34 │          9.57 / 9.82 ±0.24 / 10.18 ms │         9.37 / 10.32 ±0.85 / 11.91 ms │  1.05x slower │
│ QQuery 35 │        76.88 / 77.40 ±0.38 / 77.99 ms │        77.24 / 80.43 ±3.26 / 85.90 ms │     no change │
│ QQuery 36 │           5.84 / 5.95 ±0.21 / 6.36 ms │           5.84 / 5.98 ±0.20 / 6.38 ms │     no change │
│ QQuery 37 │           6.85 / 6.89 ±0.04 / 6.94 ms │           6.90 / 7.11 ±0.17 / 7.41 ms │     no change │
│ QQuery 38 │        63.21 / 64.13 ±0.89 / 65.57 ms │        63.05 / 64.27 ±0.81 / 65.06 ms │     no change │
│ QQuery 39 │     448.98 / 452.82 ±3.68 / 459.53 ms │     454.03 / 460.59 ±5.66 / 470.56 ms │     no change │
│ QQuery 40 │        22.94 / 23.15 ±0.17 / 23.44 ms │        23.59 / 25.38 ±3.21 / 31.79 ms │  1.10x slower │
│ QQuery 41 │        11.33 / 11.46 ±0.15 / 11.76 ms │        11.16 / 11.32 ±0.16 / 11.64 ms │     no change │
│ QQuery 42 │        23.63 / 23.98 ±0.52 / 25.00 ms │        24.26 / 24.76 ±0.64 / 26.00 ms │     no change │
│ QQuery 43 │           4.94 / 4.99 ±0.08 / 5.15 ms │           4.60 / 4.71 ±0.16 / 5.04 ms │ +1.06x faster │
│ QQuery 44 │        10.78 / 10.84 ±0.06 / 10.94 ms │        10.47 / 10.74 ±0.17 / 10.88 ms │     no change │
│ QQuery 45 │        38.08 / 39.64 ±2.10 / 43.81 ms │        37.97 / 38.76 ±0.77 / 39.76 ms │     no change │
│ QQuery 46 │        11.95 / 13.23 ±2.21 / 17.63 ms │        11.81 / 12.20 ±0.28 / 12.65 ms │ +1.08x faster │
│ QQuery 47 │     230.20 / 235.44 ±3.58 / 239.49 ms │     232.00 / 237.65 ±5.51 / 246.01 ms │     no change │
│ QQuery 48 │        95.99 / 96.58 ±0.93 / 98.43 ms │      97.04 / 100.39 ±3.97 / 108.09 ms │     no change │
│ QQuery 49 │        76.73 / 79.47 ±2.26 / 83.33 ms │        77.70 / 79.13 ±1.19 / 80.32 ms │     no change │
│ QQuery 50 │        59.62 / 60.17 ±0.29 / 60.44 ms │        59.37 / 59.76 ±0.28 / 60.19 ms │     no change │
│ QQuery 51 │        90.30 / 92.85 ±2.15 / 96.33 ms │      98.48 / 101.86 ±3.07 / 107.00 ms │  1.10x slower │
│ QQuery 52 │        23.94 / 24.19 ±0.17 / 24.34 ms │        24.13 / 24.40 ±0.22 / 24.77 ms │     no change │
│ QQuery 53 │        29.89 / 31.65 ±3.17 / 37.99 ms │        29.75 / 29.97 ±0.18 / 30.24 ms │ +1.06x faster │
│ QQuery 54 │        55.06 / 55.70 ±0.46 / 56.27 ms │        55.22 / 55.71 ±0.50 / 56.66 ms │     no change │
│ QQuery 55 │        23.45 / 23.66 ±0.16 / 23.86 ms │        23.62 / 24.19 ±0.43 / 24.81 ms │     no change │
│ QQuery 56 │        38.74 / 39.11 ±0.24 / 39.40 ms │        38.86 / 39.43 ±0.53 / 40.28 ms │     no change │
│ QQuery 57 │     176.50 / 178.71 ±3.49 / 185.62 ms │     176.73 / 178.18 ±1.25 / 180.03 ms │     no change │
│ QQuery 58 │     116.72 / 118.60 ±1.90 / 122.22 ms │     117.01 / 117.96 ±0.80 / 119.05 ms │     no change │
│ QQuery 59 │     117.48 / 117.71 ±0.18 / 118.01 ms │     118.18 / 120.05 ±1.76 / 122.61 ms │     no change │
│ QQuery 60 │        39.22 / 39.77 ±0.42 / 40.42 ms │        39.69 / 40.24 ±0.43 / 40.80 ms │     no change │
│ QQuery 61 │        13.02 / 14.59 ±2.81 / 20.20 ms │        13.05 / 13.24 ±0.23 / 13.68 ms │ +1.10x faster │
│ QQuery 62 │        46.54 / 46.76 ±0.13 / 46.90 ms │        46.13 / 46.40 ±0.23 / 46.79 ms │     no change │
│ QQuery 63 │        29.75 / 30.10 ±0.27 / 30.44 ms │        29.51 / 31.50 ±3.11 / 37.70 ms │     no change │
│ QQuery 64 │     398.83 / 404.82 ±5.61 / 413.43 ms │     399.25 / 403.82 ±5.13 / 413.71 ms │     no change │
│ QQuery 65 │     141.20 / 148.10 ±4.14 / 153.00 ms │     142.74 / 147.69 ±2.60 / 150.09 ms │     no change │
│ QQuery 66 │        79.16 / 80.34 ±0.86 / 81.78 ms │        79.50 / 82.74 ±3.39 / 87.19 ms │     no change │
│ QQuery 67 │     248.82 / 253.71 ±5.54 / 263.37 ms │     245.91 / 256.59 ±6.18 / 264.79 ms │     no change │
│ QQuery 68 │        11.65 / 11.83 ±0.17 / 12.13 ms │        11.69 / 11.87 ±0.18 / 12.22 ms │     no change │
│ QQuery 69 │        63.01 / 68.09 ±9.60 / 87.28 ms │        62.35 / 62.93 ±0.50 / 63.62 ms │ +1.08x faster │
│ QQuery 70 │     107.21 / 110.95 ±6.47 / 123.87 ms │     107.26 / 112.54 ±3.18 / 116.32 ms │     no change │
│ QQuery 71 │        35.59 / 36.00 ±0.24 / 36.26 ms │        35.94 / 36.75 ±0.46 / 37.33 ms │     no change │
│ QQuery 72 │ 2090.99 / 2137.06 ±28.65 / 2167.54 ms │ 2109.95 / 2178.18 ±46.13 / 2232.07 ms │     no change │
│ QQuery 73 │           9.23 / 9.51 ±0.26 / 9.89 ms │          9.30 / 9.56 ±0.26 / 10.02 ms │     no change │
│ QQuery 74 │     170.29 / 174.18 ±5.33 / 184.53 ms │     173.18 / 176.81 ±3.57 / 183.37 ms │     no change │
│ QQuery 75 │     146.65 / 154.77 ±6.39 / 163.74 ms │     146.84 / 150.73 ±4.46 / 159.24 ms │     no change │
│ QQuery 76 │        35.03 / 35.69 ±0.40 / 36.14 ms │        35.77 / 36.29 ±0.38 / 36.95 ms │     no change │
│ QQuery 77 │        61.04 / 61.67 ±0.41 / 62.14 ms │        60.87 / 61.41 ±0.48 / 62.19 ms │     no change │
│ QQuery 78 │     189.56 / 191.74 ±2.14 / 194.45 ms │     189.43 / 195.15 ±8.32 / 211.41 ms │     no change │
│ QQuery 79 │        67.34 / 72.71 ±8.61 / 89.88 ms │        67.09 / 68.17 ±1.01 / 69.92 ms │ +1.07x faster │
│ QQuery 80 │     102.11 / 104.00 ±1.43 / 106.23 ms │     103.80 / 107.12 ±3.01 / 111.61 ms │     no change │
│ QQuery 81 │        25.33 / 25.80 ±0.25 / 26.02 ms │        25.77 / 26.20 ±0.26 / 26.54 ms │     no change │
│ QQuery 82 │        16.32 / 16.46 ±0.11 / 16.64 ms │        16.31 / 16.46 ±0.11 / 16.64 ms │     no change │
│ QQuery 83 │        39.99 / 44.31 ±5.20 / 51.17 ms │        39.77 / 40.10 ±0.20 / 40.38 ms │ +1.11x faster │
│ QQuery 84 │        34.95 / 35.75 ±0.59 / 36.66 ms │        34.13 / 34.58 ±0.38 / 35.27 ms │     no change │
│ QQuery 85 │     106.67 / 108.54 ±1.04 / 109.88 ms │     108.83 / 110.17 ±1.86 / 113.81 ms │     no change │
│ QQuery 86 │        24.89 / 25.25 ±0.31 / 25.68 ms │        24.64 / 25.17 ±0.29 / 25.46 ms │     no change │
│ QQuery 87 │        63.81 / 66.69 ±4.34 / 75.05 ms │        63.09 / 65.04 ±2.23 / 69.24 ms │     no change │
│ QQuery 88 │        62.13 / 62.49 ±0.36 / 63.19 ms │        62.43 / 63.21 ±1.17 / 65.55 ms │     no change │
│ QQuery 89 │        35.55 / 35.82 ±0.36 / 36.52 ms │        36.12 / 36.59 ±0.25 / 36.81 ms │     no change │
│ QQuery 90 │        16.66 / 16.90 ±0.15 / 17.14 ms │        16.66 / 16.90 ±0.21 / 17.23 ms │     no change │
│ QQuery 91 │        44.54 / 46.81 ±2.61 / 50.05 ms │        44.88 / 45.47 ±0.47 / 46.16 ms │     no change │
│ QQuery 92 │        29.86 / 30.01 ±0.17 / 30.32 ms │        29.16 / 29.69 ±0.61 / 30.68 ms │     no change │
│ QQuery 93 │        51.90 / 52.62 ±0.52 / 53.33 ms │        52.38 / 54.74 ±2.45 / 58.30 ms │     no change │
│ QQuery 94 │        38.16 / 38.84 ±0.40 / 39.29 ms │        38.58 / 38.90 ±0.20 / 39.11 ms │     no change │
│ QQuery 95 │        83.61 / 85.62 ±1.56 / 88.04 ms │        83.99 / 84.49 ±0.34 / 85.07 ms │     no change │
│ QQuery 96 │        23.90 / 24.28 ±0.28 / 24.72 ms │        23.84 / 24.15 ±0.28 / 24.62 ms │     no change │
│ QQuery 97 │        45.70 / 46.57 ±0.64 / 47.62 ms │        53.49 / 54.96 ±1.72 / 58.27 ms │  1.18x slower │
│ QQuery 98 │        42.72 / 43.95 ±1.41 / 46.46 ms │        43.32 / 43.89 ±0.48 / 44.69 ms │     no change │
│ QQuery 99 │        70.51 / 71.92 ±2.55 / 77.02 ms │        70.35 / 70.77 ±0.34 / 71.35 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 10440.05ms │
│ Total Time (split-aggr-refactor-only)   │ 10568.39ms │
│ Average Time (HEAD)                     │   105.46ms │
│ Average Time (split-aggr-refactor-only) │   106.75ms │
│ Queries Faster                          │          7 │
│ Queries Slower                          │          5 │
│ Queries with No Change                  │         87 │
│ Queries with Failure                    │          0 │
└─────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 55.0s
Peak memory 6.9 GiB
Avg memory 6.2 GiB
CPU user 237.2s
CPU sys 6.6s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 55.0s
Peak memory 6.9 GiB
Avg memory 6.4 GiB
CPU user 240.0s
CPU sys 6.9s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and split-aggr-refactor-only
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃              split-aggr-refactor-only ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.16 / 3.39 ±4.31 / 12.02 ms │          1.16 / 3.39 ±4.31 / 12.00 ms │     no change │
│ QQuery 1  │        12.42 / 12.90 ±0.25 / 13.14 ms │        12.29 / 12.95 ±0.36 / 13.24 ms │     no change │
│ QQuery 2  │        35.64 / 36.03 ±0.31 / 36.51 ms │        36.04 / 36.40 ±0.29 / 36.76 ms │     no change │
│ QQuery 3  │        30.70 / 31.85 ±0.70 / 32.90 ms │        31.94 / 32.11 ±0.11 / 32.27 ms │     no change │
│ QQuery 4  │     236.05 / 238.98 ±2.67 / 243.54 ms │     241.42 / 248.12 ±5.92 / 255.84 ms │     no change │
│ QQuery 5  │     277.09 / 281.52 ±2.57 / 284.74 ms │     275.29 / 281.55 ±4.56 / 287.78 ms │     no change │
│ QQuery 6  │           1.21 / 1.37 ±0.24 / 1.84 ms │           1.21 / 1.38 ±0.24 / 1.84 ms │     no change │
│ QQuery 7  │        13.85 / 13.93 ±0.05 / 14.01 ms │        14.19 / 16.51 ±4.18 / 24.87 ms │  1.19x slower │
│ QQuery 8  │     335.81 / 346.55 ±6.17 / 352.53 ms │     336.21 / 341.13 ±3.24 / 345.70 ms │     no change │
│ QQuery 9  │     479.27 / 491.95 ±7.18 / 499.60 ms │    471.71 / 484.51 ±11.85 / 505.01 ms │     no change │
│ QQuery 10 │        72.50 / 73.60 ±0.87 / 74.64 ms │        71.31 / 73.31 ±1.54 / 75.87 ms │     no change │
│ QQuery 11 │        84.16 / 85.34 ±0.76 / 86.34 ms │        83.36 / 83.84 ±0.46 / 84.71 ms │     no change │
│ QQuery 12 │     275.38 / 283.85 ±5.98 / 293.46 ms │     278.29 / 282.77 ±4.23 / 289.44 ms │     no change │
│ QQuery 13 │     401.86 / 406.17 ±6.61 / 419.31 ms │    471.01 / 497.73 ±26.43 / 546.79 ms │  1.23x slower │
│ QQuery 14 │     289.82 / 298.87 ±5.98 / 305.35 ms │     287.89 / 295.81 ±4.99 / 301.70 ms │     no change │
│ QQuery 15 │     286.39 / 289.62 ±2.37 / 293.03 ms │     277.84 / 289.60 ±7.48 / 298.88 ms │     no change │
│ QQuery 16 │     635.92 / 649.20 ±7.04 / 655.51 ms │     630.17 / 640.83 ±7.61 / 650.28 ms │     no change │
│ QQuery 17 │    628.78 / 653.98 ±15.77 / 676.20 ms │    638.44 / 655.19 ±18.06 / 682.33 ms │     no change │
│ QQuery 18 │ 1335.41 / 1357.71 ±17.75 / 1382.16 ms │ 1304.71 / 1338.16 ±27.19 / 1375.36 ms │     no change │
│ QQuery 19 │        27.71 / 28.04 ±0.18 / 28.23 ms │        28.39 / 31.31 ±4.77 / 40.80 ms │  1.12x slower │
│ QQuery 20 │    525.01 / 535.77 ±11.90 / 558.17 ms │     520.49 / 532.89 ±8.13 / 543.50 ms │     no change │
│ QQuery 21 │     604.75 / 607.93 ±2.57 / 612.34 ms │     609.63 / 619.10 ±7.44 / 630.81 ms │     no change │
│ QQuery 22 │  1082.63 / 1089.41 ±4.27 / 1094.27 ms │ 1071.98 / 1094.74 ±13.06 / 1109.99 ms │     no change │
│ QQuery 23 │ 3326.57 / 3362.87 ±42.03 / 3430.61 ms │ 3243.97 / 3289.17 ±30.34 / 3329.45 ms │     no change │
│ QQuery 24 │        41.67 / 42.97 ±1.29 / 45.30 ms │        41.57 / 42.32 ±0.87 / 43.97 ms │     no change │
│ QQuery 25 │     114.04 / 116.36 ±2.52 / 121.21 ms │     112.95 / 115.26 ±2.00 / 118.63 ms │     no change │
│ QQuery 26 │        42.70 / 44.36 ±1.40 / 46.81 ms │        42.41 / 47.11 ±8.76 / 64.61 ms │  1.06x slower │
│ QQuery 27 │     677.42 / 686.60 ±5.85 / 694.86 ms │     678.46 / 683.48 ±4.84 / 692.68 ms │     no change │
│ QQuery 28 │ 3084.72 / 3127.28 ±36.96 / 3189.50 ms │ 3083.47 / 3104.88 ±17.69 / 3135.32 ms │     no change │
│ QQuery 29 │        40.53 / 42.38 ±2.81 / 47.97 ms │       40.56 / 54.60 ±19.67 / 91.78 ms │  1.29x slower │
│ QQuery 30 │     314.91 / 318.78 ±3.24 / 324.22 ms │     307.16 / 315.40 ±6.41 / 325.44 ms │     no change │
│ QQuery 31 │     293.31 / 300.55 ±7.77 / 315.18 ms │     350.77 / 358.28 ±7.19 / 367.42 ms │  1.19x slower │
│ QQuery 32 │ 1009.72 / 1038.56 ±19.96 / 1070.19 ms │ 1646.84 / 1748.45 ±52.59 / 1798.12 ms │  1.68x slower │
│ QQuery 33 │ 1530.63 / 1564.53 ±29.09 / 1612.73 ms │ 1496.35 / 1536.14 ±26.47 / 1570.37 ms │     no change │
│ QQuery 34 │ 1549.38 / 1592.19 ±33.91 / 1650.26 ms │ 1521.74 / 1554.32 ±26.48 / 1595.89 ms │     no change │
│ QQuery 35 │    313.73 / 330.30 ±22.47 / 374.09 ms │    288.14 / 320.48 ±38.78 / 388.39 ms │     no change │
│ QQuery 36 │        72.61 / 77.57 ±4.14 / 82.40 ms │       68.39 / 79.48 ±10.82 / 98.66 ms │     no change │
│ QQuery 37 │        36.20 / 38.40 ±3.02 / 44.23 ms │        35.82 / 38.34 ±3.08 / 44.19 ms │     no change │
│ QQuery 38 │        41.57 / 52.54 ±7.62 / 61.42 ms │        40.51 / 44.58 ±3.42 / 50.23 ms │ +1.18x faster │
│ QQuery 39 │     149.20 / 155.30 ±5.80 / 162.64 ms │     134.70 / 152.30 ±9.62 / 163.92 ms │     no change │
│ QQuery 40 │        14.64 / 19.45 ±9.38 / 38.21 ms │        14.42 / 15.26 ±0.94 / 17.01 ms │ +1.27x faster │
│ QQuery 41 │        13.76 / 14.24 ±0.27 / 14.56 ms │        13.87 / 14.07 ±0.13 / 14.25 ms │     no change │
│ QQuery 42 │        13.71 / 15.15 ±2.42 / 19.97 ms │        12.79 / 14.27 ±2.16 / 18.55 ms │ +1.06x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 20758.32ms │
│ Total Time (split-aggr-refactor-only)   │ 21421.51ms │
│ Average Time (HEAD)                     │   482.75ms │
│ Average Time (split-aggr-refactor-only) │   498.17ms │
│ Queries Faster                          │          3 │
│ Queries Slower                          │          7 │
│ Queries with No Change                  │         33 │
│ Queries with Failure                    │          0 │
└─────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.0 GiB
Avg memory 22.9 GiB
CPU user 1060.8s
CPU sys 81.9s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 110.0s
Peak memory 35.8 GiB
Avg memory 23.5 GiB
CPU user 1098.6s
CPU sys 84.7s
Peak spill 0 B

File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change common Related to common crate physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants