perf: add prefetching for aggregate multi group by [WIP] #19520

rluvaton · 2025-12-28T11:54:45Z

Which issue does this PR close?

N/A

Rationale for this change

grouping on large amount of data is very slow

What changes are included in this PR?

used my fork of hashbrown with prefetching and added prefetching when map is large

Are these changes tested?

No

Are there any user-facing changes?

No

Related hashbrown issue for allowing prefetching:

Allow to prefetch buckets rust-lang/hashbrown#677

My hashbrown changes:

rust-lang/hashbrown@master...rluvaton:hashbrown:prefetch

my hashbrown changes are not very safe and probably have bugs but just to make it quick and dirty to see the benefit

Command:

The optimizer can optimize the group by away in this case but it is just a cheap way to have a table and do an multi aggregate on it

datafusion-cli --command "select value from range(0,100000000) group by value, value + 1;"

When the hashmap is smaller, the prefetching is unneeded as it should already fit in the cache

with the following config:

The configs are meant for making it almost single threaded so we can push the single thread to its limit

set datafusion.execution.coalesce_batches=false;
set datafusion.optimizer.repartition_aggregations=false;
set datafusion.optimizer.enable_round_robin_repartition=false;

-- Adding this
set datafusion.execution.agg_prefetch_elements=0; -- 0 is disabled 
set datafusion.execution.agg_prefetch_locality=3; -- no affect if agg_prefetch_elements is 0 
set datafusion.execution.agg_prefetch_read=false; -- no affect if agg_prefetch_elements is 0

Without prefetch: 16.002 seconds
With prefetch: 10.176 seconds

Metric	Before	After	Improvement
LLC-load-misses	91.9M	25.7M	72% reduction
LLC-loads	202M	101.7M	50% reduction
Cycles	62.4B	39.8B	36% reduction
IPC	0.49	0.80	63% improvement

Env

Machine: c5.metal

$ ./neofetch
             `-/oydNNdyo:.`                ec2-user@
      `.:+shmMMMMMMMMMMMMMMmhs+:.`         ------------------------------------------------------
    -+hNNMMMMMMMMMMMMMMMMMMMMMMNNho-       OS: Amazon Linux 2023.9.20251208 x86_64
.``      -/+shmNNMMMMMMNNmhs+/-      ``.   Host: Amazon EC2
dNmhs+:.       `.:/oo/:.`       .:+shmNd   Kernel: 6.1.158-180.294.amzn2023.x86_64
dMMMMMMMNdhs+:..        ..:+shdNMMMMMMMd   Uptime: 51 mins
dMMMMMMMMMMMMMMNds    odNMMMMMMMMMMMMMMd   Packages: 800 (rpm)
dMMMMMMMMMMMMMMMMh    yMMMMMMMMMMMMMMMMd   Shell: bash 5.2.15
dMMMMMMMMMMMMMMMMh    yMMMMMMMMMMMMMMMMd   Terminal: /dev/pts/1
dMMMMMMMMMMMMMMMMh    yMMMMMMMMMMMMMMMMd   CPU: Intel Xeon Platinum 8275CL (96) @ 3.900GHz
dMMMMMMMMMMMMMMMMh    yMMMMMMMMMMMMMMMMd   Memory: 2514MiB / 193043MiB
dMMMMMMMMMMMMMMMMh    yMMMMMMMMMMMMMMMMd
dMMMMMMMMMMMMMMMMh    yMMMMMMMMMMMMMMMMd
dMMMMMMMMMMMMMMMMh    yMMMMMMMMMMMMMMMMd
dMMMMMMMMMMMMMMMMh    yMMMMMMMMMMMMMMMMd
dMMMMMMMMMMMMMMMMh    yMMMMMMMMMMMMMMMMd
.:+ydNMMMMMMMMMMMh    yMMMMMMMMMMMNdy+:.
     `.:+shNMMMMMh    yMMMMMNhs+:``
            `-+shy    shs+:`

$ ./cpufetch --verbose
Name:                Intel Xeon Platinum 8275CL
Microarchitecture:   Cascade Lake
Technology:          14nm
Max Frequency:       3.900 GHz
Sockets:             2
Cores:               24 cores (48 threads)
Cores (Total):       48 cores (96 threads)
AVX:                 AVX,AVX2,AVX512
FMA:                 FMA3
L1i Size:            32KB (1.5MB Total)
L1d Size:            32KB (1.5MB Total)
L2 Size:             1MB (48MB Total)
L3 Size:             35.75MB (71.5MB Total)
Peak Performance:    11.98 TFLOP/s

Results

Without prefetch

perf stat -e cycles,instructions,cache-references,cache-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses datafusion-cli --command "
set datafusion.execution.coalesce_batches=false;
set datafusion.optimizer.repartition_aggregations=false;
set datafusion.optimizer.enable_round_robin_repartition=false;
set datafusion.execution.agg_prefetch_elements=0;
set datafusion.execution.agg_prefetch_locality=3;
set datafusion.execution.agg_prefetch_read=false;
select value from range(0,100000000) group by value, value + 1;
"
DataFusion CLI v51.0.0
0 row(s) fetched.
Elapsed 0.001 seconds.

0 row(s) fetched.
Elapsed 0.000 seconds.

0 row(s) fetched.
Elapsed 0.000 seconds.

0 row(s) fetched.
Elapsed 0.000 seconds.

0 row(s) fetched.
Elapsed 0.000 seconds.

0 row(s) fetched.
Elapsed 0.000 seconds.

+-------+
| value |
+-------+
| 0     |
| 1     |
| 2     |
| 3     |
| 4     |
| 5     |
| 6     |
| 7     |
| 8     |
| 9     |
| 10    |
| 11    |
| 12    |
| 13    |
| 14    |
| 15    |
| 16    |
| 17    |
| 18    |
| 19    |
| 20    |
| 21    |
| 22    |
| 23    |
| 24    |
| 25    |
| 26    |
| 27    |
| 28    |
| 29    |
| 30    |
| 31    |
| 32    |
| 33    |
| 34    |
| 35    |
| 36    |
| 37    |
| 38    |
| 39    |
| .     |
| .     |
| .     |
+-------+
100000000 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 16.002 seconds.


 Performance counter stats for 'datafusion-cli --command
set datafusion.execution.coalesce_batches=false;
set datafusion.optimizer.repartition_aggregations=false;
set datafusion.optimizer.enable_round_robin_repartition=false;
set datafusion.execution.agg_prefetch_elements=0;
set datafusion.execution.agg_prefetch_locality=3;
set datafusion.execution.agg_prefetch_read=false;
select value from range(0,100000000) group by value, value + 1;
':

       62399336707      cycles                                                               (62.64%)
       30518679380      instructions                     #    0.49  insn per cycle           (75.11%)
         589924410      cache-references                                                     (75.11%)
         327996655      cache-misses                     #   55.600 % of all cache refs      (75.11%)
        6616784604      L1-dcache-loads                                                      (75.11%)
         880862425      L1-dcache-load-misses            #   13.31% of all L1-dcache accesses  (75.11%)
         201998979      LLC-loads                                                            (49.81%)
          91945353      LLC-load-misses                  #   45.52% of all LL-cache accesses  (50.00%)

      16.036173799 seconds time elapsed

      14.416051000 seconds user
       1.664161000 seconds sys

With prefetch

$ perf stat -e cycles,instructions,cache-references,cache-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses datafusion-cli --command "
set datafusion.execution.coalesce_batches=false;
set datafusion.optimizer.repartition_aggregations=false;
set datafusion.optimizer.enable_round_robin_repartition=false;
set datafusion.execution.agg_prefetch_elements=1;
set datafusion.execution.agg_prefetch_locality=3;
set datafusion.execution.agg_prefetch_read=false;
select value from range(0,100000000) group by value, value + 1;
"
DataFusion CLI v51.0.0
0 row(s) fetched.
Elapsed 0.001 seconds.

0 row(s) fetched.
Elapsed 0.000 seconds.

0 row(s) fetched.
Elapsed 0.000 seconds.

0 row(s) fetched.
Elapsed 0.000 seconds.

0 row(s) fetched.
Elapsed 0.000 seconds.

0 row(s) fetched.
Elapsed 0.000 seconds.

+-------+
| value |
+-------+
| 0     |
| 1     |
| 2     |
| 3     |
| 4     |
| 5     |
| 6     |
| 7     |
| 8     |
| 9     |
| 10    |
| 11    |
| 12    |
| 13    |
| 14    |
| 15    |
| 16    |
| 17    |
| 18    |
| 19    |
| 20    |
| 21    |
| 22    |
| 23    |
| 24    |
| 25    |
| 26    |
| 27    |
| 28    |
| 29    |
| 30    |
| 31    |
| 32    |
| 33    |
| 34    |
| 35    |
| 36    |
| 37    |
| 38    |
| 39    |
| .     |
| .     |
| .     |
+-------+
100000000 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 10.176 seconds.


 Performance counter stats for 'datafusion-cli --command
set datafusion.execution.coalesce_batches=false;
set datafusion.optimizer.repartition_aggregations=false;
set datafusion.optimizer.enable_round_robin_repartition=false;
set datafusion.execution.agg_prefetch_elements=1;
set datafusion.execution.agg_prefetch_locality=3;
set datafusion.execution.agg_prefetch_read=false;
select value from range(0,100000000) group by value, value + 1;
':

       39771220883      cycles                                                               (62.54%)
       31973887904      instructions                     #    0.80  insn per cycle           (75.06%)
         633535114      cache-references                                                     (75.18%)
         374760871      cache-misses                     #   59.154 % of all cache refs      (75.18%)
        6598313098      L1-dcache-loads                                                      (75.18%)
         920200907      L1-dcache-load-misses            #   13.95% of all L1-dcache accesses  (75.18%)
         101659922      LLC-loads                                                            (49.66%)
          25681158      LLC-load-misses                  #   25.26% of all LL-cache accesses  (49.84%)

      10.204965327 seconds time elapsed

       8.432703000 seconds user
       1.806306000 seconds sys

rluvaton · 2025-12-28T11:56:05Z

@alamb, @Dandandan you might be interested in this (not ready for review, just the general idea)

rluvaton · 2025-12-28T11:56:41Z

datafusion/common/src/config.rs

+        pub agg_prefetch_elements: usize, default = 1
+        pub agg_prefetch_locality: usize, default = 3
+        pub agg_prefetch_read: bool, default = false


Added config to allow for easy tinkering while prototyping

rluvaton · 2025-12-28T11:58:04Z

datafusion/physical-plan/src/aggregates/group_values/multi_group_by/mod.rs

+        match self.agg_prefetch_elements {
+            0 => self.collect_vectorized_process_context_with_prefetch::<0>(batch_hashes, groups),
+            1 => self.collect_vectorized_process_context_with_prefetch::<1>(batch_hashes, groups),
+            2 => self.collect_vectorized_process_context_with_prefetch::<2>(batch_hashes, groups),
+            3 => self.collect_vectorized_process_context_with_prefetch::<3>(batch_hashes, groups),
+            4 => self.collect_vectorized_process_context_with_prefetch::<4>(batch_hashes, groups),
+            5 => self.collect_vectorized_process_context_with_prefetch::<5>(batch_hashes, groups),
+            6 => self.collect_vectorized_process_context_with_prefetch::<6>(batch_hashes, groups),
+            7 => self.collect_vectorized_process_context_with_prefetch::<7>(batch_hashes, groups),
+            8 => self.collect_vectorized_process_context_with_prefetch::<8>(batch_hashes, groups),
+            _ => self.collect_vectorized_process_context_with_prefetch::<8>(batch_hashes, groups),
+        }
+    }


Added all of these to make sure when I test specific config I'm not affected by the condition, will not have this later

rluvaton · 2025-12-28T11:58:42Z

datafusion/physical-plan/src/aggregates/group_values/multi_group_by/mod.rs

        }
    }

+    #[inline(always)]


Inline always to make sure my change for extracting to a function is not affecting the prev case

rluvaton · 2025-12-28T11:59:46Z

datafusion/physical-plan/src/aggregates/group_values/multi_group_by/mod.rs

+                for i in 1..=PREFETCH {
+                    if READ {
+                        self.map.prefetch_read::<LOCALITY>(batch_hashes[row + i]);
+                    } else {
+                        self.map.prefetch_write::<LOCALITY>(batch_hashes[row + i]);
+                    }
+                }


I prefetch the same item multiple times but I should prefetch before the loop and then only prefetch the row + PREFETCH

rluvaton · 2025-12-28T12:06:16Z

run benchmarks

alamb-ghbot · 2025-12-28T12:06:23Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing add-prefetching (b2c9125) to 6ce2374 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

rluvaton · 2025-12-28T12:11:29Z

Not sure how much the existing benchmarks will show improvement as they are not testing a lot of unique values

alamb-ghbot · 2025-12-28T12:44:32Z

🤖: Benchmark completed

Details

Comparing HEAD and add-prefetching
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ add-prefetching ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2597.16 ms │      2596.57 ms │ no change │
│ QQuery 1     │  1050.22 ms │      1079.37 ms │ no change │
│ QQuery 2     │  2064.00 ms │      2054.74 ms │ no change │
│ QQuery 3     │  1164.05 ms │      1153.58 ms │ no change │
│ QQuery 4     │  2245.53 ms │      2233.96 ms │ no change │
│ QQuery 5     │ 28134.78 ms │     28343.58 ms │ no change │
│ QQuery 6     │  3910.45 ms │      3842.06 ms │ no change │
│ QQuery 7     │  3673.88 ms │      3548.20 ms │ no change │
└──────────────┴─────────────┴─────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary              ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)              │ 44840.07ms │
│ Total Time (add-prefetching)   │ 44852.06ms │
│ Average Time (HEAD)            │  5605.01ms │
│ Average Time (add-prefetching) │  5606.51ms │
│ Queries Faster                 │          0 │
│ Queries Slower                 │          0 │
│ Queries with No Change         │          8 │
│ Queries with Failure           │          0 │
└────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ add-prefetching ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.05 ms │         2.42 ms │  1.18x slower │
│ QQuery 1     │    51.44 ms │        50.51 ms │     no change │
│ QQuery 2     │   133.98 ms │       135.72 ms │     no change │
│ QQuery 3     │   152.62 ms │       148.08 ms │     no change │
│ QQuery 4     │  1067.74 ms │      1083.09 ms │     no change │
│ QQuery 5     │  1448.88 ms │      1555.15 ms │  1.07x slower │
│ QQuery 6     │     2.04 ms │         2.07 ms │     no change │
│ QQuery 7     │    53.63 ms │        60.52 ms │  1.13x slower │
│ QQuery 8     │  1436.39 ms │      1341.37 ms │ +1.07x faster │
│ QQuery 9     │  1839.98 ms │      1804.70 ms │     no change │
│ QQuery 10    │   343.10 ms │       343.72 ms │     no change │
│ QQuery 11    │   396.00 ms │       395.75 ms │     no change │
│ QQuery 12    │  1339.51 ms │      1318.11 ms │     no change │
│ QQuery 13    │  1969.60 ms │      1948.27 ms │     no change │
│ QQuery 14    │  1241.02 ms │      1187.87 ms │     no change │
│ QQuery 15    │  1215.91 ms │      1197.05 ms │     no change │
│ QQuery 16    │  2530.44 ms │      2419.33 ms │     no change │
│ QQuery 17    │  2493.16 ms │      2383.77 ms │     no change │
│ QQuery 18    │  5085.46 ms │      4446.03 ms │ +1.14x faster │
│ QQuery 19    │   120.72 ms │       121.58 ms │     no change │
│ QQuery 20    │  1922.37 ms │      1876.68 ms │     no change │
│ QQuery 21    │  2217.08 ms │      2184.14 ms │     no change │
│ QQuery 22    │  3772.43 ms │      3736.04 ms │     no change │
│ QQuery 23    │ 12214.15 ms │     12101.57 ms │     no change │
│ QQuery 24    │   214.88 ms │       212.32 ms │     no change │
│ QQuery 25    │   467.60 ms │       445.38 ms │     no change │
│ QQuery 26    │   215.69 ms │       204.74 ms │ +1.05x faster │
│ QQuery 27    │  2723.32 ms │      2683.31 ms │     no change │
│ QQuery 28    │ 21687.73 ms │     21800.58 ms │     no change │
│ QQuery 29    │   950.17 ms │       964.19 ms │     no change │
│ QQuery 30    │  1324.81 ms │      1291.80 ms │     no change │
│ QQuery 31    │  1346.07 ms │      1340.11 ms │     no change │
│ QQuery 32    │  5013.15 ms │      4309.69 ms │ +1.16x faster │
│ QQuery 33    │  5852.38 ms │      5541.85 ms │ +1.06x faster │
│ QQuery 34    │  5982.69 ms │      5631.92 ms │ +1.06x faster │
│ QQuery 35    │  1897.79 ms │      1803.74 ms │     no change │
│ QQuery 36    │    67.22 ms │        64.60 ms │     no change │
│ QQuery 37    │    45.20 ms │        44.23 ms │     no change │
│ QQuery 38    │    66.21 ms │        66.11 ms │     no change │
│ QQuery 39    │   102.91 ms │       102.70 ms │     no change │
│ QQuery 40    │    26.75 ms │        26.33 ms │     no change │
│ QQuery 41    │    23.86 ms │        23.56 ms │     no change │
│ QQuery 42    │    20.13 ms │        19.24 ms │     no change │
└──────────────┴─────────────┴─────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary              ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)              │ 91078.26ms │
│ Total Time (add-prefetching)   │ 88419.97ms │
│ Average Time (HEAD)            │  2118.10ms │
│ Average Time (add-prefetching) │  2056.28ms │
│ Queries Faster                 │          6 │
│ Queries Slower                 │          3 │
│ Queries with No Change         │         34 │
│ Queries with Failure           │          0 │
└────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ add-prefetching ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 119.60 ms │       112.35 ms │ +1.06x faster │
│ QQuery 2     │  28.30 ms │        28.64 ms │     no change │
│ QQuery 3     │  37.32 ms │        36.11 ms │     no change │
│ QQuery 4     │  28.46 ms │        28.60 ms │     no change │
│ QQuery 5     │  85.59 ms │        87.68 ms │     no change │
│ QQuery 6     │  19.77 ms │        19.80 ms │     no change │
│ QQuery 7     │ 230.49 ms │       228.34 ms │     no change │
│ QQuery 8     │  38.10 ms │        35.78 ms │ +1.06x faster │
│ QQuery 9     │ 109.67 ms │       105.37 ms │     no change │
│ QQuery 10    │  61.56 ms │        62.73 ms │     no change │
│ QQuery 11    │  18.42 ms │        18.51 ms │     no change │
│ QQuery 12    │  50.05 ms │        51.47 ms │     no change │
│ QQuery 13    │  47.45 ms │        47.16 ms │     no change │
│ QQuery 14    │  13.64 ms │        13.52 ms │     no change │
│ QQuery 15    │  24.51 ms │        24.17 ms │     no change │
│ QQuery 16    │  24.21 ms │        24.98 ms │     no change │
│ QQuery 17    │ 150.45 ms │       146.03 ms │     no change │
│ QQuery 18    │ 276.08 ms │       278.31 ms │     no change │
│ QQuery 19    │  37.83 ms │        37.81 ms │     no change │
│ QQuery 20    │  49.01 ms │        51.00 ms │     no change │
│ QQuery 21    │ 309.27 ms │       308.25 ms │     no change │
│ QQuery 22    │  17.65 ms │        16.99 ms │     no change │
└──────────────┴───────────┴─────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary              ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)              │ 1777.43ms │
│ Total Time (add-prefetching)   │ 1763.63ms │
│ Average Time (HEAD)            │   80.79ms │
│ Average Time (add-prefetching) │   80.16ms │
│ Queries Faster                 │         2 │
│ Queries Slower                 │         0 │
│ Queries with No Change         │        20 │
│ Queries with Failure           │         0 │
└────────────────────────────────┴───────────┘

rluvaton added 4 commits December 25, 2025 21:58

add benchmarks for large schema

5516fb1

add prefetching for hash aggregate for large maps

e02d5f6

change defaults

3c64610

revert unrelated

b2c9125

rluvaton marked this pull request as draft December 28, 2025 11:54

github-actions bot added common Related to common crate physical-plan Changes to the physical-plan crate labels Dec 28, 2025

rluvaton added the performance Make DataFusion faster label Dec 28, 2025

rluvaton commented Dec 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: add prefetching for aggregate multi group by [WIP] #19520

perf: add prefetching for aggregate multi group by [WIP] #19520

Uh oh!

rluvaton commented Dec 28, 2025 •

edited

Loading

Uh oh!

rluvaton commented Dec 28, 2025

Uh oh!

rluvaton Dec 28, 2025

Uh oh!

rluvaton Dec 28, 2025

Uh oh!

rluvaton Dec 28, 2025

Uh oh!

rluvaton Dec 28, 2025

Uh oh!

rluvaton commented Dec 28, 2025

Uh oh!

alamb-ghbot commented Dec 28, 2025

Uh oh!

rluvaton commented Dec 28, 2025

Uh oh!

alamb-ghbot commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: add prefetching for aggregate multi group by [WIP] #19520

Are you sure you want to change the base?

perf: add prefetching for aggregate multi group by [WIP] #19520

Uh oh!

Conversation

rluvaton commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Env

Without prefetch

With prefetch

Uh oh!

rluvaton commented Dec 28, 2025

Uh oh!

rluvaton Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

rluvaton commented Dec 28, 2025

Uh oh!

alamb-ghbot commented Dec 28, 2025

Uh oh!

rluvaton commented Dec 28, 2025

Uh oh!

alamb-ghbot commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rluvaton commented Dec 28, 2025 •

edited

Loading