-
Notifications
You must be signed in to change notification settings - Fork 1.9k
perf: add prefetching for aggregate multi group by [WIP] #19520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@alamb, @Dandandan you might be interested in this (not ready for review, just the general idea) |
| pub agg_prefetch_elements: usize, default = 1 | ||
| pub agg_prefetch_locality: usize, default = 3 | ||
| pub agg_prefetch_read: bool, default = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added config to allow for easy tinkering while prototyping
| match self.agg_prefetch_elements { | ||
| 0 => self.collect_vectorized_process_context_with_prefetch::<0>(batch_hashes, groups), | ||
| 1 => self.collect_vectorized_process_context_with_prefetch::<1>(batch_hashes, groups), | ||
| 2 => self.collect_vectorized_process_context_with_prefetch::<2>(batch_hashes, groups), | ||
| 3 => self.collect_vectorized_process_context_with_prefetch::<3>(batch_hashes, groups), | ||
| 4 => self.collect_vectorized_process_context_with_prefetch::<4>(batch_hashes, groups), | ||
| 5 => self.collect_vectorized_process_context_with_prefetch::<5>(batch_hashes, groups), | ||
| 6 => self.collect_vectorized_process_context_with_prefetch::<6>(batch_hashes, groups), | ||
| 7 => self.collect_vectorized_process_context_with_prefetch::<7>(batch_hashes, groups), | ||
| 8 => self.collect_vectorized_process_context_with_prefetch::<8>(batch_hashes, groups), | ||
| _ => self.collect_vectorized_process_context_with_prefetch::<8>(batch_hashes, groups), | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added all of these to make sure when I test specific config I'm not affected by the condition, will not have this later
| } | ||
| } | ||
|
|
||
| #[inline(always)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline always to make sure my change for extracting to a function is not affecting the prev case
| for i in 1..=PREFETCH { | ||
| if READ { | ||
| self.map.prefetch_read::<LOCALITY>(batch_hashes[row + i]); | ||
| } else { | ||
| self.map.prefetch_write::<LOCALITY>(batch_hashes[row + i]); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefetch the same item multiple times but I should prefetch before the loop and then only prefetch the row + PREFETCH
|
run benchmarks |
|
🤖 |
|
Not sure how much the existing benchmarks will show improvement as they are not testing a lot of unique values |
|
🤖: Benchmark completed Details
|
Which issue does this PR close?
N/A
Rationale for this change
grouping on large amount of data is very slow
What changes are included in this PR?
used my fork of hashbrown with prefetching and added prefetching when map is large
Are these changes tested?
No
Are there any user-facing changes?
No
Related hashbrown issue for allowing prefetching:
My hashbrown changes:
my hashbrown changes are not very safe and probably have bugs but just to make it quick and dirty to see the benefit
Command:
datafusion-cli --command "select value from range(0,100000000) group by value, value + 1;"When the hashmap is smaller, the prefetching is unneeded as it should already fit in the cache
with the following config:
Without prefetch: 16.002 seconds
With prefetch: 10.176 seconds
Env
Machine:
c5.metalResults
Without prefetch
With prefetch