-
Notifications
You must be signed in to change notification settings - Fork 30
Allow any multiple of 64 during chunked prefill #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
…atch Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
|
This PR will replace #160 |
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
|
bot:test |
| # add the valid prompt size to the end since it will already exist in the above enforce_sizes | ||
| possible_seq_lengths = possible_seq_lengths + [ | ||
| valid_prompt_shape[1] | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we adding this but we weren't before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it is not needed, but if we cycle through the sequence lengths under a certain program, there is no reason why we are skipping the largest size. This is an artifact of the prior PR which had all sequences of prefill_chunk size. I think this can be removed in this PR, though it makes sense to add it in general
| if chunk_j == 0: | ||
| chunk_start = 0 | ||
| chunk_end = prefill_chunk_size - required_extra_pads | ||
| else: | ||
| required_extra_pads = 0 | ||
| chunk_start = chunk_end | ||
| chunk_end += prefill_chunk_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what chunk_start and chunk_end mean here, but I don't think they're the best names for these variables, as they are more of a mapping between the original sequence and its chunk partition. I don't know what would be a better name, maybe just a comment explaining what they are?
aiu_fms_testing_utils/utils/paged.py
Outdated
| position_ids_seq_chunk = kwargs["position_ids"][seq_i][ | ||
| chunk_start:chunk_end | ||
| ] | ||
| if required_extra_pads > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 50-50 on whether it's cleaner to centralize all the "if required_extra_pads > 0" into a single one or keep it as is. I was thinking maybe create all the {property}_seq_chunk first, then do all the padding, and finally turn them into unsqueezed tensors might make the code cleaner, but idk
ani300
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments on code clarity for future us, but the logic looks good if test_scripts passes
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
|
bot:test |
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
|
bot:test |
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
|
bot:test |
This PR will allow prompts with any multiple of 64 during chunked prefill while ensuring prefill size chunks