Skip to content

Conversation

@JRosenkranz
Copy link
Contributor

This PR will allow prompts with any multiple of 64 during chunked prefill while ensuring prefill size chunks

ani300 and others added 11 commits October 30, 2025 01:27
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
…atch

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@JRosenkranz JRosenkranz marked this pull request as ready for review November 13, 2025 13:55
@JRosenkranz JRosenkranz requested a review from ani300 November 13, 2025 13:55
@JRosenkranz
Copy link
Contributor Author

This PR will replace #160

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@JRosenkranz
Copy link
Contributor Author

bot:test
TEST_FILE=test_scripts.py

Comment on lines 590 to 593
# add the valid prompt size to the end since it will already exist in the above enforce_sizes
possible_seq_lengths = possible_seq_lengths + [
valid_prompt_shape[1]
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we adding this but we weren't before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it is not needed, but if we cycle through the sequence lengths under a certain program, there is no reason why we are skipping the largest size. This is an artifact of the prior PR which had all sequences of prefill_chunk size. I think this can be removed in this PR, though it makes sense to add it in general

Comment on lines +321 to +327
if chunk_j == 0:
chunk_start = 0
chunk_end = prefill_chunk_size - required_extra_pads
else:
required_extra_pads = 0
chunk_start = chunk_end
chunk_end += prefill_chunk_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what chunk_start and chunk_end mean here, but I don't think they're the best names for these variables, as they are more of a mapping between the original sequence and its chunk partition. I don't know what would be a better name, maybe just a comment explaining what they are?

position_ids_seq_chunk = kwargs["position_ids"][seq_i][
chunk_start:chunk_end
]
if required_extra_pads > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 50-50 on whether it's cleaner to centralize all the "if required_extra_pads > 0" into a single one or keep it as is. I was thinking maybe create all the {property}_seq_chunk first, then do all the padding, and finally turn them into unsqueezed tensors might make the code cleaner, but idk

Copy link
Contributor

@ani300 ani300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments on code clarity for future us, but the logic looks good if test_scripts passes

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@JRosenkranz
Copy link
Contributor Author

bot:test
TEST_FILE=test_scripts.py

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@JRosenkranz
Copy link
Contributor Author

bot:test
TEST_FILE=test_scripts.py

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
@JRosenkranz
Copy link
Contributor Author

bot:test
TEST_FILE=test_scripts.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants