[Feature] Add SP support for roll_sequence_context in MTP#1629
Open
HAOCHENYE wants to merge 3 commits intogh/HAOCHENYE/22/basefrom
Open
[Feature] Add SP support for roll_sequence_context in MTP#1629HAOCHENYE wants to merge 3 commits intogh/HAOCHENYE/22/basefrom
HAOCHENYE wants to merge 3 commits intogh/HAOCHENYE/22/basefrom
Conversation
HAOCHENYE
added a commit
that referenced
this pull request
Mar 24, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: 2d574e3 Pull-Request: #1629
This was referenced Mar 24, 2026
Open
HAOCHENYE
added a commit
to HAOCHENYE/xtuner
that referenced
this pull request
Mar 24, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: 2d574e3 Pull-Request: InternLM#1629
HAOCHENYE
added a commit
to HAOCHENYE/xtuner
that referenced
this pull request
Mar 24, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: 2d574e3 Pull-Request: InternLM#1629
HAOCHENYE
added a commit
that referenced
this pull request
Mar 24, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: cc60a14 Pull-Request: #1629
HAOCHENYE
added a commit
to HAOCHENYE/xtuner
that referenced
this pull request
Mar 25, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: cc60a14 Pull-Request: InternLM#1629
HAOCHENYE
added a commit
that referenced
this pull request
Mar 25, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: 79251cf Pull-Request: #1629
HAOCHENYE
added a commit
to HAOCHENYE/xtuner
that referenced
this pull request
Mar 26, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: 79251cf Pull-Request: InternLM#1629
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
raw_rollout_routed_experts properties to SequenceContext for
reconstructing full tensors from SP shards
in SequenceContext.split() for zero-communication input_ids rolling
caches the result, amortising communication across MTP layers
tensors via raw_* properties, slice to local shard only when in SP