Skip to content

[Feature] Add SP support for roll_sequence_context in MTP#1629

Open
HAOCHENYE wants to merge 3 commits intogh/HAOCHENYE/22/basefrom
gh/HAOCHENYE/22/head
Open

[Feature] Add SP support for roll_sequence_context in MTP#1629
HAOCHENYE wants to merge 3 commits intogh/HAOCHENYE/22/basefrom
gh/HAOCHENYE/22/head

Conversation

@HAOCHENYE
Copy link
Copy Markdown
Collaborator

@HAOCHENYE HAOCHENYE commented Mar 24, 2026

Stack from ghstack (oldest at bottom):


  • Add raw_input_ids, raw_inputs_embeds, raw_position_ids,
    raw_rollout_routed_experts properties to SequenceContext for
    reconstructing full tensors from SP shards
  • Store raw_input_ids (full padded tensor), shard_start, shard_size
    in SequenceContext.split() for zero-communication input_ids rolling
  • raw_inputs_embeds triggers a single allgather on first access and
    caches the result, amortising communication across MTP layers
  • roll_sequence_context: remove SP assert; always operate on full
    tensors via raw_* properties, slice to local shard only when in SP

[ghstack-poisoned]
HAOCHENYE added a commit that referenced this pull request Mar 24, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids,
  raw_rollout_routed_experts properties to SequenceContext for
  reconstructing full tensors from SP shards
- Store raw_input_ids (full padded tensor), shard_start, shard_size
  in SequenceContext.split() for zero-communication input_ids rolling
- raw_inputs_embeds triggers a single allgather on first access and
  caches the result, amortising communication across MTP layers
- roll_sequence_context: remove SP assert; always operate on full
  tensors via raw_* properties, slice to local shard only when in SP


ghstack-source-id: 2d574e3
Pull-Request: #1629
@HAOCHENYE HAOCHENYE closed this Mar 24, 2026
HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 24, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids,
  raw_rollout_routed_experts properties to SequenceContext for
  reconstructing full tensors from SP shards
- Store raw_input_ids (full padded tensor), shard_start, shard_size
  in SequenceContext.split() for zero-communication input_ids rolling
- raw_inputs_embeds triggers a single allgather on first access and
  caches the result, amortising communication across MTP layers
- roll_sequence_context: remove SP assert; always operate on full
  tensors via raw_* properties, slice to local shard only when in SP


ghstack-source-id: 2d574e3
Pull-Request: InternLM#1629
HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 24, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids,
  raw_rollout_routed_experts properties to SequenceContext for
  reconstructing full tensors from SP shards
- Store raw_input_ids (full padded tensor), shard_start, shard_size
  in SequenceContext.split() for zero-communication input_ids rolling
- raw_inputs_embeds triggers a single allgather on first access and
  caches the result, amortising communication across MTP layers
- roll_sequence_context: remove SP assert; always operate on full
  tensors via raw_* properties, slice to local shard only when in SP


ghstack-source-id: 2d574e3
Pull-Request: InternLM#1629
@HAOCHENYE HAOCHENYE reopened this Mar 24, 2026
[ghstack-poisoned]
HAOCHENYE added a commit that referenced this pull request Mar 24, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids,
  raw_rollout_routed_experts properties to SequenceContext for
  reconstructing full tensors from SP shards
- Store raw_input_ids (full padded tensor), shard_start, shard_size
  in SequenceContext.split() for zero-communication input_ids rolling
- raw_inputs_embeds triggers a single allgather on first access and
  caches the result, amortising communication across MTP layers
- roll_sequence_context: remove SP assert; always operate on full
  tensors via raw_* properties, slice to local shard only when in SP


ghstack-source-id: cc60a14
Pull-Request: #1629
HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 25, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids,
  raw_rollout_routed_experts properties to SequenceContext for
  reconstructing full tensors from SP shards
- Store raw_input_ids (full padded tensor), shard_start, shard_size
  in SequenceContext.split() for zero-communication input_ids rolling
- raw_inputs_embeds triggers a single allgather on first access and
  caches the result, amortising communication across MTP layers
- roll_sequence_context: remove SP assert; always operate on full
  tensors via raw_* properties, slice to local shard only when in SP


ghstack-source-id: cc60a14
Pull-Request: InternLM#1629
[ghstack-poisoned]
HAOCHENYE added a commit that referenced this pull request Mar 25, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids,
  raw_rollout_routed_experts properties to SequenceContext for
  reconstructing full tensors from SP shards
- Store raw_input_ids (full padded tensor), shard_start, shard_size
  in SequenceContext.split() for zero-communication input_ids rolling
- raw_inputs_embeds triggers a single allgather on first access and
  caches the result, amortising communication across MTP layers
- roll_sequence_context: remove SP assert; always operate on full
  tensors via raw_* properties, slice to local shard only when in SP


ghstack-source-id: 79251cf
Pull-Request: #1629
HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 26, 2026
- Add raw_input_ids, raw_inputs_embeds, raw_position_ids,
  raw_rollout_routed_experts properties to SequenceContext for
  reconstructing full tensors from SP shards
- Store raw_input_ids (full padded tensor), shard_start, shard_size
  in SequenceContext.split() for zero-communication input_ids rolling
- raw_inputs_embeds triggers a single allgather on first access and
  caches the result, amortising communication across MTP layers
- roll_sequence_context: remove SP assert; always operate on full
  tensors via raw_* properties, slice to local shard only when in SP


ghstack-source-id: 79251cf
Pull-Request: InternLM#1629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant