[Feature] Add SP support for roll_sequence_context in MTP by HAOCHENYE · Pull Request #1629 · InternLM/xtuner

HAOCHENYE · 2026-03-24T12:05:23Z

Stack from ghstack (oldest at bottom):

Add raw_input_ids, raw_inputs_embeds, raw_position_ids,
raw_rollout_routed_experts properties to SequenceContext for
reconstructing full tensors from SP shards
Store raw_input_ids (full padded tensor), shard_start, shard_size
in SequenceContext.split() for zero-communication input_ids rolling
raw_inputs_embeds triggers a single allgather on first access and
caches the result, amortising communication across MTP layers
roll_sequence_context: remove SP assert; always operate on full
tensors via raw_* properties, slice to local shard only when in SP

[ghstack-poisoned]

- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: 2d574e3 Pull-Request: #1629

- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: 2d574e3 Pull-Request: InternLM#1629

[ghstack-poisoned]

- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: cc60a14 Pull-Request: #1629

- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: cc60a14 Pull-Request: InternLM#1629

[ghstack-poisoned]

- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: 79251cf Pull-Request: #1629

- Add raw_input_ids, raw_inputs_embeds, raw_position_ids, raw_rollout_routed_experts properties to SequenceContext for reconstructing full tensors from SP shards - Store raw_input_ids (full padded tensor), shard_start, shard_size in SequenceContext.split() for zero-communication input_ids rolling - raw_inputs_embeds triggers a single allgather on first access and caches the result, amortising communication across MTP layers - roll_sequence_context: remove SP assert; always operate on full tensors via raw_* properties, slice to local shard only when in SP ghstack-source-id: 79251cf Pull-Request: InternLM#1629

Update

d22e976

[ghstack-poisoned]

HAOCHENYE closed this Mar 24, 2026

HAOCHENYE reopened this Mar 24, 2026

Update

97a1635

[ghstack-poisoned]

Update

2268b94

[ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add SP support for roll_sequence_context in MTP#1629

[Feature] Add SP support for roll_sequence_context in MTP#1629
HAOCHENYE wants to merge 3 commits intogh/HAOCHENYE/22/basefrom
gh/HAOCHENYE/22/head

HAOCHENYE commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HAOCHENYE commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HAOCHENYE commented Mar 24, 2026 •

edited

Loading