[Feature] Add Multi-Token Prediction (MTP) module implementation by HAOCHENYE · Pull Request #1570 · InternLM/xtuner

HAOCHENYE · 2026-03-12T20:38:53Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

ghstack-source-id: 1b45af1 Pull-Request: InternLM#1570

[ghstack-poisoned]

ghstack-source-id: 2d84ad6 Pull-Request: InternLM#1570

[ghstack-poisoned]

ghstack-source-id: db84856 Pull-Request: InternLM#1570

[ghstack-poisoned]

Kirrito-k423

Thanks for this well-structured MTP implementation! The overall architecture is clean with good separation of concerns (MTPConfig, MTPLayer, MTPBlock). I have a few important issues and suggestions below.

Kirrito-k423 · 2026-03-25T01:24:24Z

xtuner/v1/module/mtp/mtp_layer.py

+
+        # Step 3: Pass through the standard decoder layer
+        # This includes attention, MLP, and their respective normalizations
+        # TODO: TMP hardcode here.


🟠 Important: This TODO comment indicates unfinished logic. What is hardcoded here? If this is a temporary workaround, please clarify what needs to be done before merging, or create a follow-up issue.

Kirrito-k423 · 2026-03-25T01:24:24Z

xtuner/v1/model/moe/qwen3_5_text.py

+            # xtuner: mtp_block.layers.{idx}.enorm -> HF: mtp.pre_fc_norm_embedding
+            # xtuner: mtp_block.layers.{idx}.hnorm -> HF: mtp.pre_fc_norm_hidden
+            # xtuner: mtp_block.layers.{idx}.final_layernorm -> HF: mtp.norm
+            # Note: Currently assuming single MTP layer (idx=0), may need adjustment for multiple layers


🟠 Important: The comment indicates this assumes single MTP layer, but MTPConfig.num_layers supports multiple layers. For num_layers > 1, the key mappings will be incorrect. Please either add validation to reject num_layers > 1 if not supported, or implement correct multi-layer key mapping.

Kirrito-k423 · 2026-03-25T01:24:24Z

xtuner/v1/module/mtp/mtp_block.py

+                attention mask, etc.
+
+        Returns:
+            list[tuple[torch.Tensor, torch.Tensor, torch.Tensor]]: List of 3-tuples


🟡 Suggestion: The return type annotation says list[tuple[...]], but the variable mtp_outputs and the append logic could be clearer. At line 105, mtp_outputs.append(current_hidden_states) appends a tuple but the variable name suggests a single tensor.

Kirrito-k423 · 2026-03-25T01:24:24Z

xtuner/v1/model/moe/moe.py

+            )
+
+            # Compute MTP losses for each depth
+            mtp_losses = torch.tensor(0.0, device=DEVICE)


🟡 Suggestion: Creating a tensor with torch.tensor(0.0, device=DEVICE) inside a loop can be inefficient. Consider initializing outside the loop with torch.zeros(1, device=DEVICE).

ghstack-source-id: 8c98b3a Pull-Request: InternLM#1570

[ghstack-poisoned]

ghstack-source-id: e63ad27 Pull-Request: InternLM#1570

Update

c94440c

[ghstack-poisoned]

This was referenced Mar 12, 2026

[Refactor] Refactor loss context API to support multiple loss types #1569

Open

[Refactor] Rename CELossContext to LMHeadLossContext and refactor loss context base class #1571

Open

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 13, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

f48a247

ghstack-source-id: 1b45af1 Pull-Request: InternLM#1570

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 13, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

6e650e8

ghstack-source-id: 1b45af1 Pull-Request: InternLM#1570

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 16, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

c6ec530

ghstack-source-id: 1b45af1 Pull-Request: InternLM#1570

Update

83dc7d3

[ghstack-poisoned]

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 17, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

20d74f3

ghstack-source-id: 2d84ad6 Pull-Request: InternLM#1570

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 20, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

d12a07d

ghstack-source-id: 2d84ad6 Pull-Request: InternLM#1570

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 20, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

a0b3756

ghstack-source-id: 2d84ad6 Pull-Request: InternLM#1570

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 20, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

4a26241

ghstack-source-id: 2d84ad6 Pull-Request: InternLM#1570

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 22, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

5b3eca0

ghstack-source-id: 2d84ad6 Pull-Request: InternLM#1570

Update

8eb1e98

[ghstack-poisoned]

Update

4edabe8

[ghstack-poisoned]

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 24, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

07ec796

ghstack-source-id: db84856 Pull-Request: InternLM#1570

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 24, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

f36fb44

ghstack-source-id: db84856 Pull-Request: InternLM#1570

Update

4fece22

[ghstack-poisoned]

Kirrito-k423 reviewed Mar 25, 2026

View reviewed changes

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 25, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

18e64ae

ghstack-source-id: 8c98b3a Pull-Request: InternLM#1570

Update

ed04b57

[ghstack-poisoned]

HAOCHENYE added a commit to HAOCHENYE/xtuner that referenced this pull request Mar 26, 2026

[Feature] Add Multi-Token Prediction (MTP) module implementation

e8ebcc8

ghstack-source-id: e63ad27 Pull-Request: InternLM#1570

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add Multi-Token Prediction (MTP) module implementation#1570

[Feature] Add Multi-Token Prediction (MTP) module implementation#1570
HAOCHENYE wants to merge 6 commits intogh/HAOCHENYE/17/basefrom
gh/HAOCHENYE/17/head

HAOCHENYE commented Mar 12, 2026 •

edited

Loading

Uh oh!

Kirrito-k423 left a comment

Uh oh!

Kirrito-k423 Mar 25, 2026

Uh oh!

Kirrito-k423 Mar 25, 2026

Uh oh!

Kirrito-k423 Mar 25, 2026

Uh oh!

Kirrito-k423 Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HAOCHENYE commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kirrito-k423 left a comment

Choose a reason for hiding this comment

Uh oh!

Kirrito-k423 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Kirrito-k423 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Kirrito-k423 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Kirrito-k423 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HAOCHENYE commented Mar 12, 2026 •

edited

Loading