[bugfix]Fix Megatron model device placement when `use_cpu_initialization` is enabled by ShiroNyaa · Pull Request #9446 · modelscope/ms-swift

ShiroNyaa · 2026-05-29T09:12:11Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Fix Megatron model device placement when use_cpu_initialization is enabled.

Previously, wrap_model only moved model modules to CUDA when args.use_cpu_initialization was disabled. With CPU initialization enabled, some modules could remain on CPU before fp16/DDP wrapping, which may cause device mismatch errors during training, especially in LoRA tuning where frozen base modules such as embeddings are still used in forward.

This change moves the model to the current CUDA device when args.use_cpu_initialization is enabled, avoiding CPU/GPU tensor mismatch during Megatron training.

Experiment results

Before this change, LoRA training with use_cpu_initialization=True could fail with errors like:

RuntimeError: Expected all tensors to be on the same device, but got indices is on cuda:1, different from other tensors on cpu

gemini-code-assist

Code Review

This pull request modifies the condition under which the model is moved to the CUDA device, changing it from when CPU initialization is disabled to when it is enabled. The reviewer suggests moving the model to the GPU unconditionally instead, which would align with Megatron-LM's standard behavior and ensure that all CPU-initialized parameters are correctly transferred to the GPU.

gemini-code-assist · 2026-05-29T09:13:03Z

+        if args.use_cpu_initialization:
            m.cuda(torch.cuda.current_device())


Instead of conditionally moving the model to the CUDA device only when use_cpu_initialization is enabled, it is safer and more robust to move the model to the current CUDA device unconditionally. This aligns with Megatron-LM's standard behavior and ensures that any parameters initialized on the CPU (such as newly added adapter weights or embeddings) are correctly transferred to the GPU regardless of the initialization setting.

m.cuda(torch.cuda.current_device())

Jintao-Huang · 2026-05-30T15:23:33Z

if args.use_cpu_initialization:

Let's remove this.

[bugfix]修复mcore后端cpu权重初始化未将权重移动至显存问题

d74068e

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

Jintao-Huang approved these changes May 30, 2026

View reviewed changes

[bugfix]修复mcore后端cpu权重初始化未将权重移动至显存问题

a920f35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix]Fix Megatron model device placement when `use_cpu_initialization` is enabled#9446

[bugfix]Fix Megatron model device placement when `use_cpu_initialization` is enabled#9446
ShiroNyaa wants to merge 2 commits into
modelscope:mainfrom
ShiroNyaa:fix/mcore_cpu_initialization

ShiroNyaa commented May 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

Jintao-Huang commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if args.use_cpu_initialization:
		m.cuda(torch.cuda.current_device())

Conversation

ShiroNyaa commented May 29, 2026

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Jintao-Huang commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants