Skip to content

[bugfix]Fix Megatron model device placement when use_cpu_initialization is enabled#9446

Open
ShiroNyaa wants to merge 2 commits into
modelscope:mainfrom
ShiroNyaa:fix/mcore_cpu_initialization
Open

[bugfix]Fix Megatron model device placement when use_cpu_initialization is enabled#9446
ShiroNyaa wants to merge 2 commits into
modelscope:mainfrom
ShiroNyaa:fix/mcore_cpu_initialization

Conversation

@ShiroNyaa
Copy link
Copy Markdown

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Fix Megatron model device placement when use_cpu_initialization is enabled.

Previously, wrap_model only moved model modules to CUDA when args.use_cpu_initialization was disabled. With CPU initialization enabled, some modules could remain on CPU before fp16/DDP wrapping, which may cause device mismatch errors during training, especially in LoRA tuning where frozen base modules such as embeddings are still used in forward.

This change moves the model to the current CUDA device when args.use_cpu_initialization is enabled, avoiding CPU/GPU tensor mismatch during Megatron training.

Experiment results

Before this change, LoRA training with use_cpu_initialization=True could fail with errors like:

RuntimeError: Expected all tensors to be on the same device, but got indices is on cuda:1, different from other tensors on cpu

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the condition under which the model is moved to the CUDA device, changing it from when CPU initialization is disabled to when it is enabled. The reviewer suggests moving the model to the GPU unconditionally instead, which would align with Megatron-LM's standard behavior and ensure that all CPU-initialized parameters are correctly transferred to the GPU.

Comment on lines 496 to 497
if args.use_cpu_initialization:
m.cuda(torch.cuda.current_device())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of conditionally moving the model to the CUDA device only when use_cpu_initialization is enabled, it is safer and more robust to move the model to the current CUDA device unconditionally. This aligns with Megatron-LM's standard behavior and ensures that any parameters initialized on the CPU (such as newly added adapter weights or embeddings) are correctly transferred to the GPU regardless of the initialization setting.

        m.cuda(torch.cuda.current_device())

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

if args.use_cpu_initialization:

Let's remove this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants