Skip to content

fix(pt): select device before loading torch ops#5612

Draft
njzjz-bot wants to merge 1 commit into
deepmodeling:masterfrom
njzjz-bot:fix/pt-preselect-device-4171
Draft

fix(pt): select device before loading torch ops#5612
njzjz-bot wants to merge 1 commit into
deepmodeling:masterfrom
njzjz-bot:fix/pt-preselect-device-4171

Conversation

@njzjz-bot

@njzjz-bot njzjz-bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a shared PyTorch backend helper to choose the rank-local CUDA/HIP device before PyTorch CUDA queries or custom-op loading can create a default context.
  • Move the PT C++ model init paths (DeepPotPT, DeepSpinPT, DeepTensorPT, DeepPotPTExpt, DeepSpinPTExpt) to call this helper before deepmd::load_op_library().
  • This should avoid each MPI rank leaving a small unused context on GPU 0 while preserving CPU fallback behavior.

Verification

  • git diff --check HEAD~1..HEAD
  • Static check: verified preselect_torch_device precedes deepmd::load_op_library() in all touched PT init paths.

Fixes #4171

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)

Summary by CodeRabbit

  • Bug Fixes
    • Improved device selection so models start on the correct GPU more reliably.
    • Reduced cases where GPU-enabled environments could accidentally fall back to the wrong device.
    • CPU fallback behavior remains intact when no supported GPU is available.

Preselect the CUDA/HIP device from the rank-local GPU before PyTorch CUDA queries or torch custom-op loading can create a default-device context. This avoids each MPI rank leaving an unused context on GPU 0 in multi-GPU LAMMPS runs.

Fixes deepmodeling#4171

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)

Signed-off-by: njzjz-bot (driven by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5))[bot] <48687836+njzjz-bot@users.noreply.github.com>
@github-actions github-actions Bot added the C++ label Jun 30, 2026
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a4338421-7b9b-450b-867a-37dfe436ba0b

📥 Commits

Reviewing files that changed from the base of the PR and between 73de44b and 050f97e.

📒 Files selected for processing (6)
  • source/api_cc/include/commonPT.h
  • source/api_cc/src/DeepPotPT.cc
  • source/api_cc/src/DeepPotPTExpt.cc
  • source/api_cc/src/DeepSpinPT.cc
  • source/api_cc/src/DeepSpinPTExpt.cc
  • source/api_cc/src/DeepTensorPT.cc

📝 Walkthrough

Walkthrough

A new inline helper deepmd::preselect_torch_device is added to commonPT.h. It centralizes rank-local GPU selection (via CUDA/ROCm or Torch APIs) and sets gpu_id/gpu_enabled. All five PyTorch model init functions (DeepPotPT, DeepPotPTExpt, DeepSpinPT, DeepSpinPTExpt, DeepTensorPT) replace their duplicated inline GPU-selection logic with a single call to this helper.

Changes

GPU Preselection Refactor

Layer / File(s) Summary
preselect_torch_device helper
source/api_cc/include/commonPT.h
Adds #include "device.h" and defines deepmd::preselect_torch_device, selecting a rank-local GPU via DPGetDeviceCount/DPSetDevice (under GOOGLE_CUDA/TENSORFLOW_USE_ROCM) or torch::cuda::device_count(), and assigning gpu_enabled from torch::cuda::is_available().
Adopt helper in all init paths
source/api_cc/src/DeepPotPT.cc, source/api_cc/src/DeepPotPTExpt.cc, source/api_cc/src/DeepSpinPT.cc, source/api_cc/src/DeepSpinPTExpt.cc, source/api_cc/src/DeepTensorPT.cc
Each ::init replaces its inline torch::cuda::device_count() / torch::cuda::is_available() / DPSetDevice block with preselect_torch_device(gpu_rank, gpu_id, gpu_enabled). DeepTensorPT.cc additionally adds the commonPT.h include. Subsequent CUDA-vs-CPU device construction and logging remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: preselecting the device before loading Torch ops.
Linked Issues check ✅ Passed The changes match issue #4171 by selecting the rank-local GPU before op-library loading in all touched PT init paths.
Out of Scope Changes check ✅ Passed No obvious unrelated changes are present beyond the device-preselection refactor needed for the linked bug fix.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 81.98%. Comparing base (73de44b) to head (050f97e).

Files with missing lines Patch % Lines
source/api_cc/include/commonPT.h 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5612      +/-   ##
==========================================
- Coverage   81.98%   81.98%   -0.01%     
==========================================
  Files         959      959              
  Lines      105430   105423       -7     
  Branches     4071     4069       -2     
==========================================
- Hits        86442    86426      -16     
- Misses      17518    17528      +10     
+ Partials     1470     1469       -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@njzjz njzjz marked this pull request as draft June 30, 2026 05:41
@njzjz

njzjz commented Jun 30, 2026

Copy link
Copy Markdown
Member

Need a real test on a machine with multiple GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] VRAM is wasted when running Lammps with multiple GPUs

2 participants