Conversation
There was a problem hiding this comment.
Pull request overview
This PR follows up on #4466 to make RoPE inverse-frequency initialization consistent across several model implementations/backends by creating the arange/exponent on the default device first and then moving the resulting inv_freq to the target device.
Changes:
- Update multiple model-specific rotary embedding initializations to compute
inv_freqwithout specifyingdeviceintorch.arange, then moveinv_freqto the requested device. - Update dlinfer rotary embedding init to remove hardcoded CUDA allocation for
inv_freq. - Adjust default backend’s Dynamic NTK
_ntk_inv_freqto match the “no device specified in arange” pattern (but this currently introduces a device mismatch bug).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| lmdeploy/pytorch/models/qwen3_5.py | Compute inv_freq then move it to device for default RoPE params. |
| lmdeploy/pytorch/models/qwen2_vl.py | Vision rotary embedding now moves inv_freq to device after computation. |
| lmdeploy/pytorch/models/qwen2_5_vl.py | Same “compute then move” update for vision rotary embedding. |
| lmdeploy/pytorch/models/glm4_1v.py | Same “compute then move” update for vision rotary embedding. |
| lmdeploy/pytorch/backends/dlinfer/rotary_embedding.py | Remove hardcoded device='cuda' for inv_freq init. |
| lmdeploy/pytorch/backends/default/rotary_embedding.py | Change Dynamic NTK _ntk_inv_freq arange creation (currently breaks CUDA). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| base = self.base * ((self.scaling_factor * seq_len / self.max_position_embeddings) - | ||
| (self.scaling_factor - 1))**(self.dim / (self.dim - 2)) | ||
| inv_freq = 1.0 / (base**(torch.arange(0, self.dim, 2, dtype=torch.int64, device=device).float() / self.dim)) | ||
| inv_freq = 1.0 / (base**(torch.arange(0, self.dim, 2, dtype=torch.int64).float() / self.dim)) |
There was a problem hiding this comment.
In _ntk_inv_freq, base is derived from seq_len (so it will be on seq_len.device), but torch.arange(...) is now created on the default device (CPU). This will raise a device-mismatch error when seq_len is on CUDA (common during inference). Ensure the exponent tensor is on the same device as base/seq_len (e.g., move the arange to seq_len.device before the pow), or compute the full NTK inv_freq on CPU and then move it to x.device in forward before torch.where.
| inv_freq = 1.0 / (base**(torch.arange(0, self.dim, 2, dtype=torch.int64).float() / self.dim)) | |
| inv_freq = 1.0 / (base**(torch.arange(0, self.dim, 2, dtype=torch.int64, device=seq_len.device).float() / self.dim)) |
There was a problem hiding this comment.
This will be handled by
if self.inv_freq.device != x.device:
self.inv_freq = self.inv_freq.to(x.device)so no need to add the device during initial calculation.
Follow-up PR of