[https://nvbugs/6017720][fix] Fix moe backend mismatch on Blackwell in perf test.#13470
[https://nvbugs/6017720][fix] Fix moe backend mismatch on Blackwell in perf test.#13470dominicshanshan wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
📝 WalkthroughWalkthroughAdds support for DeepSeek FP8 block-scale model handling by introducing a model-name allowlist and safe SM-version detection function. Updates model YAML configuration detection to match PyTorch model labels case-insensitively and conditionally inject the DEEPGEMM MoE backend for SM versions 100 or higher. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/defs/perf/pytorch_model_config.py`:
- Around line 40-44: The helper _get_sm_version_safe() currently catches a bare
Exception which can hide unrelated bugs; change the except block to only catch
specific exceptions (e.g., except (ImportError, RuntimeError):) when calling
from tensorrt_llm._utils.get_sm_version() so import failures and CUDA runtime
errors are handled but other exceptions propagate; update the handler to return
0 for those specific exceptions (optionally bind the exception to a variable if
you want to log it) and remove the bare except Exception.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a0012d56-058d-4b28-816d-92a4f3786d60
📒 Files selected for processing (1)
tests/integration/defs/perf/pytorch_model_config.py
| try: | ||
| from tensorrt_llm._utils import get_sm_version | ||
| return get_sm_version() | ||
| except Exception: | ||
| return 0 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the get_sm_version function in tensorrt_llm._utils
fd -t f "_utils.py" | head -20Repository: NVIDIA/TensorRT-LLM
Length of output: 950
🏁 Script executed:
# Search for get_sm_version implementation
rg "def get_sm_version" -A 15Repository: NVIDIA/TensorRT-LLM
Length of output: 3106
🏁 Script executed:
# Also check the context of the pytorch_model_config.py file
cat -n tests/integration/defs/perf/pytorch_model_config.py | sed -n '35,50p'Repository: NVIDIA/TensorRT-LLM
Length of output: 680
Narrow _get_sm_version_safe() exception handling to specific exception types.
Line 43 catches bare Exception, which can mask unrelated bugs and hide unexpected failures in the SM100+ backend override path. Narrow the exception handling to import and runtime errors.
Proposed fix
def _get_sm_version_safe() -> int:
- try:
- from tensorrt_llm._utils import get_sm_version
- return get_sm_version()
- except Exception:
+ try:
+ from tensorrt_llm._utils import get_sm_version
+ except ImportError:
return 0
+
+ try:
+ return get_sm_version()
+ except (RuntimeError, OSError):
+ return 0Per coding guidelines: "Avoid broad exception handling — catch specific exceptions, not bare except:" — torch.cuda.get_device_properties(0) raises RuntimeError when CUDA is unavailable or the device is invalid.
🧰 Tools
🪛 Ruff (0.15.11)
[warning] 43-43: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/defs/perf/pytorch_model_config.py` around lines 40 - 44,
The helper _get_sm_version_safe() currently catches a bare Exception which can
hide unrelated bugs; change the except block to only catch specific exceptions
(e.g., except (ImportError, RuntimeError):) when calling from
tensorrt_llm._utils.get_sm_version() so import failures and CUDA runtime errors
are handled but other exceptions propagate; update the handler to return 0 for
those specific exceptions (optionally bind the exception to a variable if you
want to log it) and remove the bare except Exception.
Summary by CodeRabbit
New Features
Bug Fixes
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.