[TRTLLM-13429][feat] Switch DeepSeek/NemotronH/Qwen3/Qwen3.5-MoE to sharding-IR canonical models by greg-kwasniewski1 · Pull Request #13478 · NVIDIA/TensorRT-LLM

greg-kwasniewski1 · 2026-04-26T16:14:06Z

Summary

Replace the legacy non-IR modeling_*.py for four model architectures with their sharding-IR variants. The IR-aware implementation (using apply_sharding_hints for TP/EP/BMM via canonical-op kwargs like tp_mode, layer_type, output_sizes, tp_min_local_shape, tp_scaled_dim, shardable) is now the canonical implementation. The AD_USE_IR_MODELS env-var gate is removed.

File swap

git rm modeling_X.py && git mv modeling_X_ir.py modeling_X.py for:

modeling_deepseek.py — DeepSeek V3, MLA + MoE
modeling_nemotron_h.py — NemotronH, hybrid Mamba + MoE
modeling_qwen3.py — Qwen3 dense (was previously only available as _ir; now registered in __init__.py)
modeling_qwen3_5_moe.py — Qwen3.5 MoE, GatedDeltaNet + Gated MHA + MoE

Continuation of #12419 (sharding-IR introduction) and #13272 (post-#12419 cleanup), per the staged plan to migrate model implementations to the hint-driven sharder one cohort at a time.

Skill update for future regenerations

.claude/skills/ad-model-onboard/SKILL.md:

Reverse the misleading noaux_tc "Key Gotcha" bullet — was telling agents to substitute fused trtllm calls with vanilla PyTorch; corrected to KEEP torch.ops.trtllm.{noaux_tc_op, dsv3_router_gemm_op} verbatim (no AD transform recovers vanilla → fused; vanilla replacement is ~17x more kernel launches and loses the FP8-friendly router GEMM).
New "Step 0 — IR delta contract (READ FIRST)" with explicit A1-A5 allowlist + F1-F7 forbidden list of changes an IR port may introduce vs the non-IR base.
New "Step 12 — Pre-finalization self-audit (MANDATORY)" — agents must run diff -u and classify every hunk before reporting done.
Phase 5 position_ids rule now carries an "IR-port exception" pointing at the contract.

Audit summary

Each of the four IR files was regenerated from its non-IR base by parallel generalPurpose subagents using the updated skill, then audited at the parent level. Every hunk classifies as A1-A5 (allowed). Cross-checks: same class count, same nn.Parameter/register_buffer/register_load_state_dict_pre_hook count, same def forward( count, both torch.ops.trtllm.* call sites in DeepSeekV3MoEGate and NemotronHTopkRouter preserved verbatim (the noaux_tc paths are the original fused kernels), and the position_ids contract preserved per architecture.

Test plan

Local smoke on small/fp8/deepseek/r1.yaml with the IR sharder + world_size=2 on 2x H100: exit=0, 51 sharding nodes, 10 prompts generated cleanly. (AD_USE_IR_MODELS=1 no longer needed since IR is canonical.)
Targeted CI run (Group 1 + Group 2 stages per ci-guidelines.md) — will trigger via /bot run --stage-list ... after PR opens.
Full pre-merge /bot run once the targeted run is green.

Refs: #12419 (sharding-IR introduction), #13271 (post-#12419 cleanup feature), #13272 (post-#12419 cleanup PR), #13429 (this feature's tracking issue).

Made with Cursor

Summary by CodeRabbit

New Features
- Enhanced tensor parallelism support for DeepSeek, Nemotron, and Qwen3.5 models with sharding-aware optimizations.
Refactor
- Consolidated model registrations and unified internal variant handling for cleaner module structure.
Documentation
- Updated model onboarding guidelines with IR porting requirements, operation allowlists, and mandatory audit workflows.

greg-kwasniewski1 · 2026-04-26T16:16:34Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1"

coderabbitai · 2026-04-26T16:22:00Z

📝 Walkthrough

Walkthrough

This change consolidates four model families from dual-file implementations to IR-fied single files. The non-IR implementations gain sharding-aware custom ops carrying explicit sharding metadata, the parallel _ir.py files are deleted, module registration is consolidated to remove conditional IR logic, and IR porting documentation guidelines are established.

Changes

Cohort / File(s)	Summary
IR Porting Documentation `.claude/skills/ad-model-onboard/SKILL.md`	Adds comprehensive IR delta allowlist/forbidden list, introduces mandatory pre-final audit workflow, and revises `noaux_tc` router instructions to preserve specific `torch.ops.trtllm.*` ops from base implementations.
Module Registration Consolidation `tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py`	Removes conditional `AD_USE_IR_MODELS` logic and parallel `*_ir` module entry registration. Directly registers IR-aware sharding variants as canonical mappings. Adds explicit `modeling_qwen3` entry to expose `Qwen3ForCausalLM`.
DeepSeek Consolidation `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py`, `modeling_deepseek_ir.py`	Enriches non-IR file with explicit TP sharding intent via custom ops: replaces standard linear projections with `torch_linear_simple` (colwise/rowwise), adds `all_reduce` at MLP/MoE/MLA merge points, and passes `enable_sharding` and `layer_type` hints to `torch_moe` and `torch_mla`. Deletes outdated IR file (768 lines).
Nemotron-H Consolidation `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_nemotron_h.py`, `modeling_nemotron_h_ir.py`	Replaces eager PyTorch ops (RMSNorm-gated, linear projections, tensor splits/views) with AutoDeploy custom ops carrying `tp_mode`, `tp_scaled_dim`, `output_sizes`, and `layer_type` metadata. Adds `all_reduce` semantics for Mamba/MLP/MoE final projections. Switches to `torch.ops.auto_deploy.torch_rmsnorm_gated`. Deletes outdated IR file (822 lines).
Qwen3 Updates `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3.py`	Renames rotary embedding cos/sin caches to `_ad_cos_cached`/`_ad_sin_cached` for AutoDeploy/lift_to_meta compatibility. Adds inline documentation around cos/sin slicing, BSND layout, and prefill-only inference. Minor comment/whitespace adjustments.
Qwen3.5 MoE Consolidation `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe.py`	Reworks forward passes in GatedDeltaNet, Attention, MLP, and SparseMoeBlock to use `torch_linear_simple`, `split_with_sizes`, `view`, and `all_reduce` custom ops. Tags operations with `layer_type` hints (`"delta"`, `"mha"`, `"moe"`). Adds AutoDeploy custom ops import.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: switching four model families (DeepSeek, NemotronH, Qwen3, Qwen3.5-MoE) to use sharding-IR as the canonical implementation.
Description check	✅ Passed	The PR description comprehensively explains the change (file swaps for four models), includes test plan details, and references related issues `#12419`, `#13271`, `#13272`, and `#13429`.
Linked Issues check	✅ Passed	The PR directly addresses issue `#13429` by promoting four model families to use sharding-IR as canonical [deepseek, nemotron_h, qwen3, qwen3_5_moe], removing AD_USE_IR_MODELS gates, and updating skill documentation as required.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the four target models plus skill documentation updates; no unrelated modifications to other models, infrastructure, or unrelated components are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

.claude/skills/ad-model-onboard/SKILL.md (1)
556-559: ⚠️ Potential issue | 🟠 Major

Resolve conflicting guidance on torch.ops.trtllm.* usage.

Line 556 requires preserving specific torch.ops.trtllm.* router ops, but Line 558 says to never use trtllm_*. This contradiction can lead to incorrect ports. Please carve out an explicit exception in the “Key Gotchas” rule for the noaux_tc/dsv3 router case.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/ad-model-onboard/SKILL.md around lines 556 - 559, The "Key
Gotchas" rule contradicts earlier guidance about preserving specific router ops;
add an explicit exception stating that for noaux_tc / DeepSeek-V3 style routers
you must keep torch.ops.trtllm.noaux_tc_op and
torch.ops.trtllm.dsv3_router_gemm_op exactly as in the non-IR base (router gate
is TP-REPLICATED, no sharding hints), while retaining the general prohibition on
trtllm_* in AutoDeploy (i.e., only allow these two trtllm ops when the source
model uses them verbatim and do not introduce vanilla replacements), and update
the SKILL.md section text to reflect this exception so both rules are
consistent.
tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py (1)
9-37: ⚠️ Potential issue | 🔴 Critical

Add modeling_llama3_ir to _MODEL_MODULES or restore the AD_USE_IR_MODELS gate.

The file modeling_llama3_ir.py exists but is not discoverable through the custom models __init__.py — it's neither listed in _MODEL_MODULES nor guarded by the AD_USE_IR_MODELS environment variable that tests still set. The module self-registers via AutoModelForCausalLMFactory, but removal of the environment-gated registration breaks the intended conditional loading mechanism referenced in the tests.

Either add "modeling_llama3_ir": ["Llama3ForCausalLM"] to _MODEL_MODULES (consistent with other converted IR variants like deepseek, nemotron_h, qwen3, and qwen3_5_moe), or restore the conditional gate if IR-only loading is required.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py` around lines 9 -
37, The custom models registry is missing the modeling_llama3_ir entry so
modeling_llama3_ir (which defines Llama3ForCausalLM and self-registers via
AutoModelForCausalLMFactory) isn't discoverable; fix by either adding
"modeling_llama3_ir": ["Llama3ForCausalLM"] to the _MODEL_MODULES dict in
__init__.py (so the importlib loop loads it like
modeling_deepseek/modeling_qwen3 entries) or restore the AD_USE_IR_MODELS
environment-gate around the registration/import of modeling_llama3_ir
(reintroduce the AD_USE_IR_MODELS check used by tests) so the module is
conditionally registered as intended.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/ad-model-onboard/SKILL.md:
- Around line 503-509: The fenced code block that begins with the table header
"| Hunk lines (in IR file) | Summary of change | Category | Verdict |" is
missing a language tag which triggers MD040; update that fenced block by adding
a language identifier (e.g., "markdown") right after the opening backticks so
the block becomes ```markdown, ensuring the table renders and linters stop
flagging it; locate the block by searching for the exact table header text
within SKILL.md.

In `@tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py`:
- Around line 1-4: This file is missing the required NVIDIA copyright header at
the top; add the standard NVIDIA source-file header (with updated year if this
is a modification) as the very first lines of
tensorrt_llm._torch.auto_deploy.models.custom.__init__ before any imports, then
keep the existing imports and _logger definition unchanged so symbols like
importlib, logging and _logger remain intact.

In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py`:
- Line 1: Update the copyright header year from 2025 to 2026 in the file
modeling_deepseek.py: locate the top-of-file copyright comment and change the
year range to include 2026 so the header reflects the file modification in 2026.

---

Outside diff comments:
In @.claude/skills/ad-model-onboard/SKILL.md:
- Around line 556-559: The "Key Gotchas" rule contradicts earlier guidance about
preserving specific router ops; add an explicit exception stating that for
noaux_tc / DeepSeek-V3 style routers you must keep torch.ops.trtllm.noaux_tc_op
and torch.ops.trtllm.dsv3_router_gemm_op exactly as in the non-IR base (router
gate is TP-REPLICATED, no sharding hints), while retaining the general
prohibition on trtllm_* in AutoDeploy (i.e., only allow these two trtllm ops
when the source model uses them verbatim and do not introduce vanilla
replacements), and update the SKILL.md section text to reflect this exception so
both rules are consistent.

In `@tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py`:
- Around line 9-37: The custom models registry is missing the modeling_llama3_ir
entry so modeling_llama3_ir (which defines Llama3ForCausalLM and self-registers
via AutoModelForCausalLMFactory) isn't discoverable; fix by either adding
"modeling_llama3_ir": ["Llama3ForCausalLM"] to the _MODEL_MODULES dict in
__init__.py (so the importlib loop loads it like
modeling_deepseek/modeling_qwen3 entries) or restore the AD_USE_IR_MODELS
environment-gate around the registration/import of modeling_llama3_ir
(reintroduce the AD_USE_IR_MODELS check used by tests) so the module is
conditionally registered as intended.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c69a2cab-991f-4250-b5a0-f23027751349

📥 Commits

Reviewing files that changed from the base of the PR and between dd907c0 and a5ced6c.

📒 Files selected for processing (9)

.claude/skills/ad-model-onboard/SKILL.md
tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek_ir.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_nemotron_h.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_nemotron_h_ir.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe_ir.py

💤 Files with no reviewable changes (2)

tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek_ir.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_nemotron_h_ir.py

coderabbitai · 2026-04-26T16:22:03Z

+```
+| Hunk lines (in IR file) | Summary of change | Category | Verdict |
+|---|---|---|---|
+| 234-240                 | F.linear → torch_linear_simple, tp_mode="colwise" | A1 + A2 | OK |
+| 264-340                 | noaux_tc_op replaced with vanilla PyTorch         | F1      | REVERTED to base |
+| ...                     | ...                                               | ...     | ... |
+```


⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced code block.

Line 503 starts a fenced block without a language, which triggers markdownlint MD040 and weakens rendering/tooling support.

💡 Proposed fix

-``` +```markdown | Hunk lines (in IR file) | Summary of change | Category | Verdict | |---|---|---|---| | 234-240 | F.linear → torch_linear_simple, tp_mode="colwise" | A1 + A2 | OK | | 264-340 | noaux_tc_op replaced with vanilla PyTorch | F1 | REVERTED to base | | ... | ... | ... | ... |

</details> <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.22.1)</summary> [warning] 503-503: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/ad-model-onboard/SKILL.md around lines 503 - 509, The fenced
code block that begins with the table header "| Hunk lines (in IR file) |
Summary of change | Category | Verdict |" is missing a language tag which
triggers MD040; update that fenced block by adding a language identifier (e.g.,
"markdown") right after the opening backticks so the block becomes ```markdown,
ensuring the table renders and linters stop flagging it; locate the block by
searching for the exact table header text within SKILL.md.

</details>   

coderabbitai · 2026-04-26T16:22:03Z

 import importlib
 import logging
-import os

 _logger = logging.getLogger(__name__)


⚠️ Potential issue | 🟡 Minor

Add the required NVIDIA copyright header.

This modified Python source still starts directly with imports, so it no longer satisfies the repo’s source-file header requirement.

♻️ Proposed fix

+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. + import importlib import logging

As per coding guidelines: `**/*.{cpp,cc,cxx,h,hpp,py}`: Include NVIDIA copyright header on all new files; update year on modified files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import importlib

import logging

import os

_logger = logging.getLogger(__name__)

# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

import importlib

import logging

_logger = logging.getLogger(__name__)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py` around lines 1 - 4, This file is missing the required NVIDIA copyright header at the top; add the standard NVIDIA source-file header (with updated year if this is a modification) as the very first lines of tensorrt_llm._torch.auto_deploy.models.custom.__init__ before any imports, then keep the existing imports and _logger definition unchanged so symbols like importlib, logging and _logger remain intact.

coderabbitai · 2026-04-26T16:22:03Z

@@ -1,18 +1,52 @@
 # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.


⚠️ Potential issue | 🟡 Minor

Update the copyright year.

This file is modified in 2026, but the header still stops at 2025.

♻️ Proposed fix

-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025-2026, NVIDIA CORPORATION. All rights reserved.

As per coding guidelines: `**/*.{cpp,cc,cxx,h,hpp,py}`: Include NVIDIA copyright header on all new files; update year on modified files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

# Copyright (c) 2025-2026, NVIDIA CORPORATION. All rights reserved.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py` at line 1, Update the copyright header year from 2025 to 2026 in the file modeling_deepseek.py: locate the top-of-file copyright comment and change the year range to include 2026 so the header reflects the file modification in 2026.

tensorrt-cicd · 2026-04-26T16:24:11Z

PR_Github #45593 [ run ] triggered by Bot. Commit: a5ced6c Link to invocation

greg-kwasniewski1 · 2026-04-26T16:32:38Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1"

tensorrt-cicd · 2026-04-26T16:39:52Z

PR_Github #45594 [ run ] triggered by Bot. Commit: 46f9725 Link to invocation

tensorrt-cicd · 2026-04-26T17:17:23Z

PR_Github #45594 [ run ] completed with state FAILURE. Commit: 46f9725
/LLM/main/L0_MergeRequest_PR pipeline #35810 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

greg-kwasniewski1 · 2026-04-27T11:39:03Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1"

tensorrt-cicd · 2026-04-27T11:45:01Z

PR_Github #45730 [ run ] triggered by Bot. Commit: 46f9725 Link to invocation

tensorrt-cicd · 2026-04-27T12:24:31Z

PR_Github #45730 [ run ] completed with state FAILURE. Commit: 46f9725
/LLM/main/L0_MergeRequest_PR pipeline #35927 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

greg-kwasniewski1 · 2026-04-27T17:53:27Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1" --disable-fail-fast

tensorrt-cicd · 2026-04-27T17:59:58Z

PR_Github #45765 [ run ] triggered by Bot. Commit: 46f9725 Link to invocation

… to sharding-IR canonical Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Made-with: Cursor

…py staging Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Made-with: Cursor

greg-kwasniewski1 · 2026-04-27T18:42:24Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1" --disable-fail-fast

tensorrt-cicd · 2026-04-27T18:49:21Z

PR_Github #45771 [ run ] triggered by Bot. Commit: 72cae82 Link to invocation

greg-kwasniewski1 requested a review from a team as a code owner April 26, 2026 16:14

greg-kwasniewski1 requested a review from MrGeva April 26, 2026 16:14

github-actions Bot assigned greg-kwasniewski1 Apr 26, 2026

coderabbitai Bot reviewed Apr 26, 2026

View reviewed changes

greg-kwasniewski1 added 2 commits April 27, 2026 11:40

[None][feat][autodeploy] Switch deepseek/nemotron_h/qwen3/qwen3_5_moe…

1777ad5

… to sharding-IR canonical Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Made-with: Cursor

[None][fix][autodeploy] Restore AD_USE_IR_MODELS hook for future _ir.…

72cae82

…py staging Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Made-with: Cursor

greg-kwasniewski1 force-pushed the gk/switch_to_ir_models branch from 46f9725 to 72cae82 Compare April 27, 2026 18:42

		@@ -1,18 +1,52 @@
		# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

	# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
	# Copyright (c) 2025-2026, NVIDIA CORPORATION. All rights reserved.

Conversation

greg-kwasniewski1 commented Apr 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

File swap

Skill update for future regenerations

Audit summary

Test plan

Summary by CodeRabbit

Uh oh!

greg-kwasniewski1 commented Apr 26, 2026

Uh oh!

coderabbitai Bot commented Apr 26, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

greg-kwasniewski1 commented Apr 26, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

greg-kwasniewski1 commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

greg-kwasniewski1 commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

greg-kwasniewski1 commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greg-kwasniewski1 commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading