[https://nvbugs/6037654][fix] Set DeepEP low-latency token limit for qwen3 CI to prevent OOM by byshiue · Pull Request #13484 · NVIDIA/TensorRT-LLM

byshiue · 2026-04-27T02:53:16Z

…ghput_latency] to prevent OOM

Summary by CodeRabbit

Tests
- Removed a test waiver, enabling the test_fp8[throughput_latency] test to run for the Qwen3-235B-A22B model.
- Updated the test_fp8 accuracy test configuration with adjusted runtime parameters.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…ghput_latency] to prevent OOM Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

coderabbitai · 2026-04-27T02:56:59Z

📝 Walkthrough

Walkthrough

The test_fp8 accuracy test for Qwen3-235B-A22B model is modified to set the TRTLLM_DEEP_EP_TOKEN_LIMIT environment variable to "256" via mock.patch.dict(). Additionally, a corresponding test waiver is removed, enabling the test to run without being skipped.

Changes

Cohort / File(s)	Summary
Test Configuration Updates `tests/integration/defs/accuracy/test_llm_api_pytorch.py`	Wraps the `LLM` context manager with `mock.patch.dict(os.environ, ...)` to inject `TRTLLM_DEEP_EP_TOKEN_LIMIT` environment variable set to `"256"` for the Qwen3-235B-A22B `test_fp8` test.
Test Waiver Removal `tests/integration/test_lists/waives.txt`	Removes the waiver entry for `TestQwen3_235B_A22B::test_fp8[throughput_latency]`, allowing the test to execute instead of being skipped.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description is incomplete and lacks essential information. Only the title is provided; all required sections (Description, Test Coverage) are empty.	Fill in the Description section explaining the OOM issue and the solution. Complete the Test Coverage section listing the affected test cases. Verify the checklist items are appropriate.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: setting DeepEP token limit for qwen3 CI to prevent OOM, which directly matches the code modifications.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/accuracy/test_llm_api_pytorch.py`:
- Around line 4532-4541: The environment override setting
TRTLLM_DEEP_EP_TOKEN_LIMIT=256 is currently applied unconditionally around the
LLM context and thus affects both latency and throughput_latency cases; restrict
this patch to only the throughput case by moving the mock.patch.dict(...) so it
only wraps the throughput_latency branch (or conditionally apply it when the
test variable/mode equals "throughput_latency"), leaving the latency branch
unchanged; locate the current mock.patch.dict block that wraps the LLM(...)
context and instead apply it only when creating the LLM for the throughput test
case (referencing the LLM(...) call and local variables like attention_dp and
kv_cache_config to find the correct instantiation).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e6a3711a-5079-4389-9008-e8d4de25be07

📥 Commits

Reviewing files that changed from the base of the PR and between 2f745de and ae6ca24.

📒 Files selected for processing (2)

tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/integration/test_lists/waives.txt

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

byshiue · 2026-04-27T05:57:23Z

/bot skip --comment "The fixed test is not covered by l0 pre-merge CI"

tensorrt-cicd · 2026-04-27T06:04:12Z

PR_Github #45665 [ skip ] triggered by Bot. Commit: ae6ca24 Link to invocation

tensorrt-cicd · 2026-04-27T06:33:13Z

PR_Github #45665 [ skip ] completed with state SUCCESS. Commit: ae6ca24
Release Check Pipeline #3689 failed

Link to invocation

set TRTLLM_DEEP_EP_TOKEN_LIMIT in TestQwen3_235B_A22B::test_fp8[throu…

ae6ca24

…ghput_latency] to prevent OOM Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

byshiue requested a review from a team as a code owner April 27, 2026 02:53

github-actions Bot assigned byshiue Apr 27, 2026

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread tests/integration/defs/accuracy/test_llm_api_pytorch.py

crazydemo approved these changes Apr 27, 2026

View reviewed changes

byshiue mentioned this pull request Apr 27, 2026

[https://nvbugs/6037654][fix] Cap DeepEP low-latency token limit to prevent OOM and illegal memory access #13362

Closed

2 tasks

byshiue enabled auto-merge (squash) April 27, 2026 05:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6037654][fix] Set DeepEP low-latency token limit for qwen3 CI to prevent OOM#13484

[https://nvbugs/6037654][fix] Set DeepEP low-latency token limit for qwen3 CI to prevent OOM#13484
byshiue wants to merge 1 commit intoNVIDIA:mainfrom
byshiue:fix/nvbug_6037654

byshiue commented Apr 27, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

byshiue commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

byshiue commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

byshiue commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

byshiue commented Apr 27, 2026 •

edited

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading