[None][test] refresh test constraints#13482
Conversation
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
/bot run |
📝 WalkthroughWalkthroughThis change adds hardware architecture gating to accuracy tests for Ada/Blackwell and Hopper GPUs, and introduces a new integration performance benchmark test that measures Llama 3.1 8B throughput and latency scaling under increasing concurrency in MIG mode. Changes
Sequence DiagramsequenceDiagram
participant Test as Performance Test
participant BR as BenchRunner
participant Model as Llama 3.1 8B Model
participant MIG as MIG Context
participant Parser as Metrics Parser
participant Validator as Scaling Validator
Test->>BR: Iterate concurrency levels
loop For each concurrency
BR->>MIG: Launch benchmark (streaming=False, pytorch_backend=True)
MIG->>Model: Execute inference requests
Model-->>MIG: Return results
MIG-->>BR: Return benchmark metrics (dict)
BR-->>Test: Metrics dict
Test->>Parser: Extract throughput & latency
Parser-->>Test: Parsed values
Test->>Validator: Check throughput > 1.3x previous
alt Scaling valid
Validator-->>Test: Continue to next concurrency
else Scaling failed
Validator-->>Test: Fail test
end
end
Test->>Test: Print summary table (concurrency vs. metrics)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration/defs/test_e2e.py (1)
2629-2683:⚠️ Potential issue | 🟠 MajorRegister this new integration perf test in test-db and QA perf lists.
This adds a new benchmark-style integration test, but I don’t see corresponding test-list updates in this PR scope. Without list registration, scheduled/pre-merge coverage can miss it.
Please add/update:
tests/integration/test_lists/test-db/l0_perf.yml(or the appropriatel0_*.yml)tests/integration/test_lists/qa/llm_perf_*.yml(or multinode perf list if applicable)As per coding guidelines, performance/integration test changes must be reflected in both CI test-db and QA list files.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/test_e2e.py` around lines 2629 - 2683, The new benchmark integration test test_trtllm_bench_mig_launch in tests/integration/defs/test_e2e.py is not registered in the CI test lists; add entries for this test to the test-db and QA perf lists by updating tests/integration/test_lists/test-db/l0_perf.yml (or the appropriate l0_*.yml) and tests/integration/test_lists/qa/llm_perf_*.yml (or the multinode perf list) to include the test path and any required tags/labels (e.g., l0, perf, integration, multinode) so the CI and QA runners will execute it; ensure the YAML entry references the test module and function name (tests.integration.defs.test_e2e::test_trtllm_bench_mig_launch) and matches existing formatting and gating rules in those files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/defs/test_e2e.py`:
- Line 2656: Replace the placeholder-less f-string print(f"\n=== Benchmark
Results Comparison ===") with a normal string literal or add a formatted
placeholder; e.g., change it to print("\n=== Benchmark Results Comparison ===")
so the f-prefix is removed and lint F541 is resolved; locate the exact call
print(f"\n=== Benchmark Results Comparison ===") in the test file and update it
accordingly.
- Around line 2679-2682: The test's strict stepwise assertion (throughput >
prev_throughput * 1.3) is brittle; change it to a tolerant check using a
configurable relative threshold or pytest.approx to allow normal CI variance:
compute prev_throughput = float(results[concurrency_list[idx -
1]].get('throughput', 0)) and then assert throughput >= prev_throughput * (1.0 +
RELATIVE_THRESHOLD) (e.g., RELATIVE_THRESHOLD = 0.05) or use assert throughput >
pytest.approx(prev_throughput, rel=0.05); make RELATIVE_THRESHOLD configurable
at top of the test file and handle the zero/near-zero prev_throughput case by
skipping the comparison or requiring an absolute minimum delta.
---
Outside diff comments:
In `@tests/integration/defs/test_e2e.py`:
- Around line 2629-2683: The new benchmark integration test
test_trtllm_bench_mig_launch in tests/integration/defs/test_e2e.py is not
registered in the CI test lists; add entries for this test to the test-db and QA
perf lists by updating tests/integration/test_lists/test-db/l0_perf.yml (or the
appropriate l0_*.yml) and tests/integration/test_lists/qa/llm_perf_*.yml (or the
multinode perf list) to include the test path and any required tags/labels
(e.g., l0, perf, integration, multinode) so the CI and QA runners will execute
it; ensure the YAML entry references the test module and function name
(tests.integration.defs.test_e2e::test_trtllm_bench_mig_launch) and matches
existing formatting and gating rules in those files.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 56b52213-9959-4a73-a2f1-9e5d80d70c3a
📒 Files selected for processing (3)
tests/integration/defs/accuracy/test_llm_api_autodeploy.pytests/integration/defs/accuracy/test_llm_api_pytorch.pytests/integration/defs/test_e2e.py
| output = runner() | ||
| results[concurrency] = output | ||
|
|
||
| print(f"\n=== Benchmark Results Comparison ===") |
There was a problem hiding this comment.
Remove the placeholder-less f-string at Line 2656.
This triggers F541 and fails lint.
Proposed fix
- print(f"\n=== Benchmark Results Comparison ===")
+ print("\n=== Benchmark Results Comparison ===")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"\n=== Benchmark Results Comparison ===") | |
| print("\n=== Benchmark Results Comparison ===") |
🧰 Tools
🪛 Flake8 (7.3.0)
[error] 2656-2656: f-string is missing placeholders
(F541)
🪛 Ruff (0.15.11)
[error] 2656-2656: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/defs/test_e2e.py` at line 2656, Replace the
placeholder-less f-string print(f"\n=== Benchmark Results Comparison ===") with
a normal string literal or add a formatted placeholder; e.g., change it to
print("\n=== Benchmark Results Comparison ===") so the f-prefix is removed and
lint F541 is resolved; locate the exact call print(f"\n=== Benchmark Results
Comparison ===") in the test file and update it accordingly.
| if idx > 0: | ||
| prev_throughput = float(results[concurrency_list[idx - 1]].get( | ||
| 'throughput', 0)) | ||
| assert throughput > prev_throughput * 1.3, f"Throughput is not increasing for concurrency {concurrency_list[idx]}" |
There was a problem hiding this comment.
The stepwise 1.3x throughput assertion is too brittle for CI perf variance.
Requiring every jump to exceed 30% can fail on normal noise/saturation, especially in shared MIG environments.
Stabilization option
if idx > 0:
prev_throughput = float(results[concurrency_list[idx - 1]].get(
'throughput', 0))
- assert throughput > prev_throughput * 1.3, f"Throughput is not increasing for concurrency {concurrency_list[idx]}"
+ # Allow small variance between adjacent points.
+ assert throughput >= prev_throughput * 0.9, (
+ f"Throughput regressed for concurrency {concurrency_list[idx]}"
+ )
+
+ baseline = float(results[concurrency_list[0]].get('throughput', 0))
+ final_tp = float(results[concurrency_list[-1]].get('throughput', 0))
+ assert final_tp >= baseline * 1.3, "End-to-end scaling did not improve enough"As per coding guidelines, “Flag tests that assert overly brittle behavior … unless the product contract requires it.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/defs/test_e2e.py` around lines 2679 - 2682, The test's
strict stepwise assertion (throughput > prev_throughput * 1.3) is brittle;
change it to a tolerant check using a configurable relative threshold or
pytest.approx to allow normal CI variance: compute prev_throughput =
float(results[concurrency_list[idx - 1]].get('throughput', 0)) and then assert
throughput >= prev_throughput * (1.0 + RELATIVE_THRESHOLD) (e.g.,
RELATIVE_THRESHOLD = 0.05) or use assert throughput >
pytest.approx(prev_throughput, rel=0.05); make RELATIVE_THRESHOLD configurable
at top of the test file and handle the zero/near-zero prev_throughput case by
skipping the comparison or requiring an absolute minimum delta.
|
PR_Github #45614 [ run ] triggered by Bot. Commit: |
|
PR_Github #45614 [ run ] completed with state
|
Summary by CodeRabbit
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.