[https://nvbugs/6105768][fix] ** Runtime GPU detection inside the test function: when total_memory < 80 GiB #13471
Conversation
…M on L40S The test_disaggregated_cancel_large_context_requests test fails with OOM on L40S (44.4 GiB) because DeepSeek-V3-Lite bf16 requires ~37 GiB per disaggregated worker, and two workers sharing a single GPU need ~74 GiB. Add runtime GPU memory detection to fall back to TinyLlama with a smaller config on single-GPU systems with <80 GiB memory. This preserves the test's cancellation stress-test coverage while fitting within L40S memory constraints. On H100 or multi-GPU systems, the original DeepSeek-V3-Lite bf16 model is still used. Also adds a TinyLlama-compatible disagg config with conservative memory fractions to prevent KV cache allocation races on shared GPUs, and removes the test waiver from waives.txt. Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
📝 WalkthroughWalkthroughIntroduces a new disaggregated cancellation stress test configuration with TinyLlama model. Updates the test runner with a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/integration/defs/disaggregated/test_disaggregated.py (1)
1-1: Consider updating copyright year.The copyright header shows
2022-2024, but since this file is being modified in 2026, it should be updated to2022-2026per coding guidelines. As per coding guidelines: "update year on modified files".📝 Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/disaggregated/test_disaggregated.py` at line 1, Update the copyright header year range in test_disaggregated.py from "2022-2024" to "2022-2026" so the file reflects the current modification year; locate the SPDX/header comment at the top of the file and change the year substring accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/integration/defs/disaggregated/test_disaggregated.py`:
- Line 1: Update the copyright header year range in test_disaggregated.py from
"2022-2024" to "2022-2026" so the file reflects the current modification year;
locate the SPDX/header comment at the top of the file and change the year
substring accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5baeccdf-3fcb-478d-926d-8942b9c81d28
📒 Files selected for processing (3)
tests/integration/defs/disaggregated/test_configs/disagg_config_cancel_stress_test_small.yamltests/integration/defs/disaggregated/test_disaggregated.pytests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/waives.txt
Summary
model._apply(init_meta_tensor). The test was designed for H100 (80 GiB) with workers on separate GPUs but had no memory guard.total_memory < 80 GiBanddevice_count < 2, fall back to TinyLlama-1.1B-Chat-v1.0 (~2 GiB) with a dedicated small-model config (disagg_config_cancel_stress_test_small.yaml) that uses conservativefree_gpu_memory_fractionvalues (0.2/0.3 vs 0.3/0.85) to prevent KV cache allocation races on shared GPUs. The parametrize ID[DeepSeek-V3-Lite-bf16]is preserved since it comes from the fixture parameter, not the actual model loaded. On H100 or multi-GPU systems, the original DeepSeek-V3-Lite bf16 path is unchanged.Test plan
Links
Summary by CodeRabbit
Tests
Chores