Skip to content

Integrate Automated QDQ placement tool - part 4.4#961

Open
willg-nv wants to merge 5 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.4
Open

Integrate Automated QDQ placement tool - part 4.4#961
willg-nv wants to merge 5 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.4

Conversation

@willg-nv
Copy link
Contributor

@willg-nv willg-nv commented Mar 3, 2026

What does this PR do?

Many minor changes:

  1. Add preset mode to AutoQDQ.
  2. Add pattern cache tests.
  3. increase batch size for stable QDQ insertion
  4. update LICENSE 2024 -> 2026.
  5. add cuda-python to pyproject.toml for [onnx]

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, using torch.load(..., weights_only=True), avoiding pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other source, did you follow IP policy in CONTRIBUTING.md?: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features

    • Added mode presets (quick, default, extensive) with a new --mode option for autotuning.
    • Introduced AutoQDQ: automated Q/DQ placement tool for ONNX quantization with pattern caching and checkpoint/resume.
  • Documentation

    • Updated CLI help and examples to show mode usage and override semantics.
  • Tests

    • Added comprehensive tests for pattern cache and mode-presets/explicit-override behavior; re-enabled a GPU autotuning workflow test; minor test updates.
  • Chores

    • Added "cuda-python" to optional dependencies.

@willg-nv willg-nv requested a review from a team as a code owner March 3, 2026 03:00
@willg-nv willg-nv requested a review from gcunhase March 3, 2026 03:00
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds CLI mode presets (quick/default/extensive) with explicit-flag tracking and application in runtime; expands unit tests for presets and PatternCache, adjusts test model tensor shapes, enables a GPU autotune test, and adds cuda-python to the onnx optional dependencies.

Changes

Cohort / File(s) Summary
Autotuning CLI & Presets
modelopt/onnx/quantization/autotune/__main__.py
Adds MODE_PRESETS, _StoreWithExplicitFlag, apply_mode_presets(args), and integrates preset application in run_autotune; adds --mode CLI option and updates help/semantics for --schemes_per_region, --warmup_runs, and --timing_runs.
Autotune parser helper (tests)
modelopt/onnx/quantization/autotune/__main__.py
Exposes _get_autotune_parser() to retrieve the CLI parser for tests.
Autotune CLI Tests
tests/unit/onnx/quantization/autotune/test_autotune_config.py
Adds TestModePresets verifying preset application, explicit-flag precedence, and short-form flag handling; imports MODE_PRESETS, _get_autotune_parser, and apply_mode_presets.
PatternCache Tests
tests/unit/onnx/quantization/autotune/test_pattern_cache.py
New TestPatternCache suite: creation/accessors, insertion and retrieval of pattern schemes, multiple-pattern handling, dict/YAML serialization round-trips, cache merge/update semantics, and best-scheme selection by latency.
Test Model Fixture
tests/_test_utils/onnx/quantization/autotune/models.py
Updated test ONNX model tensor shapes (input/output and conv weight) to larger batch/channel dimensions and adjusted helper calls accordingly.
GPU Test Activation
tests/gpu/onnx/quantization/autotune/test_workflow.py
Removed pytest.mark.skip from test_export_quantized_model, enabling the GPU workflow test to run.
Minor Header Update
tests/unit/onnx/quantization/autotune/test_region.py
SPDX copyright year updated (2024 → 2026).
Packaging Extras
pyproject.toml
Added cuda-python to the onnx optional dependencies.
Changelog
CHANGELOG.rst
Documents the new AutoQDQ tool and AutoQDQ/AutoQDQ-related features added.

Sequence Diagram(s)

mermaid
sequenceDiagram
actor User
participant CLI as "CLI Parser"
participant Preset as "apply_mode_presets"
participant Runner as "Autotune Runner"
User->>CLI: invoke python -m modelopt.onnx.quantization.autotune --mode= [flags]
CLI->>Preset: pass parsed args (with explicit-flag markers)
Preset->>Preset: resolve mode defaults (quick/default/extensive)
Preset->>CLI: override unset args (schemes_per_region, warmup_runs, timing_runs)
CLI->>Runner: run_autotune(final args)
Runner->>Runner: execute tuning using final args

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "Integrate Automated QDQ placement tool - part 4.4" directly aligns with the PR's primary objective to integrate the AutoQDQ feature, as documented in the objectives and change summaries.
Docstring Coverage ✅ Passed Docstring coverage is 92.59% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed PR modifications reviewed against five critical security anti-patterns. No instances of torch.load/numpy.load without justification, hardcoded trust_remote_code, eval/exec on external input, or new # nosec comments found.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.4 branch from 88793fa to 4604b84 Compare March 3, 2026 03:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/autotune/__main__.py`:
- Around line 36-57: Add pytest unit tests for the new
MODE_PRESETS/apply_mode_presets behavior: create tests that verify (1) selecting
each preset name in MODE_PRESETS sets args.num_schemes, args.warmup_runs, and
args.timing_runs when those fields are the DEFAULT_* values, (2) explicitly
passing non-default CLI values prevents overriding by apply_mode_presets, and
(3) explicitly passing values equal to the DEFAULT_* constants still counts as
"explicit" and should be overridden only if the code intends otherwise (cover
both expected behaviors). Use the apply_mode_presets function and the constants
DEFAULT_NUM_SCHEMES, DEFAULT_WARMUP_RUNS, DEFAULT_TIMING_RUNS and test args.mode
invalid case as well; place tests under tests/ (pytest) to ensure coverage and
assert that preset lookup uses MODE_PRESETS entries.
- Around line 43-56: The apply_mode_presets function currently detects
"unspecified" args by comparing against DEFAULT_* constants, which causes preset
values to override explicit CLI flags that happen to equal those defaults;
change the CLI parsing so the relevant options (num_schemes / warmup_runs /
timing_runs / schemes_per_region) default to None when not provided, then update
apply_mode_presets to only apply MODE_PRESETS when the corresponding args
attribute is None (e.g., check args.num_schemes is None instead of equality to
DEFAULT_NUM_SCHEMES); add unit tests exercising "--mode X" together with
explicit flags (including values equal to previous defaults) to assert explicit
flags are preserved. Ensure references: apply_mode_presets, MODE_PRESETS,
args.mode, DEFAULT_NUM_SCHEMES, DEFAULT_WARMUP_RUNS, DEFAULT_TIMING_RUNS.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edde087 and 88793fa.

📒 Files selected for processing (4)
  • modelopt/onnx/quantization/autotune/__main__.py
  • tests/_test_utils/onnx/quantization/autotune/models.py
  • tests/unit/onnx/quantization/autotune/test_pattern_cache.py
  • tests/unit/onnx/quantization/autotune/test_region.py

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.4 branch 2 times, most recently from 148771e to 27b930d Compare March 3, 2026 03:21
@willg-nv willg-nv requested a review from a team as a code owner March 3, 2026 03:21
@willg-nv willg-nv requested a review from kevalmorabia97 March 3, 2026 03:21
@kevalmorabia97
Copy link
Collaborator

/ok to test 27b930d

@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 84.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.75%. Comparing base (42482b1) to head (c1b363b).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/autotune/__main__.py 84.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #961      +/-   ##
==========================================
- Coverage   72.13%   71.75%   -0.39%     
==========================================
  Files         209      211       +2     
  Lines       23631    23890     +259     
==========================================
+ Hits        17046    17142      +96     
- Misses       6585     6748     +163     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97
Copy link
Collaborator

/ok to test f3491cf

@kevalmorabia97
Copy link
Collaborator

@willg-nv the test passes in this PR but is too slow. Any way to speed it up?

============================= slowest 50 durations =============================
168.39s call     tests/gpu/onnx/quantization/autotune/test_workflow.py::test_export_quantized_model[True]
...

@gcunhase
Copy link
Contributor

gcunhase commented Mar 3, 2026

Thank you for filing this bug. Can you please:

  1. Add an update on the CHANGELOG file. For ex (feel free to suggest something else):
**New Features**

...
- Add automated Q/DQ placement toolkit for ONNX quantization. This toolkit uses TensorRT latency measurements to detect the desired Q/DQ insertion points.
  1. Confirm that test_config.py is now test_autotune_config.py?

Will approve this PR after those 2 comments are fulfilled.

Thanks!

@willg-nv
Copy link
Contributor Author

willg-nv commented Mar 4, 2026

@willg-nv the test passes in this PR but is too slow. Any way to speed it up?

============================= slowest 50 durations =============================
168.39s call     tests/gpu/onnx/quantization/autotune/test_workflow.py::test_export_quantized_model[True]
...

@gcunhase @cjluo-nv can we make these slow tests manual tests or move to nightly tests?

@willg-nv
Copy link
Contributor Author

willg-nv commented Mar 4, 2026

Thank you for filing this bug. Can you please:

  1. Add an update on the CHANGELOG file. For ex (feel free to suggest something else):
**New Features**

...
- Add automated Q/DQ placement toolkit for ONNX quantization. This toolkit uses TensorRT latency measurements to detect the desired Q/DQ insertion points.
  1. Confirm that test_config.py is now test_autotune_config.py?

Will approve this PR after those 2 comments are fulfilled.

Thanks!

  1. updated CHANGELOG
  2. I think test_autotune_config.py is already on main branch. file.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
CHANGELOG.rst (1)

20-20: Consider mentioning the preset mode feature.

The changelog entry is comprehensive and well-written. However, based on the PR objectives and changes, this release includes CLI mode presets (quick, default, extensive) that allow users to control the optimization depth. Since this is a user-facing CLI feature, consider adding it to the feature list for completeness.

📝 Suggested addition
-**AutoQDQ**: New tool for automated Q/DQ (Quantize/Dequantize) placement optimization for ONNX models. Uses TensorRT latency measurements to choose insertion schemes that minimize inference time. Discovers regions automatically, groups them by structural pattern, and tests multiple Q/DQ schemes per pattern. Supports INT8 and FP8 quantization, pattern cache for warm-start on similar models, checkpoint/resume, and importing patterns from an existing QDQ baseline. CLI: ``python -m modelopt.onnx.quantization.autotune``. See the AutoQDQ guide in the documentation.
+**AutoQDQ**: New tool for automated Q/DQ (Quantize/Dequantize) placement optimization for ONNX models. Uses TensorRT latency measurements to choose insertion schemes that minimize inference time. Discovers regions automatically, groups them by structural pattern, and tests multiple Q/DQ schemes per pattern. Supports INT8 and FP8 quantization, mode presets (quick/default/extensive) for controlling optimization depth, pattern cache for warm-start on similar models, checkpoint/resume, and importing patterns from an existing QDQ baseline. CLI: ``python -m modelopt.onnx.quantization.autotune``. See the AutoQDQ guide in the documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.rst` at line 20, Update the AutoQDQ changelog entry to mention the
new CLI preset modes by adding a short sentence referencing the CLI entry point
(python -m modelopt.onnx.quantization.autotune) and the available presets
(quick, default, extensive); e.g., append that users can control optimization
depth via these presets so the AutoQDQ feature description (AutoQDQ) includes
the preset-mode capability in the feature list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@CHANGELOG.rst`:
- Line 20: Update the AutoQDQ changelog entry to mention the new CLI preset
modes by adding a short sentence referencing the CLI entry point (python -m
modelopt.onnx.quantization.autotune) and the available presets (quick, default,
extensive); e.g., append that users can control optimization depth via these
presets so the AutoQDQ feature description (AutoQDQ) includes the preset-mode
capability in the feature list.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c8666253-f3d8-4780-9e4f-cbdc81e990ca

📥 Commits

Reviewing files that changed from the base of the PR and between f3491cf and e47c678.

📒 Files selected for processing (1)
  • CHANGELOG.rst

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.4 branch from e47c678 to 6ada4d8 Compare March 4, 2026 10:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/onnx/quantization/autotune/__main__.py (1)

30-40: ⚠️ Potential issue | 🟡 Minor

Effective defaults can drift from displayed defaults.

Current parser defaults (30/5/20) and MODE_PRESETS["default"] (50/50/100) diverge, while --mode defaults to "default". That makes the runtime defaults differ from what the per-flag help currently advertises, which is confusing for users and performance expectations.

Suggested consolidation (single source of truth)
-DEFAULT_NUM_SCHEMES = 30
+DEFAULT_MODE = "default"
 DEFAULT_QUANT_TYPE = "int8"
 DEFAULT_DQ_DTYPE = "float32"
 DEFAULT_TIMING_CACHE = str(Path(tempfile.gettempdir()) / "trtexec_timing.cache")
-DEFAULT_WARMUP_RUNS = 5
-DEFAULT_TIMING_RUNS = 20
 MODE_PRESETS = {
     "quick": {"schemes_per_region": 30, "warmup_runs": 10, "timing_runs": 50},
     "default": {"schemes_per_region": 50, "warmup_runs": 50, "timing_runs": 100},
     "extensive": {"schemes_per_region": 200, "warmup_runs": 50, "timing_runs": 200},
 }
+DEFAULT_NUM_SCHEMES = MODE_PRESETS[DEFAULT_MODE]["schemes_per_region"]
+DEFAULT_WARMUP_RUNS = MODE_PRESETS[DEFAULT_MODE]["warmup_runs"]
+DEFAULT_TIMING_RUNS = MODE_PRESETS[DEFAULT_MODE]["timing_runs"]
@@
-        default="default",
+        default=DEFAULT_MODE,

Also applies to: 242-252, 257-262, 322-336

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/__main__.py` around lines 30 - 40, The
help and parser defaults (DEFAULT_NUM_SCHEMES, DEFAULT_WARMUP_RUNS,
DEFAULT_TIMING_RUNS) currently disagree with MODE_PRESETS["default"], causing
--mode default behavior to differ from per-flag help; fix by making a single
source of truth: either (A) update DEFAULT_NUM_SCHEMES/WARMUP_RUNS/TIMING_RUNS
to match MODE_PRESETS["default"] (50/50/100) and update help strings, or (B)
change the argument parser to derive its per-flag defaults and help text from
MODE_PRESETS["default"] (e.g., use MODE_PRESETS["default"]["schemes_per_region"]
etc.) so --mode="default" and per-flag help stay consistent; apply the same
consolidation where the other occurrences of these constants are used (refer to
DEFAULT_QUANT_TYPE, DEFAULT_DQ_DTYPE, DEFAULT_TIMING_CACHE if needed) and ensure
--mode default remains "default".
🧹 Nitpick comments (3)
tests/unit/onnx/quantization/autotune/test_pattern_cache.py (1)

107-109: Reduce test brittleness from positional cache indexing.

These assertions depend on list ordering (pattern_schemes[0]). If internal ordering changes, tests can fail despite correct behavior. Prefer selecting by pattern_signature before asserting details.

♻️ Suggested refactor
-        restored_ps = restored.pattern_schemes[0]
-        assert restored_ps.pattern_signature == "Conv->Relu"
+        matches = [x for x in restored.pattern_schemes if x.pattern_signature == "Conv->Relu"]
+        assert len(matches) == 1
+        restored_ps = matches[0]
@@
-        conv_relu_ps = cache.pattern_schemes[0]
-        assert conv_relu_ps.pattern_signature == "Conv->Relu"
+        matches = [x for x in cache.pattern_schemes if x.pattern_signature == "Conv->Relu"]
+        assert len(matches) == 1
+        conv_relu_ps = matches[0]
@@
-        conv_relu_ps = cache.pattern_schemes[0]
-        assert conv_relu_ps.pattern_signature == "Conv->Relu"
+        matches = [x for x in cache.pattern_schemes if x.pattern_signature == "Conv->Relu"]
+        assert len(matches) == 1
+        conv_relu_ps = matches[0]

Also applies to: 152-154, 176-178

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/onnx/quantization/autotune/test_pattern_cache.py` around lines 107
- 109, The test is brittle because it assumes ordering by using
restored.pattern_schemes[0]; change the assertions to locate the PatternScheme
by matching pattern_signature (e.g., find the element in
restored.pattern_schemes whose pattern_signature == "Conv->Relu") and then
assert on its .schemes length and other properties; update the other occurrences
that use positional indexing (the similar assertions around the other checks) to
the same pattern-signature lookup to make the tests order-independent.
tests/_test_utils/onnx/quantization/autotune/models.py (1)

28-33: Consider parameterizing/reducing test shapes to avoid worsening GPU autotune runtime.

The new shapes (Line 28-33) plus weights (Line 45) imply a very compute-heavy Conv if this model is actually executed in GPU autotune tests. Given the existing complaint about a ~168s GPU autotune test, this change risks making it slower.

A low-friction option is to keep defaults but allow call sites (especially GPU tests) to override sizes:

Suggested refactor (backward compatible defaults)
-def _create_simple_conv_onnx_model():
+def _create_simple_conv_onnx_model(*, batch: int = 64, in_ch: int = 32, out_ch: int = 64, hw: int = 224):
     """Build ONNX model: Input -> Conv -> Relu -> Output (minimal for autotuner tests)."""
     input_tensor = helper.make_tensor_value_info(
-        "input", onnx.TensorProto.FLOAT, [64, 32, 224, 224]
+        "input", onnx.TensorProto.FLOAT, [batch, in_ch, hw, hw]
     )
     output_tensor = helper.make_tensor_value_info(
-        "output", onnx.TensorProto.FLOAT, [64, 64, 224, 224]
+        "output", onnx.TensorProto.FLOAT, [batch, out_ch, hw, hw]
     )
@@
         initializer=[
             helper.make_tensor(
-                "conv_weight", onnx.TensorProto.FLOAT, [64, 32, 3, 3], [0.1] * (64 * 32 * 3 * 3)
+                "conv_weight",
+                onnx.TensorProto.FLOAT,
+                [out_ch, in_ch, 3, 3],
+                [0.1] * (out_ch * in_ch * 3 * 3),
             )
         ],
     )

If stability only depends on batch, consider keeping batch=64 but shrinking hw (e.g., 32/64) in the GPU workflow tests to cut runtime.

Also applies to: 44-46

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/_test_utils/onnx/quantization/autotune/models.py` around lines 28 - 33,
The test model uses hard-coded large tensors (input_tensor and output_tensor and
the conv weight tensor around line 45) which makes GPU autotune slow; refactor
the model builder to accept parameters for batch, in_channels, out_channels,
height, and width with sensible, smaller defaults (e.g., batch=64 kept if needed
but default hw=32 or lower) so call sites can override sizes for GPU tests, and
ensure the conv weight tensor shape is derived from those parameters
(maintaining current defaults to preserve backward compatibility).
tests/unit/onnx/quantization/autotune/test_autotune_config.py (1)

125-132: Add one regression test for the implicit parser-default path (--mode omitted).

You already validate --mode default; adding the no---mode case would lock in the intended runtime behavior if parser defaults change later.

Suggested test addition
 class TestModePresets:
@@
+    def test_mode_omitted_uses_parser_default_mode_preset(self):
+        """When --mode is omitted, parser default mode preset is applied."""
+        args = self._parse_cli(["--onnx_path", "model.onnx"])
+        preset = MODE_PRESETS["default"]
+        assert args.num_schemes == preset["schemes_per_region"]
+        assert args.warmup_runs == preset["warmup_runs"]
+        assert args.timing_runs == preset["timing_runs"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/onnx/quantization/autotune/test_autotune_config.py` around lines
125 - 132, Add a new unit test that mirrors
test_mode_default_applies_preset_when_no_explicit_flags but omits the --mode
flag to cover the implicit parser-default path: call the existing _parse_cli
helper with ["--onnx_path", "model.onnx"] (no --mode), look up preset =
MODE_PRESETS["default"], and assert that args.num_schemes ==
preset["schemes_per_region"], args.warmup_runs == preset["warmup_runs"], and
args.timing_runs == preset["timing_runs"]; place it alongside
test_mode_default_applies_preset_when_no_explicit_flags with a clear name like
test_mode_implicit_applies_preset_when_no_explicit_flags so future
parser-default changes are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/_test_utils/onnx/quantization/autotune/models.py`:
- Around line 28-33: The declared output_tensor shape ([64, 64, 224, 224]) is
inconsistent with a 3x3 Conv without padding (spatial dims should be 222x222);
update the model either by adding symmetric padding to the Conv node (set pads
or auto_pad to keep spatial dims at 224) or change output_tensor to [64, 64,
222, 222] to match no-padding behavior; locate and modify the tensor declaration
named output_tensor and/or the Conv node attributes (pads/auto_pad) in the model
definition so shapes are consistent.

---

Outside diff comments:
In `@modelopt/onnx/quantization/autotune/__main__.py`:
- Around line 30-40: The help and parser defaults (DEFAULT_NUM_SCHEMES,
DEFAULT_WARMUP_RUNS, DEFAULT_TIMING_RUNS) currently disagree with
MODE_PRESETS["default"], causing --mode default behavior to differ from per-flag
help; fix by making a single source of truth: either (A) update
DEFAULT_NUM_SCHEMES/WARMUP_RUNS/TIMING_RUNS to match MODE_PRESETS["default"]
(50/50/100) and update help strings, or (B) change the argument parser to derive
its per-flag defaults and help text from MODE_PRESETS["default"] (e.g., use
MODE_PRESETS["default"]["schemes_per_region"] etc.) so --mode="default" and
per-flag help stay consistent; apply the same consolidation where the other
occurrences of these constants are used (refer to DEFAULT_QUANT_TYPE,
DEFAULT_DQ_DTYPE, DEFAULT_TIMING_CACHE if needed) and ensure --mode default
remains "default".

---

Nitpick comments:
In `@tests/_test_utils/onnx/quantization/autotune/models.py`:
- Around line 28-33: The test model uses hard-coded large tensors (input_tensor
and output_tensor and the conv weight tensor around line 45) which makes GPU
autotune slow; refactor the model builder to accept parameters for batch,
in_channels, out_channels, height, and width with sensible, smaller defaults
(e.g., batch=64 kept if needed but default hw=32 or lower) so call sites can
override sizes for GPU tests, and ensure the conv weight tensor shape is derived
from those parameters (maintaining current defaults to preserve backward
compatibility).

In `@tests/unit/onnx/quantization/autotune/test_autotune_config.py`:
- Around line 125-132: Add a new unit test that mirrors
test_mode_default_applies_preset_when_no_explicit_flags but omits the --mode
flag to cover the implicit parser-default path: call the existing _parse_cli
helper with ["--onnx_path", "model.onnx"] (no --mode), look up preset =
MODE_PRESETS["default"], and assert that args.num_schemes ==
preset["schemes_per_region"], args.warmup_runs == preset["warmup_runs"], and
args.timing_runs == preset["timing_runs"]; place it alongside
test_mode_default_applies_preset_when_no_explicit_flags with a clear name like
test_mode_implicit_applies_preset_when_no_explicit_flags so future
parser-default changes are caught.

In `@tests/unit/onnx/quantization/autotune/test_pattern_cache.py`:
- Around line 107-109: The test is brittle because it assumes ordering by using
restored.pattern_schemes[0]; change the assertions to locate the PatternScheme
by matching pattern_signature (e.g., find the element in
restored.pattern_schemes whose pattern_signature == "Conv->Relu") and then
assert on its .schemes length and other properties; update the other occurrences
that use positional indexing (the similar assertions around the other checks) to
the same pattern-signature lookup to make the tests order-independent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 670b0a53-c9e9-494b-8034-fb49328b288b

📥 Commits

Reviewing files that changed from the base of the PR and between e47c678 and 6ada4d8.

📒 Files selected for processing (8)
  • CHANGELOG.rst
  • modelopt/onnx/quantization/autotune/__main__.py
  • pyproject.toml
  • tests/_test_utils/onnx/quantization/autotune/models.py
  • tests/gpu/onnx/quantization/autotune/test_workflow.py
  • tests/unit/onnx/quantization/autotune/test_autotune_config.py
  • tests/unit/onnx/quantization/autotune/test_pattern_cache.py
  • tests/unit/onnx/quantization/autotune/test_region.py
💤 Files with no reviewable changes (1)
  • tests/gpu/onnx/quantization/autotune/test_workflow.py
✅ Files skipped from review due to trivial changes (1)
  • tests/unit/onnx/quantization/autotune/test_region.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • CHANGELOG.rst
  • pyproject.toml

@gcunhase
Copy link
Contributor

gcunhase commented Mar 4, 2026

@willg-nv the test passes in this PR but is too slow. Any way to speed it up?

============================= slowest 50 durations =============================
168.39s call     tests/gpu/onnx/quantization/autotune/test_workflow.py::test_export_quantized_model[True]
...

@gcunhase @cjluo-nv can we make these slow tests manual tests or move to nightly tests?

@cjluo-nv for input on this.

@kevalmorabia97
Copy link
Collaborator

@willg-nv the test passes in this PR but is too slow. Any way to speed it up?

============================= slowest 50 durations =============================
168.39s call     tests/gpu/onnx/quantization/autotune/test_workflow.py::test_export_quantized_model[True]
...

@gcunhase @cjluo-nv can we make these slow tests manual tests or move to nightly tests?

Do we know why is this test is so slow in the first place? Is there opportunity to use a smaller model, less data, less iterations or something like this to make it faster? Or even in the most smallest case, it still takes ~3mins to run?

@gcunhase
Copy link
Contributor

gcunhase commented Mar 5, 2026

@willg-nv the test passes in this PR but is too slow. Any way to speed it up?

============================= slowest 50 durations =============================
168.39s call     tests/gpu/onnx/quantization/autotune/test_workflow.py::test_export_quantized_model[True]
...

@gcunhase @cjluo-nv can we make these slow tests manual tests or move to nightly tests?

Do we know why is this test is so slow in the first place? Is there opportunity to use a smaller model, less data, less iterations or something like this to make it faster? Or even in the most smallest case, it still takes ~3mins to run?

@willg-nv

@willg-nv
Copy link
Contributor Author

willg-nv commented Mar 6, 2026

@willg-nv the test passes in this PR but is too slow. Any way to speed it up?

============================= slowest 50 durations =============================
168.39s call     tests/gpu/onnx/quantization/autotune/test_workflow.py::test_export_quantized_model[True]
...

@gcunhase @cjluo-nv can we make these slow tests manual tests or move to nightly tests?

Do we know why is this test is so slow in the first place? Is there opportunity to use a smaller model, less data, less iterations or something like this to make it faster? Or even in the most smallest case, it still takes ~3mins to run?

@willg-nv

This is an integration test which runs trtexec multiple times to find best Q/DQ insertion points for Conv->Relu model. Ideally it would run trtexec 4 times. This is the reason why this test would take 3mins to complete. If the problem size of Conv->Relu get decreased, the benchmark time would be unstable, and the test result would itermittently get failed.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.4 branch from 12ac037 to b865901 Compare March 6, 2026 03:05
willg-nv and others added 5 commits March 6, 2026 03:08
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
[project.optional-dependencies]
onnx = [
"cppimport",
"cuda-python",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need to be removed in lieu of #998

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants