Add Automated QDQ placement example - Part 4.1 by willg-nv · Pull Request #841 · NVIDIA/Model-Optimizer

willg-nv · 2026-02-03T02:43:44Z

What does this PR do?

Type of change: ?
This change implement a simple example to illustrate the usage of Automated QDQ placement tool.

Overview: ?

Usage

python3 -m modelopt.onnx.quantization.autotune \
    --model resnet50.bs128.onnx \
    --output ./resnet50_autotuned \
    --qdq-baseline resnet50_quantized.onnx \
    --schemes-per-region 50

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

New Features
- Log output now includes timestamps for improved debugging and traceability.
Documentation
- Added comprehensive guide for QDQ placement optimization with TensorRT, covering setup, usage examples, and deployment instructions.

copy-pr-bot · 2026-02-03T02:43:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-03T02:44:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3237acf7-426b-4f54-89ba-d393e4c5073c

📥 Commits

Reviewing files that changed from the base of the PR and between f36aa99 and ee48afc.

📒 Files selected for processing (2)

examples/onnx/autoqdq/README.md
modelopt/onnx/logging_config.py

✅ Files skipped from review due to trivial changes (1)

examples/onnx/autoqdq/README.md

🚧 Files skipped from review as they are similar to previous changes (1)

modelopt/onnx/logging_config.py

📝 Walkthrough

Walkthrough

This pull request adds a timestamp to the log formatter in the ONNX logging configuration and introduces a comprehensive README for the ONNX autoqdq example, documenting quantization optimization with TensorRT.

Changes

Cohort / File(s)	Summary
Logging Configuration `modelopt/onnx/logging_config.py`	Modified log formatter to prepend timestamp prefix (%(asctime)s) to all log output lines for both console and file handlers.
Documentation `examples/onnx/autoqdq/README.md`	New comprehensive README documenting the QDQ placement optimization example, including prerequisites, quick-start usage for INT8/FP8 quantization, region inspection tooling, deployment guidance, autotuning workflows, and API reference links.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'Add Automated QDQ placement example - Part 4.1' directly and clearly describes the primary change: adding a new example for Automated QDQ placement, with version context.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected in changed files; logging configuration and documentation-only changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@examples/qdq_placement/README.md`:
- Around line 191-192: Fix the typo in the README sentence about TensorRT remote
autotuning: change "autouning" to "autotuning" in the line that reads "TensorRT
10.16 support remote autotuning, pass remoteAutoTuningConfig to trtexec to
benchmark with remote autouning." to correctly spell "autotuning" and ensure the
sentence still reads clearly (e.g., "TensorRT 10.16 supports remote autotuning;
pass remoteAutoTuningConfig to trtexec to benchmark with remote autotuning.").
- Around line 128-129: The downloaded filename in the curl command is
misleading: the URL fetches resnet101-v2-7.onnx but the saved name and
subsequent commands use resnet101_Opset17.onnx; update the saved filename and
downstream usage to a consistent, accurate name (e.g., use resnet101-v2-7.onnx
in the curl -o and in the python3 set_batch_size.py command) or add a one-line
clarifying comment above the commands explaining that resnet101_Opset17.onnx is
an alias for resnet101-v2-7.onnx so readers know which model variant is being
used.

🧹 Nitpick comments (3)

examples/qdq_placement/set_batch_size.py (3)
46-48: Consider validating that the model has inputs.

If the model has no graph inputs, accessing graph.input[0] will raise an IndexError. While unlikely for typical models, adding a guard improves robustness.
🛡️ Proposed defensive check
     # Get the input tensor
     graph = model.graph
+    if not graph.input:
+        raise ValueError(f"Model {model_path} has no graph inputs")
     input_tensor = graph.input[0]
60-64: Output batch dimension assumption may not hold for all models.

This code assumes the first dimension of every output is the batch dimension. While true for ResNet50 and most classification models, some models may have scalar outputs or outputs where batch isn't the first dimension. Consider adding a note in the docstring about this assumption, or making output modification opt-in.

78-84: Use the repository's utility functions for saving and checking the model to handle large files consistently.

The codebase provides save_onnx() and check_model() utilities in modelopt/onnx/utils.py that handle models larger than 2GB by using external data. Replace the standard onnx.save() (line 80) and onnx.checker.check_model() (line 84) with calls to modelopt.onnx.utils.save_onnx() and modelopt.onnx.utils.check_model(). While ResNet50 won't encounter this limitation, using the existing utilities ensures consistency across the codebase and prevents issues when the script is applied to larger models.

examples/qdq_placement/README.md

modelopt-bot · 2026-02-14T06:12:08Z

Code Review: Automated QDQ Placement Example - Part 4.1

Thanks for this well-structured documentation PR. I have reviewed it in the context of the larger Automated QDQ feature series (Parts 1-4). Here are my findings:

📋 Context: PR Dependency Chain

This PR is Part 4.1 of a multi-part feature. The dependency chain:

✅ MERGED: Integrate Automated QDQ placement tool - Part 1 #701 (Part 1), Integrate Automated QDQ placement tool - part 2.1 #844-846 (Parts 2.1-2.3) - Core infrastructure
🔄 OPEN: Integrate Automated QDQ benchmark - part 3.1 #837, Integrate Automated QDQ autotuner - part 3.2 #838, Integrate Automated QDQ placement tool - part 3.3 #839 (Parts 3.1-3.3) - Benchmark, Autotuner, CLI
🔄 CURRENT: Add Automated QDQ placement example - Part 4.1 #841 (Part 4.1) - Examples & documentation ⬅️ You are here
🔄 PENDING: Add Automated QDQ placement reference - part 4.2 #842, Integrate Automated QDQ placement tool - part 4.3 #843 (Parts 4.2-4.3) - Reference docs & integration

✅ Positive Aspects

Excellent documentation quality - The README is comprehensive with clear examples for:
- Basic usage with INT8 and FP8 quantization
- Pattern cache reuse for similar models
- Starting from existing QDQ baselines
- Remote autotuning with TensorRT 10.16+
- Programmatic API references
Useful utility script - set_batch_size.py is well-designed with:
- Proper shape inference and model verification
- Clean argparse interface with epilog examples
- Good error handling (try/catch around shape inference)
- Support for both input and output tensor shape updates
Logging improvement - Adding timestamps to the log formatter is a small but valuable quality-of-life improvement.

⚠️ Issues & Suggestions

1. Import Path Issue (Minor)

In set_batch_size.py, the import order could be improved:

import argparse           # stdlib first

import onnx               # third-party next
from onnx import shape_inference

Consider following PEP 8 import ordering convention.

2. Docstring Coverage (Noted by CodeRabbit)

The set_batch_size() function has a docstring but coverage could be improved for module-level documentation. Given this is an example script, this is acceptable but consider adding a module docstring explaining the purpose.

3. Input Validation in `set_batch_size.py`

The script assumes the first input tensor is the one to modify. This works for ResNet but may not work for all models:

input_tensor = graph.input[0]  # Assumes input[0] is the data tensor

Consider adding a --input-name option to allow users to specify which input to modify.

4. Copyright Year Inconsistency

set_batch_size.py uses copyright year 2024, but other files in the autotune series use 2026. Recommend using 2026 for consistency with the rest of the feature.

5. README Link Verification

The README references:

docs/source/guides/9_qdq_placement.rst
docs/source/reference/2_qdq_placement.rst
PR Add Automated QDQ placement reference - part 4.2 #842 (Part 4.2) adds reference documentation

Please ensure Part 4.2 is merged before this one, or verify these files exist.

🔍 Cross-Reference Check

I spot-checked the CLI args documented in README against #839 (__main__.py). The following are documented correctly:

✅ --model, --output
✅ --quant-type (int8/fp8)
✅ --schemes-per-region
✅ --qdq-baseline
✅ --pattern-cache
✅ --use_trtexec, --trtexec_benchmark_args

📌 Recommendations

Add --input-name option to set_batch_size.py for models with multiple inputs
Update copyright year to 2026 for consistency
Consider defensive coding: Check if tensors actually have dynamic dimensions before modifying (currently assumes dim 0 is batch)
Coordinate merge order: Ensure Integrate Automated QDQ benchmark - part 3.1 #837-Integrate Automated QDQ placement tool - part 3.3 #839 land before this PR so the examples work

📝 Overall Assessment

Approved with minor suggestions. This is a high-quality documentation PR that completes the user-facing portion of the Automated QDQ feature. The examples are clear, the utility script is useful, and the cross-references to API documentation are appropriate.

The minor issues (copyright year, input validation) can be addressed in follow-up or are acceptable for an example script.

Great work on the comprehensive documentation!

examples/qdq_placement/set_batch_size.py

examples/qdq_placement/README.md

gcunhase · 2026-02-26T16:39:27Z

Should we rename the examples folder to onnx_autoqdq or onnx_autotuner instead?

gcunhase · 2026-02-26T16:43:35Z

Can we also add an example on how to use region inspect for debugging? Please also include any other features that might help the user with debugging. Thanks.

Copilot

Pull request overview

This PR adds documentation and helper utilities for the QDQ (Quantize/Dequantize) placement optimization feature as part of a larger feature rollout (Part 4.1). The changes prepare the example directory and improve logging capabilities for upcoming autotune functionality.

Changes:

Enhanced ONNX logging with timestamp support for better traceability
Added a utility script to convert ONNX models from dynamic to fixed batch size
Provided comprehensive README with usage examples for the QDQ placement optimization feature

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
modelopt/onnx/logging_config.py	Added timestamp to log formatter for improved traceability
examples/qdq_placement/set_batch_size.py	New utility script to set fixed batch size on ONNX models for TensorRT benchmarking
examples/qdq_placement/README.md	Comprehensive documentation with prerequisites, usage examples, and advanced features for QDQ placement optimization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/qdq_placement/set_batch_size.py

examples/qdq_placement/README.md

examples/qdq_placement/set_batch_size.py

examples/onnx/autoqdq/README.md

codecov · 2026-03-03T07:43:10Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (a34d613) to head (ee48afc).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #841      +/-   ##
==========================================
+ Coverage   72.10%   72.12%   +0.02%     
==========================================
  Files         209      209              
  Lines       23628    23628              
==========================================
+ Hits        17036    17042       +6     
+ Misses       6592     6586       -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv · 2026-03-03T19:11:56Z

/ok to test 983dc57

examples/onnx/autoqdq/README.md

cjluo-nv · 2026-03-04T07:33:32Z

/ok to test 6057717

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv · 2026-03-04T10:39:35Z

The failed test is workflow test. This test is unstable and should be waived.

cjluo-nv · 2026-03-04T17:44:09Z

/ok to test ee48afc

willg-nv requested review from a team as code owners February 3, 2026 02:43

willg-nv requested review from ChenhanYu and galagam February 3, 2026 02:43

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.1 branch from 1b8d896 to f36aa99 Compare February 3, 2026 02:45

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

examples/qdq_placement/README.md Outdated Show resolved Hide resolved

examples/qdq_placement/README.md Outdated Show resolved Hide resolved

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.1 branch from f36aa99 to 7257a12 Compare February 3, 2026 03:42

galagam reviewed Feb 22, 2026

View reviewed changes

examples/qdq_placement/set_batch_size.py Outdated Show resolved Hide resolved

galagam reviewed Feb 22, 2026

View reviewed changes

examples/qdq_placement/set_batch_size.py Outdated Show resolved Hide resolved

galagam reviewed Feb 22, 2026

View reviewed changes

examples/qdq_placement/set_batch_size.py Outdated Show resolved Hide resolved

gcunhase reviewed Feb 26, 2026

View reviewed changes