Skip to content

Add Automated QDQ placement example - Part 4.1#841

Merged
gcunhase merged 5 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.1
Mar 4, 2026
Merged

Add Automated QDQ placement example - Part 4.1#841
gcunhase merged 5 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.1

Conversation

@willg-nv
Copy link
Contributor

@willg-nv willg-nv commented Feb 3, 2026

What does this PR do?

Type of change: ?
This change implement a simple example to illustrate the usage of Automated QDQ placement tool.

Overview: ?

Usage

python3 -m modelopt.onnx.quantization.autotune \
    --model resnet50.bs128.onnx \
    --output ./resnet50_autotuned \
    --qdq-baseline resnet50_quantized.onnx \
    --schemes-per-region 50

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

  • New Features

    • Log output now includes timestamps for improved debugging and traceability.
  • Documentation

    • Added comprehensive guide for QDQ placement optimization with TensorRT, covering setup, usage examples, and deployment instructions.

@willg-nv willg-nv requested review from a team as code owners February 3, 2026 02:43
@willg-nv willg-nv requested review from ChenhanYu and galagam February 3, 2026 02:43
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3237acf7-426b-4f54-89ba-d393e4c5073c

📥 Commits

Reviewing files that changed from the base of the PR and between f36aa99 and ee48afc.

📒 Files selected for processing (2)
  • examples/onnx/autoqdq/README.md
  • modelopt/onnx/logging_config.py
✅ Files skipped from review due to trivial changes (1)
  • examples/onnx/autoqdq/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/onnx/logging_config.py

📝 Walkthrough

Walkthrough

This pull request adds a timestamp to the log formatter in the ONNX logging configuration and introduces a comprehensive README for the ONNX autoqdq example, documenting quantization optimization with TensorRT.

Changes

Cohort / File(s) Summary
Logging Configuration
modelopt/onnx/logging_config.py
Modified log formatter to prepend timestamp prefix (%(asctime)s) to all log output lines for both console and file handlers.
Documentation
examples/onnx/autoqdq/README.md
New comprehensive README documenting the QDQ placement optimization example, including prerequisites, quick-start usage for INT8/FP8 quantization, region inspection tooling, deployment guidance, autotuning workflows, and API reference links.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add Automated QDQ placement example - Part 4.1' directly and clearly describes the primary change: adding a new example for Automated QDQ placement, with version context.
Security Anti-Patterns ✅ Passed No security anti-patterns detected in changed files; logging configuration and documentation-only changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.1 branch from 1b8d896 to f36aa99 Compare February 3, 2026 02:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@examples/qdq_placement/README.md`:
- Around line 191-192: Fix the typo in the README sentence about TensorRT remote
autotuning: change "autouning" to "autotuning" in the line that reads "TensorRT
10.16 support remote autotuning, pass remoteAutoTuningConfig to trtexec to
benchmark with remote autouning." to correctly spell "autotuning" and ensure the
sentence still reads clearly (e.g., "TensorRT 10.16 supports remote autotuning;
pass remoteAutoTuningConfig to trtexec to benchmark with remote autotuning.").
- Around line 128-129: The downloaded filename in the curl command is
misleading: the URL fetches resnet101-v2-7.onnx but the saved name and
subsequent commands use resnet101_Opset17.onnx; update the saved filename and
downstream usage to a consistent, accurate name (e.g., use resnet101-v2-7.onnx
in the curl -o and in the python3 set_batch_size.py command) or add a one-line
clarifying comment above the commands explaining that resnet101_Opset17.onnx is
an alias for resnet101-v2-7.onnx so readers know which model variant is being
used.
🧹 Nitpick comments (3)
examples/qdq_placement/set_batch_size.py (3)

46-48: Consider validating that the model has inputs.

If the model has no graph inputs, accessing graph.input[0] will raise an IndexError. While unlikely for typical models, adding a guard improves robustness.

🛡️ Proposed defensive check
     # Get the input tensor
     graph = model.graph
+    if not graph.input:
+        raise ValueError(f"Model {model_path} has no graph inputs")
     input_tensor = graph.input[0]

60-64: Output batch dimension assumption may not hold for all models.

This code assumes the first dimension of every output is the batch dimension. While true for ResNet50 and most classification models, some models may have scalar outputs or outputs where batch isn't the first dimension. Consider adding a note in the docstring about this assumption, or making output modification opt-in.


78-84: Use the repository's utility functions for saving and checking the model to handle large files consistently.

The codebase provides save_onnx() and check_model() utilities in modelopt/onnx/utils.py that handle models larger than 2GB by using external data. Replace the standard onnx.save() (line 80) and onnx.checker.check_model() (line 84) with calls to modelopt.onnx.utils.save_onnx() and modelopt.onnx.utils.check_model(). While ResNet50 won't encounter this limitation, using the existing utilities ensures consistency across the codebase and prevents issues when the script is applied to larger models.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.1 branch from f36aa99 to 7257a12 Compare February 3, 2026 03:42
@modelopt-bot
Copy link

Code Review: Automated QDQ Placement Example - Part 4.1

Thanks for this well-structured documentation PR. I have reviewed it in the context of the larger Automated QDQ feature series (Parts 1-4). Here are my findings:

📋 Context: PR Dependency Chain

This PR is Part 4.1 of a multi-part feature. The dependency chain:

✅ Positive Aspects

  1. Excellent documentation quality - The README is comprehensive with clear examples for:

    • Basic usage with INT8 and FP8 quantization
    • Pattern cache reuse for similar models
    • Starting from existing QDQ baselines
    • Remote autotuning with TensorRT 10.16+
    • Programmatic API references
  2. Useful utility script - set_batch_size.py is well-designed with:

    • Proper shape inference and model verification
    • Clean argparse interface with epilog examples
    • Good error handling (try/catch around shape inference)
    • Support for both input and output tensor shape updates
  3. Logging improvement - Adding timestamps to the log formatter is a small but valuable quality-of-life improvement.

⚠️ Issues & Suggestions

1. Import Path Issue (Minor)

In set_batch_size.py, the import order could be improved:

import argparse           # stdlib first

import onnx               # third-party next
from onnx import shape_inference

Consider following PEP 8 import ordering convention.

2. Docstring Coverage (Noted by CodeRabbit)

The set_batch_size() function has a docstring but coverage could be improved for module-level documentation. Given this is an example script, this is acceptable but consider adding a module docstring explaining the purpose.

3. Input Validation in set_batch_size.py

The script assumes the first input tensor is the one to modify. This works for ResNet but may not work for all models:

input_tensor = graph.input[0]  # Assumes input[0] is the data tensor

Consider adding a --input-name option to allow users to specify which input to modify.

4. Copyright Year Inconsistency

set_batch_size.py uses copyright year 2024, but other files in the autotune series use 2026. Recommend using 2026 for consistency with the rest of the feature.

5. README Link Verification

The README references:

Please ensure Part 4.2 is merged before this one, or verify these files exist.

🔍 Cross-Reference Check

I spot-checked the CLI args documented in README against #839 (__main__.py). The following are documented correctly:

  • --model, --output
  • --quant-type (int8/fp8)
  • --schemes-per-region
  • --qdq-baseline
  • --pattern-cache
  • --use_trtexec, --trtexec_benchmark_args

📌 Recommendations

  1. Add --input-name option to set_batch_size.py for models with multiple inputs
  2. Update copyright year to 2026 for consistency
  3. Consider defensive coding: Check if tensors actually have dynamic dimensions before modifying (currently assumes dim 0 is batch)
  4. Coordinate merge order: Ensure Integrate Automated QDQ benchmark - part 3.1 #837-Integrate Automated QDQ placement tool - part 3.3 #839 land before this PR so the examples work

📝 Overall Assessment

Approved with minor suggestions. This is a high-quality documentation PR that completes the user-facing portion of the Automated QDQ feature. The examples are clear, the utility script is useful, and the cross-references to API documentation are appropriate.

The minor issues (copyright year, input validation) can be addressed in follow-up or are acceptable for an example script.

Great work on the comprehensive documentation!

@gcunhase
Copy link
Contributor

Should we rename the examples folder to onnx_autoqdq or onnx_autotuner instead?

@gcunhase
Copy link
Contributor

Can we also add an example on how to use region inspect for debugging? Please also include any other features that might help the user with debugging. Thanks.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds documentation and helper utilities for the QDQ (Quantize/Dequantize) placement optimization feature as part of a larger feature rollout (Part 4.1). The changes prepare the example directory and improve logging capabilities for upcoming autotune functionality.

Changes:

  • Enhanced ONNX logging with timestamp support for better traceability
  • Added a utility script to convert ONNX models from dynamic to fixed batch size
  • Provided comprehensive README with usage examples for the QDQ placement optimization feature

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
modelopt/onnx/logging_config.py Added timestamp to log formatter for improved traceability
examples/qdq_placement/set_batch_size.py New utility script to set fixed batch size on ONNX models for TensorRT benchmarking
examples/qdq_placement/README.md Comprehensive documentation with prerequisites, usage examples, and advanced features for QDQ placement optimization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.1 branch 3 times, most recently from 64dfea7 to 71ad7bb Compare March 2, 2026 09:33
@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.1 branch from 71ad7bb to 983dc57 Compare March 3, 2026 03:32
@cjluo-nv cjluo-nv enabled auto-merge (squash) March 3, 2026 07:31
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (a34d613) to head (ee48afc).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #841      +/-   ##
==========================================
+ Coverage   72.10%   72.12%   +0.02%     
==========================================
  Files         209      209              
  Lines       23628    23628              
==========================================
+ Hits        17036    17042       +6     
+ Misses       6592     6586       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cjluo-nv
Copy link
Collaborator

cjluo-nv commented Mar 3, 2026

/ok to test 983dc57

@gcunhase gcunhase disabled auto-merge March 3, 2026 20:58
@cjluo-nv
Copy link
Collaborator

cjluo-nv commented Mar 4, 2026

/ok to test 6057717

@cjluo-nv cjluo-nv enabled auto-merge (squash) March 4, 2026 07:34
willg-nv added 5 commits March 4, 2026 10:37
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
auto-merge was automatically disabled March 4, 2026 10:37

Head branch was pushed to by a user without write access

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.1 branch from 6057717 to ee48afc Compare March 4, 2026 10:37
@willg-nv
Copy link
Contributor Author

willg-nv commented Mar 4, 2026

The failed test is workflow test. This test is unstable and should be waived.

@gcunhase gcunhase enabled auto-merge (squash) March 4, 2026 16:19
@cjluo-nv
Copy link
Collaborator

cjluo-nv commented Mar 4, 2026

/ok to test ee48afc

@gcunhase gcunhase merged commit e8f9687 into NVIDIA:main Mar 4, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants