Skip to content

Integrate Automated QDQ placement tool - part 4.3#843

Open
willg-nv wants to merge 7 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.3
Open

Integrate Automated QDQ placement tool - part 4.3#843
willg-nv wants to merge 7 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.3

Conversation

@willg-nv
Copy link
Contributor

@willg-nv willg-nv commented Feb 3, 2026

What does this PR do?

This PR upload user guide of Automated QDQ placement tool. This tool automatically search QDQ insertion points with better performance.

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • Documentation

    • Added comprehensive guide for Automated Q/DQ Placement Optimization workflow, including quick start instructions, advanced usage patterns, configuration options, best practices, and troubleshooting.
  • New Features

    • Exposed public API for CLI parser programmatic access.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds comprehensive documentation for the Automated Q/DQ Placement Optimization workflow and exposes a public get_parser() function in the autotune CLI module to enable programmatic access to the argument parser, replacing the previous internal-only _get_autotune_parser() reference.

Changes

Cohort / File(s) Summary
Documentation
docs/source/guides/9_autoqdq.rst
New comprehensive guide covering the Automated Q/DQ Placement Optimization workflow, including Quick Start, How It Works, Advanced Usage, Configuration, Best Practices, Troubleshooting, Architecture, Limitations, FAQs, and Examples.
CLI Parser Refactoring
modelopt/onnx/quantization/autotune/__main__.py
Introduces public get_parser() wrapper function that returns the existing internal _get_parser(), enabling external programmatic access to the CLI argument parser.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • gcunhase
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title refers to integrating an Automated QDQ placement tool and is directly related to the main changes: documentation file addition and CLI parser wrapper.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed No security anti-patterns detected in PR changes. Documentation added as RST file with no executable code. Python code change only adds a public wrapper function get_parser() without introducing torch.load, numpy.load, trust_remote_code, eval/exec, or # nosec issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 76c395b to 916687c Compare February 3, 2026 02:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 279-281: The conditional checking
autotuner.current_profile_pattern_schemes being None is counter-intuitive as
written; update the example around the if statement that references
autotuner.current_profile_pattern_schemes to either (a) add a one-line
clarifying comment explaining the API semantics (e.g., that the API returns None
when a region is already profiled/complete) or (b) change the condition if it
was reversed after confirming the API behavior; ensure the comment references
autotuner.current_profile_pattern_schemes and the subsequent print("  Already
profiled, skipping") / continue so readers understand why None means "already
profiled."
- Line 516: Replace the inconsistent pattern-cache filename reference
`./bert_base_run/pattern_cache.yaml` with the correct
`autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so the
flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.
🧹 Nitpick comments (1)
docs/source/guides/9_qdq_placement.rst (1)

457-457: Clarify scheme selection guidance.

The guidance states "For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes." The rationale behind this recommendation isn't immediately clear and could benefit from explanation—specifically, whether "big regions" refers to region size (number of nodes) or computational complexity.

💡 Suggested clarification
-For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes.
+The optimal scheme count depends on your model's structure:
+
+* **Many small regions** (e.g., 100+ patterns): Use fewer schemes (20-30) per region to keep total optimization time reasonable, as you'll be testing many unique patterns.
+* **Few large regions** (e.g., <20 patterns): Use more schemes (50-100) per region to thoroughly explore each pattern's optimization space.

@gcunhase
Copy link
Contributor

Suggestion to rename this as *_onnx_autotuner or *_onnx_autoqdq as per #841 (comment).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive user guide documentation for the Automated Q/DQ Placement Optimization tool, which automatically optimizes Quantize/Dequantize node placement in ONNX models for TensorRT deployment. This is part 4.3 of a larger integration effort.

Changes:

  • Adds a 911-line comprehensive guide covering the autotuner tool from quick start to advanced usage
  • Documents both CLI and Python API usage patterns
  • Includes troubleshooting, best practices, FAQs, and multiple examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 916687c to 1a57ad7 Compare March 2, 2026 10:12
@cjluo-nv cjluo-nv requested a review from kevalmorabia97 March 2, 2026 16:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 569-575: The example should guard against build_serialized_network
returning None before writing; after calling builder.create_builder_config() and
engine = builder.build_serialized_network(network, config) check if engine is
None and handle the failure (e.g., raise a RuntimeError or log an error and
exit) instead of directly calling f.write(engine); update the snippet around
build_serialized_network, the engine variable, and the file write so the code
only opens/writes "model.engine" when engine is a non-None bytes object.
- Around line 40-67: The output directory examples are inconsistent: the first
command states the default output is ./autotuner_output while the output tree
shows results/; update the docs so both examples and the output-tree use the
same directory name (either change the first command's default to ./results or
change the tree to ./autotuner_output) and, in the command examples around
python -m modelopt.onnx.quantization.autotune and the --output_dir ./results
example, add a short parenthetical note clarifying that --output_dir overrides
the default to avoid confusion.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 916687c and e4890ec.

📒 Files selected for processing (1)
  • docs/source/guides/9_qdq_placement.rst

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch 2 times, most recently from 69477be to a88df46 Compare March 3, 2026 03:23
@kevalmorabia97 kevalmorabia97 requested a review from gcunhase March 3, 2026 06:54
@cjluo-nv cjluo-nv requested a review from Copilot March 3, 2026 19:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gcunhase
Copy link
Contributor

gcunhase commented Mar 3, 2026

Suggestion to rename this as *_onnx_autotuner or *_onnx_autoqdq as per #841 (comment).

@willg-nv ^

@willg-nv willg-nv requested a review from a team as a code owner March 4, 2026 02:16
@willg-nv willg-nv requested a review from galagam March 4, 2026 02:16
Signed-off-by: Will Guo <willg@nvidia.com>
willg-nv added 3 commits March 4, 2026 10:38
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 99f0c95 to d4445a7 Compare March 4, 2026 10:38
@codecov
Copy link

codecov bot commented Mar 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (a34d613) to head (3eeceb1).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #843      +/-   ##
==========================================
+ Coverage   72.10%   72.12%   +0.02%     
==========================================
  Files         209      209              
  Lines       23628    23631       +3     
==========================================
+ Hits        17036    17043       +7     
+ Misses       6592     6588       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Will Guo <willg@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/guides/9_autoqdq.rst`:
- Around line 420-424: The docs text for the --schemes_per_region (-s) parameter
contains conflicting defaults ("default (30)" vs "default, -s 50"); choose a
single canonical default (e.g., 50) and make all references consistent: update
the phrase "Typical values align with the default (30)" to match the chosen
default, change the "15–30 schemes" bullet if needed to keep examples coherent,
and ensure the "50 schemes (default, -s 50)" wording is used consistently across
this section and any neighboring sentences.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 00135dfa-3f18-4a9e-9775-3bdd31a6132e

📥 Commits

Reviewing files that changed from the base of the PR and between d4445a7 and bffa726.

📒 Files selected for processing (2)
  • docs/source/guides/9_autoqdq.rst
  • modelopt/onnx/quantization/autotune/__main__.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/onnx/quantization/autotune/main.py

Signed-off-by: Will Guo <willg@nvidia.com>
@gcunhase
Copy link
Contributor

gcunhase commented Mar 6, 2026

/ok to test ae5ac66

@gcunhase gcunhase enabled auto-merge (squash) March 6, 2026 14:11
Signed-off-by: Will Guo <willg@nvidia.com>
auto-merge was automatically disabled March 6, 2026 14:54

Head branch was pushed to by a user without write access

@willg-nv
Copy link
Contributor Author

willg-nv commented Mar 6, 2026

@gcunhase I fixed the doc error, please check. thanks!

@gcunhase gcunhase enabled auto-merge (squash) March 6, 2026 16:30
@gcunhase
Copy link
Contributor

gcunhase commented Mar 6, 2026

@willg-nv doc is still failing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants