Integrate Automated QDQ placement tool - part 4.3#843
Integrate Automated QDQ placement tool - part 4.3#843willg-nv wants to merge 7 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds comprehensive documentation for the Automated Q/DQ Placement Optimization workflow and exposes a public Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
76c395b to
916687c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 279-281: The conditional checking
autotuner.current_profile_pattern_schemes being None is counter-intuitive as
written; update the example around the if statement that references
autotuner.current_profile_pattern_schemes to either (a) add a one-line
clarifying comment explaining the API semantics (e.g., that the API returns None
when a region is already profiled/complete) or (b) change the condition if it
was reversed after confirming the API behavior; ensure the comment references
autotuner.current_profile_pattern_schemes and the subsequent print(" Already
profiled, skipping") / continue so readers understand why None means "already
profiled."
- Line 516: Replace the inconsistent pattern-cache filename reference
`./bert_base_run/pattern_cache.yaml` with the correct
`autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so the
flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.
🧹 Nitpick comments (1)
docs/source/guides/9_qdq_placement.rst (1)
457-457: Clarify scheme selection guidance.The guidance states "For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes." The rationale behind this recommendation isn't immediately clear and could benefit from explanation—specifically, whether "big regions" refers to region size (number of nodes) or computational complexity.
💡 Suggested clarification
-For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes. +The optimal scheme count depends on your model's structure: + +* **Many small regions** (e.g., 100+ patterns): Use fewer schemes (20-30) per region to keep total optimization time reasonable, as you'll be testing many unique patterns. +* **Few large regions** (e.g., <20 patterns): Use more schemes (50-100) per region to thoroughly explore each pattern's optimization space.
|
Suggestion to rename this as |
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive user guide documentation for the Automated Q/DQ Placement Optimization tool, which automatically optimizes Quantize/Dequantize node placement in ONNX models for TensorRT deployment. This is part 4.3 of a larger integration effort.
Changes:
- Adds a 911-line comprehensive guide covering the autotuner tool from quick start to advanced usage
- Documents both CLI and Python API usage patterns
- Includes troubleshooting, best practices, FAQs, and multiple examples
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
916687c to
1a57ad7
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 569-575: The example should guard against build_serialized_network
returning None before writing; after calling builder.create_builder_config() and
engine = builder.build_serialized_network(network, config) check if engine is
None and handle the failure (e.g., raise a RuntimeError or log an error and
exit) instead of directly calling f.write(engine); update the snippet around
build_serialized_network, the engine variable, and the file write so the code
only opens/writes "model.engine" when engine is a non-None bytes object.
- Around line 40-67: The output directory examples are inconsistent: the first
command states the default output is ./autotuner_output while the output tree
shows results/; update the docs so both examples and the output-tree use the
same directory name (either change the first command's default to ./results or
change the tree to ./autotuner_output) and, in the command examples around
python -m modelopt.onnx.quantization.autotune and the --output_dir ./results
example, add a short parenthetical note clarifying that --output_dir overrides
the default to avoid confusion.
69477be to
a88df46
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
99f0c95 to
d4445a7
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #843 +/- ##
==========================================
+ Coverage 72.10% 72.12% +0.02%
==========================================
Files 209 209
Lines 23628 23631 +3
==========================================
+ Hits 17036 17043 +7
+ Misses 6592 6588 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Will Guo <willg@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/source/guides/9_autoqdq.rst`:
- Around line 420-424: The docs text for the --schemes_per_region (-s) parameter
contains conflicting defaults ("default (30)" vs "default, -s 50"); choose a
single canonical default (e.g., 50) and make all references consistent: update
the phrase "Typical values align with the default (30)" to match the chosen
default, change the "15–30 schemes" bullet if needed to keep examples coherent,
and ensure the "50 schemes (default, -s 50)" wording is used consistently across
this section and any neighboring sentences.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 00135dfa-3f18-4a9e-9775-3bdd31a6132e
📒 Files selected for processing (2)
docs/source/guides/9_autoqdq.rstmodelopt/onnx/quantization/autotune/__main__.py
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/onnx/quantization/autotune/main.py
Signed-off-by: Will Guo <willg@nvidia.com>
|
/ok to test ae5ac66 |
Signed-off-by: Will Guo <willg@nvidia.com>
Head branch was pushed to by a user without write access
|
@gcunhase I fixed the doc error, please check. thanks! |
|
@willg-nv doc is still failing |
What does this PR do?
This PR upload user guide of Automated QDQ placement tool. This tool automatically search QDQ insertion points with better performance.
Overview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
Documentation
New Features