Draft: Integrate TorchRegionBuilder to AutoQDQ by willg-nv · Pull Request #963 · NVIDIA/Model-Optimizer

willg-nv · 2026-03-03T08:00:01Z

What does this PR do?

This PR implemented a RegionSearch implementaion which create autotune regions based on node name for ONNX model exported by torch.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, using torch.load(..., weights_only=True), avoiding pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other source, did you follow IP policy in CONTRIBUTING.md?: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Enhanced quantization autotuner for PyTorch-exported ONNX models with automatic structure detection and improved hierarchical region analysis for better optimization results.

Signed-off-by: Will Guo <willg@nvidia.com>

copy-pr-bot · 2026-03-03T08:00:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-03T08:00:22Z

📝 Walkthrough

Walkthrough

Introduces TorchRegionBuilder class for discovering hierarchical regions in PyTorch-exported ONNX graphs, integrated into the autotune API with conditional region discovery logic based on PyTorch naming convention detection.

Changes

Cohort / File(s)	Summary
API Export `modelopt/onnx/quantization/autotune/__init__.py`	Added TorchRegionBuilder to public exports for access within autotune module.
Region Discovery Integration `modelopt/onnx/quantization/autotune/autotuner.py`	Modified _search_regions to conditionally use TorchRegionBuilder when PyTorch naming convention is detected; otherwise falls back to CombinedRegionSearch.
New Region Builder Implementation `modelopt/onnx/quantization/autotune/torch_region_builder.py`	New module implementing TorchRegionBuilder for hierarchical region discovery in PyTorch models, with PyTorch naming validation, path-based region construction, quantization awareness, linearization utilities, and CLI inspection support.

Sequence Diagram(s)

sequenceDiagram
    participant A as Autotuner._search_regions
    participant B as check_torch_naming_convention
    participant C as TorchRegionBuilder
    participant D as CombinedRegionSearch
    
    A->>B: check_torch_naming_convention(graph)
    alt PyTorch Convention Detected
        B-->>A: True
        A->>C: TorchRegionBuilder(graph)
        C->>C: build_regions(linearize=True, only_quantizable=True)
        C-->>A: regions_list
    else Fallback
        B-->>A: False
        A->>D: CombinedRegionSearch(graph, ...)
        D->>D: Build regions & reassign_region_ids
        D-->>A: regions_list
    end
    A->>A: Flatten and collect regions

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the primary change: integrating TorchRegionBuilder into the autotune module, which is supported by changes to init.py, autotuner.py, and the new torch_region_builder.py file.
Docstring Coverage	✅ Passed	Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected: torch.load with weights_only=False, numpy.load with allow_pickle=True, hardcoded trust_remote_code=True, eval/exec on external input, or # nosec comments were not found in the modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/autotune/autotuner.py`:
- Around line 101-104: The Torch branch double-flattens regions: when
check_torch_naming_convention(self.graph) is true you call
TorchRegionBuilder(self.graph).build_regions(linearize=True, ...) which already
returns linearized descendants, but the subsequent recursive flatten loop (the
code that iterates over self.regions and extends a flat list) flattens again and
creates duplicates; fix by detecting the linearize=True case and skipping the
extra recursive flatten (i.e., assign self.regions directly to the build_regions
result when linearize is requested) or, alternatively, guard the recursive
flatten with a check for already-linearized entries or deduplicate by region
identity; reference check_torch_naming_convention, TorchRegionBuilder,
build_regions and self.regions when applying the change.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py`:
- Line 756: The code in torch_region_builder.py is forcibly overriding the
only_quantizable flag (setting only_quantizable = True), which ignores the
caller/CLI value; update the places that set only_quantizable (both occurrences)
to respect the incoming parameter/argument (e.g., remove the hardcoded
assignment and use the function parameter or pass-through value) so the
--only-quantizable option and the function argument control behavior instead of
being overridden; ensure any default remains set in the function signature/CLI
parsing and not by assigning True inside functions like the region-building
routine that references only_quantizable.
- Around line 622-627: torch_node_ratio can raise ZeroDivisionError when there
are no non-Constant nodes; update torch_node_ratio to guard against an empty
non_constant_nodes list by checking its length and returning 0.0 (or another
safe default) when it's zero, otherwise compute slash_count /
len(non_constant_nodes). Modify the method in torch_region_builder.py (function
torch_node_ratio, using self.graph.nodes and non_constant_nodes) to perform this
early-return check before the division.
- Around line 547-549: The loop in _probe_epilogues_recursive currently appends
every consumer_idx to epilogue_ops before validation; change it to validate each
consumer for fusibility and non-divergence first and only append + recurse when
it passes. Replace the direct epilogue_ops.append(consumer_idx) with a
conditional that calls the appropriate checks (e.g.,
self._is_fusible(consumer_idx) and self._is_non_divergent(consumer_idx) or a
combined self._is_fusible_non_divergent(consumer_idx)) and only then call
epilogue_ops.append(consumer_idx) followed by
self._probe_epilogues_recursive(consumer_idx, ...).
- Around line 698-700: After calling
self._filter_out_non_quantizable_nodes(root_region), the region membership has
changed but the region boundary information (inputs/outputs) computed earlier is
stale; fix by recomputing boundaries for root_region and its child regions
immediately after the filter step—either call the same helper used at the
earlier boundary computation or iterate the affected regions and recalculate
their inputs and outputs so downstream insertion logic reads up-to-date
boundaries (update region.inputs and region.outputs for root_region and nested
regions).
- Around line 313-315: The three methods _build_id_to_region_map,
_build_tensor_to_regions_map, and _merge_neighboring_regions use mutable default
arguments ({} and set()) which leak state across calls; change their signatures
to accept None (e.g., id_to_region_map: Optional[dict[int, Region]] = None,
tensor_to_regions_map: Optional[dict[str, set[int]]] = None, to_remove:
Optional[set[int]] = None) and inside each function initialize them to empty
dict/set when None is passed, then use those local containers; adjust any
internal calls (including where these methods are invoked without arguments) to
rely on the new None-default behavior so no shared mutable object is reused
across invocations.
- Around line 68-69: The graph is being topologically sorted after
RegionSearchBase.__init__ runs, so the precomputed tensor_users_map and
forward_reachable_nodes_map in RegionSearchBase are using stale node indices;
fix this by calling self.graph.toposort() before invoking
super().__init__(graph, root=None) so that RegionSearchBase builds its
node-indexed maps against the final node order (alternatively, if reordering
before super is not possible, explicitly rebuild the index maps immediately
after self.graph.toposort() by re-running the same logic that sets
tensor_users_map and forward_reachable_nodes_map).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 860d0b4 and a550ed7.

📒 Files selected for processing (3)

modelopt/onnx/quantization/autotune/__init__.py
modelopt/onnx/quantization/autotune/autotuner.py
modelopt/onnx/quantization/autotune/torch_region_builder.py

coderabbitai · 2026-03-03T08:07:22Z

modelopt/onnx/quantization/autotune/autotuner.py

+        if check_torch_naming_convention(self.graph):
+            torch_search = TorchRegionBuilder(self.graph)
+            self.regions = torch_search.build_regions(linearize=True, only_quantizable=True)
+        else:


⚠️ Potential issue | 🔴 Critical

Torch branch double-flattens regions and can produce duplicates.

Line 103 requests linearize=True, but Lines 114–120 still recursively flatten every entry. If the returned list already includes descendants, this duplicates regions and skews later profiling/ordering.

Suggested fix

if check_torch_naming_convention(self.graph): torch_search = TorchRegionBuilder(self.graph) - self.regions = torch_search.build_regions(linearize=True, only_quantizable=True) + self.regions = torch_search.build_regions(linearize=False, only_quantizable=True) + self._reassign_region_ids(self.regions) else: default_search = CombinedRegionSearch( self.graph, maximum_sequence_region_size=self.config.maximum_sequence_region_size, minimum_topdown_search_size=self.config.minimum_topdown_search_size,

Also applies to: 114-120

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/autotuner.py` around lines 101 - 104, The Torch branch double-flattens regions: when check_torch_naming_convention(self.graph) is true you call TorchRegionBuilder(self.graph).build_regions(linearize=True, ...) which already returns linearized descendants, but the subsequent recursive flatten loop (the code that iterates over self.regions and extends a flat list) flattens again and creates duplicates; fix by detecting the linearize=True case and skipping the extra recursive flatten (i.e., assign self.regions directly to the build_regions result when linearize is requested) or, alternatively, guard the recursive flatten with a check for already-linearized entries or deduplicate by region identity; reference check_torch_naming_convention, TorchRegionBuilder, build_regions and self.regions when applying the change.

coderabbitai · 2026-03-03T08:07:22Z