Skip to content

Draft: Integrate TorchRegionBuilder to AutoQDQ#963

Open
willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-torch-region-builder
Open

Draft: Integrate TorchRegionBuilder to AutoQDQ#963
willg-nv wants to merge 1 commit intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-torch-region-builder

Conversation

@willg-nv
Copy link
Contributor

@willg-nv willg-nv commented Mar 3, 2026

What does this PR do?

This PR implemented a RegionSearch implementaion which create autotune regions based on node name for ONNX model exported by torch.

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, using torch.load(..., weights_only=True), avoiding pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other source, did you follow IP policy in CONTRIBUTING.md?: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features
    • Enhanced quantization autotuner for PyTorch-exported ONNX models with automatic structure detection and improved hierarchical region analysis for better optimization results.

Signed-off-by: Will Guo <willg@nvidia.com>
@willg-nv willg-nv requested a review from a team as a code owner March 3, 2026 08:00
@willg-nv willg-nv changed the title Integrate TorchRegionBuilder to AutoQDQ Draft: Integrate TorchRegionBuilder to AutoQDQ Mar 3, 2026
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

Introduces TorchRegionBuilder class for discovering hierarchical regions in PyTorch-exported ONNX graphs, integrated into the autotune API with conditional region discovery logic based on PyTorch naming convention detection.

Changes

Cohort / File(s) Summary
API Export
modelopt/onnx/quantization/autotune/__init__.py
Added TorchRegionBuilder to public exports for access within autotune module.
Region Discovery Integration
modelopt/onnx/quantization/autotune/autotuner.py
Modified _search_regions to conditionally use TorchRegionBuilder when PyTorch naming convention is detected; otherwise falls back to CombinedRegionSearch.
New Region Builder Implementation
modelopt/onnx/quantization/autotune/torch_region_builder.py
New module implementing TorchRegionBuilder for hierarchical region discovery in PyTorch models, with PyTorch naming validation, path-based region construction, quantization awareness, linearization utilities, and CLI inspection support.

Sequence Diagram(s)

sequenceDiagram
    participant A as Autotuner._search_regions
    participant B as check_torch_naming_convention
    participant C as TorchRegionBuilder
    participant D as CombinedRegionSearch
    
    A->>B: check_torch_naming_convention(graph)
    alt PyTorch Convention Detected
        B-->>A: True
        A->>C: TorchRegionBuilder(graph)
        C->>C: build_regions(linearize=True, only_quantizable=True)
        C-->>A: regions_list
    else Fallback
        B-->>A: False
        A->>D: CombinedRegionSearch(graph, ...)
        D->>D: Build regions & reassign_region_ids
        D-->>A: regions_list
    end
    A->>A: Flatten and collect regions
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: integrating TorchRegionBuilder into the autotune module, which is supported by changes to init.py, autotuner.py, and the new torch_region_builder.py file.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed No security anti-patterns detected: torch.load with weights_only=False, numpy.load with allow_pickle=True, hardcoded trust_remote_code=True, eval/exec on external input, or # nosec comments were not found in the modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/autotune/autotuner.py`:
- Around line 101-104: The Torch branch double-flattens regions: when
check_torch_naming_convention(self.graph) is true you call
TorchRegionBuilder(self.graph).build_regions(linearize=True, ...) which already
returns linearized descendants, but the subsequent recursive flatten loop (the
code that iterates over self.regions and extends a flat list) flattens again and
creates duplicates; fix by detecting the linearize=True case and skipping the
extra recursive flatten (i.e., assign self.regions directly to the build_regions
result when linearize is requested) or, alternatively, guard the recursive
flatten with a check for already-linearized entries or deduplicate by region
identity; reference check_torch_naming_convention, TorchRegionBuilder,
build_regions and self.regions when applying the change.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py`:
- Line 756: The code in torch_region_builder.py is forcibly overriding the
only_quantizable flag (setting only_quantizable = True), which ignores the
caller/CLI value; update the places that set only_quantizable (both occurrences)
to respect the incoming parameter/argument (e.g., remove the hardcoded
assignment and use the function parameter or pass-through value) so the
--only-quantizable option and the function argument control behavior instead of
being overridden; ensure any default remains set in the function signature/CLI
parsing and not by assigning True inside functions like the region-building
routine that references only_quantizable.
- Around line 622-627: torch_node_ratio can raise ZeroDivisionError when there
are no non-Constant nodes; update torch_node_ratio to guard against an empty
non_constant_nodes list by checking its length and returning 0.0 (or another
safe default) when it's zero, otherwise compute slash_count /
len(non_constant_nodes). Modify the method in torch_region_builder.py (function
torch_node_ratio, using self.graph.nodes and non_constant_nodes) to perform this
early-return check before the division.
- Around line 547-549: The loop in _probe_epilogues_recursive currently appends
every consumer_idx to epilogue_ops before validation; change it to validate each
consumer for fusibility and non-divergence first and only append + recurse when
it passes. Replace the direct epilogue_ops.append(consumer_idx) with a
conditional that calls the appropriate checks (e.g.,
self._is_fusible(consumer_idx) and self._is_non_divergent(consumer_idx) or a
combined self._is_fusible_non_divergent(consumer_idx)) and only then call
epilogue_ops.append(consumer_idx) followed by
self._probe_epilogues_recursive(consumer_idx, ...).
- Around line 698-700: After calling
self._filter_out_non_quantizable_nodes(root_region), the region membership has
changed but the region boundary information (inputs/outputs) computed earlier is
stale; fix by recomputing boundaries for root_region and its child regions
immediately after the filter step—either call the same helper used at the
earlier boundary computation or iterate the affected regions and recalculate
their inputs and outputs so downstream insertion logic reads up-to-date
boundaries (update region.inputs and region.outputs for root_region and nested
regions).
- Around line 313-315: The three methods _build_id_to_region_map,
_build_tensor_to_regions_map, and _merge_neighboring_regions use mutable default
arguments ({} and set()) which leak state across calls; change their signatures
to accept None (e.g., id_to_region_map: Optional[dict[int, Region]] = None,
tensor_to_regions_map: Optional[dict[str, set[int]]] = None, to_remove:
Optional[set[int]] = None) and inside each function initialize them to empty
dict/set when None is passed, then use those local containers; adjust any
internal calls (including where these methods are invoked without arguments) to
rely on the new None-default behavior so no shared mutable object is reused
across invocations.
- Around line 68-69: The graph is being topologically sorted after
RegionSearchBase.__init__ runs, so the precomputed tensor_users_map and
forward_reachable_nodes_map in RegionSearchBase are using stale node indices;
fix this by calling self.graph.toposort() before invoking
super().__init__(graph, root=None) so that RegionSearchBase builds its
node-indexed maps against the final node order (alternatively, if reordering
before super is not possible, explicitly rebuild the index maps immediately
after self.graph.toposort() by re-running the same logic that sets
tensor_users_map and forward_reachable_nodes_map).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 860d0b4 and a550ed7.

📒 Files selected for processing (3)
  • modelopt/onnx/quantization/autotune/__init__.py
  • modelopt/onnx/quantization/autotune/autotuner.py
  • modelopt/onnx/quantization/autotune/torch_region_builder.py

Comment on lines +101 to +104
if check_torch_naming_convention(self.graph):
torch_search = TorchRegionBuilder(self.graph)
self.regions = torch_search.build_regions(linearize=True, only_quantizable=True)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Torch branch double-flattens regions and can produce duplicates.

Line 103 requests linearize=True, but Lines 114–120 still recursively flatten every entry. If the returned list already includes descendants, this duplicates regions and skews later profiling/ordering.

Suggested fix
         if check_torch_naming_convention(self.graph):
             torch_search = TorchRegionBuilder(self.graph)
-            self.regions = torch_search.build_regions(linearize=True, only_quantizable=True)
+            self.regions = torch_search.build_regions(linearize=False, only_quantizable=True)
+            self._reassign_region_ids(self.regions)
         else:
             default_search = CombinedRegionSearch(
                 self.graph,
                 maximum_sequence_region_size=self.config.maximum_sequence_region_size,
                 minimum_topdown_search_size=self.config.minimum_topdown_search_size,

Also applies to: 114-120

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/autotuner.py` around lines 101 - 104, The
Torch branch double-flattens regions: when
check_torch_naming_convention(self.graph) is true you call
TorchRegionBuilder(self.graph).build_regions(linearize=True, ...) which already
returns linearized descendants, but the subsequent recursive flatten loop (the
code that iterates over self.regions and extends a flat list) flattens again and
creates duplicates; fix by detecting the linearize=True case and skipping the
extra recursive flatten (i.e., assign self.regions directly to the build_regions
result when linearize is requested) or, alternatively, guard the recursive
flatten with a check for already-linearized entries or deduplicate by region
identity; reference check_torch_naming_convention, TorchRegionBuilder,
build_regions and self.regions when applying the change.

Comment on lines +68 to +69
super().__init__(graph, root=None)
self.graph.toposort()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "torch_region_builder.py" | head -5

Repository: NVIDIA/Model-Optimizer

Length of output: 127


🏁 Script executed:

find . -type f -name "*region*.py" | grep -E "(base|search)" | head -10

Repository: NVIDIA/Model-Optimizer

Length of output: 182


🏁 Script executed:

git ls-files modelopt/onnx/quantization/autotune/ | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 737


🏁 Script executed:

cat -n modelopt/onnx/quantization/autotune/torch_region_builder.py | head -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4740


🏁 Script executed:

wc -l modelopt/onnx/quantization/autotune/torch_region_builder.py

Repository: NVIDIA/Model-Optimizer

Length of output: 129


🏁 Script executed:

grep -n "class TorchRegionBuilder\|def __init__\|class RegionSearchBase\|toposort" modelopt/onnx/quantization/autotune/torch_region_builder.py | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 224


🏁 Script executed:

cat -n modelopt/onnx/quantization/autotune/region_search.py | head -150

Repository: NVIDIA/Model-Optimizer

Length of output: 7001


🏁 Script executed:

grep -n "class RegionSearchBase\|def __init__\|tensor_users_map\|reachability" modelopt/onnx/quantization/autotune/region_search.py | head -30

Repository: NVIDIA/Model-Optimizer

Length of output: 2051


🏁 Script executed:

grep -n "def get_tensor_consumer_node_indices" modelopt/onnx/quantization/autotune/*.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

grep -n "def _build_forward_reachable_nodes_map" modelopt/onnx/quantization/autotune/region_search.py

Repository: NVIDIA/Model-Optimizer

Length of output: 164


🏁 Script executed:

grep -rn "def get_tensor_consumer_node_indices" modelopt/

Repository: NVIDIA/Model-Optimizer

Length of output: 208


🏁 Script executed:

cat -n modelopt/onnx/quantization/graph_utils.py | grep -A 30 "def get_tensor_consumer_node_indices"

Repository: NVIDIA/Model-Optimizer

Length of output: 1618


🏁 Script executed:

cat -n modelopt/onnx/quantization/autotune/region_search.py | sed -n '166,185p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1279


🏁 Script executed:

grep -A 20 "def _build_forward_reachable_nodes_map" modelopt/onnx/quantization/autotune/region_search.py | head -30

Repository: NVIDIA/Model-Optimizer

Length of output: 1153


🏁 Script executed:

cat -n modelopt/onnx/quantization/autotune/torch_region_builder.py | sed -n '1,100p'

Repository: NVIDIA/Model-Optimizer

Length of output: 4740


🏁 Script executed:

rg "torch\.load|numpy\.load|trust_remote_code|eval\(|exec\(|# nosec" modelopt/onnx/quantization/autotune/torch_region_builder.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

cat -n modelopt/onnx/quantization/autotune/torch_region_builder.py | sed -n '63,80p'

Repository: NVIDIA/Model-Optimizer

Length of output: 924


Node index maps must be built after graph topology is finalized.

RegionSearchBase.__init__() (line 68) pre-computes tensor_users_map and forward_reachable_nodes_map using node indices from the current graph order. Calling toposort() on line 69 reorders all nodes, leaving the precomputed indices pointing to wrong positions in the reordered graph.

Fix
     def __init__(self, graph: gs.Graph):
         """Initialize the TorchRegionBuilder with a computation graph."""
+        graph.toposort()
         super().__init__(graph, root=None)
-        self.graph.toposort()
         self.regions: list[Region] = []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
super().__init__(graph, root=None)
self.graph.toposort()
def __init__(self, graph: gs.Graph):
"""Initialize the TorchRegionBuilder with a computation graph."""
graph.toposort()
super().__init__(graph, root=None)
self.regions: list[Region] = []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 68
- 69, The graph is being topologically sorted after RegionSearchBase.__init__
runs, so the precomputed tensor_users_map and forward_reachable_nodes_map in
RegionSearchBase are using stale node indices; fix this by calling
self.graph.toposort() before invoking super().__init__(graph, root=None) so that
RegionSearchBase builds its node-indexed maps against the final node order
(alternatively, if reordering before super is not possible, explicitly rebuild
the index maps immediately after self.graph.toposort() by re-running the same
logic that sets tensor_users_map and forward_reachable_nodes_map).

Comment on lines +313 to +315
def _build_id_to_region_map(
self, region: Region, id_to_region_map: dict[int, Region] = {}
) -> dict[int, Region]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n modelopt/onnx/quantization/autotune/torch_region_builder.py | sed -n '310,345p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1894


Fix mutable default arguments causing state leakage across function calls.

Three methods use mutable defaults ({} and set()) that are shared across invocations:

  • Line 314: _build_id_to_region_map with id_to_region_map: dict[int, Region] = {}
  • Line 323: _build_tensor_to_regions_map with tensor_to_regions_map: dict[str, set[int]] = {}
  • Line 335: _merge_neighboring_regions with to_remove: set[int] = set()

The issue manifests when lines 337–338 invoke the first two methods without arguments, causing the default mutable objects to be reused and accumulate state from repeated calls. This corrupts state across multiple builds/merges.

Suggested fix
-    def _build_id_to_region_map(
-        self, region: Region, id_to_region_map: dict[int, Region] = {}
-    ) -> dict[int, Region]:
+    def _build_id_to_region_map(
+        self, region: Region, id_to_region_map: dict[int, Region] | None = None
+    ) -> dict[int, Region]:
         """Build a map from region ids to regions."""
+        if id_to_region_map is None:
+            id_to_region_map = {}
         id_to_region_map[region.id] = region
         for child in region.get_children():
             self._build_id_to_region_map(child, id_to_region_map)
         return id_to_region_map

-    def _build_tensor_to_regions_map(
-        self, region: Region, tensor_to_regions_map: dict[str, set[int]] = {}
-    ) -> dict[str, set[int]]:
+    def _build_tensor_to_regions_map(
+        self, region: Region, tensor_to_regions_map: dict[str, set[int]] | None = None
+    ) -> dict[str, set[int]]:
         """Build a map from tensor names to regions."""
+        if tensor_to_regions_map is None:
+            tensor_to_regions_map = {}
         for input in region.inputs:
             if input not in tensor_to_regions_map:
                 tensor_to_regions_map[input] = set()
             tensor_to_regions_map[input].add(region.id)
@@
-    def _merge_neighboring_regions(self, region: Region, to_remove: set[int] = set()) -> None:
+    def _merge_neighboring_regions(self, region: Region, to_remove: set[int] | None = None) -> None:
+        if to_remove is None:
+            to_remove = set()
         self._compute_all_boundaries(region)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 313
- 315, The three methods _build_id_to_region_map, _build_tensor_to_regions_map,
and _merge_neighboring_regions use mutable default arguments ({} and set())
which leak state across calls; change their signatures to accept None (e.g.,
id_to_region_map: Optional[dict[int, Region]] = None, tensor_to_regions_map:
Optional[dict[str, set[int]]] = None, to_remove: Optional[set[int]] = None) and
inside each function initialize them to empty dict/set when None is passed, then
use those local containers; adjust any internal calls (including where these
methods are invoked without arguments) to rely on the new None-default behavior
so no shared mutable object is reused across invocations.

Comment on lines +547 to +549
for consumer_idx in consumer_indices:
epilogue_ops.append(consumer_idx)
self._probe_epilogues_recursive(consumer_idx, current_step + 1, max_steps, epilogue_ops)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Epilogue probing adds consumers without fusibility/non-divergence checks.

Line 548 appends every consumer before validating it. This conflicts with the method contract (“fusible non-divergent epilogue nodes”) and can pollute regions with invalid epilogue ops.

Suggested fix
         # For each consumer, check if it's fusible and non-divergent
         for consumer_idx in consumer_indices:
+            consumer_node = self.graph.nodes[consumer_idx]
+            if not self.is_fusible_node(consumer_node.op):
+                continue
+            if self._is_node_divergent(consumer_idx):
+                continue
             epilogue_ops.append(consumer_idx)
             self._probe_epilogues_recursive(consumer_idx, current_step + 1, max_steps, epilogue_ops)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 547
- 549, The loop in _probe_epilogues_recursive currently appends every
consumer_idx to epilogue_ops before validation; change it to validate each
consumer for fusibility and non-divergence first and only append + recurse when
it passes. Replace the direct epilogue_ops.append(consumer_idx) with a
conditional that calls the appropriate checks (e.g.,
self._is_fusible(consumer_idx) and self._is_non_divergent(consumer_idx) or a
combined self._is_fusible_non_divergent(consumer_idx)) and only then call
epilogue_ops.append(consumer_idx) followed by
self._probe_epilogues_recursive(consumer_idx, ...).

Comment on lines +622 to +627
def torch_node_ratio(self) -> float:
"""Count the number of nodes that are exported from PyTorch."""
non_constant_nodes = [n for n in self.graph.nodes if n.op != "Constant"]
slash_count = sum(1 for n in non_constant_nodes if n.name and n.name.startswith("/"))
return slash_count / len(non_constant_nodes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

torch_node_ratio() can divide by zero.

If the graph has zero non-Constant nodes, Line 626 raises ZeroDivisionError.

Suggested fix
     def torch_node_ratio(self) -> float:
         """Count the number of nodes that are exported from PyTorch."""
         non_constant_nodes = [n for n in self.graph.nodes if n.op != "Constant"]
+        if not non_constant_nodes:
+            return 0.0
         slash_count = sum(1 for n in non_constant_nodes if n.name and n.name.startswith("/"))
         return slash_count / len(non_constant_nodes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 622
- 627, torch_node_ratio can raise ZeroDivisionError when there are no
non-Constant nodes; update torch_node_ratio to guard against an empty
non_constant_nodes list by checking its length and returning 0.0 (or another
safe default) when it's zero, otherwise compute slash_count /
len(non_constant_nodes). Modify the method in torch_region_builder.py (function
torch_node_ratio, using self.graph.nodes and non_constant_nodes) to perform this
early-return check before the division.

Comment on lines +698 to +700
if only_quantizable:
self._filter_out_non_quantizable_nodes(root_region)
logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Region boundaries are stale after quantizable-node filtering.

_filter_out_non_quantizable_nodes mutates region membership, but boundaries are computed earlier (Line 696) and not recomputed afterward. Downstream insertion logic can read outdated inputs/outputs.

Suggested fix
             self._sort_regions(root_region)
             if only_quantizable:
                 self._filter_out_non_quantizable_nodes(root_region)
+                self._compute_all_boundaries(root_region)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if only_quantizable:
self._filter_out_non_quantizable_nodes(root_region)
logger.info(
self._sort_regions(root_region)
if only_quantizable:
self._filter_out_non_quantizable_nodes(root_region)
self._compute_all_boundaries(root_region)
logger.info(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` around lines 698
- 700, After calling self._filter_out_non_quantizable_nodes(root_region), the
region membership has changed but the region boundary information
(inputs/outputs) computed earlier is stale; fix by recomputing boundaries for
root_region and its child regions immediately after the filter step—either call
the same helper used at the earlier boundary computation or iterate the affected
regions and recalculate their inputs and outputs so downstream insertion logic
reads up-to-date boundaries (update region.inputs and region.outputs for
root_region and nested regions).

Returns:
List of discovered regions with hierarchical structure
"""
only_quantizable = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

only_quantizable argument is overridden, so caller/CLI intent is ignored.

Line 756 hardcodes only_quantizable = True, which makes the --only-quantizable flag and function parameter ineffective.

Suggested fix
-    only_quantizable = True
     logger.info(f"Loading model: {onnx_path}")
@@
-    regions = builder.build_regions(only_quantizable=only_quantizable)
+    regions = builder.build_regions(only_quantizable=only_quantizable)

Also applies to: 770-770

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/torch_region_builder.py` at line 756, The
code in torch_region_builder.py is forcibly overriding the only_quantizable flag
(setting only_quantizable = True), which ignores the caller/CLI value; update
the places that set only_quantizable (both occurrences) to respect the incoming
parameter/argument (e.g., remove the hardcoded assignment and use the function
parameter or pass-through value) so the --only-quantizable option and the
function argument control behavior instead of being overridden; ensure any
default remains set in the function signature/CLI parsing and not by assigning
True inside functions like the region-building routine that references
only_quantizable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant