fix: layerwise calibration backward-compat, recipe split, batch-size guard by realAsma · Pull Request #1310 · NVIDIA/Model-Optimizer

realAsma · 2026-04-21T21:24:45Z

Summary

Follow-up to #1251 (which renamed use_sequential → layerwise). Three related fixes bundled:

Backward-compatible config loading. PTQ checkpoints saved before Add layerwise calibration for large models #1251 store the
legacy use_sequential key in the calibration-algorithm config, so loading them now
raises ValidationError: Extra inputs are not permitted (use_sequential) because
QuantizeAlgorithmConfig uses extra='forbid'. Accept use_sequential as an alias
for layerwise via AliasChoices. The field still serializes as layerwise, so
round-trips through the current schema are clean.
Recipe split. nvfp4_experts_only-fp8_kv previously enabled layerwise calibration
by default, which changes the calibration flow materially. Split into two recipes:
- nvfp4_experts_only-fp8_kv.yaml — default (no layerwise)
- nvfp4_experts_only-fp8_kv_layerwise.yaml — layerwise variant
hf_ptq batch-size guard. Auto batch-size detection is not supported together
with layerwise calibration. Default to batch_size=1 when layerwise is enabled and
the user hasn't set a batch size explicitly.

Originally reported by Jenny Chen while resuming a PTQ checkpoint via
restore_sharded_modelopt_state:

pydantic_core._pydantic_core.ValidationError: 1 validation error for MaxCalibConfig
use_sequential
  Extra inputs are not permitted [type=extra_forbidden, input_value=False, input_type=bool]

Test plan

tests/unit/torch/quantization/test_config_validation.py — legacy alias accepted, current name accepted, dump serializes under current name, extra='forbid' still rejects unknown keys.
pre-commit run — clean.

Before your PR is Ready for review

Is this change backward compatible?: ✅ (restores compatibility for pre-Add layerwise calibration for large models #1251 checkpoints)
New PIP dependency: N/A
New necessary tests: ✅
Changelog update: N/A (bug fix)

Summary by CodeRabbit

New Features
- New PTQ recipe for NVFP4 experts-only with layerwise calibration and FP8 KV cache quantization.
Improvements
- Added support for legacy field naming (use_sequential) in quantization configurations.
- Enhanced automatic batch size handling for layerwise quantization recipes.

coderabbitai · 2026-04-21T21:26:02Z

📝 Walkthrough

Walkthrough

Adds a Pydantic validation alias so legacy use_sequential maps to layerwise, extends tests to validate alias behavior and serialization, preloads PTQ recipes and forces batch-size=1 when detecting layerwise calibration during probing, and updates/introduces NVFP4 PTQ recipes reflecting layerwise changes.

Changes

Cohort / File(s)	Summary
Config declaration `modelopt/torch/quantization/config.py`	Added AliasChoices-based validation so `use_sequential` is accepted as an input alias for the `QuantizeAlgorithmConfig.layerwise` boolean field. No other calibration validation logic changed.
Tests `tests/unit/torch/quantization/test_config_validation.py`	Added tests to verify `use_sequential` is accepted as legacy input, maps to `layerwise` correctly, serialization emits canonical `layerwise`, and unknown fields still raise `ValidationError`.
Example PTQ runner `examples/llm_ptq/hf_ptq.py`	Moves recipe loading earlier (when provided), validates recipe type once, inspects `quantize.algorithm` for layerwise (including nested/list forms), forces `args.batch_size = 1` during probing for layerwise recipes, and reuses preloaded recipe for mono-quantization flow.
Recipes `modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml`, `modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv_layerwise.yaml`	Removed explicit `layerwise: true` from one recipe and added a new layerwise-focused NVFP4 PTQ recipe with targeted enable/disable quantizer settings for expert and KV paths and updated metadata wording.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title comprehensively describes all three key fixes: backward compatibility for legacy `use_sequential`, recipe split, and batch-size logic guard for layerwise calibration.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	The pull request introduces no security anti-patterns. Changes include Pydantic validation alias, unit tests, batch-size guard logic, and YAML recipe files with no unsafe patterns detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch asma/fix_bwd_comaptibility

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-21T21:29:05Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1310/
Built to branch `gh-pages` at 2026-04-24 22:12 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-04-21T21:38:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.58%. Comparing base (946639a) to head (f78ac50).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1310      +/-   ##
==========================================
- Coverage   75.69%   75.58%   -0.11%     
==========================================
  Files         467      471       +4     
  Lines       50334    51052     +718     
==========================================
+ Hits        38099    38590     +491     
- Misses      12235    12462     +227

Flag	Coverage Δ
examples	`41.60% <100.00%> (+0.92%)`	⬆️
gpu	`58.35% <100.00%> (-0.50%)`	⬇️
regression	`14.77% <100.00%> (+0.07%)`	⬆️
unit	`52.72% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1246-1250: The migration method _migrate_use_sequential currently
treats explicit layerwise=False as unset by checking "not self.layerwise", which
causes use_sequential to override an explicitly provided False; change the check
to detect whether the field was explicitly set by using '"layerwise" not in
self.model_fields_set' so you only copy use_sequential into layerwise when
layerwise was not provided by the caller, preserving explicit layerwise=False
values.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 976f04a8-a415-4ce0-9f53-7ef5c5686d8a

📥 Commits

Reviewing files that changed from the base of the PR and between b4c6a03 and 3e9db5c.

📒 Files selected for processing (2)

modelopt/torch/quantization/config.py
tests/unit/torch/quantization/test_config_validation.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/torch/quantization/test_config_validation.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 967-968: The current is_layerwise check only considers dict
recipes and misses algorithms expressed as lists (QuantizeAlgoCfgType); update
the detection for recipe_algorithm (from
recipe.quantize.model_dump().get("algorithm")) so it treats both dicts and
lists: if it's a dict keep the existing .get("layerwise", False) check, and if
it's a list, scan the list for any dict item where item.get("layerwise", False)
is True (set is_layerwise True if any match). Ensure you reference
recipe_algorithm and is_layerwise when making this change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f0df803d-ca1c-4640-90ed-c3c5230f9faf

📥 Commits

Reviewing files that changed from the base of the PR and between 3e9db5c and ad91b29.

📒 Files selected for processing (5)

examples/llm_ptq/hf_ptq.py
modelopt/torch/quantization/config.py
modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml
modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv_layerwise.yaml
tests/unit/torch/quantization/test_config_validation.py

✅ Files skipped from review due to trivial changes (1)

tests/unit/torch/quantization/test_config_validation.py

jenchen13 · 2026-04-24T20:21:38Z

+        )
+
+    recipe_algorithm = recipe.quantize.model_dump().get("algorithm") if recipe else None
+    is_layerwise = isinstance(recipe_algorithm, dict) and recipe_algorithm.get("layerwise", False)


you can check is layerwise before converting the recipe to a dict, when it is still a pydantic config via recipe.algorithm.layerwise

Agreed, we do not need to do model_dump, unless have to.

We have places do model_dump just to adapt to the existing codebase.

Addressed in f78ac50 \u2014 the helper now reads recipe.quantize.algorithm directly as a pydantic config (via a ModelOptPTQRecipe branch that recurses into it), no more .model_dump() round-trip. See examples/llm_ptq/hf_ptq.py:967-974.

cjluo-nv

Overall a clean, well-motivated PR with good tests. Two issues worth addressing:

Copyright year: The new nvfp4_experts_only-fp8_kv_layerwise.yaml uses Copyright (c) 2024 but the project's LICENSE_HEADER says 2026. Should use the current year.
Bare assert for runtime validation: The recipe type check in hf_ptq.py uses assert isinstance(recipe, ModelOptPTQRecipe). Since assert is stripped under -O, this should be a proper ValueError/TypeError raise.

cjluo-nv · 2026-04-24T21:29:33Z

@@ -0,0 +1,94 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.


Addressed in f78ac50 \u2014 bumped the header year to 2026 to match LICENSE_HEADER.

…guard - config: accept legacy `use_sequential` via AliasChoices on `layerwise` so pre-#1251 PTQ checkpoints load; still serializes as `layerwise` - recipes: split nvfp4_experts_only-fp8_kv into default (no layerwise) and _layerwise variants - hf_ptq: auto batch-size detection not supported with layerwise; default to batch_size=1 in that case - tests: cover alias accept, current-name accept, dump under current name, and extra='forbid' still rejecting unknowns Signed-off-by: realAsma <akuriparambi@nvidia.com>

…validation - examples/llm_ptq/hf_ptq.py: replace dict-inspection layerwise detection with a small recursive helper accepting ModelOptPTQRecipe directly, handling list-form QuantizeAlgoCfgType (per coderabbitai, jenchen13). - examples/llm_ptq/hf_ptq.py: convert recipe-type assert to explicit if/raise TypeError so validation is not stripped under python -O (per cjluo-nv). - modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv_layerwise.yaml: bump new-file copyright header to 2026 per LICENSE_HEADER (per cjluo-nv). Signed-off-by: realAsma <akuriparambi@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 968-973: The helper _is_layerwise currently treats dict-form
algorithms as non-layerwise because getattr(obj, "layerwise", False) returns
False for dicts; update _is_layerwise to explicitly handle dicts by checking if
obj is a dict and returning True when obj.get("layerwise") is truthy or when any
of its values (or nested algorithm entries) are layerwise (i.e., recurse into
dict values similar to list handling). Keep the existing branches for
ModelOptPTQRecipe and list, and ensure the final fallback checks dicts before
using getattr to avoid misclassifying dict algorithms and bypassing the
layerwise guard.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7a4ac195-3ed7-4078-ba4d-fadce20d6a46

📥 Commits

Reviewing files that changed from the base of the PR and between ad91b29 and f78ac50.

📒 Files selected for processing (5)

examples/llm_ptq/hf_ptq.py
modelopt/torch/quantization/config.py
modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml
modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv_layerwise.yaml
tests/unit/torch/quantization/test_config_validation.py

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml
modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv_layerwise.yaml

coderabbitai · 2026-04-24T22:13:28Z

+    def _is_layerwise(obj):
+        if isinstance(obj, ModelOptPTQRecipe):
+            return _is_layerwise(obj.quantize.algorithm)
+        if isinstance(obj, list):
+            return any(_is_layerwise(a) for a in obj)
+        return bool(getattr(obj, "layerwise", False))


⚠️ Potential issue | 🟠 Major

Handle dict-form algorithms in _is_layerwise.

At Line 973, getattr(obj, "layerwise", False) makes dict algorithms evaluate as non-layerwise. That can bypass the Line 990-994 guard and fall back to full-model batch probing.

Suggested fix

def _is_layerwise(obj): if isinstance(obj, ModelOptPTQRecipe): return _is_layerwise(obj.quantize.algorithm) + if isinstance(obj, dict): + if "layerwise" in obj: + return bool(obj["layerwise"]) + if "algorithm" in obj: + return _is_layerwise(obj["algorithm"]) + return False if isinstance(obj, list): return any(_is_layerwise(a) for a in obj) return bool(getattr(obj, "layerwise", False))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/hf_ptq.py` around lines 968 - 973, The helper _is_layerwise currently treats dict-form algorithms as non-layerwise because getattr(obj, "layerwise", False) returns False for dicts; update _is_layerwise to explicitly handle dicts by checking if obj is a dict and returning True when obj.get("layerwise") is truthy or when any of its values (or nested algorithm entries) are layerwise (i.e., recurse into dict values similar to list handling). Keep the existing branches for ModelOptPTQRecipe and list, and ensure the final fallback checks dicts before using getattr to avoid misclassifying dict algorithms and bypassing the layerwise guard.

shengliangxu

LGTM

meenchen · 2026-04-24T22:38:02Z

@@ -0,0 +1,94 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.


Okay for now. Can you compose this yaml based on the modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml once the composable recipes PR is merged?

realAsma requested a review from a team as a code owner April 21, 2026 21:24

realAsma requested a review from kinjalpatel27 April 21, 2026 21:24

shengliangxu reviewed Apr 21, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/config.py Outdated

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/config.py Outdated

jenchen13 requested changes Apr 23, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/config.py Outdated

Comment thread modelopt/torch/quantization/config.py Outdated

realAsma force-pushed the asma/fix_bwd_comaptibility branch from 3e9db5c to ad91b29 Compare April 24, 2026 20:09

realAsma requested review from a team as code owners April 24, 2026 20:09

realAsma requested a review from shengliangxu April 24, 2026 20:09

realAsma changed the title ~~fix: load PTQ checkpoints from before use_sequential→layerwise rename~~ fix: layerwise calibration backward-compat, recipe split, batch-size guard Apr 24, 2026

realAsma requested a review from jenchen13 April 24, 2026 20:12

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread examples/llm_ptq/hf_ptq.py Outdated

jenchen13 reviewed Apr 24, 2026

View reviewed changes

cjluo-nv reviewed Apr 24, 2026

View reviewed changes

realAsma added 2 commits April 24, 2026 22:07

realAsma force-pushed the asma/fix_bwd_comaptibility branch from ad91b29 to f78ac50 Compare April 24, 2026 22:08

realAsma requested review from cjluo-nv and jenchen13 April 24, 2026 22:13

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

cjluo-nv approved these changes Apr 24, 2026

View reviewed changes

shengliangxu approved these changes Apr 24, 2026

View reviewed changes

meenchen reviewed Apr 24, 2026

View reviewed changes

		@@ -0,0 +1,94 @@
		# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Conversation

realAsma commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Before your PR is Ready for review

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-24 22:12 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jenchen13 Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shengliangxu Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

meenchen Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

realAsma commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

github-actions Bot commented Apr 21, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-24 22:12 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Apr 21, 2026 •

edited

Loading

jenchen13 Apr 24, 2026 •

edited

Loading