Skip to content

fix: layerwise calibration backward-compat, recipe split, batch-size guard#1310

Open
realAsma wants to merge 2 commits intomainfrom
asma/fix_bwd_comaptibility
Open

fix: layerwise calibration backward-compat, recipe split, batch-size guard#1310
realAsma wants to merge 2 commits intomainfrom
asma/fix_bwd_comaptibility

Conversation

@realAsma
Copy link
Copy Markdown
Contributor

@realAsma realAsma commented Apr 21, 2026

Summary

Follow-up to #1251 (which renamed use_sequentiallayerwise). Three related fixes bundled:

  1. Backward-compatible config loading. PTQ checkpoints saved before Add layerwise calibration for large models #1251 store the
    legacy use_sequential key in the calibration-algorithm config, so loading them now
    raises ValidationError: Extra inputs are not permitted (use_sequential) because
    QuantizeAlgorithmConfig uses extra='forbid'. Accept use_sequential as an alias
    for layerwise via AliasChoices. The field still serializes as layerwise, so
    round-trips through the current schema are clean.

  2. Recipe split. nvfp4_experts_only-fp8_kv previously enabled layerwise calibration
    by default, which changes the calibration flow materially. Split into two recipes:

    • nvfp4_experts_only-fp8_kv.yaml — default (no layerwise)
    • nvfp4_experts_only-fp8_kv_layerwise.yaml — layerwise variant
  3. hf_ptq batch-size guard. Auto batch-size detection is not supported together
    with layerwise calibration. Default to batch_size=1 when layerwise is enabled and
    the user hasn't set a batch size explicitly.

Originally reported by Jenny Chen while resuming a PTQ checkpoint via
restore_sharded_modelopt_state:

pydantic_core._pydantic_core.ValidationError: 1 validation error for MaxCalibConfig
use_sequential
  Extra inputs are not permitted [type=extra_forbidden, input_value=False, input_type=bool]

Test plan

  • tests/unit/torch/quantization/test_config_validation.py — legacy alias accepted, current name accepted, dump serializes under current name, extra='forbid' still rejects unknown keys.
  • pre-commit run — clean.

Before your PR is Ready for review

Summary by CodeRabbit

  • New Features

    • New PTQ recipe for NVFP4 experts-only with layerwise calibration and FP8 KV cache quantization.
  • Improvements

    • Added support for legacy field naming (use_sequential) in quantization configurations.
    • Enhanced automatic batch size handling for layerwise quantization recipes.

@realAsma realAsma requested a review from a team as a code owner April 21, 2026 21:24
@realAsma realAsma requested a review from kinjalpatel27 April 21, 2026 21:24
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

📝 Walkthrough

Walkthrough

Adds a Pydantic validation alias so legacy use_sequential maps to layerwise, extends tests to validate alias behavior and serialization, preloads PTQ recipes and forces batch-size=1 when detecting layerwise calibration during probing, and updates/introduces NVFP4 PTQ recipes reflecting layerwise changes.

Changes

Cohort / File(s) Summary
Config declaration
modelopt/torch/quantization/config.py
Added AliasChoices-based validation so use_sequential is accepted as an input alias for the QuantizeAlgorithmConfig.layerwise boolean field. No other calibration validation logic changed.
Tests
tests/unit/torch/quantization/test_config_validation.py
Added tests to verify use_sequential is accepted as legacy input, maps to layerwise correctly, serialization emits canonical layerwise, and unknown fields still raise ValidationError.
Example PTQ runner
examples/llm_ptq/hf_ptq.py
Moves recipe loading earlier (when provided), validates recipe type once, inspects quantize.algorithm for layerwise (including nested/list forms), forces args.batch_size = 1 during probing for layerwise recipes, and reuses preloaded recipe for mono-quantization flow.
Recipes
modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml, modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv_layerwise.yaml
Removed explicit layerwise: true from one recipe and added a new layerwise-focused NVFP4 PTQ recipe with targeted enable/disable quantizer settings for expert and KV paths and updated metadata wording.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title comprehensively describes all three key fixes: backward compatibility for legacy use_sequential, recipe split, and batch-size logic guard for layerwise calibration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed The pull request introduces no security anti-patterns. Changes include Pydantic validation alias, unit tests, batch-size guard logic, and YAML recipe files with no unsafe patterns detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch asma/fix_bwd_comaptibility

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1310/

Built to branch gh-pages at 2026-04-24 22:12 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Comment thread modelopt/torch/quantization/config.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.58%. Comparing base (946639a) to head (f78ac50).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1310      +/-   ##
==========================================
- Coverage   75.69%   75.58%   -0.11%     
==========================================
  Files         467      471       +4     
  Lines       50334    51052     +718     
==========================================
+ Hits        38099    38590     +491     
- Misses      12235    12462     +227     
Flag Coverage Δ
examples 41.60% <100.00%> (+0.92%) ⬆️
gpu 58.35% <100.00%> (-0.50%) ⬇️
regression 14.77% <100.00%> (+0.07%) ⬆️
unit 52.72% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1246-1250: The migration method _migrate_use_sequential currently
treats explicit layerwise=False as unset by checking "not self.layerwise", which
causes use_sequential to override an explicitly provided False; change the check
to detect whether the field was explicitly set by using '"layerwise" not in
self.model_fields_set' so you only copy use_sequential into layerwise when
layerwise was not provided by the caller, preserving explicit layerwise=False
values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 976f04a8-a415-4ce0-9f53-7ef5c5686d8a

📥 Commits

Reviewing files that changed from the base of the PR and between b4c6a03 and 3e9db5c.

📒 Files selected for processing (2)
  • modelopt/torch/quantization/config.py
  • tests/unit/torch/quantization/test_config_validation.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/torch/quantization/test_config_validation.py

Comment thread modelopt/torch/quantization/config.py Outdated
Comment thread modelopt/torch/quantization/config.py Outdated
Comment thread modelopt/torch/quantization/config.py Outdated
@realAsma realAsma force-pushed the asma/fix_bwd_comaptibility branch from 3e9db5c to ad91b29 Compare April 24, 2026 20:09
@realAsma realAsma requested review from a team as code owners April 24, 2026 20:09
@realAsma realAsma requested a review from shengliangxu April 24, 2026 20:09
@realAsma realAsma changed the title fix: load PTQ checkpoints from before use_sequential→layerwise rename fix: layerwise calibration backward-compat, recipe split, batch-size guard Apr 24, 2026
@realAsma realAsma requested a review from jenchen13 April 24, 2026 20:12
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 967-968: The current is_layerwise check only considers dict
recipes and misses algorithms expressed as lists (QuantizeAlgoCfgType); update
the detection for recipe_algorithm (from
recipe.quantize.model_dump().get("algorithm")) so it treats both dicts and
lists: if it's a dict keep the existing .get("layerwise", False) check, and if
it's a list, scan the list for any dict item where item.get("layerwise", False)
is True (set is_layerwise True if any match). Ensure you reference
recipe_algorithm and is_layerwise when making this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f0df803d-ca1c-4640-90ed-c3c5230f9faf

📥 Commits

Reviewing files that changed from the base of the PR and between 3e9db5c and ad91b29.

📒 Files selected for processing (5)
  • examples/llm_ptq/hf_ptq.py
  • modelopt/torch/quantization/config.py
  • modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml
  • modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv_layerwise.yaml
  • tests/unit/torch/quantization/test_config_validation.py
✅ Files skipped from review due to trivial changes (1)
  • tests/unit/torch/quantization/test_config_validation.py

Comment thread examples/llm_ptq/hf_ptq.py Outdated
Comment thread modelopt/torch/quantization/config.py
Comment thread examples/llm_ptq/hf_ptq.py Outdated
)

recipe_algorithm = recipe.quantize.model_dump().get("algorithm") if recipe else None
is_layerwise = isinstance(recipe_algorithm, dict) and recipe_algorithm.get("layerwise", False)
Copy link
Copy Markdown
Contributor

@jenchen13 jenchen13 Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can check is layerwise before converting the recipe to a dict, when it is still a pydantic config via recipe.algorithm.layerwise

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we do not need to do model_dump, unless have to.

We have places do model_dump just to adapt to the existing codebase.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f78ac50 \u2014 the helper now reads recipe.quantize.algorithm directly as a pydantic config (via a ModelOptPTQRecipe branch that recurses into it), no more .model_dump() round-trip. See examples/llm_ptq/hf_ptq.py:967-974.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a clean, well-motivated PR with good tests. Two issues worth addressing:

  1. Copyright year: The new nvfp4_experts_only-fp8_kv_layerwise.yaml uses Copyright (c) 2024 but the project's LICENSE_HEADER says 2026. Should use the current year.

  2. Bare assert for runtime validation: The recipe type check in hf_ptq.py uses assert isinstance(recipe, ModelOptPTQRecipe). Since assert is stripped under -O, this should be a proper ValueError/TypeError raise.

@@ -0,0 +1,94 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project's LICENSE_HEADER specifies Copyright (c) 2026. This new file has Copyright (c) 2024. Please update to match the canonical header.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f78ac50 \u2014 bumped the header year to 2026 to match LICENSE_HEADER.

Comment thread examples/llm_ptq/hf_ptq.py
…guard

- config: accept legacy `use_sequential` via AliasChoices on `layerwise`
  so pre-#1251 PTQ checkpoints load; still serializes as `layerwise`
- recipes: split nvfp4_experts_only-fp8_kv into default (no layerwise) and
  _layerwise variants
- hf_ptq: auto batch-size detection not supported with layerwise; default
  to batch_size=1 in that case
- tests: cover alias accept, current-name accept, dump under current name,
  and extra='forbid' still rejecting unknowns

Signed-off-by: realAsma <akuriparambi@nvidia.com>
…validation

- examples/llm_ptq/hf_ptq.py: replace dict-inspection layerwise detection
  with a small recursive helper accepting ModelOptPTQRecipe directly,
  handling list-form QuantizeAlgoCfgType (per coderabbitai, jenchen13).
- examples/llm_ptq/hf_ptq.py: convert recipe-type assert to explicit
  if/raise TypeError so validation is not stripped under python -O
  (per cjluo-nv).
- modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv_layerwise.yaml:
  bump new-file copyright header to 2026 per LICENSE_HEADER (per cjluo-nv).

Signed-off-by: realAsma <akuriparambi@nvidia.com>
@realAsma realAsma force-pushed the asma/fix_bwd_comaptibility branch from ad91b29 to f78ac50 Compare April 24, 2026 22:08
@realAsma realAsma requested review from cjluo-nv and jenchen13 April 24, 2026 22:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 968-973: The helper _is_layerwise currently treats dict-form
algorithms as non-layerwise because getattr(obj, "layerwise", False) returns
False for dicts; update _is_layerwise to explicitly handle dicts by checking if
obj is a dict and returning True when obj.get("layerwise") is truthy or when any
of its values (or nested algorithm entries) are layerwise (i.e., recurse into
dict values similar to list handling). Keep the existing branches for
ModelOptPTQRecipe and list, and ensure the final fallback checks dicts before
using getattr to avoid misclassifying dict algorithms and bypassing the
layerwise guard.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7a4ac195-3ed7-4078-ba4d-fadce20d6a46

📥 Commits

Reviewing files that changed from the base of the PR and between ad91b29 and f78ac50.

📒 Files selected for processing (5)
  • examples/llm_ptq/hf_ptq.py
  • modelopt/torch/quantization/config.py
  • modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml
  • modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv_layerwise.yaml
  • tests/unit/torch/quantization/test_config_validation.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml
  • modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv_layerwise.yaml

Comment on lines +968 to +973
def _is_layerwise(obj):
if isinstance(obj, ModelOptPTQRecipe):
return _is_layerwise(obj.quantize.algorithm)
if isinstance(obj, list):
return any(_is_layerwise(a) for a in obj)
return bool(getattr(obj, "layerwise", False))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle dict-form algorithms in _is_layerwise.

At Line 973, getattr(obj, "layerwise", False) makes dict algorithms evaluate as non-layerwise. That can bypass the Line 990-994 guard and fall back to full-model batch probing.

Suggested fix
     def _is_layerwise(obj):
         if isinstance(obj, ModelOptPTQRecipe):
             return _is_layerwise(obj.quantize.algorithm)
+        if isinstance(obj, dict):
+            if "layerwise" in obj:
+                return bool(obj["layerwise"])
+            if "algorithm" in obj:
+                return _is_layerwise(obj["algorithm"])
+            return False
         if isinstance(obj, list):
             return any(_is_layerwise(a) for a in obj)
         return bool(getattr(obj, "layerwise", False))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/hf_ptq.py` around lines 968 - 973, The helper _is_layerwise
currently treats dict-form algorithms as non-layerwise because getattr(obj,
"layerwise", False) returns False for dicts; update _is_layerwise to explicitly
handle dicts by checking if obj is a dict and returning True when
obj.get("layerwise") is truthy or when any of its values (or nested algorithm
entries) are layerwise (i.e., recurse into dict values similar to list
handling). Keep the existing branches for ModelOptPTQRecipe and list, and ensure
the final fallback checks dicts before using getattr to avoid misclassifying
dict algorithms and bypassing the layerwise guard.

Copy link
Copy Markdown
Collaborator

@shengliangxu shengliangxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,94 @@
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay for now. Can you compose this yaml based on the modelopt_recipes/general/ptq/nvfp4_experts_only-fp8_kv.yaml once the composable recipes PR is merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants