[minor] fixes for layerwise calib + MSE by Fridah-nv · Pull Request #1344 · NVIDIA/Model-Optimizer

Fridah-nv · 2026-04-24T23:25:56Z

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved deterministic checkpoint directory resolution for quantization workflows
- Enhanced handling of resume operations for already-calibrated model runs
- Fixed cache state management during model calibration to prevent stale cached data

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

coderabbitai · 2026-04-24T23:26:08Z

📝 Walkthrough

Walkthrough

Two functions adjusted for improved checkpoint handling: resolve_checkpoint_dir now derives hash from only the quantization algorithm using deterministic JSON serialization instead of full config; layerwise_calibrate handles already-calibrated cases by skipping resume setup and consistently clears past key values during replay loops.

Changes

Cohort / File(s)	Summary
Checkpoint hash derivation `examples/llm_ptq/example_utils.py`	Modified hash computation to use only the `algorithm` portion of quantization config with deterministic JSON serialization and scalar type restriction, excluding the full `quant_cfg` and `layerwise_checkpoint_dir` field.
Calibration control flow `modelopt/torch/quantization/model_calib.py`	Updated to skip checkpoint resume setup when `start_layer == num_layers` and consistently clear `past_key_values` cache during replay loops by conditionally resetting and forcing to `None`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title '[minor] fixes for layerwise calib + MSE' refers to layerwise calibration and MSE, which aligns with changes in layerwise_calibrate control flow and checkpoint handling, but is vague about the specific fixes and doesn't clearly convey the main technical changes.	Consider a more specific title such as 'Fix layerwise calibration cache handling and checkpoint determinism' to better communicate the actual changes made to the codebase.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Pull request does not introduce critical security anti-patterns from SECURITY.md including unsafe deserialization, eval/exec usage, or insecure trust_remote_code defaults.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fridah/layerwise-mse

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-24T23:29:48Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1344/
Built to branch `gh-pages` at 2026-04-24 23:29 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

🧹 Nitpick comments (2)

modelopt/torch/quantization/model_calib.py (2)
1658-1669: cache.reset() may be redundant after forcing past_key_values=None; add justification/defensive handling.

In the replay loop you:

copy kwargs_input

call cache.reset() (if present)

then set kwargs_input["past_key_values"] = None

Since the cache object is not passed into the layer after setting to None, reset() is only useful if you’re relying on it for buffer cleanup/releasing resources before dropping the reference. If that’s the intent, a brief comment would help future readers.

If reset() can throw for some cache implementations, consider guarding it (minor hardening) so calibration doesn’t fail unexpectedly on a cache type that exposes reset but can’t safely run it:
Optional hardening diff
                     cache = kwargs_input["past_key_values"]
-                    if cache is not None and hasattr(cache, "reset"):
-                        cache.reset()
+                    if cache is not None and hasattr(cache, "reset"):
+                        try:
+                            cache.reset()
+                        except Exception:
+                            # Best-effort cleanup; we will force past_key_values=None anyway.
+                            pass
                     kwargs_input["past_key_values"] = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 1658 - 1669, The
loop clears past_key_values by setting kwargs_input["past_key_values"]=None but
also calls cache.reset() which is either redundant or risky; either remove the
reset() call and add a short comment noting we drop the cache by setting
past_key_values to None, or harden it by wrapping cache.reset() in a try/except
(logging or ignoring exceptions) and add a comment explaining it's defensive
cleanup before dropping the reference; update the replay loop around _inputs,
kwargs_input, cache, and the call site m(*args, **kwargs_input) accordingly so
behavior is explicit and safe.
1646-1653: OK to skip bootstrapping when no layers remain, but consider an early-exit optimization.

With start_layer >= num_layers, you set layer_inputs = None and the per-layer loop becomes a no-op. That’s logically consistent.

Optional: you could short-circuit earlier (after patching/unpatching strategy review) to avoid unnecessary input_getter._patch_all_layers(...) work when there’s nothing to replay/capture. Not required for correctness, but it can reduce overhead for fully-calibrated restarts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 1646 - 1653, The
per-layer loop is a no-op when start_layer >= num_layers but you still call
input_getter._patch_all_layers(...) and do bootstrapping work; add an early-exit
to skip bootstrapping and patch/unpatch steps when start_layer >= num_layers to
avoid unnecessary overhead—check the start_layer >= num_layers condition before
calling input_getter._patch_all_layers (and before any bootstrapping/patching
logic) and return or bypass the rest of the replay/capture flow so
layer_inputs/resumed_inputs/forward_loop work is skipped for fully-calibrated
restarts.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1658-1669: The loop clears past_key_values by setting
kwargs_input["past_key_values"]=None but also calls cache.reset() which is
either redundant or risky; either remove the reset() call and add a short
comment noting we drop the cache by setting past_key_values to None, or harden
it by wrapping cache.reset() in a try/except (logging or ignoring exceptions)
and add a comment explaining it's defensive cleanup before dropping the
reference; update the replay loop around _inputs, kwargs_input, cache, and the
call site m(*args, **kwargs_input) accordingly so behavior is explicit and safe.
- Around line 1646-1653: The per-layer loop is a no-op when start_layer >=
num_layers but you still call input_getter._patch_all_layers(...) and do
bootstrapping work; add an early-exit to skip bootstrapping and patch/unpatch
steps when start_layer >= num_layers to avoid unnecessary overhead—check the
start_layer >= num_layers condition before calling
input_getter._patch_all_layers (and before any bootstrapping/patching logic) and
return or bypass the rest of the replay/capture flow so
layer_inputs/resumed_inputs/forward_loop work is skipped for fully-calibrated
restarts.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f912c3b9-748a-4790-92cf-c62b7dc38086

📥 Commits

Reviewing files that changed from the base of the PR and between 7c80d85 and b772ab9.

📒 Files selected for processing (2)

examples/llm_ptq/example_utils.py
modelopt/torch/quantization/model_calib.py

codecov · 2026-04-24T23:38:40Z

Codecov Report

❌ Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.36%. Comparing base (5ffb848) to head (b772ab9).
⚠️ Report is 16 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/model_calib.py	85.71%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1344      +/-   ##
==========================================
- Coverage   75.40%   73.36%   -2.04%     
==========================================
  Files         464      485      +21     
  Lines       50036    53567    +3531     
==========================================
+ Hits        37729    39301    +1572     
- Misses      12307    14266    +1959

Flag	Coverage Δ
examples	`41.53% <0.00%> (+0.86%)`	⬆️
gpu	`58.58% <85.71%> (-0.48%)`	⬇️
regression	`14.86% <0.00%> (+0.07%)`	⬆️
unit	`52.72% <57.14%> (+0.39%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

small fixes for layerwise calib

b772ab9

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv requested review from kinjalpatel27, realAsma and sugunav14 April 24, 2026 23:25

Fridah-nv self-assigned this Apr 24, 2026

Fridah-nv requested review from a team as code owners April 24, 2026 23:25

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

realAsma approved these changes Apr 25, 2026

View reviewed changes

sugunav14 approved these changes Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[minor] fixes for layerwise calib + MSE#1344

[minor] fixes for layerwise calib + MSE#1344
Fridah-nv wants to merge 1 commit intomainfrom
fridah/layerwise-mse

Fridah-nv commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions Bot commented Apr 24, 2026

Built to branch `gh-pages` at 2026-04-24 23:29 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Fridah-nv commented Apr 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions Bot commented Apr 24, 2026

Built to branch gh-pages at 2026-04-24 23:29 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fridah-nv commented Apr 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-24 23:29 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Apr 24, 2026 •

edited

Loading