Skip to content

[minor] fixes for layerwise calib + MSE#1344

Open
Fridah-nv wants to merge 1 commit intomainfrom
fridah/layerwise-mse
Open

[minor] fixes for layerwise calib + MSE#1344
Fridah-nv wants to merge 1 commit intomainfrom
fridah/layerwise-mse

Conversation

@Fridah-nv
Copy link
Copy Markdown
Contributor

@Fridah-nv Fridah-nv commented Apr 24, 2026

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved deterministic checkpoint directory resolution for quantization workflows
    • Enhanced handling of resume operations for already-calibrated model runs
    • Fixed cache state management during model calibration to prevent stale cached data

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv self-assigned this Apr 24, 2026
@Fridah-nv Fridah-nv requested review from a team as code owners April 24, 2026 23:25
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

Two functions adjusted for improved checkpoint handling: resolve_checkpoint_dir now derives hash from only the quantization algorithm using deterministic JSON serialization instead of full config; layerwise_calibrate handles already-calibrated cases by skipping resume setup and consistently clears past key values during replay loops.

Changes

Cohort / File(s) Summary
Checkpoint hash derivation
examples/llm_ptq/example_utils.py
Modified hash computation to use only the algorithm portion of quantization config with deterministic JSON serialization and scalar type restriction, excluding the full quant_cfg and layerwise_checkpoint_dir field.
Calibration control flow
modelopt/torch/quantization/model_calib.py
Updated to skip checkpoint resume setup when start_layer == num_layers and consistently clear past_key_values cache during replay loops by conditionally resetting and forcing to None.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title '[minor] fixes for layerwise calib + MSE' refers to layerwise calibration and MSE, which aligns with changes in layerwise_calibrate control flow and checkpoint handling, but is vague about the specific fixes and doesn't clearly convey the main technical changes. Consider a more specific title such as 'Fix layerwise calibration cache handling and checkpoint determinism' to better communicate the actual changes made to the codebase.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed Pull request does not introduce critical security anti-patterns from SECURITY.md including unsafe deserialization, eval/exec usage, or insecure trust_remote_code defaults.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fridah/layerwise-mse

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1344/

Built to branch gh-pages at 2026-04-24 23:29 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
modelopt/torch/quantization/model_calib.py (2)

1658-1669: cache.reset() may be redundant after forcing past_key_values=None; add justification/defensive handling.

In the replay loop you:

  • copy kwargs_input
  • call cache.reset() (if present)
  • then set kwargs_input["past_key_values"] = None

Since the cache object is not passed into the layer after setting to None, reset() is only useful if you’re relying on it for buffer cleanup/releasing resources before dropping the reference. If that’s the intent, a brief comment would help future readers.

If reset() can throw for some cache implementations, consider guarding it (minor hardening) so calibration doesn’t fail unexpectedly on a cache type that exposes reset but can’t safely run it:

Optional hardening diff
                     cache = kwargs_input["past_key_values"]
-                    if cache is not None and hasattr(cache, "reset"):
-                        cache.reset()
+                    if cache is not None and hasattr(cache, "reset"):
+                        try:
+                            cache.reset()
+                        except Exception:
+                            # Best-effort cleanup; we will force past_key_values=None anyway.
+                            pass
                     kwargs_input["past_key_values"] = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 1658 - 1669, The
loop clears past_key_values by setting kwargs_input["past_key_values"]=None but
also calls cache.reset() which is either redundant or risky; either remove the
reset() call and add a short comment noting we drop the cache by setting
past_key_values to None, or harden it by wrapping cache.reset() in a try/except
(logging or ignoring exceptions) and add a comment explaining it's defensive
cleanup before dropping the reference; update the replay loop around _inputs,
kwargs_input, cache, and the call site m(*args, **kwargs_input) accordingly so
behavior is explicit and safe.

1646-1653: OK to skip bootstrapping when no layers remain, but consider an early-exit optimization.

With start_layer >= num_layers, you set layer_inputs = None and the per-layer loop becomes a no-op. That’s logically consistent.

Optional: you could short-circuit earlier (after patching/unpatching strategy review) to avoid unnecessary input_getter._patch_all_layers(...) work when there’s nothing to replay/capture. Not required for correctness, but it can reduce overhead for fully-calibrated restarts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 1646 - 1653, The
per-layer loop is a no-op when start_layer >= num_layers but you still call
input_getter._patch_all_layers(...) and do bootstrapping work; add an early-exit
to skip bootstrapping and patch/unpatch steps when start_layer >= num_layers to
avoid unnecessary overhead—check the start_layer >= num_layers condition before
calling input_getter._patch_all_layers (and before any bootstrapping/patching
logic) and return or bypass the rest of the replay/capture flow so
layer_inputs/resumed_inputs/forward_loop work is skipped for fully-calibrated
restarts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1658-1669: The loop clears past_key_values by setting
kwargs_input["past_key_values"]=None but also calls cache.reset() which is
either redundant or risky; either remove the reset() call and add a short
comment noting we drop the cache by setting past_key_values to None, or harden
it by wrapping cache.reset() in a try/except (logging or ignoring exceptions)
and add a comment explaining it's defensive cleanup before dropping the
reference; update the replay loop around _inputs, kwargs_input, cache, and the
call site m(*args, **kwargs_input) accordingly so behavior is explicit and safe.
- Around line 1646-1653: The per-layer loop is a no-op when start_layer >=
num_layers but you still call input_getter._patch_all_layers(...) and do
bootstrapping work; add an early-exit to skip bootstrapping and patch/unpatch
steps when start_layer >= num_layers to avoid unnecessary overhead—check the
start_layer >= num_layers condition before calling
input_getter._patch_all_layers (and before any bootstrapping/patching logic) and
return or bypass the rest of the replay/capture flow so
layer_inputs/resumed_inputs/forward_loop work is skipped for fully-calibrated
restarts.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f912c3b9-748a-4790-92cf-c62b7dc38086

📥 Commits

Reviewing files that changed from the base of the PR and between 7c80d85 and b772ab9.

📒 Files selected for processing (2)
  • examples/llm_ptq/example_utils.py
  • modelopt/torch/quantization/model_calib.py

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.36%. Comparing base (5ffb848) to head (b772ab9).
⚠️ Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/model_calib.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1344      +/-   ##
==========================================
- Coverage   75.40%   73.36%   -2.04%     
==========================================
  Files         464      485      +21     
  Lines       50036    53567    +3531     
==========================================
+ Hits        37729    39301    +1572     
- Misses      12307    14266    +1959     
Flag Coverage Δ
examples 41.53% <0.00%> (+0.86%) ⬆️
gpu 58.58% <85.71%> (-0.48%) ⬇️
regression 14.86% <0.00%> (+0.07%) ⬆️
unit 52.72% <57.14%> (+0.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants