Add demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/pruning/ with README and notebooks by achidiac-nv · Pull Request #1320 · NVIDIA/Model-Optimizer

achidiac-nv · 2026-04-22T15:30:24Z

What does this PR do?

Type of change: new documentation/example (tutorial + notebooks)

Adds an end-to-end pruning & distillation guide under examples/pruning_demo/, walking users through structural compression of Qwen3-8B with NVIDIA Model-Optimizer.

The example compares two methods side-by-side on two concrete scenarios:

Scenario 1 — Moderate compression (7B parameter target): homogeneous pruning with Minitron vs. heterogeneous NAS-based pruning with Puzzletron.
Scenario 2 — Aggressive compression (78,000 MiB memory budget): same comparison under a hard memory constraint.

Both scenarios are followed by knowledge distillation and evaluated on MMLU (end-to-end in the notebooks) plus HellaSwag and GSM8K (reported in the guide).

Contents:

README.md — full guide (setup, two scenarios, head-to-head analysis, inference benchmarks with vLLM + AIPerf, decision rules, limitations, open questions).
00_prerequisites.ipynb — data prep (WikiText-103 → Megatron binary) and teacher baseline evaluation.
scenario1_minitron.ipynb / scenario1_puzzletron.ipynb — 7B-param target.
scenario2_minitron.ipynb / scenario2_puzzletron.ipynb — 78k-MiB target, including a Puzzletron memory-sweep bonus section.
advanced_compression_experiments.md — extended results (larger distillation budgets with Nemotron-Post-Training-Dataset-v2, BLD, chained Minitron→Puzzletron, Mamba-Transformer hybrid).
Companion plots (summary_chart.png, distillation_curves.png, memory_sweep_combined.png, all_curves_throughput_vs_latency.png, ...).

Usage

Follow setup instructions in README.md then run, in order:

00_prerequisites.ipynb — prepare data + baseline eval (~15 min).
One (or more) of the scenario notebooks:

scenario1_minitron.ipynb (~1h45)
scenario1_puzzletron.ipynb (~6h first run)
scenario2_minitron.ipynb (~45 min)
scenario2_puzzletron.ipynb (~6h15 first run)

Testing

All four scenario notebooks were executed and tested end-to-end on 2x H200 GPUs
Inference benchmarks were captured on 1x H200 NVL with vLLM (AnyModel backend for Puzzletron checkpoints) and AIPerf
No library code is modified, so no unit tests are affected

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ (documentation/examples-only addition)
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A — no new runtime dependencies; the notebooks use lm-eval==0.4.8 and the existing ModelOpt/NeMo stack. The vLLM serving appendix references an open PR ([Model] Add AnyModel: generic support for NAS-optimized heterogeneous architectures vllm-project/vllm#36512) for Puzzletron AnyModel support, clearly flagged as pre-release.
Did you write any new necessary tests?: N/A — tutorial / documentation example.
Did you update Changelog?: N/A

Additional Information

Base model: https://huggingface.co/Qwen/Qwen3-8B
Calibration dataset: nvidia/Nemotron-Post-Training-Dataset-v2
Distillation dataset: WikiText-103
Complements the existing examples/puzzletron/ and examples/megatron_bridge/ READMEs with a scenario-driven narrative and a direct Minitron↔Puzzletron comparison.

Summary by CodeRabbit

Documentation
- Added a comprehensive end-to-end pruning & distillation guide covering setup, workflows, benchmark tables, inference benchmarking, limitations, and a deployment appendix.
New Features
- Added notebooks for prerequisites and four scenario pipelines demonstrating dataset preparation, compression workflows, distillation, evaluation, and expected results for Qwen3-8B.

…xamples/ with README and notebooks Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>

Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>

coderabbitai · 2026-04-22T15:30:39Z

📝 Walkthrough

Walkthrough

Adds a new pruning-and-distillation demo set: a prerequisites notebook, multiple end-to-end scenario notebooks for Minitron and Puzzletron, and detailed documentation pages describing experiments, benchmarking, distillation, and deployment/benchmark instructions.

Changes

Cohort / File(s)	Summary
Guide & Experiments `examples/pruning/demo/README.md`, `examples/pruning/demo/advanced_compression_experiments.md`	New comprehensive documentation covering two compression scenarios, experimental results (MMLU/HellaSwag/GSM8K), benchmarking methodology, inference measurements, deployment steps, and analysis/comparison of Minitron vs Puzzletron.
Prerequisites / Preprocessing `examples/pruning/demo/00_prerequisites.ipynb`	New notebook that prepares environment (replaces in-venv `modelopt`), ensures dataset dirs, downloads WikiText-103 train split, serializes to JSONL, tokenizes with Qwen3-8B tokenizer into Megatron-Bridge `.bin/.idx`, and runs an lm_eval teacher baseline (Qwen3-8B, bfloat16, 5-shot).
Scenario 1 — Minitron `examples/pruning/demo/scenario1_minitron.ipynb`	New end-to-end Minitron homogeneous pruning demo: pruning run, checkpoint export/verification, pre-distillation MMLU, TensorBoard-monitored logits distillation, checkpoint conversion to HuggingFace, and post-distillation MMLU evaluation.
Scenario 1 — Puzzletron `examples/pruning/demo/scenario1_puzzletron.ipynb`	New end-to-end Puzzletron heterogeneous NAS pruning demo: dependency setup, calibration dataset prep, YAML edits, NAS search, pruned checkpoint evaluation, TensorBoard-monitored distillation, and final MMLU evaluation.
Scenario 2 — Minitron `examples/pruning/demo/scenario2_minitron.ipynb`	New aggressive depth-pruning demo (36→22 layers): prune, export verification, pre-distillation MMLU, TensorBoard-monitored distillation, convert to HuggingFace, and post-distillation MMLU.
Scenario 2 — Puzzletron `examples/pruning/demo/scenario2_puzzletron.ipynb`	New Puzzletron memory-constrained workflow (target_memory ≈ 78,000 MiB): dataset prep, YAML edits, NAS/MIP search, patch generated solution, pruned evaluation, heterogeneous distillation, optional memory-sweep mode, and evaluation artifacts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately and clearly describes the main change: adding a comprehensive demo with Puzzletron and Minitron guides, README, and notebooks to the pruning examples directory. It is specific, concise, and directly related to the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Pull request adds documentation and tutorial notebooks with no unsafe deserialization, remote code execution, dynamic code execution on untrusted input, security bypass comments, or unsafe subprocess patterns. All code adheres to SECURITY.md guidelines.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch achidiac/pruning_demo

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-04-22T15:39:11Z

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details

{}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (4)

examples/pruning_demo/README.md (3)

683-685: Add trailing newline at end of file.

Per markdown conventions (MD047), files should end with a single newline character.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/pruning_demo/README.md` around lines 683 - 685, The README.md file
is missing a trailing newline; add a single newline character at the end of the
file (after the final line containing the note about Minitron models and
baseline) so the file ends with exactly one newline to satisfy MD047.

320-357: Add language identifier to fenced code block.

The architecture details code block lacks a language specification, which affects syntax highlighting and accessibility.

📝 Suggested fix

-```
+```text
 block_0:   attention  kv_heads_8    ffn  intermediate_12288

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/pruning_demo/README.md` around lines 320 - 357, The fenced code
block that starts with the line "block_0:   attention  kv_heads_8    ffn 
intermediate_12288" should include a language identifier to enable proper
highlighting; update the opening triple-backtick for that block (the block
showing block_0...block_35) to use a language token such as ```text (or
```plain) so the README's architecture details code block is marked correctly.

5-21: Fix markdown formatting issues in Table of Contents.

Lines 5 and 21 have spaces inside link text which violates markdown best practices:

📝 Suggested fix

-1.[ Introduction](`#1-introduction`)
+1. [Introduction](`#1-introduction`)

-10.[ References](`#10-references`)
+10. [References](`#10-references`)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/pruning_demo/README.md` around lines 5 - 21, Fix the markdown link
text spacing in the Table of Contents by removing the extra spaces inside the
square brackets for the affected entries: change "1.[
Introduction](`#1-introduction`)" to "1. [Introduction](`#1-introduction`)" and
"10.[ References](`#10-references`)" to "10. [References](`#10-references`) so the
link text has no leading/trailing spaces and spacing after the list number is
consistent; verify similar entries follow the same "N. [Text](`#anchor`)" pattern.

examples/pruning_demo/00_prerequisites.ipynb (1)

37-41: Hardcoded Python version in path may break on different container versions.

The path /opt/venv/lib/python3.12/site-packages/modelopt assumes Python 3.12. If the container or environment uses a different Python version, this will silently fail to replace the modelopt installation.

🔧 Suggested improvement using dynamic Python version

-!rm -rf /opt/venv/lib/python3.12/site-packages/modelopt
-!cp -r /workspace/Model-Optimizer/modelopt /opt/venv/lib/python3.12/site-packages/modelopt
+import sys
+site_packages = f"/opt/venv/lib/python{sys.version_info.major}.{sys.version_info.minor}/site-packages"
+!rm -rf {site_packages}/modelopt
+!cp -r /workspace/Model-Optimizer/modelopt {site_packages}/modelopt
 !mkdir -p /workspace/datasets

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/pruning_demo/00_prerequisites.ipynb` around lines 37 - 41, The
notebook currently hardcodes /opt/venv/lib/python3.12/site-packages/modelopt
which will fail for other Python versions; replace the three shell commands with
dynamic site-packages detection (e.g. set PY_SITE=$(python -c 'import sysconfig;
print(sysconfig.get_paths()[\"purelib\"])') ) and then run rm -rf
"$PY_SITE/modelopt" and cp -r /workspace/Model-Optimizer/modelopt "$PY_SITE"/
and keep the mkdir -p /workspace/datasets line—update the cell that contains the
rm, cp and mkdir commands to use the PY_SITE variable instead of the 3.12
hardcoded path.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/pruning_demo/scenario1_minitron.ipynb`:
- Around line 193-195: The cell calls subprocess.run("pkill -f tensorboard") but
does not import subprocess in that cell; add an import for subprocess (e.g.,
import subprocess) before the subprocess.run call or use from subprocess import
run and call run(...) so subprocess.run is defined (refer to the subprocess.run
invocation to locate the call and add the import in the same notebook cell).

In `@examples/pruning_demo/scenario1_puzzletron.ipynb`:
- Around line 231-233: The cell calls subprocess.run(["pkill", "-f",
"tensorboard"]) but does not import subprocess locally, which can raise
NameError if cells are run out of order; add an explicit import subprocess at
the top of the same notebook cell (or the cell immediately above) where
subprocess.run is used so the call in that cell always has the subprocess symbol
available.
- Around line 135-148: Remove the monkey-patch that prepends __version__ =
"0.4.8" into the installed lm_eval/__init__.py (the sed command that writes that
line); instead rely on the existing version-check/warning logic in
examples/llm_eval/lm_eval_hf.py (lines handling version mismatch) or document a
specific lm_eval prerequisite, and if the dtype issue remains, fix the exported
config file in the workspace (the sed that replaces "torch.bfloat16" ->
"bfloat16" in the solution config) or correct the Puzzletron export upstream
rather than editing site-packages.

In `@examples/pruning_demo/scenario2_minitron.ipynb`:
- Around line 167-169: The cell uses subprocess.run(["pkill", "-f",
"tensorboard"]) but doesn't import subprocess locally; add an explicit import
subprocess at the top of this cell (or merge the TensorBoard start/stop logic
into one cell) so subprocess.run is always defined even if cells are executed
out of order; ensure the import appears before the subprocess.run call to avoid
NameError.

In `@examples/pruning_demo/scenario2_puzzletron.ipynb`:
- Around line 364-366: The cell calls subprocess.run([ "pkill", "-f",
"tensorboard" ]) but never imports the subprocess module; add an import
subprocess statement (e.g., at the top of this cell or the notebook) so
subprocess.run is defined, and apply the same fix to all scenario notebooks that
use subprocess.run to keep them consistent.

---

Nitpick comments:
In `@examples/pruning_demo/00_prerequisites.ipynb`:
- Around line 37-41: The notebook currently hardcodes
/opt/venv/lib/python3.12/site-packages/modelopt which will fail for other Python
versions; replace the three shell commands with dynamic site-packages detection
(e.g. set PY_SITE=$(python -c 'import sysconfig;
print(sysconfig.get_paths()[\"purelib\"])') ) and then run rm -rf
"$PY_SITE/modelopt" and cp -r /workspace/Model-Optimizer/modelopt "$PY_SITE"/
and keep the mkdir -p /workspace/datasets line—update the cell that contains the
rm, cp and mkdir commands to use the PY_SITE variable instead of the 3.12
hardcoded path.

In `@examples/pruning_demo/README.md`:
- Around line 683-685: The README.md file is missing a trailing newline; add a
single newline character at the end of the file (after the final line containing
the note about Minitron models and baseline) so the file ends with exactly one
newline to satisfy MD047.
- Around line 320-357: The fenced code block that starts with the line "block_0:
attention  kv_heads_8    ffn  intermediate_12288" should include a language
identifier to enable proper highlighting; update the opening triple-backtick for
that block (the block showing block_0...block_35) to use a language token such
as ```text (or ```plain) so the README's architecture details code block is
marked correctly.
- Around line 5-21: Fix the markdown link text spacing in the Table of Contents
by removing the extra spaces inside the square brackets for the affected
entries: change "1.[ Introduction](`#1-introduction`)" to "1.
[Introduction](`#1-introduction`)" and "10.[ References](`#10-references`)" to "10.
[References](`#10-references`) so the link text has no leading/trailing spaces and
spacing after the list number is consistent; verify similar entries follow the
same "N. [Text](`#anchor`)" pattern.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0c466b7c-b784-496d-802c-dfe00cad04ea

📥 Commits

Reviewing files that changed from the base of the PR and between c417e6f and beced98.

⛔ Files ignored due to path filters (6)

examples/pruning_demo/all_curves_throughput_vs_latency.png is excluded by !**/*.png
examples/pruning_demo/distillation_curves.png is excluded by !**/*.png
examples/pruning_demo/distillation_loss_7B.png is excluded by !**/*.png
examples/pruning_demo/memory_sweep.png is excluded by !**/*.png
examples/pruning_demo/memory_sweep_combined.png is excluded by !**/*.png
examples/pruning_demo/summary_chart.png is excluded by !**/*.png

📒 Files selected for processing (7)

examples/pruning_demo/00_prerequisites.ipynb
examples/pruning_demo/README.md
examples/pruning_demo/advanced_compression_experiments.md
examples/pruning_demo/scenario1_minitron.ipynb
examples/pruning_demo/scenario1_puzzletron.ipynb
examples/pruning_demo/scenario2_minitron.ipynb
examples/pruning_demo/scenario2_puzzletron.ipynb

codecov · 2026-04-22T15:45:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.25%. Comparing base (1ec931c) to head (8ee6a65).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1320      +/-   ##
==========================================
- Coverage   75.72%   75.25%   -0.48%     
==========================================
  Files         471      484      +13     
  Lines       50375    51389    +1014     
==========================================
+ Hits        38146    38671     +525     
- Misses      12229    12718     +489

Flag	Coverage Δ
examples	`41.60% <ø> (+0.98%)`	⬆️
unit	`52.76% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…lint issues Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

examples/pruning/demo/scenario1_puzzletron.ipynb (1)

136-140: ⚠️ Potential issue | 🟠 Major

Remove site-packages monkey patch for lm_eval (Line 139).

Editing /usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py is fragile and environment-specific; examples/llm_eval/lm_eval_hf.py already warns when the version is not 0.4.8 instead of requiring file mutation.

🔧 Suggested fix

 !sed -i 's/"torch\\.bfloat16"/"bfloat16"/g' \
     /workspace/puzzle_dir/mip/puzzle_solutions/target_memory_130000MiB-num_params_7G/solutions--checkpoints/solution_0/config.json
 
-!sed -i '1s/^/__version__ = "0.4.8"\\n/' /usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py
-
 !cd /workspace/Model-Optimizer && \
 python examples/llm_eval/lm_eval_hf.py \

#!/bin/bash
set -euo pipefail

python - <<'PY'
import json
p = "examples/pruning/demo/scenario1_puzzletron.ipynb"
nb = json.load(open(p))
for i, c in enumerate(nb["cells"]):
    if c.get("cell_type") == "code":
        s = "".join(c.get("source", []))
        if "lm_eval/__init__.py" in s and "__version__" in s:
            print(f"{p} -> cell {i} contains lm_eval monkey patch:")
            print(s)
PY

echo
echo "Version-check behavior in lm_eval_hf.py:"
rg -n 'if not lm_eval\.__version__\.startswith\("0\.4\.8"\)|warnings\.warn' examples/llm_eval/lm_eval_hf.py -C 2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/pruning/demo/scenario1_puzzletron.ipynb` around lines 136 - 140,
Remove the fragile site-packages monkey-patch that inserts "__version__ =
\"0.4.8\"" into lm_eval/__init__.py; locate the notebook cell in
scenario1_puzzletron.ipynb that runs the sed command string "!sed -i
'1s/^/__version__ = \"0.4.8\"\\n/'
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py" and delete that
command (and any related sed edits for lm_eval), relying on the existing
version-check/warning in examples/llm_eval/lm_eval_hf.py instead.

🧹 Nitpick comments (1)

examples/pruning/demo/README.md (1)
170-173: Avoid token-in-CLI examples for authentication (Line 171).

Using --token <your_token> in docs encourages secrets ending up in shell history. Prefer interactive login or env-var based usage.
🔧 Suggested doc update
-hf auth login --token <your_token>
+hf auth login
+# or:
+# export HF_TOKEN=...
+# hf auth login --token "$HF_TOKEN"
As per coding guidelines, "Never hardcode secrets, credentials, tokens, passwords, or API keys in source code. Use environment variables or configuration files listed in .gitignore to store sensitive information."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning/demo/README.md` around lines 170 - 173, The README currently
shows a CLI example using the explicit flag "hf auth login --token
<your_token>", which risks exposing secrets; update the example to use an
interactive login or environment-variable approach instead (e.g., instruct users
to run "hf auth login" interactively or to set HF_TOKEN in their environment and
call "hf download Qwen/Qwen3-8B --local-dir /workspace/models/Qwen3-8B" without
embedding the token). Replace the inline token usage in the example and add a
short note advising storing tokens in environment variables or a .env/config
file excluded from VCS per the project's secret-handling guidelines.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/pruning/demo/README.md`:
- Line 5: The README has remaining markdownlint issues: fix the malformed
list/link "1.[ Introduction](`#1-introduction`)" by normalizing it to a proper
list or heading syntax, add or remove blank lines around headings and lists to
satisfy MD022 (ensure headings are surrounded by blank lines), collapse or
remove extra blank lines to address MD039, and ensure the file ends with a
single newline to resolve MD047; run markdownlint or your project's pre-commit
linter after updating README.md to confirm all warnings are cleared.

In `@examples/pruning/demo/scenario2_puzzletron.ipynb`:
- Around line 269-273: Remove the sed command that mutates
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py (the "!sed -i
'1s/^/__version__ = \"0.4.8\"\\n/' ..." cell) and any other notebook cell that
patches lm_eval; instead pin the package to lm-eval==0.4.8 in your environment
(requirements, pip install, or container image) so the version check in
examples/llm_eval/lm_eval_hf.py (the warning at line 47) can operate as
intended. Ensure no in-place edits to lm_eval/__init__.py remain in the
notebook.

---

Duplicate comments:
In `@examples/pruning/demo/scenario1_puzzletron.ipynb`:
- Around line 136-140: Remove the fragile site-packages monkey-patch that
inserts "__version__ = \"0.4.8\"" into lm_eval/__init__.py; locate the notebook
cell in scenario1_puzzletron.ipynb that runs the sed command string "!sed -i
'1s/^/__version__ = \"0.4.8\"\\n/'
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py" and delete that
command (and any related sed edits for lm_eval), relying on the existing
version-check/warning in examples/llm_eval/lm_eval_hf.py instead.

---

Nitpick comments:
In `@examples/pruning/demo/README.md`:
- Around line 170-173: The README currently shows a CLI example using the
explicit flag "hf auth login --token <your_token>", which risks exposing
secrets; update the example to use an interactive login or environment-variable
approach instead (e.g., instruct users to run "hf auth login" interactively or
to set HF_TOKEN in their environment and call "hf download Qwen/Qwen3-8B
--local-dir /workspace/models/Qwen3-8B" without embedding the token). Replace
the inline token usage in the example and add a short note advising storing
tokens in environment variables or a .env/config file excluded from VCS per the
project's secret-handling guidelines.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 67767531-5494-4153-9d9d-d17c53732c87

📥 Commits

Reviewing files that changed from the base of the PR and between c417e6f and 8403a1d.

⛔ Files ignored due to path filters (6)

examples/pruning/demo/all_curves_throughput_vs_latency.png is excluded by !**/*.png
examples/pruning/demo/distillation_curves.png is excluded by !**/*.png
examples/pruning/demo/distillation_loss_7B.png is excluded by !**/*.png
examples/pruning/demo/memory_sweep.png is excluded by !**/*.png
examples/pruning/demo/memory_sweep_combined.png is excluded by !**/*.png
examples/pruning/demo/summary_chart.png is excluded by !**/*.png

📒 Files selected for processing (7)

examples/pruning/demo/00_prerequisites.ipynb
examples/pruning/demo/README.md
examples/pruning/demo/advanced_compression_experiments.md
examples/pruning/demo/scenario1_minitron.ipynb
examples/pruning/demo/scenario1_puzzletron.ipynb
examples/pruning/demo/scenario2_minitron.ipynb
examples/pruning/demo/scenario2_puzzletron.ipynb

Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

examples/pruning/demo/README.md (1)

5-21: ⚠️ Potential issue | 🟡 Minor

Fix TOC list formatting for consistent Markdown rendering.

Line 5 and Line 21 are missing a space after the numeric list marker (1. / 10.), which breaks standard ordered-list formatting.

✏️ Suggested fix

-1.[Introduction](`#1-introduction`)
+1. [Introduction](`#1-introduction`)
...
-10.[References](`#10-references`)
+10. [References](`#10-references`)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/pruning/demo/README.md` around lines 5 - 21, The table-of-contents
ordered list items "1.[Introduction]" and "10.[References]" are missing a space
after the numeric marker which breaks Markdown rendering; edit the TOC entries
(look for the strings "1.[Introduction]" and "10.[References]") to insert a
space after the period (e.g., "1. [Introduction]" and "10. [References]") and
scan the other numbered list entries in that block to ensure all ordered markers
follow the same "N. Item" spacing for consistent Markdown formatting.

🧹 Nitpick comments (1)

examples/pruning/demo/README.md (1)
171-174: Avoid inline token patterns in auth instructions.

Using hf auth login --token <your_token> encourages token-in-command usage (shell history/log risk). Prefer interactive login or env-var-based token usage in docs.
🔐 Suggested tweak
-hf auth login --token <your_token>
+hf auth login
+# or (non-interactive):
+# hf auth login --token "$HF_TOKEN"
As per coding guidelines: “Never hardcode secrets, credentials, tokens, passwords, or API keys in source code. Use environment variables or configuration files listed in .gitignore to store sensitive information.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning/demo/README.md` around lines 171 - 174, Replace the
inline-token pattern shown in the README snippet ("hf auth login --token
<your_token>") with a secure alternative: remove examples that put tokens
directly in commands and instead instruct users to either use interactive login
(e.g., run the CLI without a token to be prompted) or set their token via an
environment variable (e.g., export HF_TOKEN=...) or a credentials file, then run
the download command ("hf download Qwen/Qwen3-8B --local-dir
/workspace/models/Qwen3-8B") normally; update the README guidance accordingly so
tokens are never shown inline or hardcoded.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/pruning/demo/scenario2_puzzletron.ipynb`:
- Around line 436-441: The cell currently pipes the sweep run through `tee` into
`grep "Puzzletron Progress"`, coupling cell exit status to the grep match;
instead run the sweep command and capture full output to
/workspace/puzzletron_sweep.log via `tee` without piping into `grep`, then
perform any `grep "Puzzletron Progress"` as a separate non-blocking step (or as
a follow-up cell) so the exit status reflects the actual run of
examples/puzzletron/main.py (the --mip-only invocation using the
qwen3_8b_pruneffn_memory config) and not whether the log contained the string.

---

Duplicate comments:
In `@examples/pruning/demo/README.md`:
- Around line 5-21: The table-of-contents ordered list items "1.[Introduction]"
and "10.[References]" are missing a space after the numeric marker which breaks
Markdown rendering; edit the TOC entries (look for the strings
"1.[Introduction]" and "10.[References]") to insert a space after the period
(e.g., "1. [Introduction]" and "10. [References]") and scan the other numbered
list entries in that block to ensure all ordered markers follow the same "N.
Item" spacing for consistent Markdown formatting.

---

Nitpick comments:
In `@examples/pruning/demo/README.md`:
- Around line 171-174: Replace the inline-token pattern shown in the README
snippet ("hf auth login --token <your_token>") with a secure alternative: remove
examples that put tokens directly in commands and instead instruct users to
either use interactive login (e.g., run the CLI without a token to be prompted)
or set their token via an environment variable (e.g., export HF_TOKEN=...) or a
credentials file, then run the download command ("hf download Qwen/Qwen3-8B
--local-dir /workspace/models/Qwen3-8B") normally; update the README guidance
accordingly so tokens are never shown inline or hardcoded.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b892f30b-4096-418c-ae84-79b6420e9dc3

📥 Commits

Reviewing files that changed from the base of the PR and between d1169d3 and 8ee6a65.

📒 Files selected for processing (7)

examples/pruning/demo/00_prerequisites.ipynb
examples/pruning/demo/README.md
examples/pruning/demo/advanced_compression_experiments.md
examples/pruning/demo/scenario1_minitron.ipynb
examples/pruning/demo/scenario1_puzzletron.ipynb
examples/pruning/demo/scenario2_minitron.ipynb
examples/pruning/demo/scenario2_puzzletron.ipynb

✅ Files skipped from review due to trivial changes (1)

examples/pruning/demo/00_prerequisites.ipynb

achidiac-nv added 2 commits April 22, 2026 12:03

add pruning_demo (Puzzletron and Minitron guide) in Model-Optimizer/e…

4b38b45

…xamples/ with README and notebooks Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>

update summary_chart.png with new version

be38bfe

Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>

achidiac-nv assigned danielkorzekwa and kevalmorabia97 Apr 22, 2026

achidiac-nv requested a review from a team as a code owner April 22, 2026 15:30

achidiac-nv assigned j-rausch Apr 22, 2026

achidiac-nv requested a review from realAsma April 22, 2026 15:30

achidiac-nv assigned LianaMikael Apr 22, 2026

achidiac-nv added the documentation Improvements or additions to documentation label Apr 22, 2026

Merge branch 'main' into achidiac/pruning_demo

beced98

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

kevalmorabia97 requested review from LianaMikael, danielkorzekwa, j-rausch and kevalmorabia97 and removed request for realAsma April 22, 2026 15:42

kevalmorabia97 unassigned danielkorzekwa, kevalmorabia97, j-rausch and LianaMikael Apr 22, 2026

achidiac-nv self-assigned this Apr 22, 2026

kevalmorabia97 reviewed Apr 22, 2026

View reviewed changes

Comment thread examples/pruning/demo/00_prerequisites.ipynb

refactor: move pruning_demo to examples/pruning/demo and fix markdown…

8403a1d

…lint issues Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>

achidiac-nv requested a review from a team as a code owner April 22, 2026 16:49

achidiac-nv requested a review from kevalmorabia97 April 22, 2026 16:49

achidiac-nv changed the title ~~Add pruning_demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/ with README and notebooks~~ Add pruning/demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/ with README and notebooks Apr 22, 2026

achidiac-nv changed the title ~~Add pruning/demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/ with README and notebooks~~ Add demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/pruning/ with README and notebooks Apr 22, 2026

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread examples/pruning/demo/README.md Outdated

Comment thread examples/pruning/demo/scenario2_puzzletron.ipynb Outdated

achidiac-nv added 2 commits April 22, 2026 19:59

fix: apply pre-commit auto-fixes

d1169d3

Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>

Merge branch 'main' into achidiac/pruning_demo

9a9f7f4

LianaMikael reviewed Apr 23, 2026

View reviewed changes

achidiac-nv added 2 commits April 27, 2026 12:13

address Liana's comments

289c416

Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>

Merge branch 'main' into achidiac/pruning_demo

8ee6a65

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread examples/pruning/demo/scenario2_puzzletron.ipynb

Conversation

achidiac-nv commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

achidiac-nv commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

codecov Bot commented Apr 22, 2026 •

edited

Loading