Add demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/pruning/ with README and notebooks#1320
Add demo (Puzzletron and Minitron guide) in Model-Optimizer/examples/pruning/ with README and notebooks#1320achidiac-nv wants to merge 8 commits intomainfrom
Conversation
…xamples/ with README and notebooks Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
📝 WalkthroughWalkthroughAdds a new pruning-and-distillation demo set: a prerequisites notebook, multiple end-to-end scenario notebooks for Minitron and Puzzletron, and detailed documentation pages describing experiments, benchmarking, distillation, and deployment/benchmark instructions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
examples/pruning_demo/README.md (3)
683-685: Add trailing newline at end of file.Per markdown conventions (MD047), files should end with a single newline character.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning_demo/README.md` around lines 683 - 685, The README.md file is missing a trailing newline; add a single newline character at the end of the file (after the final line containing the note about Minitron models and baseline) so the file ends with exactly one newline to satisfy MD047.
320-357: Add language identifier to fenced code block.The architecture details code block lacks a language specification, which affects syntax highlighting and accessibility.
📝 Suggested fix
-``` +```text block_0: attention kv_heads_8 ffn intermediate_12288🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning_demo/README.md` around lines 320 - 357, The fenced code block that starts with the line "block_0: attention kv_heads_8 ffn intermediate_12288" should include a language identifier to enable proper highlighting; update the opening triple-backtick for that block (the block showing block_0...block_35) to use a language token such as ```text (or ```plain) so the README's architecture details code block is marked correctly.
5-21: Fix markdown formatting issues in Table of Contents.Lines 5 and 21 have spaces inside link text which violates markdown best practices:
📝 Suggested fix
-1.[ Introduction](`#1-introduction`) +1. [Introduction](`#1-introduction`) -10.[ References](`#10-references`) +10. [References](`#10-references`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning_demo/README.md` around lines 5 - 21, Fix the markdown link text spacing in the Table of Contents by removing the extra spaces inside the square brackets for the affected entries: change "1.[ Introduction](`#1-introduction`)" to "1. [Introduction](`#1-introduction`)" and "10.[ References](`#10-references`)" to "10. [References](`#10-references`) so the link text has no leading/trailing spaces and spacing after the list number is consistent; verify similar entries follow the same "N. [Text](`#anchor`)" pattern.examples/pruning_demo/00_prerequisites.ipynb (1)
37-41: Hardcoded Python version in path may break on different container versions.The path
/opt/venv/lib/python3.12/site-packages/modeloptassumes Python 3.12. If the container or environment uses a different Python version, this will silently fail to replace the modelopt installation.🔧 Suggested improvement using dynamic Python version
-!rm -rf /opt/venv/lib/python3.12/site-packages/modelopt -!cp -r /workspace/Model-Optimizer/modelopt /opt/venv/lib/python3.12/site-packages/modelopt +import sys +site_packages = f"/opt/venv/lib/python{sys.version_info.major}.{sys.version_info.minor}/site-packages" +!rm -rf {site_packages}/modelopt +!cp -r /workspace/Model-Optimizer/modelopt {site_packages}/modelopt !mkdir -p /workspace/datasets🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning_demo/00_prerequisites.ipynb` around lines 37 - 41, The notebook currently hardcodes /opt/venv/lib/python3.12/site-packages/modelopt which will fail for other Python versions; replace the three shell commands with dynamic site-packages detection (e.g. set PY_SITE=$(python -c 'import sysconfig; print(sysconfig.get_paths()[\"purelib\"])') ) and then run rm -rf "$PY_SITE/modelopt" and cp -r /workspace/Model-Optimizer/modelopt "$PY_SITE"/ and keep the mkdir -p /workspace/datasets line—update the cell that contains the rm, cp and mkdir commands to use the PY_SITE variable instead of the 3.12 hardcoded path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/pruning_demo/scenario1_minitron.ipynb`:
- Around line 193-195: The cell calls subprocess.run("pkill -f tensorboard") but
does not import subprocess in that cell; add an import for subprocess (e.g.,
import subprocess) before the subprocess.run call or use from subprocess import
run and call run(...) so subprocess.run is defined (refer to the subprocess.run
invocation to locate the call and add the import in the same notebook cell).
In `@examples/pruning_demo/scenario1_puzzletron.ipynb`:
- Around line 231-233: The cell calls subprocess.run(["pkill", "-f",
"tensorboard"]) but does not import subprocess locally, which can raise
NameError if cells are run out of order; add an explicit import subprocess at
the top of the same notebook cell (or the cell immediately above) where
subprocess.run is used so the call in that cell always has the subprocess symbol
available.
- Around line 135-148: Remove the monkey-patch that prepends __version__ =
"0.4.8" into the installed lm_eval/__init__.py (the sed command that writes that
line); instead rely on the existing version-check/warning logic in
examples/llm_eval/lm_eval_hf.py (lines handling version mismatch) or document a
specific lm_eval prerequisite, and if the dtype issue remains, fix the exported
config file in the workspace (the sed that replaces "torch.bfloat16" ->
"bfloat16" in the solution config) or correct the Puzzletron export upstream
rather than editing site-packages.
In `@examples/pruning_demo/scenario2_minitron.ipynb`:
- Around line 167-169: The cell uses subprocess.run(["pkill", "-f",
"tensorboard"]) but doesn't import subprocess locally; add an explicit import
subprocess at the top of this cell (or merge the TensorBoard start/stop logic
into one cell) so subprocess.run is always defined even if cells are executed
out of order; ensure the import appears before the subprocess.run call to avoid
NameError.
In `@examples/pruning_demo/scenario2_puzzletron.ipynb`:
- Around line 364-366: The cell calls subprocess.run([ "pkill", "-f",
"tensorboard" ]) but never imports the subprocess module; add an import
subprocess statement (e.g., at the top of this cell or the notebook) so
subprocess.run is defined, and apply the same fix to all scenario notebooks that
use subprocess.run to keep them consistent.
---
Nitpick comments:
In `@examples/pruning_demo/00_prerequisites.ipynb`:
- Around line 37-41: The notebook currently hardcodes
/opt/venv/lib/python3.12/site-packages/modelopt which will fail for other Python
versions; replace the three shell commands with dynamic site-packages detection
(e.g. set PY_SITE=$(python -c 'import sysconfig;
print(sysconfig.get_paths()[\"purelib\"])') ) and then run rm -rf
"$PY_SITE/modelopt" and cp -r /workspace/Model-Optimizer/modelopt "$PY_SITE"/
and keep the mkdir -p /workspace/datasets line—update the cell that contains the
rm, cp and mkdir commands to use the PY_SITE variable instead of the 3.12
hardcoded path.
In `@examples/pruning_demo/README.md`:
- Around line 683-685: The README.md file is missing a trailing newline; add a
single newline character at the end of the file (after the final line containing
the note about Minitron models and baseline) so the file ends with exactly one
newline to satisfy MD047.
- Around line 320-357: The fenced code block that starts with the line "block_0:
attention kv_heads_8 ffn intermediate_12288" should include a language
identifier to enable proper highlighting; update the opening triple-backtick for
that block (the block showing block_0...block_35) to use a language token such
as ```text (or ```plain) so the README's architecture details code block is
marked correctly.
- Around line 5-21: Fix the markdown link text spacing in the Table of Contents
by removing the extra spaces inside the square brackets for the affected
entries: change "1.[ Introduction](`#1-introduction`)" to "1.
[Introduction](`#1-introduction`)" and "10.[ References](`#10-references`)" to "10.
[References](`#10-references`) so the link text has no leading/trailing spaces and
spacing after the list number is consistent; verify similar entries follow the
same "N. [Text](`#anchor`)" pattern.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 0c466b7c-b784-496d-802c-dfe00cad04ea
⛔ Files ignored due to path filters (6)
examples/pruning_demo/all_curves_throughput_vs_latency.pngis excluded by!**/*.pngexamples/pruning_demo/distillation_curves.pngis excluded by!**/*.pngexamples/pruning_demo/distillation_loss_7B.pngis excluded by!**/*.pngexamples/pruning_demo/memory_sweep.pngis excluded by!**/*.pngexamples/pruning_demo/memory_sweep_combined.pngis excluded by!**/*.pngexamples/pruning_demo/summary_chart.pngis excluded by!**/*.png
📒 Files selected for processing (7)
examples/pruning_demo/00_prerequisites.ipynbexamples/pruning_demo/README.mdexamples/pruning_demo/advanced_compression_experiments.mdexamples/pruning_demo/scenario1_minitron.ipynbexamples/pruning_demo/scenario1_puzzletron.ipynbexamples/pruning_demo/scenario2_minitron.ipynbexamples/pruning_demo/scenario2_puzzletron.ipynb
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1320 +/- ##
==========================================
- Coverage 75.72% 75.25% -0.48%
==========================================
Files 471 484 +13
Lines 50375 51389 +1014
==========================================
+ Hits 38146 38671 +525
- Misses 12229 12718 +489
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…lint issues Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
examples/pruning/demo/scenario1_puzzletron.ipynb (1)
136-140:⚠️ Potential issue | 🟠 MajorRemove site-packages monkey patch for
lm_eval(Line 139).Editing
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.pyis fragile and environment-specific;examples/llm_eval/lm_eval_hf.pyalready warns when the version is not0.4.8instead of requiring file mutation.🔧 Suggested fix
!sed -i 's/"torch\\.bfloat16"/"bfloat16"/g' \ /workspace/puzzle_dir/mip/puzzle_solutions/target_memory_130000MiB-num_params_7G/solutions--checkpoints/solution_0/config.json -!sed -i '1s/^/__version__ = "0.4.8"\\n/' /usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py - !cd /workspace/Model-Optimizer && \ python examples/llm_eval/lm_eval_hf.py \#!/bin/bash set -euo pipefail python - <<'PY' import json p = "examples/pruning/demo/scenario1_puzzletron.ipynb" nb = json.load(open(p)) for i, c in enumerate(nb["cells"]): if c.get("cell_type") == "code": s = "".join(c.get("source", [])) if "lm_eval/__init__.py" in s and "__version__" in s: print(f"{p} -> cell {i} contains lm_eval monkey patch:") print(s) PY echo echo "Version-check behavior in lm_eval_hf.py:" rg -n 'if not lm_eval\.__version__\.startswith\("0\.4\.8"\)|warnings\.warn' examples/llm_eval/lm_eval_hf.py -C 2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning/demo/scenario1_puzzletron.ipynb` around lines 136 - 140, Remove the fragile site-packages monkey-patch that inserts "__version__ = \"0.4.8\"" into lm_eval/__init__.py; locate the notebook cell in scenario1_puzzletron.ipynb that runs the sed command string "!sed -i '1s/^/__version__ = \"0.4.8\"\\n/' /usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py" and delete that command (and any related sed edits for lm_eval), relying on the existing version-check/warning in examples/llm_eval/lm_eval_hf.py instead.
🧹 Nitpick comments (1)
examples/pruning/demo/README.md (1)
170-173: Avoid token-in-CLI examples for authentication (Line 171).Using
--token <your_token>in docs encourages secrets ending up in shell history. Prefer interactive login or env-var based usage.🔧 Suggested doc update
-hf auth login --token <your_token> +hf auth login +# or: +# export HF_TOKEN=... +# hf auth login --token "$HF_TOKEN"As per coding guidelines, "Never hardcode secrets, credentials, tokens, passwords, or API keys in source code. Use environment variables or configuration files listed in
.gitignoreto store sensitive information."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning/demo/README.md` around lines 170 - 173, The README currently shows a CLI example using the explicit flag "hf auth login --token <your_token>", which risks exposing secrets; update the example to use an interactive login or environment-variable approach instead (e.g., instruct users to run "hf auth login" interactively or to set HF_TOKEN in their environment and call "hf download Qwen/Qwen3-8B --local-dir /workspace/models/Qwen3-8B" without embedding the token). Replace the inline token usage in the example and add a short note advising storing tokens in environment variables or a .env/config file excluded from VCS per the project's secret-handling guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/pruning/demo/README.md`:
- Line 5: The README has remaining markdownlint issues: fix the malformed
list/link "1.[ Introduction](`#1-introduction`)" by normalizing it to a proper
list or heading syntax, add or remove blank lines around headings and lists to
satisfy MD022 (ensure headings are surrounded by blank lines), collapse or
remove extra blank lines to address MD039, and ensure the file ends with a
single newline to resolve MD047; run markdownlint or your project's pre-commit
linter after updating README.md to confirm all warnings are cleared.
In `@examples/pruning/demo/scenario2_puzzletron.ipynb`:
- Around line 269-273: Remove the sed command that mutates
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py (the "!sed -i
'1s/^/__version__ = \"0.4.8\"\\n/' ..." cell) and any other notebook cell that
patches lm_eval; instead pin the package to lm-eval==0.4.8 in your environment
(requirements, pip install, or container image) so the version check in
examples/llm_eval/lm_eval_hf.py (the warning at line 47) can operate as
intended. Ensure no in-place edits to lm_eval/__init__.py remain in the
notebook.
---
Duplicate comments:
In `@examples/pruning/demo/scenario1_puzzletron.ipynb`:
- Around line 136-140: Remove the fragile site-packages monkey-patch that
inserts "__version__ = \"0.4.8\"" into lm_eval/__init__.py; locate the notebook
cell in scenario1_puzzletron.ipynb that runs the sed command string "!sed -i
'1s/^/__version__ = \"0.4.8\"\\n/'
/usr/local/lib/python3.12/dist-packages/lm_eval/__init__.py" and delete that
command (and any related sed edits for lm_eval), relying on the existing
version-check/warning in examples/llm_eval/lm_eval_hf.py instead.
---
Nitpick comments:
In `@examples/pruning/demo/README.md`:
- Around line 170-173: The README currently shows a CLI example using the
explicit flag "hf auth login --token <your_token>", which risks exposing
secrets; update the example to use an interactive login or environment-variable
approach instead (e.g., instruct users to run "hf auth login" interactively or
to set HF_TOKEN in their environment and call "hf download Qwen/Qwen3-8B
--local-dir /workspace/models/Qwen3-8B" without embedding the token). Replace
the inline token usage in the example and add a short note advising storing
tokens in environment variables or a .env/config file excluded from VCS per the
project's secret-handling guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 67767531-5494-4153-9d9d-d17c53732c87
⛔ Files ignored due to path filters (6)
examples/pruning/demo/all_curves_throughput_vs_latency.pngis excluded by!**/*.pngexamples/pruning/demo/distillation_curves.pngis excluded by!**/*.pngexamples/pruning/demo/distillation_loss_7B.pngis excluded by!**/*.pngexamples/pruning/demo/memory_sweep.pngis excluded by!**/*.pngexamples/pruning/demo/memory_sweep_combined.pngis excluded by!**/*.pngexamples/pruning/demo/summary_chart.pngis excluded by!**/*.png
📒 Files selected for processing (7)
examples/pruning/demo/00_prerequisites.ipynbexamples/pruning/demo/README.mdexamples/pruning/demo/advanced_compression_experiments.mdexamples/pruning/demo/scenario1_minitron.ipynbexamples/pruning/demo/scenario1_puzzletron.ipynbexamples/pruning/demo/scenario2_minitron.ipynbexamples/pruning/demo/scenario2_puzzletron.ipynb
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
Signed-off-by: Alexandre Chidiac <achidiac@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
examples/pruning/demo/README.md (1)
5-21:⚠️ Potential issue | 🟡 MinorFix TOC list formatting for consistent Markdown rendering.
Line 5 and Line 21 are missing a space after the numeric list marker (
1./10.), which breaks standard ordered-list formatting.✏️ Suggested fix
-1.[Introduction](`#1-introduction`) +1. [Introduction](`#1-introduction`) ... -10.[References](`#10-references`) +10. [References](`#10-references`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning/demo/README.md` around lines 5 - 21, The table-of-contents ordered list items "1.[Introduction]" and "10.[References]" are missing a space after the numeric marker which breaks Markdown rendering; edit the TOC entries (look for the strings "1.[Introduction]" and "10.[References]") to insert a space after the period (e.g., "1. [Introduction]" and "10. [References]") and scan the other numbered list entries in that block to ensure all ordered markers follow the same "N. Item" spacing for consistent Markdown formatting.
🧹 Nitpick comments (1)
examples/pruning/demo/README.md (1)
171-174: Avoid inline token patterns in auth instructions.Using
hf auth login --token <your_token>encourages token-in-command usage (shell history/log risk). Prefer interactive login or env-var-based token usage in docs.🔐 Suggested tweak
-hf auth login --token <your_token> +hf auth login +# or (non-interactive): +# hf auth login --token "$HF_TOKEN"As per coding guidelines: “Never hardcode secrets, credentials, tokens, passwords, or API keys in source code. Use environment variables or configuration files listed in
.gitignoreto store sensitive information.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning/demo/README.md` around lines 171 - 174, Replace the inline-token pattern shown in the README snippet ("hf auth login --token <your_token>") with a secure alternative: remove examples that put tokens directly in commands and instead instruct users to either use interactive login (e.g., run the CLI without a token to be prompted) or set their token via an environment variable (e.g., export HF_TOKEN=...) or a credentials file, then run the download command ("hf download Qwen/Qwen3-8B --local-dir /workspace/models/Qwen3-8B") normally; update the README guidance accordingly so tokens are never shown inline or hardcoded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/pruning/demo/scenario2_puzzletron.ipynb`:
- Around line 436-441: The cell currently pipes the sweep run through `tee` into
`grep "Puzzletron Progress"`, coupling cell exit status to the grep match;
instead run the sweep command and capture full output to
/workspace/puzzletron_sweep.log via `tee` without piping into `grep`, then
perform any `grep "Puzzletron Progress"` as a separate non-blocking step (or as
a follow-up cell) so the exit status reflects the actual run of
examples/puzzletron/main.py (the --mip-only invocation using the
qwen3_8b_pruneffn_memory config) and not whether the log contained the string.
---
Duplicate comments:
In `@examples/pruning/demo/README.md`:
- Around line 5-21: The table-of-contents ordered list items "1.[Introduction]"
and "10.[References]" are missing a space after the numeric marker which breaks
Markdown rendering; edit the TOC entries (look for the strings
"1.[Introduction]" and "10.[References]") to insert a space after the period
(e.g., "1. [Introduction]" and "10. [References]") and scan the other numbered
list entries in that block to ensure all ordered markers follow the same "N.
Item" spacing for consistent Markdown formatting.
---
Nitpick comments:
In `@examples/pruning/demo/README.md`:
- Around line 171-174: Replace the inline-token pattern shown in the README
snippet ("hf auth login --token <your_token>") with a secure alternative: remove
examples that put tokens directly in commands and instead instruct users to
either use interactive login (e.g., run the CLI without a token to be prompted)
or set their token via an environment variable (e.g., export HF_TOKEN=...) or a
credentials file, then run the download command ("hf download Qwen/Qwen3-8B
--local-dir /workspace/models/Qwen3-8B") normally; update the README guidance
accordingly so tokens are never shown inline or hardcoded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b892f30b-4096-418c-ae84-79b6420e9dc3
📒 Files selected for processing (7)
examples/pruning/demo/00_prerequisites.ipynbexamples/pruning/demo/README.mdexamples/pruning/demo/advanced_compression_experiments.mdexamples/pruning/demo/scenario1_minitron.ipynbexamples/pruning/demo/scenario1_puzzletron.ipynbexamples/pruning/demo/scenario2_minitron.ipynbexamples/pruning/demo/scenario2_puzzletron.ipynb
✅ Files skipped from review due to trivial changes (1)
- examples/pruning/demo/00_prerequisites.ipynb
What does this PR do?
Type of change: new documentation/example (tutorial + notebooks)
Adds an end-to-end pruning & distillation guide under
examples/pruning_demo/, walking users through structural compression of Qwen3-8B with NVIDIA Model-Optimizer.The example compares two methods side-by-side on two concrete scenarios:
Both scenarios are followed by knowledge distillation and evaluated on MMLU (end-to-end in the notebooks) plus HellaSwag and GSM8K (reported in the guide).
Contents:
README.md— full guide (setup, two scenarios, head-to-head analysis, inference benchmarks with vLLM + AIPerf, decision rules, limitations, open questions).00_prerequisites.ipynb— data prep (WikiText-103 → Megatron binary) and teacher baseline evaluation.scenario1_minitron.ipynb/scenario1_puzzletron.ipynb— 7B-param target.scenario2_minitron.ipynb/scenario2_puzzletron.ipynb— 78k-MiB target, including a Puzzletron memory-sweep bonus section.advanced_compression_experiments.md— extended results (larger distillation budgets with Nemotron-Post-Training-Dataset-v2, BLD, chained Minitron→Puzzletron, Mamba-Transformer hybrid).summary_chart.png,distillation_curves.png,memory_sweep_combined.png,all_curves_throughput_vs_latency.png, ...).Usage
Follow setup instructions in README.md then run, in order:
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/A — no new runtime dependencies; the notebooks use lm-eval==0.4.8 and the existing ModelOpt/NeMo stack. The vLLM serving appendix references an open PR ([Model] Add AnyModel: generic support for NAS-optimized heterogeneous architectures vllm-project/vllm#36512) for Puzzletron AnyModel support, clearly flagged as pre-release.Additional Information
Summary by CodeRabbit