diff --git a/.agents/skills/puzzletron/README.md b/.agents/skills/puzzletron/README.md new file mode 100644 index 00000000000..9ee8ed9c6c0 --- /dev/null +++ b/.agents/skills/puzzletron/README.md @@ -0,0 +1,93 @@ +# Puzzletron Agent Skill + +Puzzletron is an end-to-end workflow for model pruning and MIP-based architecture optimization. +This skill exposes it as a slash command for AI coding agents. + +For full environment setup, model configuration, and algorithm details see +[examples/puzzletron/README.md](../../examples/puzzletron/README.md). + +> **Experimental:** AI agent integration is an experimental feature and may change. + +Run `/puzzletron` with no arguments to see available commands. + +## Running the MIP step + +Start the MIP step by telling the agent how many GPUs per node to use: + +```text +/puzzletron mip 4 +``` + +Output is streamed live and also written to `./log.txt`. While it runs (or after it finishes), +check progress with: + +```text +/puzzletron mip progress +``` + +Example output when complete: + +```text +Overall: Puzzletron step 7/8 — MIP sweep (6 compression rates) +────────────────────────────────────────────────────────────── + Status Phase Elapsed +────────────────────────────────────────────────────────────── + [DONE] Prep (teacher memory + rate list) <1s + [DONE] compression_rate=0.5 3m 52s + [DONE] compression_rate=0.6 4m 41s + [DONE] compression_rate=0.7 4m 46s + [DONE] compression_rate=0.8 3m 55s + [DONE] compression_rate=0.9 3m 55s + [DONE] compression_rate=1.0 3m 59s +────────────────────────────────────────────────────────────── + Started: 08:05:30 + Finished: 08:30:38 + Elapsed: 25m 8s + Completed: 6/6 compression rates + Remaining: done estimated + + Results: /workspace/puzzle_dir/mip_sweep_results.csv +``` + +While running, the report shows which rate is active, sub-step detail (MIP solver node count +or validation batch progress), and an estimated time remaining based on completed rates. + +## Running the full pipeline + +To run all 8 pipeline steps (not just the MIP sweep): + +```text +/puzzletron all 2 +``` + +Check progress with: + +```text +/puzzletron all progress +``` + +Example output while running: + +```text +Overall: Puzzletron full pipeline (steps 1–8) +──────────────────────────────────────────────────────────────────── + Status Step Description Elapsed +──────────────────────────────────────────────────────────────────── + [DONE] 1/8: starting puzzletron pipeline 0m 0s + [DONE] 2/8: converting model to Puzzletron heterogeneous format (single-gpu) 0m 26s + [DONE] 3/8: scoring pruning activations (multi-gpu) 9m 9s + [DONE] 4/8: pruning the model and saving pruned checkpoints (single-gpu) 0m 57s + [DONE] 5/8: building replacement library and subblock statistics (single-gpu) 0m 26s + [RUNNING] 6/8: calculating one block scores (multi-gpu) (270/352 solutions) 100m 6s + [ ] 7/8: pending + [ ] 8/8: pending +──────────────────────────────────────────────────────────────────── + Started: 00:08:50 + Finished: 01:59:54 (in progress) + Elapsed: 111m 4s + Completed: 5/8 steps + Remaining: 56m 24s estimated +``` + +Step 6 progress is tracked via completed `solution_N.json` files on disk for an accurate +remaining estimate. Step 7 (MIP sweep) shows per-rate progress once it starts. diff --git a/.agents/skills/puzzletron/SKILL.md b/.agents/skills/puzzletron/SKILL.md new file mode 100644 index 00000000000..803cd5bf816 --- /dev/null +++ b/.agents/skills/puzzletron/SKILL.md @@ -0,0 +1,94 @@ +--- +name: puzzletron +description: "End-to-end workflow for model pruning and MIP-based optimization. Commands: mip, all. Usage: /puzzletron " +license: Apache-2.0 +--- + +# Puzzletron + +## Routing + +**STEP 1 — Check args before doing anything else. This is MANDATORY.** + +- If args are **empty**, output the block below verbatim and **STOP immediately. Do NOT proceed to any command.** +- If the first word of args does **not exactly match** `mip` or `all`, output the block below verbatim and **STOP immediately. Do NOT proceed to any command.** + +--- + +**Puzzletron** — end-to-end workflow for model pruning and MIP-based optimization. + +Available commands: +- `mip ` — Run the MIP step (nproc_per_node: number of GPUs per node) +- `mip progress` — Show live MIP progress with timing summary +- `all ` — Run the full Puzzletron pipeline (nproc_per_node: number of GPUs per node) +- `all progress` — Show live full pipeline progress with timing summary + +Usage: `/puzzletron [args]` + +--- + +**STEP 2 — Only if the first word of args exactly matches a command name, execute it. Never reach this step if args were empty.** + +## Command: all + +Parse `nproc_per_node` from args using either positional or flag syntax: +- Positional: second word is a number, e.g. `all 2` +- Flag: `--nproc_per_node ` anywhere in args, e.g. `all --nproc_per_node 2` + +- If the second word is exactly `progress`, execute the **all progress** sub-command below. +- If no `nproc_per_node` value can be found, ask the user: "Please provide the number of GPUs per node (nproc_per_node)." and **STOP**. +- If the value does not match `^[0-9]+$`, ask the user: "nproc_per_node must be a positive integer." and **STOP**. +- Otherwise use the parsed value and run the full pipeline. + +### all \ + +Run the following Bash command, substituting `` with the parsed value: + +```bash +set -o pipefail && export PYTHONPATH=$PYTHONPATH:/workspace/Model-Optimizer && \ +torchrun --nproc_per_node examples/puzzletron/main.py \ + --config examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml \ + 2>&1 | tee ./log.txt | grep "Puzzletron Progress" +``` + +Stream output to the user as it arrives. When the command finishes, report the exit code. + +### all progress + +Run the following Bash command. Present the output to the user wrapped in a fenced code block (``` ... ```). + +```bash +python3 .agents/skills/puzzletron/all_progress.py +``` + +## Command: mip + +Parse `nproc_per_node` from args using either positional or flag syntax: +- Positional: second word is a number, e.g. `mip 2` +- Flag: `--nproc_per_node ` anywhere in args, e.g. `mip --nproc_per_node 2` + +- If the second word is exactly `progress`, execute the **mip progress** sub-command below. +- If no `nproc_per_node` value can be found, ask the user: "Please provide the number of GPUs per node (nproc_per_node)." and **STOP**. +- If the value does not match `^[0-9]+$`, ask the user: "nproc_per_node must be a positive integer." and **STOP**. +- Otherwise use the parsed value and run the MIP step. + +### mip \ + +Run the following Bash command, substituting `` with the parsed value: + +```bash +set -o pipefail && export PYTHONPATH=$PYTHONPATH:/workspace/Model-Optimizer && \ +torchrun --nproc_per_node examples/puzzletron/main.py \ + --config examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml \ + --mip-only 2>&1 | tee ./log.txt | grep "Puzzletron Progress" +``` + +Stream output to the user as it arrives. When the command finishes, report the exit code. + +### mip progress + +Run the following Bash command. Present the output to the user wrapped in a fenced code block (``` ... ```). + +```bash +python3 .agents/skills/puzzletron/mip_progress.py +``` diff --git a/.agents/skills/puzzletron/all_progress.py b/.agents/skills/puzzletron/all_progress.py new file mode 100644 index 00000000000..3db4b033e6b --- /dev/null +++ b/.agents/skills/puzzletron/all_progress.py @@ -0,0 +1,174 @@ +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Generated with Claude Code +"""Progress report for the full Puzzletron pipeline (all 8 steps).""" + +import glob +import re +import sys +from datetime import datetime + +LOG = "./log.txt" +try: + lines = open(LOG).readlines() + text = "".join(lines) +except FileNotFoundError: + print("No log.txt found. Run /puzzletron all first.") + sys.exit(0) + + +def fmt(s): + """Format seconds as 'Xm Ys', or '—' if None.""" + return f"{int(s) // 60}m {int(s) % 60}s" if s is not None else "—" + + +def get_ts(line): + """Extract a datetime from a log line timestamp, or None.""" + m = re.search(r"\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})", line) + return datetime.strptime(m.group(1), "%Y-%m-%d %H:%M:%S") if m else None + + +now = datetime.now().replace(microsecond=0) +DIV = "─" * 68 + +step_events = [] +for line in lines: + m = re.search(r"Puzzletron Progress (\d+)/(\d+): (.+)", line) + if m: + step_num = int(m.group(1)) + total_steps = int(m.group(2)) + desc = m.group(3).strip() + ts = get_ts(line) + step_events.append((step_num, total_steps, desc, ts)) + +total_steps = step_events[-1][1] if step_events else 8 +seen_steps = {e[0]: (e[2], e[3]) for e in step_events} +last_step_num = max(seen_steps.keys()) if seen_steps else 0 + +pipeline_complete_ts = None +if last_step_num == total_steps and total_steps in seen_steps: + pipeline_complete_ts = seen_steps[total_steps][1] + +cur_detail = "" +step_remaining = None +batch_matches = re.findall(r"calculate_losses_pipeline[^:]*:\s*(\d+)%.*?(\d+)/(\d+)", text) +cbc_matches = re.findall(r"After (\d+) nodes.*?\(([\d.]+) seconds\)", text) + +sol_dir_match = re.search( + r"'output_dir': '([^']+single_sequence_replacement_solutions--validation[^']*)'", text +) +sol_done, sol_total = None, None +if sol_dir_match: + sol_dir = sol_dir_match.group(1) + sol_done = len(glob.glob(f"{sol_dir}/solution*.json")) + sol_list_match = re.search(r"'solutions_to_validate': \[([\d, ]+)\]", text) + if sol_list_match: + sol_total = len(sol_list_match.group(1).split(",")) +pct, cur_b, total_b = batch_matches[-1] if batch_matches else (None, None, None) +if sol_done is not None and sol_total: + cur_detail = f" ({sol_done}/{sol_total} solutions)" +elif batch_matches: + cur_detail = f" ({cur_b}/{total_b} batches)" +elif cbc_matches: + nodes, secs = cbc_matches[-1] + cur_detail = f" (MIP solver: {int(nodes):,} nodes, {float(secs):.1f}s)" + +pipeline_start = step_events[0][3] if step_events else None +end_ts = pipeline_complete_ts or now +total_elapsed = int((end_ts - pipeline_start).total_seconds()) if pipeline_start else 0 + +step_ts_list = sorted(seen_steps.items()) +cur_step_start_ts = seen_steps[last_step_num][1] if last_step_num in seen_steps else None +if not pipeline_complete_ts and cur_step_start_ts: + cur_step_elapsed = int((now - cur_step_start_ts).total_seconds()) + if sol_done and sol_total and sol_done > 0: + rate_per_sol = cur_step_elapsed / sol_done + step_remaining = rate_per_sol * (sol_total - sol_done) + elif cur_b is not None and total_b is not None and int(cur_b) > 0 and int(cur_b) < int(total_b): + rate_per_batch = cur_step_elapsed / int(cur_b) + step_remaining = rate_per_batch * (int(total_b) - int(cur_b)) + +print(f"\nOverall: Puzzletron full pipeline (steps 1–{total_steps})") # noqa: RUF001 +print(DIV) +print(f" {'Status':<10} {'Step':<4} {'Description':<34} {'Elapsed':>8}") +print(DIV) + +for i, (snum, (sdesc, sts)) in enumerate(step_ts_list): + next_ts = ( + step_ts_list[i + 1][1][1] if i + 1 < len(step_ts_list) else (pipeline_complete_ts or now) + ) + elapsed = int((next_ts - sts).total_seconds()) if sts and next_ts else None + is_last = snum == last_step_num + is_done = not is_last or pipeline_complete_ts is not None + detail = "" + if is_last and not is_done: + detail = cur_detail + label = f"{snum}/{total_steps}: {sdesc}{detail}" + status = "[DONE]" if is_done else "[RUNNING]" + print( + f" {status:<10} {'':<4} {label:<34} {fmt(elapsed) if elapsed is not None else '—':>8}" + ) + +for snum in range(last_step_num + 1, total_steps + 1): + print(f" {'[ ]':<10} {'':<4} {f'{snum}/{total_steps}: pending':<34} {'':>8}") + +print(DIV) +done_steps = len([s for s in seen_steps if s != last_step_num or pipeline_complete_ts]) +step_durations = [] +for i, (snum, (sdesc, sts)) in enumerate(step_ts_list): + next_ts = ( + step_ts_list[i + 1][1][1] if i + 1 < len(step_ts_list) else (pipeline_complete_ts or None) + ) + if next_ts and sts: + step_durations.append(int((next_ts - sts).total_seconds())) +avg_step_s = sum(step_durations) / len(step_durations) if step_durations else None + + +def step_est(snum): + """Estimate duration in seconds for a pending pipeline step.""" + if snum == 7: + # Step 7 in the full pipeline is a single MIP solve (~5m), not a sweep + return 296 + elif snum == 8: + return 60 + return avg_step_s or 0 + + +if pipeline_complete_ts: + est_rem = "done" +elif step_remaining is not None: + future_s = sum(step_est(s) for s in range(last_step_num + 1, total_steps + 1)) + est_rem = fmt(step_remaining + future_s) +else: + cur_s = step_est(last_step_num) + future_s = cur_s + sum(step_est(s) for s in range(last_step_num + 1, total_steps + 1)) + est_rem = fmt(future_s) if (cur_s or future_s) else "calculating..." + +finished_str = ( + pipeline_complete_ts.strftime("%H:%M:%S") + if pipeline_complete_ts + else now.strftime("%H:%M:%S") + " (in progress)" +) +print(f" Started: {pipeline_start.strftime('%H:%M:%S') if pipeline_start else '—'}") +print(f" Finished: {finished_str}") +print(f" Elapsed: {fmt(total_elapsed)}") +print(f" Completed: {done_steps}/{total_steps} steps") +print(f" Remaining: {est_rem} estimated") +results_match = re.search(r"Results written to: (\S+)", text) +if not results_match: + results_match = re.search(r"\[run_puzzle\.py:335\]\s+(\S+)", text) +if results_match: + print(f"\n Results: {results_match.group(1)}") diff --git a/.agents/skills/puzzletron/mip_progress.py b/.agents/skills/puzzletron/mip_progress.py new file mode 100644 index 00000000000..2065cc18ce3 --- /dev/null +++ b/.agents/skills/puzzletron/mip_progress.py @@ -0,0 +1,182 @@ +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Generated with Claude Code +"""Progress report for the Puzzletron MIP step.""" + +import re +import sys +from datetime import datetime + +LOG = "./log.txt" +try: + lines = open(LOG).readlines() + text = "".join(lines) +except FileNotFoundError: + print("No log.txt found. Run /puzzletron mip first.") + sys.exit(0) + + +def norm(r): + """Normalize a compression rate to a canonical float string.""" + return str(float(r)) + + +def fmt(s): + """Format seconds as 'Xm Ys', or '—' if None.""" + return f"{int(s) // 60}m {int(s) % 60}s" if s is not None else "—" + + +def get_ts(line): + """Extract a datetime from a log line timestamp, or None.""" + m = re.search(r"\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})", line) + return datetime.strptime(m.group(1), "%Y-%m-%d %H:%M:%S") if m else None + + +now = datetime.now().replace(microsecond=0) + +rates_match = re.search(r"Compression rates: \[(.*?)\]", text) +all_rates = [norm(r.strip()) for r in rates_match.group(1).split(",")] if rates_match else [] + +# Detect completion via step 8 marker or sweep.py:292 +complete_ts = None +for line in lines: + ts = get_ts(line) + if ts and ("Results written to:" in line or "Puzzletron Progress 8/8" in line): + complete_ts = ts + break + +cbc_matches = re.findall(r"After (\d+) nodes.*?\(([\d.]+) seconds\)", text) + +# ── Sweep disabled: single MIP solve ───────────────────────────────────────── +if not all_rates: + step7_ts = None + for line in lines: + ts = get_ts(line) + if ts and "Puzzletron Progress 7/8" in line: + step7_ts = ts + break + + end_ts = complete_ts or now + total_elapsed = int((end_ts - step7_ts).total_seconds()) if step7_ts else 0 + + cbc_detail = "" + if cbc_matches: + nodes, secs = cbc_matches[-1] + cbc_detail = f" ({int(nodes):,} nodes, {float(secs):.1f}s)" + + DIV = "─" * 62 + print("\nOverall: Puzzletron step 7/8 — MIP solve (sweep disabled)") + print(DIV) + print(f" {'Status':<10} {'Phase':<32} {'Elapsed':>8}") + print(DIV) + print(f" {'[DONE]':<10} {'Prep (loading model + scores)':<32} {'<1s':>8}") + status = "[DONE]" if complete_ts else "[RUNNING]" + label = f"MIP solve{cbc_detail}" + print(f" {status:<10} {label:<32} {fmt(total_elapsed):>8}") + print(DIV) + finished_str = ( + complete_ts.strftime("%H:%M:%S") + if complete_ts + else now.strftime("%H:%M:%S") + " (in progress)" + ) + print(f" Started: {step7_ts.strftime('%H:%M:%S') if step7_ts else '—'}") + print(f" Finished: {finished_str}") + print(f" Elapsed: {fmt(total_elapsed)}") + print(f" Remaining: {'done' if complete_ts else 'calculating...'}") + results_match = re.search(r"Results written to: (\S+)", text) + if not results_match: + results_match = re.search(r"\[run_puzzle\.py:335\]\s+(\S+)", text) + if results_match: + print(f"\n Results: {results_match.group(1)}") + sys.exit(0) + +# ── Sweep enabled: per-rate progress ───────────────────────────────────────── +rate_start = {} +for line in lines: + m = re.search(r"compression_rate=([\d.]+)", line) + if m: + r = norm(m.group(1)) + if r in all_rates and r not in rate_start: + rate_start[r] = get_ts(line) + +sweep_start = rate_start.get(all_rates[0]) if all_rates else None + +rate_done = set() +for i, r in enumerate(all_rates[:-1]): + if all_rates[i + 1] in rate_start: + rate_done.add(r) +last = all_rates[-1] if all_rates else None +if complete_ts and last and last in rate_start: + rate_done.add(last) + +rate_elapsed = {} +for i, r in enumerate(all_rates): + if r not in rate_start: + continue + end = rate_start[all_rates[i + 1]] if i + 1 < len(all_rates) else (complete_ts or now) + rate_elapsed[r] = int((end - rate_start[r]).total_seconds()) + +running_rate = next((r for r in all_rates if r in rate_start and r not in rate_done), None) + +cur_detail = "" +if running_rate: + batch_matches = re.findall(r"calculate_losses_pipeline[^:]*:\s*(\d+)%.*?(\d+)/(\d+)", text) + if batch_matches: + pct, cur, total = batch_matches[-1] + cur_detail = f" — validating ({cur}/{total} batches)" + elif cbc_matches: + nodes, secs = cbc_matches[-1] + cur_detail = f" — MIP solver ({int(nodes):,} nodes, {float(secs):.1f}s)" + +end_ts = complete_ts or now +total_elapsed = int((end_ts - sweep_start).total_seconds()) if sweep_start else 0 + +done_count = len(rate_done) +remaining_count = len(all_rates) - done_count +avg_s = sum(rate_elapsed[r] for r in rate_done) / done_count if done_count else None +est_rem = ( + fmt(avg_s * remaining_count) + if avg_s and remaining_count + else ("done" if not remaining_count else "calculating...") +) + +DIV = "─" * 62 +print(f"\nOverall: Puzzletron step 7/8 — MIP sweep ({len(all_rates)} compression rates)") +print(DIV) +print(f" {'Status':<10} {'Phase':<32} {'Elapsed':>8}") +print(DIV) +print(f" {'[DONE]':<10} {'Prep (teacher memory + rate list)':<32} {'<1s':>8}") +for r in all_rates: + if r not in rate_start: + print(f" {'[ ]':<10} {f'compression_rate={r}':<32} {'pending':>8}") + elif r == running_rate: + print( + f" {'[RUNNING]':<10} {f'compression_rate={r}{cur_detail}':<32} {fmt(rate_elapsed.get(r)):>8}" + ) + else: + print(f" {'[DONE]':<10} {f'compression_rate={r}':<32} {fmt(rate_elapsed.get(r)):>8}") +print(DIV) +finished_str = ( + complete_ts.strftime("%H:%M:%S") if complete_ts else now.strftime("%H:%M:%S") + " (in progress)" +) +print(f" Started: {sweep_start.strftime('%H:%M:%S') if sweep_start else '—'}") +print(f" Finished: {finished_str}") +print(f" Elapsed: {fmt(total_elapsed)}") +print(f" Completed: {done_count}/{len(all_rates)} compression rates") +print(f" Remaining: {est_rem} estimated") +results_match = re.search(r"Results written to: (\S+)", text) +if results_match: + print(f"\n Results: {results_match.group(1)}") diff --git a/.claude/skills/puzzletron b/.claude/skills/puzzletron new file mode 120000 index 00000000000..ef76b5489dd --- /dev/null +++ b/.claude/skills/puzzletron @@ -0,0 +1 @@ +../../.agents/skills/puzzletron \ No newline at end of file diff --git a/CHANGELOG.rst b/CHANGELOG.rst index d3d0ec160ec..852a352d891 100755 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -6,6 +6,7 @@ Changelog **New Features** +- Add **experimental** ``/puzzletron`` Claude Code agent skill (``.agents/skills/puzzletron/``) with ``mip`` and ``all`` commands for running the MIP step or full pipeline, and ``mip progress`` / ``all progress`` sub-commands reporting per-step status, elapsed time, and estimated time remaining. See `.agents/skills/puzzletron/README.md `_. - Add the ``day0-release`` agent skill (``.agents/skills/day0-release/``), a deterministic end-to-end driver that chains the PTQ → evaluation → comparison skills (the evaluation stage deploys the checkpoint itself) with an enforced gate after each stage and returns a publish decision (ACCEPT / REGRESSION / ANOMALOUS / INFEASIBLE). Ships three GPU-free, unit-tested gate scripts (``gate_ptq.py``, ``gate_run.py``, ``gate_compare.py``) that validate checkpoint coverage, evaluation-run completeness, and baseline-vs-candidate accuracy threshold. v1 reports and stops on regression; the recipe-search loop is deferred. - Add **streaming** speculative-decoding training (EAGLE3 / DFlash): the draft trains on base-model hidden states produced on the fly by a co-located ``vllm serve`` (no disk dump), moved trainer-side over NIXL RDMA, scaling to multi-node (dedicated serve replicas + DDP trainers). New launcher examples for NVFP4 Kimi-K2.5 / K2.6 on GB200/aarch64 under ``tools/launcher/examples/moonshotai/``. - Add a fused Triton fast path for ``local_hessian`` NVFP4 weight-scale search (the Hessian-weighted FP8-E4M3 scale sweep). For each NVFP4 block it minimizes ``dwᵀ H dw`` over the 126 candidate scales using the per-cin-block local Hessian on tensor cores, replacing the per-weight Python reference sweep — roughly **34x** faster on a single 8192x4096 weight and bit-exact with the reference for fp32/fp16 weights. Used automatically during ``local_hessian`` calibration for both dense and fused-MoE expert weights; falls back to the reference sweep on CPU, when Triton is unavailable, or via ``MODELOPT_NVFP4_TRITON_SWEEP=0``. diff --git a/examples/puzzletron/README.md b/examples/puzzletron/README.md index 48954a2b773..d5ce1e4535c 100644 --- a/examples/puzzletron/README.md +++ b/examples/puzzletron/README.md @@ -388,3 +388,10 @@ Due to non-linear extension of the runtime stats of single subblocks to the tota ## Advanced Usage Modify `llama-3_1-8B_pruneffn_memory.yaml` file for advanced compression scenarios. + +## Using with AI agents + +> **Experimental:** AI agent integration is an experimental feature and may change. + +Puzzletron ships a skill for AI coding agents (Claude Code, Cursor, Codex). +See [`.agents/skills/puzzletron/README.md`](../../.agents/skills/puzzletron/README.md) for setup, commands, and example output.