You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/ISSUE_TEMPLATE/1_bug_report.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ labels: bug
6
6
assignees: ''
7
7
---
8
8
9
-
**Before submitting an issue, please make sure it hasn't been already addressed by searching through the [existing and past issues](https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues?q=is%3Aissue).**
9
+
**Before submitting an issue, please make sure it hasn't been already addressed by searching through the [existing and past issues](https://github.com/NVIDIA/Model-Optimizer/issues?q=is%3Aissue).**
10
10
11
11
## Describe the bug
12
12
<!-- Description of what the bug is, its impact (blocker, should have, nice to have) and any stack traces or error messages. -->
@@ -30,7 +30,7 @@ If you are unsure about whom to tag, you can leave it blank, and we will make su
30
30
31
31
## System information
32
32
33
-
<!-- Run this script to automatically collect system information: https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/.github/ISSUE_TEMPLATE/get_system_info.py -->
33
+
<!-- Run this script to automatically collect system information: https://github.com/NVIDIA/Model-Optimizer/blob/main/.github/ISSUE_TEMPLATE/get_system_info.py -->
34
34
35
35
- Container used (if applicable): ?
36
36
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ? <!-- If Windows, please add the `windows` label to the issue. -->
Copy file name to clipboardExpand all lines: .github/ISSUE_TEMPLATE/3_question.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ labels: question
6
6
assignees: ''
7
7
---
8
8
9
-
Make sure you already checked the [examples](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples) and [documentation](https://nvidia.github.io/TensorRT-Model-Optimizer/) before submitting an issue.
9
+
Make sure you already checked the [examples](https://github.com/NVIDIA/Model-Optimizer/tree/main/examples) and [documentation](https://nvidia.github.io/Model-Optimizer/) before submitting an issue.
10
10
11
11
## How would you like to use ModelOpt
12
12
@@ -23,7 +23,7 @@ If you are unsure about whom to tag, you can leave it blank, and we will make su
23
23
24
24
## System information
25
25
26
-
<!-- Run this script to automatically collect system information: https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/.github/ISSUE_TEMPLATE/get_system_info.py -->
26
+
<!-- Run this script to automatically collect system information: https://github.com/NVIDIA/Model-Optimizer/blob/main/.github/ISSUE_TEMPLATE/get_system_info.py -->
27
27
28
28
- Container used (if applicable): ?
29
29
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ? <!-- If Windows, please add the `windows` label to the issue. -->
Copy file name to clipboardExpand all lines: .github/PULL_REQUEST_TEMPLATE.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,11 +17,11 @@
17
17
## Before your PR is "*Ready for review*"
18
18
<!-- If you haven't finished some of the above items you can still open `Draft` PR. -->
19
19
20
-
-**Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed.
20
+
-**Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed.
21
21
-**Is this change backward compatible?**: Yes/No <!--- If No, explain why. -->
22
22
-**Did you write any new necessary tests?**: Yes/No
23
23
-**Did you add or update any necessary documentation?**: Yes/No
24
-
-**Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. -->
24
+
-**Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. -->
Copy file name to clipboardExpand all lines: CHANGELOG-Windows.rst
+8-9Lines changed: 8 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,34 +1,33 @@
1
-
===================================
2
-
Model Optimizer Changelog (Windows)
3
-
===================================
1
+
NVIDIA Model Optimizer Changelog (Windows)
2
+
==========================================
4
3
5
4
0.33 (2025-07-21)
6
5
^^^^^^^^^^^^^^^^^
7
6
8
7
**New Features**
9
8
10
-
- TensorRT Model Optimizer for Windows now supports `NvTensorRtRtx <https://onnxruntime.ai/docs/execution-providers/TensorRTRTX-ExecutionProvider.html>`_ execution-provider.
9
+
- Model Optimizer for Windows now supports `NvTensorRtRtx <https://onnxruntime.ai/docs/execution-providers/TensorRTRTX-ExecutionProvider.html>`_ execution-provider.
11
10
12
11
13
12
0.27 (2025-04-30)
14
13
^^^^^^^^^^^^^^^^^
15
14
16
15
**New Features**
17
16
18
-
- New LLM models like DeepSeek etc. are supported with ONNX INT4 AWQ quantization on Windows. Refer `Windows Support Matrix <https://nvidia.github.io/TensorRT-Model-Optimizer/guides/0_support_matrix.html>`_ for details about supported features and models.
19
-
- TensorRT Model Optimizer for Windows now supports ONNX INT8 and FP8 quantization (W8A8) of SAM2 and Whisper models. Check `example scripts <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/windows/onnx_ptq>`_ for getting started with quantizing these models.
17
+
- New LLM models like DeepSeek etc. are supported with ONNX INT4 AWQ quantization on Windows. Refer `Windows Support Matrix <https://nvidia.github.io/Model-Optimizer/guides/0_support_matrix.html>`_ for details about supported features and models.
18
+
- Model Optimizer for Windows now supports ONNX INT8 and FP8 quantization (W8A8) of SAM2 and Whisper models. Check `example scripts <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq>`_ for getting started with quantizing these models.
20
19
21
20
22
21
0.19 (2024-11-18)
23
22
^^^^^^^^^^^^^^^^^
24
23
25
24
**New Features**
26
25
27
-
- This is the first official release of TensorRT Model Optimizer for Windows
26
+
- This is the first official release of Model Optimizer for Windows
28
27
- **ONNX INT4 Quantization:** :meth:`modelopt.onnx.quantization.quantize_int4 <modelopt.onnx.quantization.int4.quantize>` now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See :ref:`Support_Matrix` for details about supported features and models.
29
-
- **LLM Quantization with Olive:** Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer `example <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-tensorrt-model-optimizer>`_
28
+
- **LLM Quantization with Olive:** Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer `example <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-Model-Optimizer>`_
- **MMLU Benchmark for Accuracy Evaluations:** Introduced `MMLU benchmarking <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/README.md>`_ for accuracy evaluation of ONNX models on DirectML (DML).
30
+
- **MMLU Benchmark for Accuracy Evaluations:** Introduced `MMLU benchmarking <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/README.md>`_ for accuracy evaluation of ONNX models on DirectML (DML).
32
31
- **Published quantized ONNX models collection:** Published quantized ONNX models at HuggingFace `NVIDIA collections <https://huggingface.co/collections/nvidia/optimized-onnx-models-for-nvidia-rtx-gpus>`_.
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
-
Model Optimizer Changelog (Linux)
2
-
=================================
1
+
NVIDIA Model Optimizer Changelog (Linux)
2
+
========================================
3
3
4
4
0.40 (2025-12-11)
5
5
^^^^^^^^^^^^^^^^^
@@ -12,7 +12,7 @@ Model Optimizer Changelog (Linux)
12
12
**New Features**
13
13
14
14
- Add MoE (e.g. Qwen3-30B-A3B, gpt-oss-20b) pruning support for ``num_moe_experts``, ``moe_ffn_hidden_size`` and ``moe_shared_expert_intermediate_size`` parameters in Minitron pruning (``mcore_minitron``).
15
-
- Add ``specdec_bench`` example to benchmark speculative decoding performance. See `examples/specdec_bench/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/specdec_bench#speculative-decoding-benchmark>`_ for more details.
15
+
- Add ``specdec_bench`` example to benchmark speculative decoding performance. See `examples/specdec_bench/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/specdec_bench#speculative-decoding-benchmark>`_ for more details.
16
16
- Add FP8/NVFP4 KV cache quantization support for Megatron Core models.
17
17
- Add flag ``trt_plugins_precision`` in ONNX autocast to indicate custom ops precision. This is similar to the flag already existing in the quantization workflow.
18
18
- Add support for PyTorch Geometric quantization.
@@ -30,7 +30,7 @@ Model Optimizer Changelog (Linux)
30
30
31
31
**Deprecations**
32
32
33
-
- Deprecated ``modelopt.torch._deploy.utils.get_onnx_bytes`` API. Please use ``modelopt.torch._deploy.utils.get_onnx_bytes_and_metadata`` instead to access the ONNX model bytes with external data. see `examples/onnx_ptq/download_example_onnx.py <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/onnx_ptq/download_example_onnx.py>`_ for example usage.
33
+
- Deprecated ``modelopt.torch._deploy.utils.get_onnx_bytes`` API. Please use ``modelopt.torch._deploy.utils.get_onnx_bytes_and_metadata`` instead to access the ONNX model bytes with external data. see `examples/onnx_ptq/download_example_onnx.py <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/onnx_ptq/download_example_onnx.py>`_ for example usage.
34
34
35
35
**New Features**
36
36
@@ -40,7 +40,7 @@ Model Optimizer Changelog (Linux)
40
40
- Add support for ``nemotron-post-training-dataset-v2`` and ``nemotron-post-training-dataset-v1`` in ``examples/llm_ptq``. Default to a mix of ``cnn_dailymail`` and ``nemotron-post-training-dataset-v2`` (gated dataset accessed using ``HF_TOKEN`` environment variable) if no dataset is specified.
41
41
- Allow specifying ``calib_seq`` in ``examples/llm_ptq`` to set the maximum sequence length for calibration.
42
42
- Add support for MCore MoE PTQ/QAT/QAD.
43
-
- Add support for multi-node PTQ and export with FSDP2 in ``examples/llm_ptq/multinode_ptq.py``. See `examples/llm_ptq/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq#multi-node-post-training-quantization-with-fsdp2>`_ for more details.
43
+
- Add support for multi-node PTQ and export with FSDP2 in ``examples/llm_ptq/multinode_ptq.py``. See `examples/llm_ptq/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq#multi-node-post-training-quantization-with-fsdp2>`_ for more details.
44
44
- Add support for Nemotron Nano VL v1 & v2 models in FP8/NVFP4 PTQ workflow.
45
45
- Add flags ``nodes_to_include`` and ``op_types_to_include`` in AutoCast to force-include nodes in low precision, even if they would otherwise be excluded by other rules.
46
46
- Add support for ``torch.compile`` and benchmarking in ``examples/diffusers/quantization/diffusion_trt.py``.
@@ -51,15 +51,15 @@ Model Optimizer Changelog (Linux)
51
51
52
52
**Documentation**
53
53
54
-
- Add general guidelines for Minitron pruning and distillation. See `examples/pruning/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/pruning#pruning-guidelines>`_ for more details.
55
-
- Added example for exporting QLoRA checkpoint for vLLM deployment. Refer to `examples/llm_qat/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/79ef31bc7269ba4da0cfab446da5b64509cbfcef/examples/llm_qat/README.md#qlora-deployment>`_ for more details
54
+
- Add general guidelines for Minitron pruning and distillation. See `examples/pruning/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/pruning#pruning-guidelines>`_ for more details.
55
+
- Added example for exporting QLoRA checkpoint for vLLM deployment. Refer to `examples/llm_qat/README.md <https://github.com/NVIDIA/Model-Optimizer/blob/79ef31bc7269ba4da0cfab446da5b64509cbfcef/examples/llm_qat/README.md#qlora-deployment>`_ for more details
56
56
57
57
0.37 (2025-10-08)
58
58
^^^^^^^^^^^^^^^^^
59
59
60
60
**Deprecations**
61
61
62
-
- Deprecated ModelOpt's custom docker images. Please use the PyTorch, TensorRT-LLM or TensorRT docker image directly or refer to the `installation guide <https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html>`_ for more details.
62
+
- Deprecated ModelOpt's custom docker images. Please use the PyTorch, TensorRT-LLM or TensorRT docker image directly or refer to the `installation guide <https://nvidia.github.io/Model-Optimizer/getting_started/2_installation.html>`_ for more details.
63
63
- Deprecated ``quantize_mode`` argument in ``examples/onnx_ptq/evaluate.py`` to support strongly typing. Use ``engine_precision`` instead.
64
64
- Deprecated TRT-LLM's TRT backend in ``examples/llm_ptq`` and ``examples/vlm_ptq``. Tasks ``build`` and ``benchmark`` support are removed and replaced with ``quant``. ``engine_dir`` is replaced with ``checkpoint_dir`` in ``examples/llm_ptq`` and ``examples/vlm_ptq``. For performance evaluation, please use ``trtllm-bench`` directly.
65
65
- ``--export_fmt`` flag in ``examples/llm_ptq`` is removed. By default we export to the unified Hugging Face checkpoint format.
@@ -226,16 +226,16 @@ Model Optimizer Changelog (Linux)
226
226
- Disabled saving modelopt state in unified hf export APIs by default, i.e., added ``save_modelopt_state`` flag in ``export_hf_checkpoint`` API and by default set to False.
227
227
- Add FP8 and NVFP4 real quantization support with LLM QLoRA example.
228
228
- The :class:`modelopt.deploy.llm.LLM` now support use the :class:`tensorrt_llm._torch.LLM` backend for the quantized HuggingFace checkpoints.
229
-
- Add `NVFP4 PTQ example for DeepSeek-R1 <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/deepseek>`_.
230
-
- Add end-to-end `AutoDeploy example for AutoQuant LLM models <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_autodeploy>`_.
229
+
- Add `NVFP4 PTQ example for DeepSeek-R1 <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/deepseek>`_.
230
+
- Add end-to-end `AutoDeploy example for AutoQuant LLM models <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_autodeploy>`_.
231
231
232
232
0.23 (2025-01-29)
233
233
^^^^^^^^^^^^^^^^^
234
234
235
235
**Backward Breaking Changes**
236
236
237
237
- Support TensorRT-LLM to 0.17. Examples (e.g. benchmark task in llm_ptq) may not be fully compatible with TensorRT-LLM 0.15.
238
-
- Nvidia TensorRT Model Optimizer has changed its LICENSE from NVIDIA Proprietary (library wheel) and MIT (examples) to Apache 2.0 in this first full OSS release.
238
+
- Nvidia Model Optimizer has changed its LICENSE from NVIDIA Proprietary (library wheel) and MIT (examples) to Apache 2.0 in this first full OSS release.
239
239
- Deprecate Python 3.8, Torch 2.0, and Cuda 11.x support.
240
240
- ONNX Runtime dependency upgraded to 1.20 which no longer supports Python 3.9.
241
241
- In the Huggingface examples, the ``trust_remote_code`` is by default set to false and require users to explicitly turning it on with ``--trust_remote_code`` flag.
@@ -283,7 +283,7 @@ Model Optimizer Changelog (Linux)
283
283
**Backward Breaking Changes**
284
284
285
285
- Deprecated the summarize task in the ``llm_ptq`` example.
286
-
- Deprecated the ``type`` flag in the `huggingface_example.sh <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq/scripts/huggingface_example.sh>`_
286
+
- Deprecated the ``type`` flag in the `huggingface_example.sh <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq/scripts/huggingface_example.sh>`_
287
287
- Deprecated Python plugin support in ONNX.
288
288
- Support TensorRT-LLM 0.13. Examples not compatible with TensorRT-LLM 0.12.
289
289
- :meth:`mtq.auto_quantize <modelopt.torch.quantization.model_quant.auto_quantize>` API has been updated. The API now
@@ -320,7 +320,7 @@ Model Optimizer Changelog (Linux)
320
320
- New APIs and examples: :mod:`modelopt.torch.prune` for pruning Conv, Linear, and Attention heads for
- New API: :mod:`modelopt.torch.distill` for knowledge distillation, along with guides and example.
323
-
- New Example: `HF BERT Prune, Distill & Quantize <https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/examples/chained_optimizations>`_
323
+
- New Example: `HF BERT Prune, Distill & Quantize <https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/chained_optimizations>`_
324
324
showcasing how to chain pruning, distillation, and quantization to achieve the best performance on a given model.
325
325
- Added INT8/FP8 DQ-only support for ONNX model.
326
326
- New API: :mod:`modelopt.torch.speculative` for end-to-end support of Medusa models.
@@ -383,13 +383,13 @@ Model Optimizer Changelog (Linux)
383
383
384
384
**Backward Breaking Changes**
385
385
386
-
- `PTQ examples <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq>`_ have been
386
+
- `PTQ examples <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq>`_ have been
387
387
upgraded to use TensorRT-LLM 0.10.
388
388
389
389
**New Features**
390
390
391
391
- Adding TensorRT-LLM checkpoint export support for Medusa decoding (official ``MedusaModel`` and Megatron Core ``GPTModel``).
392
-
- Enable support for mixtral, recurrentgemma, starcoder, qwen in `PTQ examples <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq>`_.
392
+
- Enable support for mixtral, recurrentgemma, starcoder, qwen in `PTQ examples <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq>`_.
393
393
- Adding TensorRT-LLM checkpoint export and engine building support for sparse models.
394
394
- Import scales from TensorRT calibration cache and use them for quantization.
395
395
- (Experimental) Enable low GPU memory FP8 calibration for the Hugging Face models when the original model size does not fit into the GPU memory.
@@ -403,7 +403,7 @@ Model Optimizer Changelog (Linux)
403
403
**Backward Breaking Changes**
404
404
405
405
- [!!!] The package was renamed from ``ammo`` to ``modelopt``. The new full product
406
-
name is *Nvidia TensorRT Model Optimizer*. PLEASE CHANGE ALL YOUR REFERENCES FROM ``ammo`` to
406
+
name is *Nvidia Model Optimizer*. PLEASE CHANGE ALL YOUR REFERENCES FROM ``ammo`` to
407
407
``modelopt`` including any paths and links!
408
408
- Default installation ``pip install nvidia-modelopt`` will now only install minimal core
409
409
dependencies. Following optional dependencies are available depending on the features that are
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,17 @@
1
-
# Contributing to TensorRT Model Optimizer
1
+
# Contributing to Model Optimizer
2
2
3
-
Thanks for your interest in contributing to TensorRT Model Optimizer (ModelOpt)!
3
+
Thanks for your interest in contributing to Model Optimizer (ModelOpt)!
4
4
5
5
## 🛠️ Setting up your environment
6
6
7
-
Ensure that TensorRT Model Optimizer (ModelOpt) is installed in editable mode and that all `dev` optional requirements are installed:
7
+
Ensure that Model Optimizer (ModelOpt) is installed in editable mode and that all `dev` optional requirements are installed:
8
8
9
9
```bash
10
10
pip install -e ".[dev]"
11
11
```
12
12
13
13
If you are working on features that require dependencies like TensorRT-LLM or Megatron-Core, consider using a docker container to simplify the setup process.
14
-
Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.
14
+
Visit our [installation docs](https://nvidia.github.io/Model-Optimizer/getting_started/2_installation.html) for more information.
0 commit comments