autotune target_bits example for llama recipe #2344

xin3he · 2025-11-25T08:33:55Z

PR Type

Enhancement

Description

Add target_bits as float in AutoRoundConfig
Remove redundant params_list in multiple classes
Introduce non_tunable_params in TorchBaseConfig
Add output_dir to AutoRoundConfig

Diagram Walkthrough

flowchart LR
  A["Add target_bits as float"] -- "in AutoRoundConfig" --> B["Remove redundant params_list"]
  B -- "Introduce non_tunable_params" --> C["Add output_dir"]

File Walkthrough

Relevant files

Enhancement

1 files

config.py `Modify AutoRoundConfig and clean up config classes`	+11/-167

Additional files

11 files

README.md	+217/-0
quantize.py	+260/-0
requirements.txt	[link]
README.md	+0/-125
quantize.py	+0/-261
Meta-Llama-3.1-8B-Instruct_7bits.json	+0/-2242
Meta-Llama-3.3-70B-Instruct_5bits.json	+0/-5602
run_hf_inf.py	+0/-29
base_config.py	+2/-1
autoround.py	+16/-1
auto_accelerator.py	+2/-1

Signed-off-by: He, Xin3 <xin3.he@intel.com>

PRAgent4INC · 2025-11-25T08:34:37Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Exception Handling The `except` block in the `convert` method is too broad. It catches all exceptions without any handling, which can hide errors and make debugging difficult. pass Resource Management The `del model` statement and `self.accelerator.empty_cache()` call should be inside a `finally` block to ensure they are executed even if an exception occurs. del model self.accelerator.empty_cache() logger.info("Quantization is done, reloading model from saved directory...") Garbage Collection Calling `gc.collect()` in the `empty_cache` method might not be necessary and could impact performance. Consider removing it unless profiling shows a significant benefit. gc.collect()

PRAgent4INC · 2025-11-25T08:35:09Z

PR Code Suggestions ✨

Copilot

Pull request overview

This PR enhances AutoRound quantization configuration by improving parameter management and enabling flexible mixed-precision quantization through the target_bits parameter.

Key Changes:

Modified target_bits from int to float in AutoRoundConfig to support fractional bit-width targets
Refactored config classes to remove redundant params_list attributes and use dynamic generation instead
Added non_tunable_params mechanism in TorchBaseConfig to exclude specific parameters from tuning
Added output_dir parameter to AutoRoundConfig for managing temporary file storage

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`neural_compressor/torch/quantization/config.py`	Refactored config classes: removed hardcoded `params_list`, added `non_tunable_params` initialization in `TorchBaseConfig.__init__`, changed `target_bits` type to float, added `output_dir` parameter
`neural_compressor/common/base_config.py`	Updated tuning parameter filtering logic to check against `non_tunable_params`, added internal parameter filtering in `to_dict()`
`neural_compressor/torch/algorithms/weight_only/autoround.py`	Refactored device handling to store accelerator object, added model reloading logic for specific export formats with memory cleanup
`neural_compressor/torch/utils/auto_accelerator.py`	Enhanced CPU accelerator's `empty_cache()` to call `gc.collect()` instead of no-op
`examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/quantize.py`	New example script demonstrating AutoRound quantization with `target_bits` parameter
`examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md`	New comprehensive documentation for Llama3 quantization recipes and inference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

neural_compressor/torch/algorithms/weight_only/autoround.py

neural_compressor/common/base_config.py

Signed-off-by: He, Xin3 <xin3.he@intel.com>

yiliu30

Overall, LGTM.
It would be better not to mix example changes and new features in one PR.”

neural_compressor/common/base_config.py

neural_compressor/torch/algorithms/weight_only/autoround.py

...es/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md

neural_compressor/torch/algorithms/weight_only/autoround.py

Signed-off-by: He, Xin3 <xin3.he@intel.com>

neural_compressor/torch/algorithms/weight_only/autoround.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: Tang Kaihui <kaihui.tang@intel.com>

xin3he · 2025-11-26T06:05:20Z

Overall, LGTM. It would be better not to mix example changes and new features in one PR.”

Right, most issues are found during enabling example, and those are mixed aiming quick development.

Signed-off-by: He, Xin3 <xin3.he@intel.com>

autotune target_bits example for llama recipe

2e50295

Signed-off-by: He, Xin3 <xin3.he@intel.com>

xin3he requested review from Kaihui-intel, chensuyue, Copilot, thuang6 and yiliu30 November 25, 2025 08:34

PRAgent4INC added the Review effort 4/5 label Nov 25, 2025

Copilot AI reviewed Nov 25, 2025

View reviewed changes

neural_compressor/torch/algorithms/weight_only/autoround.py Show resolved Hide resolved

neural_compressor/common/base_config.py Outdated Show resolved Hide resolved

update requirement

709cc71

Signed-off-by: He, Xin3 <xin3.he@intel.com>

yiliu30 approved these changes Nov 25, 2025

View reviewed changes

neural_compressor/common/base_config.py Outdated Show resolved Hide resolved

neural_compressor/torch/algorithms/weight_only/autoround.py Show resolved Hide resolved

chensuyue reviewed Nov 25, 2025

View reviewed changes

...es/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md Show resolved Hide resolved

Kaihui-intel approved these changes Nov 26, 2025

View reviewed changes

neural_compressor/torch/algorithms/weight_only/autoround.py Outdated Show resolved Hide resolved

neural_compressor/torch/algorithms/weight_only/autoround.py Outdated Show resolved Hide resolved

xin3he added 2 commits November 26, 2025 00:55

add run_quant run_benchmark

cc25af5

Signed-off-by: He, Xin3 <xin3.he@intel.com>

update readme

dcd69a2

Signed-off-by: He, Xin3 <xin3.he@intel.com>

xin3he commented Nov 26, 2025

View reviewed changes

neural_compressor/torch/algorithms/weight_only/autoround.py Outdated Show resolved Hide resolved

xin3he and others added 3 commits November 26, 2025 14:03

Update neural_compressor/torch/algorithms/weight_only/autoround.py

f07ca2d

Update neural_compressor/common/base_config.py

bca2063

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update neural_compressor/torch/algorithms/weight_only/autoround.py

1d812a0

Co-authored-by: Tang Kaihui <kaihui.tang@intel.com>

xin3he added 3 commits November 26, 2025 21:05

fix bug

99b8fff

Signed-off-by: He, Xin3 <xin3.he@intel.com>

update readme and fix CI

3ffb650

Signed-off-by: He, Xin3 <xin3.he@intel.com>

fix CI

54f87bb

Signed-off-by: He, Xin3 <xin3.he@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

autotune target_bits example for llama recipe #2344

autotune target_bits example for llama recipe #2344

Uh oh!

xin3he commented Nov 25, 2025 •

edited by PRAgent4INC

Loading

Uh oh!

PRAgent4INC commented Nov 25, 2025

Uh oh!

PRAgent4INC commented Nov 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

yiliu30 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xin3he commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

autotune target_bits example for llama recipe #2344

Are you sure you want to change the base?

autotune target_bits example for llama recipe #2344

Uh oh!

Conversation

xin3he commented Nov 25, 2025 • edited by PRAgent4INC Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

PRAgent4INC commented Nov 25, 2025

PR Reviewer Guide 🔍

Uh oh!

PRAgent4INC commented Nov 25, 2025

PR Code Suggestions ✨

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xin3he commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

xin3he commented Nov 25, 2025 •

edited by PRAgent4INC

Loading