Skip to content

Conversation

@xin3he
Copy link
Contributor

@xin3he xin3he commented Nov 25, 2025

PR Type

Enhancement


Description

  • Add target_bits as float in AutoRoundConfig

  • Remove redundant params_list in multiple classes

  • Introduce non_tunable_params in TorchBaseConfig

  • Add output_dir to AutoRoundConfig


Diagram Walkthrough

flowchart LR
  A["Add target_bits as float"] -- "in AutoRoundConfig" --> B["Remove redundant params_list"]
  B -- "Introduce non_tunable_params" --> C["Add output_dir"]
Loading

File Walkthrough

Relevant files
Enhancement
1 files
config.py
Modify AutoRoundConfig and clean up config classes             
+11/-167
Additional files
11 files
README.md +217/-0 
quantize.py +260/-0 
requirements.txt [link]   
README.md +0/-125 
quantize.py +0/-261 
Meta-Llama-3.1-8B-Instruct_7bits.json +0/-2242
Meta-Llama-3.3-70B-Instruct_5bits.json +0/-5602
run_hf_inf.py +0/-29   
base_config.py +2/-1     
autoround.py +16/-1   
auto_accelerator.py +2/-1     

Signed-off-by: He, Xin3 <xin3.he@intel.com>
@PRAgent4INC
Copy link
Collaborator

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Exception Handling

The except block in the convert method is too broad. It catches all exceptions without any handling, which can hide errors and make debugging difficult.

pass
Resource Management

The del model statement and self.accelerator.empty_cache() call should be inside a finally block to ensure they are executed even if an exception occurs.

del model
self.accelerator.empty_cache()
logger.info("Quantization is done, reloading model from saved directory...")
Garbage Collection

Calling gc.collect() in the empty_cache method might not be necessary and could impact performance. Consider removing it unless profiling shows a significant benefit.

gc.collect()

@PRAgent4INC
Copy link
Collaborator

PR Code Suggestions ✨

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances AutoRound quantization configuration by improving parameter management and enabling flexible mixed-precision quantization through the target_bits parameter.

Key Changes:

  • Modified target_bits from int to float in AutoRoundConfig to support fractional bit-width targets
  • Refactored config classes to remove redundant params_list attributes and use dynamic generation instead
  • Added non_tunable_params mechanism in TorchBaseConfig to exclude specific parameters from tuning
  • Added output_dir parameter to AutoRoundConfig for managing temporary file storage

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
neural_compressor/torch/quantization/config.py Refactored config classes: removed hardcoded params_list, added non_tunable_params initialization in TorchBaseConfig.__init__, changed target_bits type to float, added output_dir parameter
neural_compressor/common/base_config.py Updated tuning parameter filtering logic to check against non_tunable_params, added internal parameter filtering in to_dict()
neural_compressor/torch/algorithms/weight_only/autoround.py Refactored device handling to store accelerator object, added model reloading logic for specific export formats with memory cleanup
neural_compressor/torch/utils/auto_accelerator.py Enhanced CPU accelerator's empty_cache() to call gc.collect() instead of no-op
examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/quantize.py New example script demonstrating AutoRound quantization with target_bits parameter
examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md New comprehensive documentation for Llama3 quantization recipes and inference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: He, Xin3 <xin3.he@intel.com>
Copy link
Contributor

@yiliu30 yiliu30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM.
It would be better not to mix example changes and new features in one PR.”

Signed-off-by: He, Xin3 <xin3.he@intel.com>
Signed-off-by: He, Xin3 <xin3.he@intel.com>
xin3he and others added 3 commits November 26, 2025 14:03
@xin3he
Copy link
Contributor Author

xin3he commented Nov 26, 2025

Overall, LGTM. It would be better not to mix example changes and new features in one PR.”

Right, most issues are found during enabling example, and those are mixed aiming quick development.

Signed-off-by: He, Xin3 <xin3.he@intel.com>
Signed-off-by: He, Xin3 <xin3.he@intel.com>
Signed-off-by: He, Xin3 <xin3.he@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants