-
Notifications
You must be signed in to change notification settings - Fork 284
autotune target_bits example for llama recipe #2344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: He, Xin3 <xin3.he@intel.com>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances AutoRound quantization configuration by improving parameter management and enabling flexible mixed-precision quantization through the target_bits parameter.
Key Changes:
- Modified
target_bitsfrominttofloatin AutoRoundConfig to support fractional bit-width targets - Refactored config classes to remove redundant
params_listattributes and use dynamic generation instead - Added
non_tunable_paramsmechanism inTorchBaseConfigto exclude specific parameters from tuning - Added
output_dirparameter to AutoRoundConfig for managing temporary file storage
Reviewed changes
Copilot reviewed 10 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
neural_compressor/torch/quantization/config.py |
Refactored config classes: removed hardcoded params_list, added non_tunable_params initialization in TorchBaseConfig.__init__, changed target_bits type to float, added output_dir parameter |
neural_compressor/common/base_config.py |
Updated tuning parameter filtering logic to check against non_tunable_params, added internal parameter filtering in to_dict() |
neural_compressor/torch/algorithms/weight_only/autoround.py |
Refactored device handling to store accelerator object, added model reloading logic for specific export formats with memory cleanup |
neural_compressor/torch/utils/auto_accelerator.py |
Enhanced CPU accelerator's empty_cache() to call gc.collect() instead of no-op |
examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/quantize.py |
New example script demonstrating AutoRound quantization with target_bits parameter |
examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md |
New comprehensive documentation for Llama3 quantization recipes and inference |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: He, Xin3 <xin3.he@intel.com>
yiliu30
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM.
It would be better not to mix example changes and new features in one PR.”
...es/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/llama3/README.md
Show resolved
Hide resolved
Signed-off-by: He, Xin3 <xin3.he@intel.com>
Signed-off-by: He, Xin3 <xin3.he@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tang Kaihui <kaihui.tang@intel.com>
Right, most issues are found during enabling example, and those are mixed aiming quick development. |
PR Type
Enhancement
Description
Add
target_bitsas float in AutoRoundConfigRemove redundant
params_listin multiple classesIntroduce
non_tunable_paramsin TorchBaseConfigAdd
output_dirto AutoRoundConfigDiagram Walkthrough
File Walkthrough
1 files
Modify AutoRoundConfig and clean up config classes11 files