Commit 63c175b
Add Intel AutoRound algorithm support (#1994)
Resolve #1968
### Highlights
- Introduced `AutoRoundModifier` to enable AutoRound quantization for
`wNa16`.
- Added an end-to-end example and unit tests.
- Verified functionality with local accuracy tests (GSM8K with a limit
of 1000, the results may fluctuate due to non-determinism.)
```bash
- LLMC-AutoRound
vllm (pretrained=/storage/yiliu7/Meta-Llama-3-8B-Instruct-W4A16-G128-disbale-shuffule,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.737|± |0.0139|
| | |strict-match | 5|exact_match|↑ |0.736|± |0.0139|
- AutoRound result as ref
vllm (pretrained=/storage/yiliu7/meta-llama/Meta-Llama-3-8B-Instruct-ar/Meta-Llama-3-8B-Instruct-w4g128/,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.739|± |0.0139|
| | |strict-match | 5|exact_match|↑ |0.740|± |0.0139|
```
Attached [eval
cmd](https://gist.github.com/yiliu30/a7881cd1cbf0d676e3ffac3e3833aa8e)
FYI.
### Next stage (in later PRs)
- [ ] Extend support for additional data types.
- [ ] Add group-wise quantization recipes mapping between LLMC and
AutoRound.
- [ ] Add end-to-end tests.
cc @hshen14 @thuang6 @wenhuach21
---------
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: Yi Liu <yi4.liu@intel.com>
Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>1 parent c600e2e commit 63c175b
File tree
8 files changed
+508
-3
lines changed- .github/workflows
- examples/autoround
- src/llmcompressor
- modifiers/autoround
- pipelines/sequential
- utils
- tests/llmcompressor/transformers/autoround
8 files changed
+508
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
100 | 104 | | |
101 | 105 | | |
102 | 106 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| 147 | + | |
| 148 | + | |
147 | 149 | | |
148 | 150 | | |
149 | 151 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
0 commit comments