Commit 33c6bf9
committed
WIP Compute per layer LIM Scores during imatrix
*WARNING*: This is mostly vibe code. Hope I'm not wasting y'alls time.
Compute Layer Importance Modification (LIM) Scores
The goal of this PR is to rank layers of a given tensor in order of sensitivity to quantization error. Given that it is now possible to use `llama-quantize --custom-q ...` regex, it may be possible to use these LIM Scores to decide which layers of a given tensor to quantize more or less in an attempt to preserve generation quality (e.g. low perplexity) while reducing memory footprint as compared to using same quant size across all layers of a given tensor.
This experimental PR was motivated by this comment and PR: ggml-org/llama.cpp#12718
I may force-push this after more testing and experimenting to see if it is actually doing the right thing and if the output is actually useful to improve quantization quality e.g. PPL per GiB... This may just be a big mistake, lol.
This is built on existing imatrix computation and assumes that values of `x[j]` are the "activations" coming right in/out of the given tensor layer. I don't know GGML and generally work in python or vanilla c not so much c++. So a lot of this was vibe coded running [ubergarm/DeepSeek-V3-0324-GGUF IQ4_K_R4 quant](https://huggingface.co/ubergarm/DeepSeek-V3-0324-GGUF/tree/main/DeepSeek-V3-0324-IQ4_K_R4). So this is partially an experiment actually trying to use an LLM instead of just enjoying the meta of manual quantization min-maxing.
```
@misc{dumitru2024layerwisequantizationpragmaticeffective,
title={Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels},
author={Razvan-Gabriel Dumitru and Vikas Yadav and Rishabh Maheshwary and Paul-Ioan Clotan and Sathwik Tejaswi Madhusudhan and Mihai Surdeanu},
year={2024},
eprint={2406.17415},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.17415},
code={https://github.com/RazvanDu/LayerwiseQuant/},
}
```1 parent c01449a commit 33c6bf9
3 files changed
+113
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1419 | 1419 | | |
1420 | 1420 | | |
1421 | 1421 | | |
| 1422 | + | |
| 1423 | + | |
| 1424 | + | |
| 1425 | + | |
1422 | 1426 | | |
1423 | 1427 | | |
1424 | 1428 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
| 273 | + | |
273 | 274 | | |
274 | 275 | | |
275 | 276 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
33 | | - | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| 40 | + | |
39 | 41 | | |
40 | 42 | | |
41 | 43 | | |
| |||
48 | 50 | | |
49 | 51 | | |
50 | 52 | | |
| 53 | + | |
51 | 54 | | |
52 | 55 | | |
53 | 56 | | |
| |||
131 | 134 | | |
132 | 135 | | |
133 | 136 | | |
| 137 | + | |
134 | 138 | | |
135 | 139 | | |
136 | 140 | | |
| |||
162 | 166 | | |
163 | 167 | | |
164 | 168 | | |
| 169 | + | |
165 | 170 | | |
166 | 171 | | |
167 | 172 | | |
| |||
183 | 188 | | |
184 | 189 | | |
185 | 190 | | |
| 191 | + | |
186 | 192 | | |
| 193 | + | |
187 | 194 | | |
188 | 195 | | |
189 | 196 | | |
| |||
198 | 205 | | |
199 | 206 | | |
200 | 207 | | |
| 208 | + | |
201 | 209 | | |
202 | 210 | | |
203 | 211 | | |
| |||
396 | 404 | | |
397 | 405 | | |
398 | 406 | | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
399 | 500 | | |
400 | 501 | | |
401 | 502 | | |
| |||
683 | 784 | | |
684 | 785 | | |
685 | 786 | | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
686 | 791 | | |
687 | 792 | | |
688 | 793 | | |
689 | 794 | | |
690 | 795 | | |
| 796 | + | |
| 797 | + | |
691 | 798 | | |
692 | 799 | | |
0 commit comments