-
Notifications
You must be signed in to change notification settings - Fork 227
Support KIMI K2 Thinking int4 checkpoint PTQ #669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,6 +22,8 @@ | |
| from typing import TYPE_CHECKING | ||
|
|
||
| import torch | ||
| from torch import Tensor | ||
| from torch.nn.functional import linear | ||
|
|
||
| try: | ||
| from torch.distributed.tensor import Shard | ||
|
|
@@ -501,6 +503,22 @@ def top_k(self, value): | |
| self.router.moe_top_k = value | ||
|
|
||
|
|
||
| class _QuantCompressedLinear(QuantModule): | ||
| def _setup(self): | ||
| self.input_quantizer = TensorQuantizer() | ||
| self.weight_quantizer = TensorQuantizer() | ||
|
|
||
| def forward(self, input: Tensor) -> Tensor: | ||
| from compressed_tensors.quantization import QuantizationStatus | ||
|
|
||
| if self.quantization_status == QuantizationStatus.COMPRESSED: | ||
| weight_data = self.compressor.decompress_module(self) | ||
| else: | ||
| weight_data = self.weight | ||
|
|
||
| return linear(self.input_quantizer(input), self.weight_quantizer(weight_data), self.bias) | ||
|
|
||
|
|
||
| try: | ||
| from transformers.models.llama4.modeling_llama4 import Llama4TextExperts, Llama4TextMoe | ||
|
|
||
|
|
@@ -576,6 +594,16 @@ def top_k(self, value): | |
| except ImportError: | ||
| pass | ||
|
|
||
| try: | ||
| from compressed_tensors.linear.compressed_linear import CompressedLinear | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we add compressed-tensor as an optional dependency?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kevalmorabia97 @realAsma what do you think?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If a user is quantizing a model with CompressedLinear, wouldn't they already have
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we move this to a seperate file
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is a good point. +1
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not right now
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How strong do you feel about it? Right now I feel this still fall under hf plugins as it's part of the HF's invocation. |
||
|
|
||
| if CompressedLinear not in QuantModuleRegistry: | ||
| QuantModuleRegistry.register({CompressedLinear: "hf.CompressedLinear"})( | ||
| _QuantCompressedLinear | ||
| ) | ||
| except ImportError: | ||
| pass | ||
|
|
||
|
|
||
| class _QuantGptOssExperts(_QuantFunctionalMixin): | ||
| """Quantized wrapper for `transformers.GptOssExperts`. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is suspicious to me.
Won't this impact other models?