CUDA Out of memory when mqt.compress model after QAT

**Before submitting an issue, please make sure it hasn't been already addressed by searching through the [existing and past issues](https://github.com/NVIDIA/Model-Optimizer/issues?q=is%3Aissue).**

## Describe the bug
I am trying to apply LoRA before QAT and, after that, QST + compress weights (as far as I now, that is how I need to implement QLoRA) 

But if I compress weights: CUDA Out of memory when mqt.compress model after QAT. 
(Model is gpt-oss-120b)

from trl import SFTTrainer
from peft import LoraConfig

peft_config = LoraConfig(r=256, lora_alpha=16, target_modules="all-linear")

trainer = SFTTrainer(
    peft_config=peft_config,
    model=model,
    args=training_args,
    train_dataset=dataset[script_args.dataset_train_split],
    eval_dataset=dataset[script_args.dataset_test_split],
    processing_class=tokenizer,
)
import torch

import modelopt.torch.quantization as mtq

quantization_config = mtq.MXFP4_MLP_WEIGHT_ONLY_CFG
calib_size = 128

dataset = torch.utils.data.Subset(
    trainer.eval_dataset, list(range(min(len(trainer.eval_dataset), calib_size)))
)
data_loader = trainer.get_eval_dataloader(dataset)


def forward_loop(model):
    for data in data_loader:
        model(**data)

q_model = mtq.quantize(model, quantization_config, forward_loop)
qc_model = mtq.compress(model)
trainer.train()



## System information

- Container used (if applicable): ?
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): Ubuntu 22.04.5 LTS
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): NVIDIA A100-SXM4-80GB
- GPU memory size: 80.0 GB
- Number of GPUs: 8
- Library versions (if applicable):
  - Python: 3.10.12
  - ModelOpt version or commit hash: 0.40.0
  - CUDA: 12.2
  - PyTorch: 2.6.0+cu124
  - Transformers: 4.57.3
  - TensorRT-LLM: ?
  - ONNXRuntime: 1.22.0
  - TensorRT: ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Out of memory when mqt.compress model after QAT #696

Describe the bug

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA Out of memory when mqt.compress model after QAT #696

Description

Describe the bug

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions