Skip to content

Conversation

@ovuruska
Copy link

What does this PR do?

Type of change: New feature

Overview:
This PR adds support for Flux.2-dev (black-forest-labs/FLUX.2-dev) quantization and ONNX export. It enables users to quantize Flux.2 models (INT8/FP8/FP4) and export them to TensorRT by handling the architectural differences in the new model version, specifically the updated RoPE embedding dimensions and attention mechanisms.

Key changes include:

  • Quantization Registry: Registered Flux2Attention and Flux2ParallelSelfAttention in modelopt/torch/quantization/plugins/diffusers.py to enable quantization for the new architecture.
  • ONNX Export Logic: Implemented _gen_dummy_inp_and_dyn_shapes_flux2 in export.py to correctly handle Flux.2 input shapes, specifically accounting for the RoPE dimension change (from 3 to 4) and updated ID tensor shapes.
  • Pipeline Integration: Added ModelType.FLUX_2_DEV and mapped it to the correct Flux2Pipeline in quantize.py and diffusion_trt.py.
  • Documentation: Updated examples/diffusers/README.md with support status and a specific trtexec command for Flux.2.

Usage

You can now quantize and export Flux.2-dev using the standard quantization script:

# Example for FP8 quantization
python examples/diffusers/quantization/quantize.py \
    --model flux-2-dev \
    --format fp8 \
    --batch-size 1 \
    --calib-size 128 \
    --quantized-torch-ckpt-save-path ./flux2_fp8.pt \
    --onnx-dir ./onnx_flux2

Testing

Tested manually by:

  1. successfully quantizing the black-forest-labs/FLUX.2-dev model.
  2. exporting the quantized model to ONNX.
  3. verifying the ONNX graph input shapes match the Flux.2 requirements (specifically img_ids and txt_ids with the last dimension as 4).
  4. successfully building a TensorRT engine using trtexec.

Before your PR is "Ready for review"

Additional Information

This PR addresses the support for the Flux.2 architecture, which introduces Flux2ParallelSelfAttention and changes the positional embedding (RoPE) dimension size to 4, requiring specific handling distinct from Flux.1.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ovuruska ovuruska marked this pull request as ready for review December 19, 2025 10:24
@ovuruska ovuruska requested review from a team as code owners December 19, 2025 10:24
@ovuruska ovuruska requested a review from ajrasane December 19, 2025 10:24
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Added support for various quantization features including Flux.2-dev and Transformer Engine.

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant