feat: Add support for Flux.2-dev quantization and export #707

ovuruska · 2025-12-17T19:15:20Z

What does this PR do?

Type of change: New feature

Overview:
This PR adds support for Flux.2-dev (black-forest-labs/FLUX.2-dev) quantization and ONNX export. It enables users to quantize Flux.2 models (INT8/FP8/FP4) and export them to TensorRT by handling the architectural differences in the new model version, specifically the updated RoPE embedding dimensions and attention mechanisms.

Key changes include:

Quantization Registry: Registered Flux2Attention and Flux2ParallelSelfAttention in modelopt/torch/quantization/plugins/diffusers.py to enable quantization for the new architecture.
ONNX Export Logic: Implemented _gen_dummy_inp_and_dyn_shapes_flux2 in export.py to correctly handle Flux.2 input shapes, specifically accounting for the RoPE dimension change (from 3 to 4) and updated ID tensor shapes.
Pipeline Integration: Added ModelType.FLUX_2_DEV and mapped it to the correct Flux2Pipeline in quantize.py and diffusion_trt.py.
Documentation: Updated examples/diffusers/README.md with support status and a specific trtexec command for Flux.2.

Usage

You can now quantize and export Flux.2-dev using the standard quantization script:

# Example for FP8 quantization
python examples/diffusers/quantization/quantize.py \
    --model flux-2-dev \
    --format fp8 \
    --batch-size 1 \
    --calib-size 128 \
    --quantized-torch-ckpt-save-path ./flux2_fp8.pt \
    --onnx-dir ./onnx_flux2

Testing

Tested manually by:

successfully quantizing the black-forest-labs/FLUX.2-dev model.
exporting the quantized model to ONNX.
verifying the ONNX graph input shapes match the Flux.2 requirements (specifically img_ids and txt_ids with the last dimension as 4).
successfully building a TensorRT engine using trtexec.

Before your PR is "Ready for review"

Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No (Manual verification performed)
Did you add or update any necessary documentation?: Yes (Updated README.md)
Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: Yes

Additional Information

This PR addresses the support for the Flux.2 architecture, which introduces Flux2ParallelSelfAttention and changes the positional embedding (RoPE) dimension size to 4, requiring specific handling distinct from Flux.1.

copy-pr-bot · 2025-12-17T19:15:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Added support for various quantization features including Flux.2-dev and Transformer Engine. Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

ovuruska marked this pull request as ready for review December 19, 2025 10:24

ovuruska requested review from a team as code owners December 19, 2025 10:24

ovuruska requested a review from ajrasane December 19, 2025 10:24

ovuruska added 11 commits December 22, 2025 16:01

Add Flux2Attention to quantization plugin

7294c6a

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Update diffusers.py

ab959dc

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Add Flux2ParallelSelfAttention to quantization registry

d99c665

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Update quantize.py

ff1e836

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Add 'flux-2-dev' to model and dtype mappings

4977fd5

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Update export.py

2850a5e

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Update README.md

a7d8c75

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Update flux-2-dev model type in diffusion_trt.py

5fcab8b

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Rename FLUX_DEV_2 to FLUX_2_DEV in quantize.py

50fcff8

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Update CHANGELOG with new quantization features

77dca36

Added support for various quantization features including Flux.2-dev and Transformer Engine. Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

Update export.py

61ac2fd

Signed-off-by: Oguz Vuruskaner <ovuruska@outlook.com>

ovuruska force-pushed the main branch from 4f9fb1d to 61ac2fd Compare December 22, 2025 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add support for Flux.2-dev quantization and export #707

feat: Add support for Flux.2-dev quantization and export #707

Uh oh!

ovuruska commented Dec 17, 2025

Uh oh!

copy-pr-bot bot commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add support for Flux.2-dev quantization and export #707

Are you sure you want to change the base?

feat: Add support for Flux.2-dev quantization and export #707

Uh oh!

Conversation

ovuruska commented Dec 17, 2025

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant