MXFP8 training bug fixes for quantized_model_init and Torch FSDP fp8 all gather by sudhu2k · Pull Request #587 · ROCm/TransformerEngine

sudhu2k · 2026-05-15T17:29:08Z

Description

Ensure keep_fp8_weight_transpose_cache flag is set to True not only for autocast but also for quantized_model_init.
Fix padding during fp8 all-gather

Fixes: #15425
#15420

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…el_init case and not just autocast case. Fix padding during fp8 all-gather

alextmagro · 2026-05-19T15:37:48Z

+            # NOTE: ROCm/HIP backend uses an unpadded scale-inv layout (see `MXFP8Quantizer.make_empty`),
+            # so applying the padding here would produce a per-shard scale-inv whose dim-0
+            # does not match the destination scale-inv allocated for the FSDP2 local shard.
+            padding_multiples = [128, 4] if not IS_HIP_EXTENSION else [1, 1]


I think for gfx1250 we have some other padding requirements, this should be unified with #568

Agreed. These changes should also be present in that PR accordingly. But I think for now, let's fix the issue on existing archs and make the appropriate changes along with the #568 PR.

OK, @matthiasdiener can you work with Sudharshan make sure this is in your PR one way or another?

… on scale_inv_out

alextmagro · 2026-05-19T22:05:31Z

LGTM! Just sync with Matthias on that one padding thing please.

Ensure keep_fp8_weight_transpose_cache is True even for quantized_mod…

0deec6e

…el_init case and not just autocast case. Fix padding during fp8 all-gather

sudhu2k requested review from ipanfilo, wangye805 and wenchenvincent as code owners May 15, 2026 17:29

sudhu2k self-assigned this May 15, 2026

sudhu2k added the ci-level 3 CI test level 3 label May 15, 2026

alextmagro requested changes May 19, 2026

View reviewed changes

Refactor padding logic in MXFP8Tensor to remove unnecessary condition…

d9d7a69

… on scale_inv_out

sudhu2k requested a review from alextmagro May 19, 2026 20:10

alextmagro approved these changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MXFP8 training bug fixes for quantized_model_init and Torch FSDP fp8 all gather#587

MXFP8 training bug fixes for quantized_model_init and Torch FSDP fp8 all gather#587
sudhu2k wants to merge 2 commits into
devfrom
sudhu/mxfp8_bug_fixes

sudhu2k commented May 15, 2026

Uh oh!

alextmagro May 19, 2026

Uh oh!

sudhu2k May 19, 2026

Uh oh!

alextmagro May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

alextmagro commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sudhu2k commented May 15, 2026

Description

Type of change

Checklist:

Uh oh!

alextmagro May 19, 2026

Choose a reason for hiding this comment

Uh oh!

sudhu2k May 19, 2026

Choose a reason for hiding this comment

Uh oh!

alextmagro May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alextmagro commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alextmagro May 19, 2026 •

edited

Loading