ck_tile grouped gemm: more padding by matthiasdiener · Pull Request #574 · ROCm/TransformerEngine

matthiasdiener · 2026-05-05T00:15:07Z

Description

Enabling padding always causes a significant (~15%) reduction in speed, so only enable it when necessary.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…-gemm-padding

aris134 · 2026-05-20T11:52:23Z

+    reason="Only enable CUTLASS/CK grouped gemm on Hopper or ROCm",
+)
+@pytest.mark.parametrize("dtype", [torch.bfloat16, torch.float16], ids=str)
+@pytest.mark.parametrize("layout", ["TN", "NN", "NT"])


Added TT in aee2c4c.

aris134 · 2026-05-20T12:27:42Z

+            k_val = k_aligned
+            m_vals = [m_aligned] * z
+            n_val = unaligned_n
+


Would we want an MKN unaligned test? Would that cover something that isn't included in the current test sweep?

I added an MKN test in aee2c4c.

aris134 · 2026-05-20T12:40:23Z

+        if pad_dim == "K":
+            k_val = unaligned_k
+            m_vals = [m_aligned] * z
+            n_val = n_aligned
+        elif pad_dim == "M":
+            k_val = k_aligned
+            m_vals = unaligned_m
+            n_val = n_aligned
+        elif pad_dim == "MK":
+            k_val = unaligned_k
+            m_vals = unaligned_m
+            n_val = n_aligned
+        else:  # N
+            k_val = k_aligned
+            m_vals = [m_aligned] * z
+            n_val = unaligned_n


Can we factor out this if-elif-elif-else block that seems repeated for each layout?

Restructured this in aee2c4c

aris134 · 2026-05-20T13:15:41Z

    }
-    return launch_grouped_gemm_kernel<Kernel>(descs, ctx, stream_cfg);
+    // Dispatch with B's columnwise buffer as RowMajor (transB=false).
+    GroupedGemmRunContext ctx_nn = ctx;


nit: ctx_nn seems a bit misleading since this only rewrites B as non-transposed via columnwise_data; A can still be T or N. Maybe rename to something like ctx_b_colwise?

Renamed in aee2c4c.

aris134 · 2026-05-20T13:16:20Z

+        grad = True
+        single_output = True
+    else:  # NT
+        # NT GEMM: out[i] = A[i]^T @ B[i], A[i]: (m_i, k), B[i]: (m_i, n), out[i]: (n, k)


nit: this comment is a little confusing. For the grouped path, the user-facing NT inputs are A=(m_i,k), B=(m_i,n), out=(n,k), but normalization swaps operands/layouts before dispatch, so the actual dispatched gemm is B^T @ A = (n,m_i) @ (m_i,k).

Right, I removed this comment in aee2c4c

…-gemm-padding

ck_tile grouped gemm: more padding

95f984c

matthiasdiener requested a review from sudhu2k May 5, 2026 00:15

matthiasdiener self-assigned this May 5, 2026

matthiasdiener requested review from aris134 May 5, 2026 15:36

aris134 reviewed May 6, 2026

View reviewed changes

Comment thread transformer_engine/common/gemm/ck_grouped_gemm/ck_grouped_gemm_fp16.cpp

aris134 reviewed May 6, 2026

View reviewed changes

Comment thread transformer_engine/common/gemm/ck_grouped_gemm/ck_grouped_gemm_common.h Outdated

aris134 reviewed May 6, 2026

View reviewed changes

Comment thread tests/pytorch/test_numerics.py Outdated

aris134 reviewed May 6, 2026

View reviewed changes

Comment thread tests/pytorch/test_numerics.py Outdated

aris134 requested changes May 6, 2026

View reviewed changes

matthiasdiener added the ci-level 1 CI test level 1 label May 15, 2026

matthiasdiener added 3 commits May 15, 2026 19:57

Merge branch 'dev' into mdiener/cktile-grouped-gemm-padding

cfbc537

address review comments

225c3dc

NT workaround, split, address review comments

2939017

matthiasdiener marked this pull request as ready for review May 15, 2026 22:24

matthiasdiener requested review from ipanfilo, wangye805 and wenchenvincent as code owners May 15, 2026 22:24

matthiasdiener requested a review from aris134 May 15, 2026 22:24

aris134 reviewed May 19, 2026

View reviewed changes

Comment thread transformer_engine/common/gemm/ck_grouped_gemm/ck_grouped_gemm_fp16_nn.cpp

aris134 requested changes May 19, 2026

View reviewed changes

matthiasdiener added 2 commits May 19, 2026 15:51

Merge remote-tracking branch 'origin/dev' into mdiener/cktile-grouped…

fa87ccc

…-gemm-padding

factor out templating

01f62d0

matthiasdiener requested a review from aris134 May 19, 2026 19:13

aris134 reviewed May 20, 2026

View reviewed changes

aris134 requested changes May 20, 2026

View reviewed changes

matthiasdiener added 2 commits May 20, 2026 16:59

address review comments, capture fallbacks

aee2c4c

Merge remote-tracking branch 'origin/dev' into mdiener/cktile-grouped…

f830b89

…-gemm-padding

matthiasdiener requested a review from aris134 May 20, 2026 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ck_tile grouped gemm: more padding#574

ck_tile grouped gemm: more padding#574
matthiasdiener wants to merge 8 commits into
devfrom
mdiener/cktile-grouped-gemm-padding

matthiasdiener commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aris134 May 20, 2026

Uh oh!

matthiasdiener May 20, 2026

Uh oh!

aris134 May 20, 2026

Uh oh!

matthiasdiener May 20, 2026

Uh oh!

aris134 May 20, 2026

Uh oh!

matthiasdiener May 20, 2026

Uh oh!

aris134 May 20, 2026

Uh oh!

matthiasdiener May 20, 2026

Uh oh!

aris134 May 20, 2026

Uh oh!

matthiasdiener May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matthiasdiener commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthiasdiener commented May 5, 2026 •

edited

Loading