[WIP] Grouped GEMM with ck_tile #434

matthiasdiener · 2026-01-28T15:49:27Z

Description

See https://github.com/ROCm/frameworks-internal/issues/13792 for context.

TODOs:

Enable tests in test_numerics.py
Make kernels selectable & tunable
Handle gelu/bias (or make sure these are not passed in)
Performance analysis and improvements
More tests

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

This reverts commit 86fbbac.

…mTest

wangye805 · 2026-02-02T04:17:46Z

tests/pytorch/test_numerics.py

    delay_wgrad_compute,
 ):
    os.environ["NVTE_USE_CUTLASS_GROUPED_GEMM"] = "1"
+    if IS_HIP_EXTENSION:


Is our CK grouped gemm a drop-in replacement with NV upstream CUTLASS grouped gemm? If so, we can share the same env. It's like cublaslt vs hipblaslt...

wangye805 · 2026-02-02T04:22:57Z