tune triton gemm kernel for MI355 DSV3 DP+EP configuration by inkcherry · Pull Request #2016 · ROCm/aiter

inkcherry · 2026-02-10T04:41:11Z

Decode Side. cc @Duyi-Wang

M	N	K	Time_old (ms)	Time_new (ms)	speedup (old/new)	delta (new-old, ms)
16	2112	7168	0.055951	0.056129	0.997	0.000178
32	2112	7168	0.048471	0.048778	0.994	0.000307
64	2112	7168	0.046766	0.047719	0.980	0.000953
128	2112	7168	0.047608	0.047576	1.001	-0.000032
256	2112	7168	0.097757	0.046720	2.092	-0.051037
16	4096	7168	0.055328	0.054776	1.010	-0.000552
32	4096	7168	0.057020	0.054329	1.050	-0.002691
64	4096	7168	0.055331	0.056177	0.985	0.000846
128	4096	7168	0.049443	0.048664	1.016	-0.000779
256	4096	7168	0.109374	0.046923	2.331	-0.062451
16	7168	16384	0.183842	0.055107	3.336	-0.128735
32	7168	16384	0.192137	0.053328	3.603	-0.138809
64	7168	16384	0.200794	0.048574	4.134	-0.152220
128	7168	16384	0.220417	0.052922	4.165	-0.167495
256	7168	16384	0.208873	0.085082	2.455	-0.123791
16	7168	2048	0.055840	0.055515	1.006	-0.000325
32	7168	2048	0.056272	0.024918	2.258	-0.031354
64	7168	2048	0.024018	0.023520	1.021	-0.000498
128	7168	2048	0.020884	0.021803	0.958	0.000919
256	7168	2048	0.032093	0.020644	1.555	-0.011449
16	16384	1536	0.024423	0.020252	1.206	-0.004171
32	16384	1536	0.025440	0.020578	1.236	-0.004862
64	16384	1536	0.025345	0.018211	1.392	-0.007134
128	16384	1536	0.026026	0.018262	1.425	-0.007764
256	16384	1536	0.026593	0.022947	1.159	-0.003646

Copilot

Pull request overview

Updates Triton GEMM tuning configs for gfx950 (MI355) targeting the A8W8 blocks-scale path for specific (N, K) shapes and small-M specializations.

Changes:

Add new tuned config files for (N=7168, K=16384) and (N=16384, K=1536).
Refine existing per-M-threshold tuning parameters (block sizes, warps/stages, waves_per_eu, k-splitting).
Add/adjust ultra-small M special-cases (e.g., M_LEQ_8) for some shapes.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
aiter/ops/triton/configs/gemm/gfx950-GEMM-A8W8_BLOCKSCALE-N=7168-K=2048.json	Retunes per-M configs and adds `M_LEQ_8` specialization for K=2048.
aiter/ops/triton/configs/gemm/gfx950-GEMM-A8W8_BLOCKSCALE-N=7168-K=16384.json	Adds a new tuned config for larger K=16384.
aiter/ops/triton/configs/gemm/gfx950-GEMM-A8W8_BLOCKSCALE-N=4096-K=7168.json	Retunes per-M configs and adds `M_LEQ_8` specialization for N=4096/K=7168.
aiter/ops/triton/configs/gemm/gfx950-GEMM-A8W8_BLOCKSCALE-N=2112-K=7168.json	Retunes per-M configs, adds `M_LEQ_8`, and introduces `M_LEQ_256`.
aiter/ops/triton/configs/gemm/gfx950-GEMM-A8W8_BLOCKSCALE-N=16384-K=1536.json	Adds a new tuned config for N=16384/K=1536.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T04:41:40Z

aiter/ops/triton/configs/gemm/gfx950-GEMM-A8W8_BLOCKSCALE-N=7168-K=2048.json

+  "any": {
+    "BLOCK_SIZE_M": 64,
+    "BLOCK_SIZE_N": 64,
+    "BLOCK_SIZE_K": 128,
+    "GROUP_SIZE_M": 8,
+    "num_warps": 4,
+    "num_stages": 2,
+    "waves_per_eu": 1,
+    "matrix_instr_nonkdim": 16,
+    "cache_modifier": null,
+    "NUM_KSPLIT": 1
+  }


This JSON appears to be missing the final closing brace for the root object. The last line closes the any object, but there is no subsequent } to close the top-level {, making the file invalid JSON. Add a final } at end-of-file.

tune mi355 dsv3 dp+ep gemm

b73ff49

inkcherry requested review from a team and Copilot February 10, 2026 04:41

Copilot AI reviewed Feb 10, 2026

View reviewed changes

inkcherry added 2 commits February 11, 2026 09:38

update

bb1e79b

update

2311a38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tune triton gemm kernel for MI355 DSV3 DP+EP configuration#2016

tune triton gemm kernel for MI355 DSV3 DP+EP configuration#2016
inkcherry wants to merge 3 commits intoROCm:mainfrom
inkcherry:tune

inkcherry commented Feb 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

inkcherry commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

inkcherry commented Feb 10, 2026 •

edited

Loading