[AMD] Add DeepSeek-R1-0528 FP8 MI355X ATOM MTP3 benchmark by seungrokj · Pull Request #1628 · SemiAnalysisAI/InferenceX

seungrokj · 2026-05-31T05:20:58Z

Summary

Update dsr1-fp8-mi355x-atom-mtp image from rocm/atom:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom20260511 to rocm/atom-dev:nightly_202605301523
Source: ATOM upstream benchmark run #26690241645 (all jobs passed)

Performance vs current InferenceX numbers (DeepSeek-R1-0528 fp8, spec_method=mtp, TP=8, 8×MI355X)

isl=1024, osl=1024

Conc	InferenceX tput/GPU	ATOM tput/GPU	Delta
4	67.14	145.20	+116.3%
8	106.22	201.45	+89.7%
16	182.50	384.17	+110.5%
32	303.67	642.73	+111.7%
64	446.65	877.83	+96.5%
128	659.22	1273.90	+93.2%
256	944.95	1805.72	+91.1%

isl=8192, osl=1024

Conc	InferenceX tput/GPU	ATOM tput/GPU	Delta
4	278.04	643.18	+131.3%
8	430.54	955.00	+121.8%
16	701.15	1411.93	+101.4%
32	1078.31	1757.25	+63.0%
64	1449.00	2361.44	+63.0%
128	1895.33	3008.09	+58.7%
256	2307.03	3392.79	+47.1%

Test plan

Verify server starts with new image rocm/atom-dev:nightly_202605301523
Confirm benchmark numbers match expected tput/GPU values above

🤖 Generated with Claude Code

Note

Low Risk
Benchmark config and shell script only; no auth or production serving paths. Main operational impact is longer/higher-concurrency CI sweeps on MI355X.

Overview
Updates the DeepSeek-R1 FP8 MI355X ATOM MTP benchmark to a newer ROCm/ATOM stack and widens the throughput sweep.

For dsr1-fp8-mi355x-atom-mtp in .github/configs/amd-master.yaml, the container moves from rocm/atom:rocm7.2.3_..._atom20260511 to rocm/atom:rocm7.2.4_..._atom0.1.3, and fixed-seq-len search space conc-end rises from 256 to 1024 for both isl=1024 and isl=8192 (TP=8, MTP).

benchmarks/single_node/dsr1_fp8_mi355x_atom_mtp.sh no longer sets --max-model-len from ISL/OSL (empty by default; eval-only still uses EVAL_MAX_MODEL_LEN). Server launch is refactored into PARALLEL_ARGS (TP, optional --enable-dp-attention and --enable-expert-parallel when DP_ATTENTION/EP_SIZE apply) and SPEC_ARGS for MTP (--method mtp --num-speculative-tokens 3).

perf-changelog.yaml records the image bump and reported ~47–131% throughput gains vs prior InferenceX numbers for conc 4–256.

^{Reviewed by Cursor Bugbot for commit c1e8e6f. Bugbot is set up for automated code reviews on this repo. Configure here.}

…1523 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-31T05:21:06Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-31T05:21:06Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-31T05:21:06Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…attn support - Switch image from nightly_202605301523 to stable atom0.1.3 - Expand concurrency search space from 4-256 to 4-1024 - Refactor benchmark script to use PARALLEL_ARGS/SPEC_ARGS pattern - Remove ISL/OSL-based max-model-len calculation - Update perf-changelog image reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor · 2026-05-31T05:32:28Z

+    else #DP+TP
+        PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
+    fi
+fi 


Expert parallel ignores EP_SIZE alone

Medium Severity

The script only adds --enable-expert-parallel when DP_ATTENTION is true and EP_SIZE is greater than 1. The prior logic enabled expert parallel whenever EP_SIZE exceeded 1, matching how master config passes EP_SIZE independently of dp-attn.

^{Reviewed by Cursor Bugbot for commit a251a0d. Configure here.}

github-actions · 2026-05-31T05:32:51Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26704356763
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26704356763

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-31T05:42:50Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26704542048
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26704542048

cursor · 2026-05-31T05:44:17Z

-if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then
-    CALCULATED_MAX_MODEL_LEN=""
-else
-    CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 "


8192 ISL missing max-model-len

Medium Severity

The script no longer sets --max-model-len 10240 for the isl=8192/osl=1024 scenario in amd-master.yaml. Other atom MTP launchers still apply that cap for non-1024 inputs, so long-context runs may use a different KV budget than before and can skew or destabilize high-concurrency sweeps.

^{Reviewed by Cursor Bugbot for commit 4a32700. Configure here.}

cursor · 2026-05-31T05:44:17Z

+    else #DP+TP
+        PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
+    fi
+fi 


EP ignored without DP attention

Medium Severity

--enable-expert-parallel is only added when DP_ATTENTION is true. Config YAML can set EP_SIZE greater than 1 with dp-attn false; the prior script and other atom benchmarks still enable expert parallel from EP_SIZE alone, so those matrix entries would silently run without EP.

^{Reviewed by Cursor Bugbot for commit 4a32700. Configure here.}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 5 total unresolved issues (including 3 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.}

cursor · 2026-05-31T05:47:43Z

+    else #DP+TP
+        PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
+    fi
+fi 


Expert parallel ignores EP_SIZE

Medium Severity

--enable-expert-parallel is only added when DP_ATTENTION is true, so configs with EP_SIZE greater than 1 and dp-attn: false no longer enable expert parallelism on the ATOM server. Other ATOM benchmark scripts gate that flag on EP_SIZE alone, matching how the YAML passes ep.

^{Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.}

cursor · 2026-05-31T05:47:43Z

-if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then
-    CALCULATED_MAX_MODEL_LEN=""
-else
-    CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 "


8k1k missing max model len

Medium Severity

The ISL/OSL branch that passed --max-model-len 10240 for non-1k1k runs was removed, so the isl=8192/osl=1024 scenario in amd-master.yaml starts the server with no explicit context limit while still driving 8192-token inputs. Peer dsr1_fp8_mi355x_atom.sh still sets that flag for the same sequence lengths.

^{Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.}

github-actions · 2026-05-31T09:00:49Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26704598283
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26704598283

github-actions · 2026-05-31T12:26:25Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26704598283
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26704598283

github-actions · 2026-05-31T13:53:54Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26712556288
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26712556288

functionstackx

the veirficiation run was cancelled by AMD engineer, bill, any reason why? @seungrokj

feat(atom/dsr1-fp8-mi355x-mtp): update ATOM image to nightly_20260530…

76a6073

…1523 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

seungrokj requested a review from a team May 31, 2026 05:20

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners May 31, 2026 05:21

github-project-automation Bot added this to InferenceMAX Board May 31, 2026

chore: update perf-changelog pr-link to #1628

621f897

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

seungrokj changed the title ~~feat(atom): update dsr1-fp8-mi355x-atom-mtp image to nightly_202605301523~~ [AMD] Add DeepSeek-R1-0528 FP8 MI355X ATOM MTP3 benchmark May 31, 2026

claude Bot reviewed May 31, 2026

View reviewed changes

Comment thread perf-changelog.yaml

Comment thread perf-changelog.yaml

functionstackx added the full-sweep-enabled label May 31, 2026

cursor Bot reviewed May 31, 2026

View reviewed changes

dsr1_fp8_mi355x_atom_mtp.sh: remove trailing whitespace

4a32700

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed May 31, 2026

View reviewed changes

perf-changelog: fix unclosed quote in dsr1-fp8-mi355x-atom-mtp entry

c1e8e6f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj added the AMD label May 31, 2026

cursor Bot reviewed May 31, 2026

View reviewed changes

functionstackx added full-sweep-enabled and removed full-sweep-enabled labels May 31, 2026

functionstackx requested changes May 31, 2026

View reviewed changes

Conversation

seungrokj commented May 31, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance vs current InferenceX numbers (DeepSeek-R1-0528 fp8, spec_method=mtp, TP=8, 8×MI355X)

isl=1024, osl=1024

isl=8192, osl=1024

Test plan

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot May 31, 2026

Choose a reason for hiding this comment

Expert parallel ignores EP_SIZE alone

Uh oh!

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

cursor Bot May 31, 2026

Choose a reason for hiding this comment

8192 ISL missing max-model-len

Uh oh!

cursor Bot May 31, 2026

Choose a reason for hiding this comment

EP ignored without DP attention

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 31, 2026

Choose a reason for hiding this comment

Expert parallel ignores EP_SIZE

Uh oh!

cursor Bot May 31, 2026

Choose a reason for hiding this comment

8k1k missing max model len

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seungrokj commented May 31, 2026 •

edited by cursor Bot

Loading