[AMD] Add DeepSeek-R1-0528 FP8 MI355X ATOM MTP3 benchmark#1628
[AMD] Add DeepSeek-R1-0528 FP8 MI355X ATOM MTP3 benchmark#1628seungrokj wants to merge 5 commits into
Conversation
…1523 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…attn support - Switch image from nightly_202605301523 to stable atom0.1.3 - Expand concurrency search space from 4-256 to 4-1024 - Refactor benchmark script to use PARALLEL_ARGS/SPEC_ARGS pattern - Remove ISL/OSL-based max-model-len calculation - Update perf-changelog image reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| else #DP+TP | ||
| PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention ) | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Expert parallel ignores EP_SIZE alone
Medium Severity
The script only adds --enable-expert-parallel when DP_ATTENTION is true and EP_SIZE is greater than 1. The prior logic enabled expert parallel whenever EP_SIZE exceeded 1, matching how master config passes EP_SIZE independently of dp-attn.
Reviewed by Cursor Bugbot for commit a251a0d. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26704356763 |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26704542048 |
| if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then | ||
| CALCULATED_MAX_MODEL_LEN="" | ||
| else | ||
| CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 " |
There was a problem hiding this comment.
8192 ISL missing max-model-len
Medium Severity
The script no longer sets --max-model-len 10240 for the isl=8192/osl=1024 scenario in amd-master.yaml. Other atom MTP launchers still apply that cap for non-1024 inputs, so long-context runs may use a different KV budget than before and can skew or destabilize high-concurrency sweeps.
Reviewed by Cursor Bugbot for commit 4a32700. Configure here.
| else #DP+TP | ||
| PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention ) | ||
| fi | ||
| fi |
There was a problem hiding this comment.
EP ignored without DP attention
Medium Severity
--enable-expert-parallel is only added when DP_ATTENTION is true. Config YAML can set EP_SIZE greater than 1 with dp-attn false; the prior script and other atom benchmarks still enable expert parallel from EP_SIZE alone, so those matrix entries would silently run without EP.
Reviewed by Cursor Bugbot for commit 4a32700. Configure here.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 5 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.
| else #DP+TP | ||
| PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention ) | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Expert parallel ignores EP_SIZE
Medium Severity
--enable-expert-parallel is only added when DP_ATTENTION is true, so configs with EP_SIZE greater than 1 and dp-attn: false no longer enable expert parallelism on the ATOM server. Other ATOM benchmark scripts gate that flag on EP_SIZE alone, matching how the YAML passes ep.
Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.
| if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then | ||
| CALCULATED_MAX_MODEL_LEN="" | ||
| else | ||
| CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 " |
There was a problem hiding this comment.
8k1k missing max model len
Medium Severity
The ISL/OSL branch that passed --max-model-len 10240 for non-1k1k runs was removed, so the isl=8192/osl=1024 scenario in amd-master.yaml starts the server with no explicit context limit while still driving 8192-token inputs. Peer dsr1_fp8_mi355x_atom.sh still sets that flag for the same sequence lengths.
Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26704598283 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26704598283 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26712556288 |
functionstackx
left a comment
There was a problem hiding this comment.
the veirficiation run was cancelled by AMD engineer, bill, any reason why? @seungrokj


Summary
dsr1-fp8-mi355x-atom-mtpimage fromrocm/atom:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom20260511torocm/atom-dev:nightly_202605301523Performance vs current InferenceX numbers (DeepSeek-R1-0528 fp8, spec_method=mtp, TP=8, 8×MI355X)
isl=1024, osl=1024
isl=8192, osl=1024
Test plan
rocm/atom-dev:nightly_202605301523🤖 Generated with Claude Code
Note
Low Risk
Benchmark config and shell script only; no auth or production serving paths. Main operational impact is longer/higher-concurrency CI sweeps on MI355X.
Overview
Updates the DeepSeek-R1 FP8 MI355X ATOM MTP benchmark to a newer ROCm/ATOM stack and widens the throughput sweep.
For
dsr1-fp8-mi355x-atom-mtpin.github/configs/amd-master.yaml, the container moves fromrocm/atom:rocm7.2.3_..._atom20260511torocm/atom:rocm7.2.4_..._atom0.1.3, and fixed-seq-len search spaceconc-endrises from 256 to 1024 for bothisl=1024andisl=8192(TP=8, MTP).benchmarks/single_node/dsr1_fp8_mi355x_atom_mtp.shno longer sets--max-model-lenfrom ISL/OSL (empty by default; eval-only still usesEVAL_MAX_MODEL_LEN). Server launch is refactored intoPARALLEL_ARGS(TP, optional--enable-dp-attentionand--enable-expert-parallelwhenDP_ATTENTION/EP_SIZEapply) andSPEC_ARGSfor MTP (--method mtp --num-speculative-tokens 3).perf-changelog.yamlrecords the image bump and reported ~47–131% throughput gains vs prior InferenceX numbers for conc 4–256.Reviewed by Cursor Bugbot for commit c1e8e6f. Bugbot is set up for automated code reviews on this repo. Configure here.