Skip to content

[AMD] Add DeepSeek-R1-0528 FP8 MI355X ATOM MTP3 benchmark#1628

Open
seungrokj wants to merge 5 commits into
mainfrom
atom/dsr1-fp8-mi355x-atom-mtp-nightly-20260530
Open

[AMD] Add DeepSeek-R1-0528 FP8 MI355X ATOM MTP3 benchmark#1628
seungrokj wants to merge 5 commits into
mainfrom
atom/dsr1-fp8-mi355x-atom-mtp-nightly-20260530

Conversation

@seungrokj
Copy link
Copy Markdown
Collaborator

@seungrokj seungrokj commented May 31, 2026

Summary

  • Update dsr1-fp8-mi355x-atom-mtp image from rocm/atom:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom20260511 to rocm/atom-dev:nightly_202605301523
  • Source: ATOM upstream benchmark run #26690241645 (all jobs passed)

Performance vs current InferenceX numbers (DeepSeek-R1-0528 fp8, spec_method=mtp, TP=8, 8×MI355X)

isl=1024, osl=1024

Conc InferenceX tput/GPU ATOM tput/GPU Delta
4 67.14 145.20 +116.3%
8 106.22 201.45 +89.7%
16 182.50 384.17 +110.5%
32 303.67 642.73 +111.7%
64 446.65 877.83 +96.5%
128 659.22 1273.90 +93.2%
256 944.95 1805.72 +91.1%

isl=8192, osl=1024

Conc InferenceX tput/GPU ATOM tput/GPU Delta
4 278.04 643.18 +131.3%
8 430.54 955.00 +121.8%
16 701.15 1411.93 +101.4%
32 1078.31 1757.25 +63.0%
64 1449.00 2361.44 +63.0%
128 1895.33 3008.09 +58.7%
256 2307.03 3392.79 +47.1%

Test plan

  • Verify server starts with new image rocm/atom-dev:nightly_202605301523
  • Confirm benchmark numbers match expected tput/GPU values above

🤖 Generated with Claude Code


Note

Low Risk
Benchmark config and shell script only; no auth or production serving paths. Main operational impact is longer/higher-concurrency CI sweeps on MI355X.

Overview
Updates the DeepSeek-R1 FP8 MI355X ATOM MTP benchmark to a newer ROCm/ATOM stack and widens the throughput sweep.

For dsr1-fp8-mi355x-atom-mtp in .github/configs/amd-master.yaml, the container moves from rocm/atom:rocm7.2.3_..._atom20260511 to rocm/atom:rocm7.2.4_..._atom0.1.3, and fixed-seq-len search space conc-end rises from 256 to 1024 for both isl=1024 and isl=8192 (TP=8, MTP).

benchmarks/single_node/dsr1_fp8_mi355x_atom_mtp.sh no longer sets --max-model-len from ISL/OSL (empty by default; eval-only still uses EVAL_MAX_MODEL_LEN). Server launch is refactored into PARALLEL_ARGS (TP, optional --enable-dp-attention and --enable-expert-parallel when DP_ATTENTION/EP_SIZE apply) and SPEC_ARGS for MTP (--method mtp --num-speculative-tokens 3).

perf-changelog.yaml records the image bump and reported ~47–131% throughput gains vs prior InferenceX numbers for conc 4–256.

Reviewed by Cursor Bugbot for commit c1e8e6f. Bugbot is set up for automated code reviews on this repo. Configure here.

…1523

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@seungrokj seungrokj changed the title feat(atom): update dsr1-fp8-mi355x-atom-mtp image to nightly_202605301523 [AMD] Add DeepSeek-R1-0528 FP8 MI355X ATOM MTP3 benchmark May 31, 2026
Comment thread perf-changelog.yaml
Comment thread perf-changelog.yaml
…attn support

- Switch image from nightly_202605301523 to stable atom0.1.3
- Expand concurrency search space from 4-256 to 4-1024
- Refactor benchmark script to use PARALLEL_ARGS/SPEC_ARGS pattern
- Remove ISL/OSL-based max-model-len calculation
- Update perf-changelog image reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread benchmarks/single_node/dsr1_fp8_mi355x_atom_mtp.sh
else #DP+TP
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
fi
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert parallel ignores EP_SIZE alone

Medium Severity

The script only adds --enable-expert-parallel when DP_ATTENTION is true and EP_SIZE is greater than 1. The prior logic enabled expert parallel whenever EP_SIZE exceeded 1, matching how master config passes EP_SIZE independently of dp-attn.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a251a0d. Configure here.

Comment thread benchmarks/single_node/dsr1_fp8_mi355x_atom_mtp.sh
@github-actions
Copy link
Copy Markdown
Contributor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then
CALCULATED_MAX_MODEL_LEN=""
else
CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 "
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8192 ISL missing max-model-len

Medium Severity

The script no longer sets --max-model-len 10240 for the isl=8192/osl=1024 scenario in amd-master.yaml. Other atom MTP launchers still apply that cap for non-1024 inputs, so long-context runs may use a different KV budget than before and can skew or destabilize high-concurrency sweeps.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4a32700. Configure here.

else #DP+TP
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
fi
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EP ignored without DP attention

Medium Severity

--enable-expert-parallel is only added when DP_ATTENTION is true. Config YAML can set EP_SIZE greater than 1 with dp-attn false; the prior script and other atom benchmarks still enable expert parallel from EP_SIZE alone, so those matrix entries would silently run without EP.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4a32700. Configure here.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@seungrokj seungrokj added the AMD label May 31, 2026
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 5 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.

else #DP+TP
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
fi
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert parallel ignores EP_SIZE

Medium Severity

--enable-expert-parallel is only added when DP_ATTENTION is true, so configs with EP_SIZE greater than 1 and dp-attn: false no longer enable expert parallelism on the ATOM server. Other ATOM benchmark scripts gate that flag on EP_SIZE alone, matching how the YAML passes ep.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.

if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then
CALCULATED_MAX_MODEL_LEN=""
else
CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 "
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8k1k missing max model len

Medium Severity

The ISL/OSL branch that passed --max-model-len 10240 for non-1k1k runs was removed, so the isl=8192/osl=1024 scenario in amd-master.yaml starts the server with no explicit context limit while still driving 8192-token inputs. Peer dsr1_fp8_mi355x_atom.sh still sets that flag for the same sequence lengths.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Collaborator

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the veirficiation run was cancelled by AMD engineer, bill, any reason why? @seungrokj

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants