Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1197,7 +1197,7 @@ dsr1-fp8-mi355x-atom:
- { tp: 8, conc-start: 4, conc-end: 128 }

dsr1-fp8-mi355x-atom-mtp:
image: rocm/atom:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom20260511
image: rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3
model: deepseek-ai/DeepSeek-R1-0528
model-prefix: dsr1
runner: mi355x
Expand All @@ -1209,11 +1209,11 @@ dsr1-fp8-mi355x-atom-mtp:
- isl: 1024
osl: 1024
search-space:
- { tp: 8, conc-start: 4, conc-end: 256, spec-decoding: mtp }
- { tp: 8, conc-start: 4, conc-end: 1024, spec-decoding: mtp }
- isl: 8192
osl: 1024
search-space:
- { tp: 8, conc-start: 4, conc-end: 256, spec-decoding: mtp }
- { tp: 8, conc-start: 4, conc-end: 1024, spec-decoding: mtp }

dsr1-fp8-mi355x-sglang-disagg:
image: rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-0227-2
Expand Down
30 changes: 14 additions & 16 deletions benchmarks/single_node/dsr1_fp8_mi355x_atom_mtp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,23 +24,22 @@ PORT=${PORT:-8888}

export OMP_NUM_THREADS=1

# Calculate max-model-len based on ISL and OSL
if [ "$ISL" = "1024" ] && [ "$OSL" = "1024" ]; then
CALCULATED_MAX_MODEL_LEN=""
else
CALCULATED_MAX_MODEL_LEN=" --max-model-len 10240 "
Comment thread
seungrokj marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8192 ISL missing max-model-len

Medium Severity

The script no longer sets --max-model-len 10240 for the isl=8192/osl=1024 scenario in amd-master.yaml. Other atom MTP launchers still apply that cap for non-1024 inputs, so long-context runs may use a different KV budget than before and can skew or destabilize high-concurrency sweeps.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4a32700. Configure here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8k1k missing max model len

Medium Severity

The ISL/OSL branch that passed --max-model-len 10240 for non-1k1k runs was removed, so the isl=8192/osl=1024 scenario in amd-master.yaml starts the server with no explicit context limit while still driving 8192-token inputs. Peer dsr1_fp8_mi355x_atom.sh still sets that flag for the same sequence lengths.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.

fi

CALCULATED_MAX_MODEL_LEN=""
if [ "${EVAL_ONLY}" = "true" ]; then
setup_eval_context
CALCULATED_MAX_MODEL_LEN=" --max-model-len $EVAL_MAX_MODEL_LEN "
fi

if [ "$EP_SIZE" -gt 1 ]; then
EP=" --enable-expert-parallel"
else
EP=" "
fi
PARALLEL_ARGS=(-tp "$TP") #TP
if [ "$DP_ATTENTION" = "true" ]; then
if [ "$EP_SIZE" -gt 1 ]; then #DP+EP
PARALLEL_ARGS=(-tp "$TP" --enable-expert-parallel --enable-dp-attention )
else #DP+TP
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
fi
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert parallel ignores EP_SIZE alone

Medium Severity

The script only adds --enable-expert-parallel when DP_ATTENTION is true and EP_SIZE is greater than 1. The prior logic enabled expert parallel whenever EP_SIZE exceeded 1, matching how master config passes EP_SIZE independently of dp-attn.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a251a0d. Configure here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EP ignored without DP attention

Medium Severity

--enable-expert-parallel is only added when DP_ATTENTION is true. Config YAML can set EP_SIZE greater than 1 with dp-attn false; the prior script and other atom benchmarks still enable expert parallel from EP_SIZE alone, so those matrix entries would silently run without EP.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4a32700. Configure here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert parallel ignores EP_SIZE

Medium Severity

--enable-expert-parallel is only added when DP_ATTENTION is true, so configs with EP_SIZE greater than 1 and dp-attn: false no longer enable expert parallelism on the ATOM server. Other ATOM benchmark scripts gate that flag on EP_SIZE alone, matching how the YAML passes ep.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c1e8e6f. Configure here.


SPEC_ARGS=(--method mtp --num-speculative-tokens 3 )

# Start GPU monitoring (power, temperature, clocks every second)
start_gpu_monitor
Expand All @@ -50,10 +49,9 @@ set -x
python3 -m atom.entrypoints.openai_server \
--model $MODEL \
--server-port $PORT \
-tp $TP \
--kv_cache_dtype fp8 $CALCULATED_MAX_MODEL_LEN $EP \
--method mtp \
Comment thread
seungrokj marked this conversation as resolved.
--num-speculative-tokens 3 \
"${PARALLEL_ARGS[@]}" \
"${SPEC_ARGS[@]}" \
--kv_cache_dtype fp8 $CALCULATED_MAX_MODEL_LEN \
> $SERVER_LOG 2>&1 &

SERVER_PID=$!
Expand Down
8 changes: 8 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3330,6 +3330,14 @@
- "Update vLLM ROCm image from nightly-4f940896a32c9e2a0eba7f50d521bf5f6b4de458 to v0.22.0"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1624

- config-keys:
- dsr1-fp8-mi355x-atom-mtp
description:
- "Update ATOM image to rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3"
- "isl=1024/osl=1024: +47% to +116% improvement across conc 4-256 vs prior InferenceX numbers"
Comment thread
seungrokj marked this conversation as resolved.
- "isl=8192/osl=1024: +47% to +131% improvement across conc 4-256"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1628
Comment thread
seungrokj marked this conversation as resolved.

- config-keys:
- kimik2.5-fp4-mi355x-vllm
description:
Expand Down
Loading