Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
148 commits
Select commit Hold shift + click to select a range
846af91
mi355x kimi-fp4 agentic: switch from SimpleCPUOffloadConnector to Off…
cquil11 May 17, 2026
9996180
dsv4-fp4-b200-vllm-agentic: bump image to cquil v0.21.0 custom build
cquil11 May 17, 2026
aae82c0
Add dsv4-fp4-mi355x-sglang-agentic config + launcher
cquil11 May 17, 2026
8feadd4
dsv4-fp4-b200-vllm-agentic: drop docker.io/ prefix from image
cquil11 May 17, 2026
5e1ca4e
Add dsv4-fp4-gb300-dynamo-vllm-agentic with local-recipe overlay
cquil11 May 18, 2026
3eb9cbf
gb300 agentic recipes: quote PORT as string for fork srtctl schema
cquil11 May 18, 2026
7b3756e
launch_gb300-cw.sh: mirror IS_AGENTIC branch from launch_gb300-nv.sh
cquil11 May 18, 2026
2ae4bf9
gb300 agentic launchers: use upstream NVIDIA/srt-slurm + fix venv pip
cquil11 May 18, 2026
b858480
gb300 launchers: use real upstream srt-slurm SHA (was fabricated)
cquil11 May 18, 2026
893f5b8
gb300 agentic: strip chat parser flags from worker config + harden cw…
cquil11 May 18, 2026
43b3a05
gb300-nv launcher: point dsv4 MODEL_PATH at the real shared NFS path
cquil11 May 18, 2026
4195071
gb300-nv launcher: switch dsv4 MODEL_PATH to /data/ mount to dodge NF…
cquil11 May 18, 2026
948eaa5
gb300 agentic launchers: pin to fork branch with --mem=0 patch
cquil11 May 18, 2026
a3512cb
gb300-nv launcher: move squash files to /data/ mount (same NFS ELOOP)
cquil11 May 18, 2026
52af9d4
gb300 agentic: set --mem=0 via recipe srun_options (canonical mechanism)
cquil11 May 18, 2026
3274dea
gb300 agentic: add sbatch_directives.mem=0 (the missing layer)
cquil11 May 18, 2026
92d2738
gb300 agentic: add sbatch_directives.cpus-per-task=72 (fix etcd starv…
cquil11 May 18, 2026
1614e7f
gb300 agentic: pin to nv-only + try /scratch model path
cquil11 May 18, 2026
4ff2e50
gb300-nv agentic: clone cquil11 fork + pass --no-preflight
cquil11 May 18, 2026
a3d946c
gb300 agentic: wire aiperf mmap dataset cache
cquil11 May 18, 2026
7530760
bump aiperf submodule: sync with ai-dynamo/aiperf PR #875
cquil11 May 18, 2026
0678059
agentic: install git on-demand for aiperf editable install
cquil11 May 18, 2026
62ef027
agentic: switch to no-subagents loader + sudo git install for non-roo…
cquil11 May 18, 2026
18bc0bc
agentic: drop -e from aiperf install (sidesteps git + userns-remap)
cquil11 May 18, 2026
ea13e41
agentic: simplify git install to bare apt-get update && install; keep -e
cquil11 May 18, 2026
3f4b095
gb300-nv agentic: add srun_options.container-remap-root
cquil11 May 18, 2026
482348c
gb300-nv launcher: bump srt-slurm SHA to include benchmark_stage fix
cquil11 May 18, 2026
dac50f7
bump aiperf submodule: hang fix on cancel path
cquil11 May 18, 2026
e575981
runners(gb300): snapshot server-log tarball on script EXIT (handle ca…
cquil11 May 18, 2026
609b74d
agentic: bump --failed-request-threshold 0.05 -> 0.20
cquil11 May 18, 2026
afacd5b
bump aiperf submodule: quieter warnings + tqdm in non-tty
cquil11 May 19, 2026
4c9a4b5
benchmark_lib: disable failed-request threshold (1.0) for capacity-bo…
cquil11 May 19, 2026
b2ffd9b
launch_gb300-nv: snapshot server logs BEFORE rm -rf outputs
cquil11 May 19, 2026
48f151e
bump aiperf to a6812b03: fix UIType.TQDM crash
cquil11 May 19, 2026
a4ee9a7
bump aiperf to 2f30ea86: revert TQDM + warning-downgrade changes
cquil11 May 19, 2026
329d168
agentic recipes: raise NATS max_payload from 1MiB default to 32MiB
cquil11 May 19, 2026
f8b85c9
bump aiperf to 61a9ed80: per-lane start-token counts in TrajectorySou…
cquil11 May 19, 2026
fa28004
add dsv4-fp4-gb300-cw-dynamo-vllm-agentic — CoreWeave sibling config
cquil11 May 19, 2026
4a46881
bump aiperf to a2b9d6b5: cc-traces dataset 051226 -> 051826 (98 traces)
cquil11 May 19, 2026
20d4dd8
bump aiperf to 90c93aba: revert per-lane start-token logging
cquil11 May 19, 2026
21f71b6
bump aiperf to a61553fd: drop preemptions from realtime log
cquil11 May 19, 2026
6d10eaf
b200/b300 vllm-agentic: no-offload curves vs new cc-traces 051826
cquil11 May 19, 2026
2ce6131
launch_b300-nv: drop nonexistent b300-020 from salloc nodelist
cquil11 May 19, 2026
c2c04df
launch_b300-nv: add --container-remap-root to enable apt-get inside c…
cquil11 May 19, 2026
a70f1ba
remove utils/trace-replay submodule
cquil11 May 19, 2026
6558228
remove trace-replay references; standardize on aiperf_artifacts
cquil11 May 19, 2026
3af753b
update aiperf submodule branch tracking to cjq/agentx-v0.3
cquil11 May 19, 2026
aa7348f
track aiperf submodule on cjq/agentx-v0.3-subagents
cquil11 May 20, 2026
3273663
chore: update aiperf tiered subagent joins
cquil11 May 20, 2026
1722c11
chore: update aiperf tiered join docs
cquil11 May 20, 2026
9d94969
chore: update aiperf idle gap cap
cquil11 May 20, 2026
dc35e35
chore: update aiperf idle gap cap precedence
cquil11 May 20, 2026
da16c0b
chore: update aiperf idle gap semantics
cquil11 May 20, 2026
bd290a0
chore: update aiperf join examples
cquil11 May 20, 2026
9ea7370
benchmarks(agentic): switch to with-subagents corpus + idle-gap cap
cquil11 May 20, 2026
a2707d4
benchmarks(agentic): trim workload distribution analyzer to ISL/OSL only
cquil11 May 20, 2026
8a267b7
benchmarks(agentic): restore generate_aiperf_plots.py for server-metr…
cquil11 May 20, 2026
a258f90
benchmarks(agentic): drop conc=96,128 from b200 dsv4 vllm agentic sweep
cquil11 May 20, 2026
d79bc5f
benchmarks(agentic): fix generate_aiperf_plots.py artifact dir lookup
cquil11 May 20, 2026
5c15fa9
chore: bump aiperf submodule to de702eaf (mmap cache hardlink)
cquil11 May 21, 2026
c149b9d
feat: add lmcache mp agentic offload
cquil11 May 21, 2026
ed79577
fix: run lmcache on dsv4 tep agentic
cquil11 May 21, 2026
01ed357
fix: clean lmcache agentic startup logs
cquil11 May 21, 2026
21ed1eb
fix: disable lmcache dsv4 offload
cquil11 May 21, 2026
907ad2e
switch to native offloading
cquil11 May 21, 2026
4abc590
switch to native offloading
cquil11 May 21, 2026
3d7bfe2
fix: size native dsv4 offload to 2.8tb
cquil11 May 21, 2026
ad505ff
switch to native offloading
cquil11 May 21, 2026
1cede80
switch to native offloading
cquil11 May 21, 2026
99cd035
benchmarks(agentic): drop dsv4 b200 native offload from 2.8TB to 1.2TB
cquil11 May 21, 2026
b07bd58
feat(agentic): add Kimi LMCache offload coverage
cquil11 May 21, 2026
327c4d9
feat(agentic): add Qwen SGLang HiCache starts
cquil11 May 21, 2026
6b87f49
fix(agentic): size SGLang HiCache per rank
cquil11 May 21, 2026
9e6e81a
fix(agentic): tune B300 HiCache defaults
cquil11 May 21, 2026
1a300d3
fix(agentic): tune MI355X HiCache defaults
cquil11 May 21, 2026
859aec5
fix(agentic): cap Kimi LMCache CPU pool per rank
cquil11 May 21, 2026
5398ba9
fix(agentic): cap MI355X HiCache per-rank memory
cquil11 May 21, 2026
9a8f89c
fix(agentic): skip MI355X HiCache server warmup
cquil11 May 21, 2026
9f3fb05
fix(agentic): skip non-finite SGLang metrics
cquil11 May 21, 2026
5ebf81f
fix(agentic): size Qwen HiCache host pools
cquil11 May 21, 2026
dbfbd56
fix(agentic): cap replay contexts to server window
cquil11 May 21, 2026
8fa3c96
fix(agentic): cap MI355X HiCache graph capture
cquil11 May 22, 2026
83fa8ec
fix(matrix): apply runner filter to agentic configs
cquil11 May 22, 2026
f999fef
fix(config): use registered MI355X runner labels
cquil11 May 22, 2026
afaec72
fix(agentic): use direct HiCache copies for Qwen MI355X
cquil11 May 22, 2026
1e730d7
mi355x qwen sgl offload
cquil11 May 22, 2026
e29fb3b
fix(agentic): use LMCache MP for Kimi B200
cquil11 May 22, 2026
bb64d3e
feat(agentic): add LMCache MP for Kimi MI355X
cquil11 May 22, 2026
91b24b5
mi355x qwen sgl offload
cquil11 May 22, 2026
5a3cd6a
fix(agentic): avoid CUDA NIXL import on MI355X LMCache
cquil11 May 22, 2026
4fec279
chore: bump aiperf submodule to 5b3db5a2 (merge PR #2)
cquil11 May 22, 2026
4a51237
fix(agentic): fail replay above 10 percent request errors
cquil11 May 22, 2026
4aeb164
benchmarks(agentic): retarget HF dataset constants to with-subagents-…
cquil11 May 22, 2026
36cb524
fix(agentic): propagate replay failures
cquil11 May 22, 2026
10222f4
fix(agentic): remove CUDA LMCache deps on ROCm
cquil11 May 22, 2026
8f01cb4
fix(agentic): keep LMCache cupy deps on ROCm
cquil11 May 22, 2026
265fc75
fix(agentic): use ROCm CuPy for Kimi LMCache MP
cquil11 May 22, 2026
f34e024
fix(agentic): add ROCm LMCache MP block fallback
cquil11 May 22, 2026
20d6508
fix(agentic): defer ROCm LMCache pinned expansion
cquil11 May 22, 2026
0103241
fix(agentic): lazily patch ROCm LMCache allocator
cquil11 May 22, 2026
5db2668
fix(agentic): avoid partial LMCache import patching
cquil11 May 22, 2026
5819b31
fix(agentic): filter Kimi MI355X replay context
cquil11 May 22, 2026
165d41c
fix(agentic): normalize Kimi MI355X max context
cquil11 May 22, 2026
229d541
fix(agentic): update AIPerf replay metadata
cquil11 May 22, 2026
81fd6bf
fix(agentic): refresh AIPerf mmap cache schema
cquil11 May 22, 2026
e80a843
fix(agentic): carry AIPerf prefix metadata
cquil11 May 22, 2026
69cdbc2
fix(agentic): use final LMCache capacity on ROCm
cquil11 May 22, 2026
03a85ab
fix(agentic): extend Kimi MI355X LMCache read lease
cquil11 May 22, 2026
4941697
feat: Kimi-K2.5-MXFP4 LMCache MP offloading for MI355X agentic benchm…
andyluo7 May 26, 2026
9e41c1a
chore(agentx): update aiperf prefix cache metric
cquil11 May 26, 2026
380dcd7
fix(agentx): refresh aiperf mmap cache schema
cquil11 May 26, 2026
bc41a72
fix(agentx): carry prefix counts into mmap metadata
cquil11 May 26, 2026
60fcd42
fix(agentx): default to pre-canned assistant replay
cquil11 May 26, 2026
81d381d
dsv4
cquil11 May 26, 2026
8724609
dsv4
cquil11 May 26, 2026
06c606a
dsv4
cquil11 May 26, 2026
7ad7dd4
fix(agentx): update aiperf realtime cache metrics
cquil11 May 26, 2026
5403a6b
testing kimi
cquil11 May 26, 2026
1c6d297
testing kimi
cquil11 May 26, 2026
acc2c73
chore(aiperf): bump submodule for unique_in_srv realtime metric
cquil11 May 26, 2026
967c50c
runners(h200-dgxc-slurm): remap container UID to root to match b200-dgxc
cquil11 May 26, 2026
4be3ef0
fix(agentx): re-enable weka live assistant replay
cquil11 May 26, 2026
8eec0d4
benchmarks(single_node): move fixed-seq-len scripts into fixed_seq_le…
cquil11 May 27, 2026
f89cdfe
Merge origin/main into chore/agentx-v0.3 with fixed_seq_len/ reorg fi…
cquil11 May 27, 2026
049a873
chore: update agentx v0.3 aiperf
cquil11 May 27, 2026
711cb85
chore: update agentx weka dataset
cquil11 May 27, 2026
284cfa5
chore: update agentx snapshot logging
cquil11 May 27, 2026
1b41cd0
benchmarks: drop redundant ${VAR:-default} defaults from recipe scripts
cquil11 May 27, 2026
a98fcaa
runners(h200-{nb,cw}): wire AIPERF mmap cache mount + env
cquil11 May 27, 2026
e1e4d44
benchmarks(agentic): add WEKA_LOADER_OVERRIDE; switch minimax to 256k…
cquil11 May 27, 2026
4e62c59
benchmarks: retarget WEKA_LOADER_OVERRIDE 256k variant to 052726-256k
cquil11 May 27, 2026
eab58e9
utils(proxy_to_weka): drop exact-duplicate rows in load_session_rows
cquil11 May 27, 2026
88a1153
nvidia-master(kimik2.5-fp4-b200-vllm-agentic): bump vLLM v0.20.2 -> v…
cquil11 May 27, 2026
72cf856
feat(agentic): add qwen3.5-fp8-h100-sglang-agentic recipe
cquil11 May 27, 2026
3406355
runners(h100-dgxc-slurm): wire AIPERF mmap cache mount + env
cquil11 May 27, 2026
4933cf3
chore(aiperf): bump submodule for SGLang realtime srv-row fallbacks
cquil11 May 27, 2026
6d884b9
chore(aiperf): bump submodule for _total counter-lookup fix
cquil11 May 27, 2026
77e648d
agentic(sglang): drop --disable-radix-cache from every recipe
cquil11 May 27, 2026
b27295c
chore(aiperf): bump submodule for SGLang counter-pair cache hit rate
cquil11 May 27, 2026
842a0cf
testing qwen
cquil11 May 28, 2026
5d10625
testing qwen
cquil11 May 28, 2026
717385a
testing qwen
cquil11 May 28, 2026
6a77acb
testing qwen
cquil11 May 28, 2026
c00454e
chore(aiperf): bump submodule for weka_trace id()-keyed dict fix
cquil11 May 28, 2026
ae8ba76
chore(aiperf): bump submodule for parallel reconstruction dropped-sub…
cquil11 May 28, 2026
bcf338c
chore(aiperf): bump submodule for mmap-cache stale-lock bypass
cquil11 May 28, 2026
0e8ac92
testing qwen
cquil11 May 28, 2026
57fdef7
chore(aiperf): bump submodule for snapshot warmup fix
cquil11 May 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 56 additions & 13 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,21 @@ qwen3.5-fp8-mi355x-sglang-agentic:
search-space:
- { tp: 8, ep: 1, offloading: none, conc-list: [1, 2, 4, 8, 16, 32] }

qwen3.5-fp8-mi355x-sglang-agentic-hicache:
image: lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260521
model: Qwen/Qwen3.5-397B-A17B-FP8
model-prefix: qwen3.5
runner: mi355x
precision: fp8
framework: sglang
multinode: false
scenarios:
agentic-coding:
- duration: 1800
search-space:
- { tp: 8, ep: 1, offloading: none, conc-list: [1, 2, 4, 8, 16, 32] }
- { tp: 8, ep: 1, offloading: hicache, conc-list: [16, 32, 48, 64] }

qwen3.5-fp8-mi355x-atom:
image: rocm/atom:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom20260511
model: Qwen/Qwen3.5-397B-A17B-FP8
Expand Down Expand Up @@ -653,10 +668,6 @@ kimik2.5-fp4-mi355x-vllm:
# its fixed-seq-len sweep is unaffected.
# - image: 'vllm/vllm-openai-rocm:v0.18.0' -> 'vllm/vllm-openai-rocm:v0.21.0'
kimik2.5-fp4-mi355x-vllm-agentic:
# v0.21.0 (released 2026-05-14) supersedes the prior nightly pin
# (51f22dcf...) which was carrying the SimpleCPUOffloadConnector ROCm
# cpu_offload_blocks > 0 fix. v0.21.0 is much newer than that fix and
# includes all subsequent ROCm offload work.
image: vllm/vllm-openai-rocm:v0.21.0
model: amd/Kimi-K2.5-MXFP4
model-prefix: kimik2.5
Expand All @@ -669,16 +680,9 @@ kimik2.5-fp4-mi355x-vllm-agentic:
- duration: 1800
search-space:
- { tp: 8, offloading: none, conc-list: [1, 2, 4, 8, 16, 24, 32, 40, 48] }
# CPU offload only above the KV cliff. Lower concurrencies fit
# entirely on-GPU, so paying the offload-path overhead there would
# just slow them down without measuring anything new.
- { tp: 8, offloading: cpu, conc-list: [32, 40, 48, 56] }
# TP=4 probe: half-node layout doubles per-GPU weight footprint
# (~62 GB on MI355X's 288 GB HBM, plenty of headroom). Restrict to
# cliff-region concurrencies on both offload modes so we can directly
# compare TP=4 vs TP=8 at the same conc points.
- { tp: 8, offloading: lmcache, conc-list: [32, 40, 48, 56] }
- { tp: 4, offloading: none, conc-list: [16, 24, 32, 40] }
- { tp: 4, offloading: cpu, conc-list: [16, 24, 32, 40] }
- { tp: 4, offloading: lmcache, conc-list: [16, 24, 32, 40] }
Comment thread
cursor[bot] marked this conversation as resolved.

kimik2.5-fp4-mi355x-atom:
image: rocm/atom:rocm7.2.3_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom20260511
Expand All @@ -701,6 +705,22 @@ kimik2.5-fp4-mi355x-atom:
- { tp: 8, conc-start: 4, conc-end: 128 }
- { tp: 4, conc-start: 4, conc-end: 128 }

dsv4-fp4-mi355x-vllm-agentic:
image: vllm/vllm-openai-rocm:v0.21.0
model: deepseek-ai/DeepSeek-V4-Pro
model-prefix: dsv4
runner: mi355x
precision: fp4
framework: vllm
multinode: false
scenarios:
agentic-coding:
- duration: 1800
search-space:
- { tp: 8, offloading: none, conc-list: [1, 2, 4] }
- { tp: 4, offloading: none, conc-list: [1, 2, 4, 8, 10, 12, 16] }
Comment thread
cursor[bot] marked this conversation as resolved.
- { tp: 4, ep: 4, dp-attn: true, offloading: none, conc-list: [16, 24, 32, 40, 48] }

minimaxm2.5-fp8-mi355x-vllm:
image: vllm/vllm-openai-rocm:v0.21.0
model: MiniMaxAI/MiniMax-M2.5
Expand Down Expand Up @@ -1833,6 +1853,29 @@ dsv4-fp4-mi355x-sglang:
- { tp: 8, dp-attn: true, conc-start: 64, conc-end: 2048 }
- { tp: 8, dp-attn: false, conc-start: 1, conc-end: 32 }

# Diverged from dsv4-fp4-mi355x-sglang (agentic-coding sibling). Reasons below;
# the original dsv4-fp4-mi355x-sglang entry is left identical to origin/main so
# its fixed-seq-len sweep is unaffected.
# - scenarios: replaced fixed-seq-len with agentic-coding.
# Image is identical to the base entry (rocm/sgl-dev DSv4 build).
# CONC ranges mirror dsv4-fp4-b200-vllm-agentic for cross-hardware
# comparability. Offload sweep is none-only (SGLang has no equivalent of
# vLLM's SimpleCPUOffloadConnector path that we exercise on b200).
dsv4-fp4-mi355x-sglang-agentic:
image: rocm/sgl-dev:rocm720-mi35x-0363e6c-20260509-DSv4
model: deepseek-ai/DeepSeek-V4-Pro
model-prefix: dsv4
runner: mi355x
precision: fp4
framework: sglang
multinode: false
scenarios:
agentic-coding:
- duration: 1800
search-space:
- { tp: 8, offloading: none, conc-list: [16, 32, 64] }
- { tp: 8, dp-attn: true, offloading: none, conc-list: [64, 128, 256] }

# DSv4 on MI355X via vLLM, using the official vllm/vllm-openai-rocm
# nightly image. DSv4 base ROCm support (vllm-project/vllm#40871) merged
# on 2026-05-05, so any nightly built after that includes the
Expand Down
Loading