kimik2.5-fp4-gb200-dynamo-vllm: bump vLLM image to v0.21.0#1582
kimik2.5-fp4-gb200-dynamo-vllm: bump vLLM image to v0.21.0#1582Ankur-singh wants to merge 2 commits into
Conversation
Bumps the kimik2.5-fp4-gb200-dynamo-vllm image from vllm/vllm-openai:v0.18.0-cu130 to vllm/vllm-openai:v0.21.0.
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 424a294. Configure here.
|
|
||
| kimik2.5-fp4-gb200-dynamo-vllm: | ||
| image: vllm/vllm-openai:v0.18.0-cu130 | ||
| image: vllm/vllm-openai:v0.21.0 |
There was a problem hiding this comment.
Missing -ubuntu2404 suffix for GB200 vLLM image tag
High Severity
The image tag vllm/vllm-openai:v0.21.0 lacks the -ubuntu2404 suffix required for GB200 runners. All other GB200 dynamo-vllm configs in this file (dsv4-fp4-gb200-dynamo-vllm, dsv4-fp4-gb200-dynamo-vllm-mtp2, dsv4-fp4-gb300-dynamo-vllm) use the -ubuntu2404 variant which provides CUDA 13.0 and aarch64 support needed by Grace Blackwell hardware. The old image correctly used -cu130. The v0.21.0-ubuntu2404 tag exists and is the appropriate image for this runner.
Reviewed by Cursor Bugbot for commit 424a294. Configure here.
|
|
||
| kimik2.5-fp4-gb200-dynamo-vllm: | ||
| image: vllm/vllm-openai:v0.18.0-cu130 | ||
| image: vllm/vllm-openai:v0.21.0 |
There was a problem hiding this comment.
🔴 The bumped image vllm/vllm-openai:v0.21.0 is the x86_64 tag, but the gb200 runner is Grace+Blackwell (ARM64). Every other gb200/gb300 vllm entry in this file uses the -ubuntu2404 suffix (e.g. dsv4-fp4-gb200-dynamo-vllm pins v0.20.0-ubuntu2404), and the srt-slurm recipe disagg-gb200-mid-curve-megamoe-mtp2.yaml already pins vllm/vllm-openai:v0.21.0-ubuntu2404 — proving the ARM64 tag exists. Change to vllm/vllm-openai:v0.21.0-ubuntu2404 to match the gb200 convention; the prior v0.18.0-cu130 value also had an arch/CUDA suffix, so dropping to the bare tag will almost certainly fail at container start with a manifest/exec-format error.
Extended reasoning...
What the bug is
The diff bumps kimik2.5-fp4-gb200-dynamo-vllm from vllm/vllm-openai:v0.18.0-cu130 to the bare tag vllm/vllm-openai:v0.21.0. The bare vX.Y.Z tag on vllm/vllm-openai is the x86_64 image. The runner: gb200 field directly below this line targets a Grace+Blackwell node, whose CPU is ARM64 (aarch64). Pulling the x86 image on an ARM64 host will either fail with no matching manifest for linux/arm64 or fall back to an emulated/wrong image that fails with exec format error.
Convention evidence in this file
Every other gb200/gb300 vllm entry in .github/configs/nvidia-master.yaml uses the -ubuntu2404 ARM-suitable suffix:
| Config | runner | image tag |
|---|---|---|
dsv4-fp4-gb200-dynamo-vllm |
gb200 | v0.20.0-ubuntu2404 |
dsv4-fp4-gb200-dynamo-vllm-mtp2 |
gb200 | v0.20.1-ubuntu2404 |
dsv4-fp4-gb300-dynamo-vllm |
gb300-nv | v0.20.0-ubuntu2404 |
kimik2.5-fp4-gb200-dynamo-vllm (this PR) |
gb200 | v0.21.0 ← missing suffix |
Conversely, every bare v0.21.0 (no suffix) usage in this file is on an x86 runner (h200, b200, b300, h100). The bare tag is correct only for x86 hosts.
Proof the correct tag exists upstream
The repository already pins the exact correct tag for gb200: benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/8k1k/disagg-gb200-mid-curve-megamoe-mtp2.yaml uses vllm/vllm-openai:v0.21.0-ubuntu2404 (lines 5 and 143). So this is not a new tag we need vLLM to publish — it is already published and already used elsewhere in the repo for gb200.
Why the PR description's justification doesn't apply
The PR description says it "matches the v0.21.0 tag convention from PR #1461 (dsv4-fp8-h200-vllm)" and "the recent kimik2.5-fp4-b200-vllm-agentic bump". Both of those configs are x86 (H200 host and B200-dgxc host), so plain v0.21.0 is correct for them. They are exactly the wrong precedent for an ARM64 GB200 entry. The author appears to have copied the tag from an x86 sibling without noticing the architecture difference. The prior value v0.18.0-cu130 had a -cu130 (CUDA 13) suffix, which is what provided the ARM64-compatible image — that signal was lost in the bump.
Step-by-step proof of failure
- CI dispatches the benchmark to a
gb200runner (line 8276:runner: gb200). GB200 = Grace ARM64 CPU + Blackwell GPU. - The runner attempts
docker pull vllm/vllm-openai:v0.21.0(per line 8273). - The bare
v0.21.0tag on Docker Hub forvllm/vllm-openaihas only alinux/amd64manifest (the ARM64 build is published under the-ubuntu2404suffix — confirmed because the repo's owndisagg-gb200-mid-curve-megamoe-mtp2.yamlpinsv0.21.0-ubuntu2404specifically because that's the ARM build). - Docker on ARM64 will fail with
no matching manifest for linux/arm64/v8 in the manifest list entries(or, if a single-arch x86 manifest is matched anyway, the container will fail to start withexec /usr/bin/python: exec format error). - The benchmark never runs; the job fails at container start.
Fix
Change line 8273 from:
image: vllm/vllm-openai:v0.21.0to:
image: vllm/vllm-openai:v0.21.0-ubuntu2404This matches the established gb200 convention in nvidia-master.yaml and the existing disagg-gb200-mid-curve-megamoe-mtp2.yaml recipe. The perf-changelog entry should be updated to reference the suffixed tag as well.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26591796805 |


Summary
kimik2.5-fp4-gb200-dynamo-vllmimage fromvllm/vllm-openai:v0.18.0-cu130tovllm/vllm-openai:v0.21.0.dsv4-fp8-h200-vllm) and the recentkimik2.5-fp4-b200-vllm-agenticbump.pr-linkwill be patched in a follow-up commit).Note
Low Risk
Config-only container tag bump and changelog; no application or auth logic changes.
Overview
Updates the
kimik2.5-fp4-gb200-dynamo-vllmbenchmark innvidia-master.yamlto usevllm/vllm-openai:v0.21.0instead ofv0.18.0-cu130, aligning with other recent vLLM v0.21.0 bumps in the repo.Adds a matching
perf-changelog.yamlentry for that config key documenting the image change.Reviewed by Cursor Bugbot for commit 424a294. Bugbot is set up for automated code reviews on this repo. Configure here.