kimik2.5-fp4-gb200-dynamo-vllm: bump vLLM image to v0.21.0 by Ankur-singh · Pull Request #1582 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-05-28T17:42:33Z

Summary

Bump kimik2.5-fp4-gb200-dynamo-vllm image from vllm/vllm-openai:v0.18.0-cu130 to vllm/vllm-openai:v0.21.0.
Matches the v0.21.0 tag convention from PR [Handoff to @Oseltamivir Claude /loop] [Klaud Cold] Update dsv4-fp8-h200-vllm (+mtp) vLLM image to v0.21.0 #1461 (dsv4-fp8-h200-vllm) and the recent kimik2.5-fp4-b200-vllm-agentic bump.
Appends a perf-changelog entry (pr-link will be patched in a follow-up commit).

Note

Low Risk
Config-only container tag bump and changelog; no application or auth logic changes.

Overview
Updates the kimik2.5-fp4-gb200-dynamo-vllm benchmark in nvidia-master.yaml to use vllm/vllm-openai:v0.21.0 instead of v0.18.0-cu130, aligning with other recent vLLM v0.21.0 bumps in the repo.

Adds a matching perf-changelog.yaml entry for that config key documenting the image change.

^{Reviewed by Cursor Bugbot for commit 424a294. Bugbot is set up for automated code reviews on this repo. Configure here.}

Bumps the kimik2.5-fp4-gb200-dynamo-vllm image from vllm/vllm-openai:v0.18.0-cu130 to vllm/vllm-openai:v0.21.0.

github-actions · 2026-05-28T17:42:43Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-28T17:42:43Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 424a294. Configure here.}

cursor · 2026-05-28T17:45:34Z


 kimik2.5-fp4-gb200-dynamo-vllm:
-  image: vllm/vllm-openai:v0.18.0-cu130
+  image: vllm/vllm-openai:v0.21.0


Missing -ubuntu2404 suffix for GB200 vLLM image tag

High Severity

The image tag vllm/vllm-openai:v0.21.0 lacks the -ubuntu2404 suffix required for GB200 runners. All other GB200 dynamo-vllm configs in this file (dsv4-fp4-gb200-dynamo-vllm, dsv4-fp4-gb200-dynamo-vllm-mtp2, dsv4-fp4-gb300-dynamo-vllm) use the -ubuntu2404 variant which provides CUDA 13.0 and aarch64 support needed by Grace Blackwell hardware. The old image correctly used -cu130. The v0.21.0-ubuntu2404 tag exists and is the appropriate image for this runner.

^{Reviewed by Cursor Bugbot for commit 424a294. Configure here.}

claude · 2026-05-28T17:51:12Z


 kimik2.5-fp4-gb200-dynamo-vllm:
-  image: vllm/vllm-openai:v0.18.0-cu130
+  image: vllm/vllm-openai:v0.21.0


🔴 The bumped image vllm/vllm-openai:v0.21.0 is the x86_64 tag, but the gb200 runner is Grace+Blackwell (ARM64). Every other gb200/gb300 vllm entry in this file uses the -ubuntu2404 suffix (e.g. dsv4-fp4-gb200-dynamo-vllm pins v0.20.0-ubuntu2404), and the srt-slurm recipe disagg-gb200-mid-curve-megamoe-mtp2.yaml already pins vllm/vllm-openai:v0.21.0-ubuntu2404 — proving the ARM64 tag exists. Change to vllm/vllm-openai:v0.21.0-ubuntu2404 to match the gb200 convention; the prior v0.18.0-cu130 value also had an arch/CUDA suffix, so dropping to the bare tag will almost certainly fail at container start with a manifest/exec-format error.

Extended reasoning...

What the bug is

The diff bumps kimik2.5-fp4-gb200-dynamo-vllm from vllm/vllm-openai:v0.18.0-cu130 to the bare tag vllm/vllm-openai:v0.21.0. The bare vX.Y.Z tag on vllm/vllm-openai is the x86_64 image. The runner: gb200 field directly below this line targets a Grace+Blackwell node, whose CPU is ARM64 (aarch64). Pulling the x86 image on an ARM64 host will either fail with no matching manifest for linux/arm64 or fall back to an emulated/wrong image that fails with exec format error.

Convention evidence in this file

Every other gb200/gb300 vllm entry in .github/configs/nvidia-master.yaml uses the -ubuntu2404 ARM-suitable suffix:

Config runner image tag

dsv4-fp4-gb200-dynamo-vllm gb200 v0.20.0-ubuntu2404

dsv4-fp4-gb200-dynamo-vllm-mtp2 gb200 v0.20.1-ubuntu2404

dsv4-fp4-gb300-dynamo-vllm gb300-nv v0.20.0-ubuntu2404

kimik2.5-fp4-gb200-dynamo-vllm (this PR) gb200 v0.21.0 ← missing suffix

Conversely, every bare v0.21.0 (no suffix) usage in this file is on an x86 runner (h200, b200, b300, h100). The bare tag is correct only for x86 hosts.

Proof the correct tag exists upstream

The repository already pins the exact correct tag for gb200: benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/8k1k/disagg-gb200-mid-curve-megamoe-mtp2.yaml uses vllm/vllm-openai:v0.21.0-ubuntu2404 (lines 5 and 143). So this is not a new tag we need vLLM to publish — it is already published and already used elsewhere in the repo for gb200.

Why the PR description's justification doesn't apply

The PR description says it "matches the v0.21.0 tag convention from PR #1461 (dsv4-fp8-h200-vllm)" and "the recent kimik2.5-fp4-b200-vllm-agentic bump". Both of those configs are x86 (H200 host and B200-dgxc host), so plain v0.21.0 is correct for them. They are exactly the wrong precedent for an ARM64 GB200 entry. The author appears to have copied the tag from an x86 sibling without noticing the architecture difference. The prior value v0.18.0-cu130 had a -cu130 (CUDA 13) suffix, which is what provided the ARM64-compatible image — that signal was lost in the bump.

Step-by-step proof of failure

CI dispatches the benchmark to a gb200 runner (line 8276: runner: gb200). GB200 = Grace ARM64 CPU + Blackwell GPU.

The runner attempts docker pull vllm/vllm-openai:v0.21.0 (per line 8273).

The bare v0.21.0 tag on Docker Hub for vllm/vllm-openai has only a linux/amd64 manifest (the ARM64 build is published under the -ubuntu2404 suffix — confirmed because the repo's own disagg-gb200-mid-curve-megamoe-mtp2.yaml pins v0.21.0-ubuntu2404 specifically because that's the ARM build).

Docker on ARM64 will fail with no matching manifest for linux/arm64/v8 in the manifest list entries (or, if a single-arch x86 manifest is matched anyway, the container will fail to start with exec /usr/bin/python: exec format error).

The benchmark never runs; the job fails at container start.

Fix

Change line 8273 from:

image: vllm/vllm-openai:v0.21.0

to:

image: vllm/vllm-openai:v0.21.0-ubuntu2404

This matches the established gb200 convention in nvidia-master.yaml and the existing disagg-gb200-mid-curve-megamoe-mtp2.yaml recipe. The perf-changelog entry should be updated to reference the suffixed tag as well.

github-actions · 2026-05-28T20:32:04Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26591796805
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26591796805

kimik2.5-fp4-gb200-dynamo-vllm: bump vllm image to v0.21.0

f40f14f

Bumps the kimik2.5-fp4-gb200-dynamo-vllm image from vllm/vllm-openai:v0.18.0-cu130 to vllm/vllm-openai:v0.21.0.

Ankur-singh requested a review from a team May 28, 2026 17:42

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners May 28, 2026 17:42

github-project-automation Bot added this to InferenceMAX Board May 28, 2026

perf-changelog: set PR link to #1582

424a294

Ankur-singh added the full-sweep-enabled label May 28, 2026

cursor Bot reviewed May 28, 2026

View reviewed changes

claude Bot reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kimik2.5-fp4-gb200-dynamo-vllm: bump vLLM image to v0.21.0#1582

kimik2.5-fp4-gb200-dynamo-vllm: bump vLLM image to v0.21.0#1582
Ankur-singh wants to merge 2 commits into
mainfrom
update-kimi-image

Ankur-singh commented May 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 28, 2026

Uh oh!

claude Bot May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Config	runner	image tag
`dsv4-fp4-gb200-dynamo-vllm`	gb200	`v0.20.0-ubuntu2404`
`dsv4-fp4-gb200-dynamo-vllm-mtp2`	gb200	`v0.20.1-ubuntu2404`
`dsv4-fp4-gb300-dynamo-vllm`	gb300-nv	`v0.20.0-ubuntu2404`
`kimik2.5-fp4-gb200-dynamo-vllm` (this PR)	gb200	`v0.21.0` ← missing suffix

Conversation

Ankur-singh commented May 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 28, 2026

Choose a reason for hiding this comment

Missing -ubuntu2404 suffix for GB200 vLLM image tag

Uh oh!

claude Bot May 28, 2026

Choose a reason for hiding this comment

What the bug is

Convention evidence in this file

Proof the correct tag exists upstream

Why the PR description's justification doesn't apply

Step-by-step proof of failure

Fix

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ankur-singh commented May 28, 2026 •

edited by cursor Bot

Loading

Missing `-ubuntu2404` suffix for GB200 vLLM image tag