Skip to content

Comments

Add Kimi K2.5 1T INT4 vLLM benchmark for B200#735

Open
functionstackx wants to merge 12 commits intomainfrom
claude/issue-727-20260218-0415
Open

Add Kimi K2.5 1T INT4 vLLM benchmark for B200#735
functionstackx wants to merge 12 commits intomainfrom
claude/issue-727-20260218-0415

Conversation

@functionstackx
Copy link
Contributor

Summary

  • Add Kimi K2.5 INT4 vLLM benchmark for B200 single-node
  • Image: vllm/vllm-openai:v0.15.1
  • TP=8, concurrency 4-64 for 1k1k, 1k8k, 8k1k
  • Flags: --mm-encoder-tp-mode data, --trust-remote-code (on both vllm serve and benchmark)
  • Branches off claude/issue-723-20260218-0123 (MI355X Kimi K2.5 work)

Closes #727

Generated with Claude Code

github-actions bot and others added 5 commits February 18, 2026 01:25
- Add kimik2.5-int4-mi355x-vllm config to amd-master.yaml
- Image: vllm/vllm-openai-rocm:v0.15.1 (per Andy Luo's recipe)
- Model: moonshotai/Kimi-K2.5 with --mm-encoder-tp-mode data
- TP=8, concurrency 4-64 for 1k1k, 1k8k, 8k1k
- No AITER env vars, no --no-enable-prefix-caching

Closes #723

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
The flag is not recognized by vllm v0.15.1 and causes the server
to fail on startup.

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
The benchmark_serving.py script already accepts --trust-remote-code but
benchmark_lib.sh's run_benchmark_serving() function wasn't passing it
through. This caused tokenizer loading failures for models like
Kimi-K2.5 that require trust_remote_code=True.

- Add --trust-remote-code flag parsing in run_benchmark_serving()
- Pass the flag through to benchmark_serving.py when set
- Enable --trust-remote-code in the Kimi-K2.5 benchmark script

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
- Create benchmarks/kimik2.5_int4_b200.sh with --trust-remote-code on
  both vllm serve and run_benchmark_serving
- Add kimik2.5-int4-b200-vllm config to nvidia-master.yaml
- Update perf-changelog.yaml with new entry

Image: vllm/vllm-openai:v0.15.1
Model: moonshotai/Kimi-K2.5, TP=8, concurrency 4-64
Flags: --mm-encoder-tp-mode data, --trust-remote-code

Closes #727

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
@functionstackx
Copy link
Contributor Author

/sweep

@github-actions
Copy link
Contributor

@functionstackx Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22127197803
Command: ``
Pinned ref: eebc476
Approval: not required (trusted collaborator).

@functionstackx
Copy link
Contributor Author

/sweep test-config --config-keys kimik2.5-int4-b200-vllm --runner-config .github/configs/runners.yaml --config-files .github/configs/nvidia-master.yaml

@functionstackx functionstackx changed the base branch from claude/issue-723-20260218-0123 to main February 18, 2026 05:05
@functionstackx functionstackx changed the title Add Kimi K2.5 INT4 vLLM benchmark for B200 Add Kimi K2.5 1T INT4 vLLM benchmark for B200 Feb 18, 2026
@kedarpotdar-nv
Copy link
Collaborator

I will take this over

@kedarpotdar-nv kedarpotdar-nv self-assigned this Feb 19, 2026
@kedarpotdar-nv kedarpotdar-nv marked this pull request as draft February 19, 2026 20:11
@kedarpotdar-nv kedarpotdar-nv marked this pull request as ready for review February 20, 2026 23:22
@kedarpotdar-nv kedarpotdar-nv removed their request for review February 21, 2026 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

b200 kimi k2.5 int4 vllm single node

2 participants