Add Kimi K2.5 1T INT4 vLLM benchmark for B200 by functionstackx · Pull Request #735 · SemiAnalysisAI/InferenceX

functionstackx · 2026-02-18T04:55:44Z

Summary

Add Kimi K2.5 INT4 vLLM benchmark for B200 single-node
Image: vllm/vllm-openai:v0.15.1
TP=8, concurrency 4-64 for 1k1k, 1k8k, 8k1k
Flags: --mm-encoder-tp-mode data, --trust-remote-code (on both vllm serve and benchmark)
Branches off claude/issue-723-20260218-0123 (MI355X Kimi K2.5 work)

Closes #727

- Add kimik2.5-int4-mi355x-vllm config to amd-master.yaml - Image: vllm/vllm-openai-rocm:v0.15.1 (per Andy Luo's recipe) - Model: moonshotai/Kimi-K2.5 with --mm-encoder-tp-mode data - TP=8, concurrency 4-64 for 1k1k, 1k8k, 8k1k - No AITER env vars, no --no-enable-prefix-caching Closes #723 Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

The flag is not recognized by vllm v0.15.1 and causes the server to fail on startup. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

The benchmark_serving.py script already accepts --trust-remote-code but benchmark_lib.sh's run_benchmark_serving() function wasn't passing it through. This caused tokenizer loading failures for models like Kimi-K2.5 that require trust_remote_code=True. - Add --trust-remote-code flag parsing in run_benchmark_serving() - Pass the flag through to benchmark_serving.py when set - Enable --trust-remote-code in the Kimi-K2.5 benchmark script Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

- Create benchmarks/kimik2.5_int4_b200.sh with --trust-remote-code on both vllm serve and run_benchmark_serving - Add kimik2.5-int4-b200-vllm config to nvidia-master.yaml - Update perf-changelog.yaml with new entry Image: vllm/vllm-openai:v0.15.1 Model: moonshotai/Kimi-K2.5, TP=8, concurrency 4-64 Flags: --mm-encoder-tp-mode data, --trust-remote-code Closes #727 Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

functionstackx · 2026-02-18T04:58:15Z

/sweep

github-actions · 2026-02-18T04:58:25Z

@functionstackx Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22127197803
Command: ``
Pinned ref: eebc476
Approval: not required (trusted collaborator).

functionstackx · 2026-02-18T04:59:49Z

/sweep test-config --config-keys kimik2.5-int4-b200-vllm --runner-config .github/configs/runners.yaml --config-files .github/configs/nvidia-master.yaml

kedarpotdar-nv · 2026-02-19T20:11:31Z

I will take this over

github-actions bot and others added 5 commits February 18, 2026 01:25

Update perf-changelog PR link to #734

e4bf03f

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

Remove --max-seq-len-to-capture flag from Kimi-K2.5 benchmark

d7ce514

The flag is not recognized by vllm v0.15.1 and causes the server to fail on startup. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

github-project-automation bot added this to InferenceMAX Board Feb 18, 2026

functionstackx added NVIDIA sweep-enabled labels Feb 18, 2026

functionstackx requested review from a team and kedarpotdar-nv February 18, 2026 04:56

functionstackx removed the sweep-enabled label Feb 18, 2026

functionstackx changed the base branch from claude/issue-723-20260218-0123 to main February 18, 2026 05:05

functionstackx added the sweep-enabled label Feb 18, 2026

functionstackx changed the title ~~Add Kimi K2.5 INT4 vLLM benchmark for B200~~ Add Kimi K2.5 1T INT4 vLLM benchmark for B200 Feb 18, 2026

Merge branch 'main' into claude/issue-727-20260218-0415

4ef1970

kedarpotdar-nv self-assigned this Feb 19, 2026

kedarpotdar-nv marked this pull request as draft February 19, 2026 20:11

kedarpotdar-nv and others added 2 commits February 20, 2026 15:17

Merge branch 'main' into claude/issue-727-20260218-0415

bd740c0

update kimi config

c4cf9e5

kedarpotdar-nv marked this pull request as ready for review February 20, 2026 23:22

kedarpotdar-nv and others added 4 commits February 20, 2026 15:31

fix typo in benchmark serving location

2b7b694

add max model len

7bfd76a

add max num seq

9909635

Merge branch 'main' into claude/issue-727-20260218-0415

b8d690f

kedarpotdar-nv removed their request for review February 21, 2026 05:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add Kimi K2.5 1T INT4 vLLM benchmark for B200#735

Add Kimi K2.5 1T INT4 vLLM benchmark for B200#735
functionstackx wants to merge 12 commits intomainfrom
claude/issue-727-20260218-0415

functionstackx commented Feb 18, 2026

Uh oh!

functionstackx commented Feb 18, 2026

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

functionstackx commented Feb 18, 2026

Uh oh!

kedarpotdar-nv commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

functionstackx commented Feb 18, 2026

Summary

Uh oh!

functionstackx commented Feb 18, 2026

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

functionstackx commented Feb 18, 2026

Uh oh!

kedarpotdar-nv commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants