Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
189 commits
Select commit Hold shift + click to select a range
c3af2af
Split PR. Second part. Compile ranges
ilmarkov Sep 4, 2025
0cbb065
Remove general shape graph
ilmarkov Sep 4, 2025
d5392f5
Add test to test pipeline
ilmarkov Sep 5, 2025
027c9eb
Fix pre-commit
ilmarkov Sep 9, 2025
b2992d3
Upd
ilmarkov Oct 16, 2025
3499384
Upd config
ilmarkov Oct 16, 2025
5336ee6
Fix
ilmarkov Oct 16, 2025
4958474
Priotitize compile_sizes
ilmarkov Oct 17, 2025
04306ed
Fix inductor config
ilmarkov Oct 28, 2025
9dc4eea
Laith's fix
ilmarkov Nov 3, 2025
2c63f0b
Upd
ilmarkov Nov 4, 2025
8b8d01d
Merge branch 'imarkov/fused_allreduce_torch_native' into imarkov/cond…
ilmarkov Nov 4, 2025
fcebc21
Add caching
ilmarkov Nov 4, 2025
65151bc
Address comments
ilmarkov Nov 5, 2025
df22202
Update benchmark
ilmarkov Nov 5, 2025
a21de2b
Fix
ilmarkov Nov 5, 2025
ada24e6
Merge branch 'imarkov/fused_allreduce_torch_native' into imarkov/cond…
ilmarkov Nov 6, 2025
6766e4f
Update fakify for compile sizes
ilmarkov Nov 5, 2025
af87d7a
Linter fix
ilmarkov Nov 6, 2025
459f71c
Merge branch 'imarkov/fused_allreduce_torch_native' into imarkov/cond…
ilmarkov Nov 6, 2025
b4c1b1d
Address the review
ilmarkov Nov 10, 2025
f080a83
[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like …
vllmellm Nov 10, 2025
d0e186c
[V0 Deprecation] Remove unused `context_len` and `seq_len` from M-RoP…
DarkLight1337 Nov 10, 2025
a3e7bdc
Merge branch 'imarkov/fused_allreduce_torch_native' into imarkov/cond…
ilmarkov Nov 10, 2025
b039bfd
[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366)
varun-sundar-rabindranath Nov 10, 2025
34553b9
[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3ne…
jiahanc Nov 10, 2025
6d54336
[Bugfix] Fix llguidance backend, rollback when EOS was encountered (#…
Flechman Nov 10, 2025
9c84ca8
[FA/Chore] Bump FA version for FP8 two-level accumulation (#27889)
jmkuebler Nov 10, 2025
40d3326
[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#…
SageMoore Nov 10, 2025
bf6a3d0
[Misc] Add more scoping for improved trace (#28329)
frank-wei Nov 10, 2025
6dec9f6
[BugFix] Fix DeepGEMM over-allocating workspace (#28254)
LucasWilkinson Nov 10, 2025
4b94ed8
[Frontend][2/n] remove empty content from _parse_tool_calls_from_cont…
qandrew Nov 10, 2025
30700b1
[CI] Fix Plugin Tests Tests (#28413)
robertgshaw2-redhat Nov 10, 2025
0211435
[ROCm] Add missing gemm_a8w8_blockscale import (#28378)
sarckk Nov 10, 2025
d17ecc6
[PERF] Allreduce fusion. Support torch native matching. Tuning of the…
ilmarkov Nov 10, 2025
b30372c
[Perf] Move gc.freeze logic from EngineCoreProc to EngineCore for bet…
Jialin Nov 10, 2025
a5a790e
[Bugfix] Ensure calculated KV scales are applied in attention. (#27232)
adabeyta Nov 10, 2025
0bf29fa
[Test] Remove old non-varlen FA2 test (#28420)
MatthewBonanni Nov 10, 2025
35d801f
[Feature] Refactor batch invariant fp8 DeepGEMM (#27606)
yewentao256 Nov 11, 2025
39029d5
[CI/Test Fix] Fix CP tests on Blackwell (#28404)
LucasWilkinson Nov 11, 2025
de540c0
[Feature] Add env var `VLLM_MOE_USE_DEEP_GEMM` (#28422)
yewentao256 Nov 11, 2025
f2d9ad0
Only register rocm_aiter_ops if aiter is found (#28428)
mgoin Nov 11, 2025
57201a6
Fix rotary embedding benchmark script (#28323)
xyang16 Nov 11, 2025
8d706cc
[Misc] FlattenLogprobs -> FlatLogprobs (#28335)
zhuohan123 Nov 11, 2025
bca74e3
[Frontend] Add sagemaker_standards dynamic lora adapter and stateful …
zhaozuy Nov 11, 2025
e605e8e
[Bugfix] Fix Stream Sync for Shared Expert Overlap (#28430)
robertgshaw2-redhat Nov 11, 2025
a7adbc6
[Doc] Sleep mode documentation (#28357)
iAmir97 Nov 11, 2025
cc07976
[BugFix] Avoid calling KV connector layer APIs when metadata is unset…
sdavidbd Nov 11, 2025
4fd4b74
[Bugfix] Fix max image size for PaddleOCR-VL (#28442)
ywang96 Nov 11, 2025
798c7be
[EPLB] Refactor balance_packing to use numpy and optimize GPU-CPU tra…
SageMoore Nov 11, 2025
f0359ff
[Bugfix] fix qwen3-next crash (#28202)
ZJY0516 Nov 11, 2025
c799126
[BugFix] 'DeepseekV2Config' object has no attribute 'use_mla'` (#28387)
faaany Nov 11, 2025
9973e6e
[Model][Qwen3VL] Slighly speedup `fast_pos_embed_interpolate` (#28434)
lgeiger Nov 11, 2025
a810969
Merge branch 'main' into imarkov/conditional_compilation_ranges
ilmarkov Nov 11, 2025
d381eb9
Multi turn benchmark progress bar for synthetic conversation generati…
segevido Nov 11, 2025
2e78150
[CI] Add mergify rules for `nvidia` label (#28417)
mgoin Nov 11, 2025
b30dfa0
[Attention] Refactor CUDA attention backend selection logic (#24794)
MatthewBonanni Nov 11, 2025
7dbe6d8
Fix Fused MoE LoRA Triton kernel bug (#28450)
chaojun-zhang Nov 11, 2025
afffd3c
[Model] Pass `mm_features` directly into `get_mrope_input_positions` …
DarkLight1337 Nov 11, 2025
3380543
Add request timeout override for multi-turn benchmarks (#28386)
segevido Nov 11, 2025
fa19702
[Docs] Fix grammar in CPU installation guide (#28461)
maryamtahhan Nov 11, 2025
a1448b4
[Kernels] Split up fused_moe/layer.py, isolate more modular kernel co…
bnellnm Nov 11, 2025
533b018
[BugFix] Fix Failing Ruff Check (#28469)
jvlunteren Nov 11, 2025
a90ad7d
Add @markmc to CODEOWNERS for Observability (#28457)
markmc Nov 11, 2025
b886068
[BugFix] Fix RuntimeError in PixtralHFAttention on CPU/XPU (#28444)
faaany Nov 11, 2025
3143eb2
[BugFix] Add test_outputs.py to CI pipeline (#28466)
usberkeley Nov 11, 2025
287bbbe
[Doc] Fix typo in serving docs (#28474)
the-codeboy Nov 11, 2025
f9a4087
Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel …
mgoin Nov 11, 2025
a7ef3eb
[NIXL] Generalize block-first backend layouts (FlashInfer-like) (#28282)
NickLucche Nov 11, 2025
68c09ef
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Mo…
izhuhaoran Nov 11, 2025
05576df
[ROCm][Quantization] extend AMD Quark to support mixed-precision quan…
xuebwang-amd Nov 11, 2025
5a1271d
[Quantization] fix attention quantization of gpt_oss model (#27334)
xuebwang-amd Nov 11, 2025
e553424
[CI/Build] Refactor Attention backend for test_prefix_prefill from xf…
zhewenl Nov 11, 2025
684f254
Prefer FlashAttention MLA as default over FlashMLA (#27363)
MatthewBonanni Nov 11, 2025
6c3c0f8
[Kernel] Optimize rms_norm kernel (#27931)
xyang16 Nov 11, 2025
d5edcb8
[BugFix] Fix Siglip2Attention on XPU (#28448)
faaany Nov 11, 2025
76e4dcf
[Misc] Remove unused attention prefix prefill ops functions (#26971)
lgeiger Nov 11, 2025
4228be7
[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhea…
Jialin Nov 11, 2025
de120bc
[V0 deprecation] Clean up num_prefill_tokens logic for V0 (#28203)
gcanlin Nov 11, 2025
8c32c6e
[Misc] fix typo in DCP comment (#28389)
Livinfly Nov 11, 2025
9d1c474
[LoRA][1/N]Remove LoRA extra vocab (#28382)
jeejeelee Nov 11, 2025
df4d3a4
[TPU] Rename path to tpu platform (#28452)
kyuyeunk Nov 11, 2025
d4902ba
[Misc] Cleanup Executor interface (#28441)
wangxiyuan Nov 11, 2025
28534b9
Add Zurich vLLM Meetup (#28488)
mgoin Nov 11, 2025
e5f599d
[Bugfix] Disable shared expert overlap if Marlin MoE is used (#28410)
mgoin Nov 11, 2025
412e153
[Feature] Allow configuring FlashInfer workspace size (#28269)
maxyanghu Nov 11, 2025
d235395
Use FLASHINFER MLA backend when testing fp8_kv_scale_compile (#28491)
adabeyta Nov 12, 2025
1788aa1
[BugFix] Graceful handling of torch symm mem errors. (#27671)
ilmarkov Nov 12, 2025
48c8793
[Frontend] Change CompilationMode to a proper Enum (#28165)
gmagogsfm Nov 12, 2025
3f770f4
[Performance] Cache loaded custom logitsprocs to avoid overheads (#28…
Isotr0py Nov 12, 2025
e171039
[[V0 deprecation]]Remove VLLM_USE_V1 env (#28204)
wangxiyuan Nov 12, 2025
7f829be
[CPU] Refactor CPU attention backend (#27954)
bigPYJ1151 Nov 12, 2025
9f0247c
`VLLM_USE_TRITON_FLASH_ATTN` V0 variable deprecation (#27611)
AndreasKaratzas Nov 12, 2025
cbb799e
[Model][Qwen3VL] Simplify `get_mrope_input_positions` using numpy (#2…
lgeiger Nov 12, 2025
4ccffe5
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#…
fake0fan Nov 12, 2025
b9ce9a3
[BugFix] Add fallback path in `apply_rotary_pos_emb_flashattn` for no…
faaany Nov 12, 2025
f31419e
[Benchmark] Add retry support to fix workload bias in multi-turn benc…
ai-jz Nov 12, 2025
ac0bb2c
[Core] Cache `vllm_is_batch_invariant` (#28304)
lgeiger Nov 12, 2025
91864b7
[CI/Build] Fix crash due to removed VLLM_USE_V1 attribute in EPD (#28…
fake0fan Nov 12, 2025
c748355
[CI] Introduce autorun_on_main feature (#27836)
hl475 Nov 12, 2025
1761dea
[BugFix]: --enable-lora with model granite-4.0-micro crash (#27733)
yyzxw Nov 12, 2025
d3ade61
[Model] fix glm4_moe_mtp load weights with GLM-4.6 checkpoint. (#27597)
wuyaoxuehun Nov 12, 2025
a4730c1
[XPU]Fix crash due to removed VLLM_USE_V1 attribute (#28520)
chaojun-zhang Nov 12, 2025
d143152
[KVConnector] Enable get_block_ids_with_load_errors() in LMCache conn…
ziruiliu Nov 12, 2025
c5f10cc
add cpu option for p/d in nixl_connector (#28356)
ZhengHongming888 Nov 12, 2025
edb59a9
[ROCm] [Bugfix] Fix `fused_qknorm_rope_kernel` rocm compatibility (#2…
tjtanaa Nov 12, 2025
a9d18b5
[Bugfix] Fix gpt_oss packed_modules_mapping (#28536)
jeejeelee Nov 12, 2025
10138c9
[V0 deprecation] Deprecate use_v1 parameter (#28112)
wangxiyuan Nov 12, 2025
54aecd9
Fix pre-commit (and XPU) on `main` (#28556)
hmellor Nov 12, 2025
f76e85c
[Performance][Hopper] Avoid M dim padding to 4x for most cases (due t…
alexm-redhat Nov 12, 2025
bc5bd45
[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL (#…
gcanlin Nov 12, 2025
728a9eb
[Misc] Refactor Attention kv transfer methods into decorator (#27816)
NickLucche Nov 12, 2025
a742134
Remove deprecated fields from `CompilationConfig` (#27593)
hmellor Nov 12, 2025
3044195
[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec…
benchislett Nov 12, 2025
bac9045
Implement ARC KV cache eviction policy for CPU offloader (#27039)
albertoperdomo2 Nov 12, 2025
a1e7fa3
[EPLB][ROCm]: support EPBL for ROCm backend (#27731)
PerryZhang01 Nov 12, 2025
64d57c3
[Model] [Config] Correctly identify granite-4.0-micro as non-hybrid m…
tdoublep Nov 12, 2025
319abd5
Remove dynamic shape
ilmarkov Nov 12, 2025
a39dd7b
[CI] Skip "Multi-Modal Models Test (Extended) 3" test that's broken i…
hmellor Nov 12, 2025
94a9ebc
[KV connector][WIP] KV cache proxy based on LMCache multi-process mod…
ApostaC Nov 12, 2025
58ce8d1
[BugFix] Priority scheduling and spec tokens preemption (#28558)
andylolu2 Nov 12, 2025
478ee51
[Misc]Fix typo in llm_engine.py (#28584)
frank-wei Nov 12, 2025
74a9a9f
[Performance][B200] Fix deepgemm prologue (#27897)
varun-sundar-rabindranath Nov 12, 2025
d8140b9
[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel na…
vllmellm Nov 12, 2025
3eb0c26
[TPU] Support GCS path in VLLM_TORCH_PROFILER_DIR (#28487)
QiliangCui Nov 12, 2025
10f01d5
[Bugfix] Adjust Marlin CUDA arch selection to 8.0+PTX;9.0+PTX (#28294)
mgoin Nov 12, 2025
4ca5cd5
[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#1…
HollowMan6 Nov 12, 2025
69d0e90
[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406)
alexm-redhat Nov 12, 2025
51c599f
Skip models that cannot currently init on Transformers v5 (#28471)
hmellor Nov 12, 2025
52eadce
[Docs] Update meetups.md description (#28583)
mgoin Nov 13, 2025
d75ad04
[ROCm][Bugfix] Revert removing setuptools version restriction (#28592)
gshtras Nov 13, 2025
2dacd57
[platform] Move get_cu_count to utils (#27005)
wangxiyuan Nov 13, 2025
a543e67
[Bugfix] Fix SM100 gpt-oss regression due to faulty attn sink support…
mgoin Nov 13, 2025
8832fff
[BugFix] Fix `mm_encoder_attn_backend` arg type checking (#28599)
njhill Nov 13, 2025
3226283
[Docs] Add some details about what the MoE block needs for the Transf…
hmellor Nov 13, 2025
97d1c99
Rename clashing method names for vLLM model protocol (#27583)
hmellor Nov 13, 2025
a1d3866
[n-gen] DO NOT repeatedly return finished child requests (#28591)
Jialin Nov 13, 2025
7c38ed0
[Frontend] split append tool output (#28333)
qandrew Nov 13, 2025
1a0b157
[Frontend][responsesAPI][1/n] convert responses API tool input to cha…
qandrew Nov 13, 2025
7dca0c9
[BugFix][ROCm] Fix `get_cu_count` missing variable error (#28608)
ganyi1996ppo Nov 13, 2025
dbbe0c7
[XPU] Support Triton path for LoRA operations on XPU (#28511)
faaany Nov 13, 2025
7e082bc
Support DeepEP for Kimi-k2-thinking through enabling gemm selection f…
luccafong Nov 13, 2025
d44fbba
[build][cmake]: Bundle static ACL and torch libgomp for CPU extension…
Radu2k Nov 13, 2025
ca00b1b
[ROCm][BugFix] Remove the usage of `device_info` from aiter (#28383)
ganyi1996ppo Nov 13, 2025
4504e80
[Bugfix] Prevent crash on empty grammar string (#28210)
tjandy98 Nov 13, 2025
c33b87e
Use official xformers-0.0.33 built for PT 2.9 (#28600)
huydhn Nov 13, 2025
4ab34f6
Add NUMA node validation for CPU thread binding (#28555)
usberkeley Nov 13, 2025
fa183e9
[Bugfix] fix kimi-linear crash (#28445)
ZJY0516 Nov 13, 2025
5c9ad13
[Frontend] supports interleaved thinking (#28531)
chaunceyjiang Nov 13, 2025
11ac9dd
Support all interleaved layer types (#28485)
sarckk Nov 13, 2025
d168de0
Make ranges inclusive-inclusive
ilmarkov Nov 13, 2025
e63fd44
Fix: Correctly filter special tokens in benchmark_prefix_caching (#28…
dw2761 Nov 13, 2025
5e97320
[BugFix] Fix type error when assign a trition kernel tensor to a torc…
liuzijing2014 Nov 13, 2025
c428e8d
Fix io processor pooling #28273 (#28484)
baonudesifeizhai Nov 13, 2025
c47b6c8
[XPU] add sym params to IPEXConfig (#28611)
zufangzhu Nov 13, 2025
c9fe6ab
[Bugfix] Fix FPS value type for Qwen2.5-Omni video processing (#28630)
faaany Nov 13, 2025
86d15bf
[Hardware][PowerPC] Fix fp16 compilation error for Power in cpu atten…
Akashcodes732 Nov 13, 2025
8da2f28
[ROCm][BugFix]Fix `get_cu_count` in rocm_aiter_fa.py (#28618)
ganyi1996ppo Nov 13, 2025
a7791ea
[CI/Build] Install uv for AMD MI300: Language Models Tests (Hybrid) %…
amdfaa Nov 13, 2025
07a606a
[CI Failure] Fix backend selection for encoder-only models (#28534)
hl475 Nov 13, 2025
3035d1a
[BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy p…
YuanpingSong Nov 13, 2025
b230286
Fix `get_num_experts` when config sets it explicitly to `None` (#28652)
hmellor Nov 13, 2025
d338775
[Misc] Turn off encoder torch compile by default (#28634)
ywang96 Nov 13, 2025
06c4873
Rewrite C++ meta funcs to Python (#28595)
janeyx99 Nov 13, 2025
327c0a9
[BugFix] Ensure `EngineArgs.create_engine_config` is idempotent (#28515)
njhill Nov 13, 2025
fdfd507
[TPU] patch TPU wheel build script to resolve metadata issue (#27279)
jcyang43 Nov 13, 2025
fe1cd77
[Performance][B200] silu_mul_quant: pack scales in int32 (#28358)
varun-sundar-rabindranath Nov 13, 2025
119c492
[Bugfix] Fix validate model input for decoder models (#27099)
yannicks1 Nov 13, 2025
f9f3b59
[Attention][Bugfix] Fix FA sink support (#28660)
MatthewBonanni Nov 13, 2025
5d6ce2b
[Perf] Support stream interval for reducing host overhead (#27869)
elvischenv Nov 13, 2025
968060c
[bugfix] correct local_chunk_len for DCP in reorg_kvcache with long c…
pisceskkk Nov 13, 2025
262d263
[Bugfix] Eliminate tuple inputs to submodules in graph partitioning (…
gmagogsfm Nov 13, 2025
faed7bf
[Bugfix] [CPU] bump torch to 2.9.0 for Darwin to fix segmentation fau…
kebe7jun Nov 13, 2025
1b622de
[Misc] Update CODEOWNERS for simon-mo and comaniac (#28675)
simon-mo Nov 13, 2025
e64011f
[CI] Bug: Fix ci entrypoint pooling (#28684)
yewentao256 Nov 13, 2025
6e25b1c
[KV Connector] Test async mode in scheduler tests (#28550)
markmc Nov 13, 2025
f2b8e1c
Mirrored test group definitions for AMD (2025-11-11) (#28573)
Alexei-V-Ivanov-AMD Nov 14, 2025
4d5943b
[quantization][config] enable override existing quant_config (#28510)
ILikeIneine Nov 14, 2025
2aa75c7
[ROCm] Bump up the version of amd-smi to 6.4.3 (#28680)
SageMoore Nov 14, 2025
622e610
[CPU][Bugfix] Fix Apple Silicon M1 compilation failure (#28681)
mgoin Nov 14, 2025
b39a502
[ci][amd] fix basic models extra init test (#28676)
bradleyhd Nov 14, 2025
01bea11
[Misc] Remove `warn_for_unimplemented_methods` (#28613)
DarkLight1337 Nov 14, 2025
da14ae0
[XPU][CI]disable lm cache uts (#28696)
jikunshang Nov 14, 2025
0aecd91
[Misc] Update xformers to 0.33.0.post1 (#28678)
ywang96 Nov 14, 2025
0b25498
[Misc] add ignore mapper for quark quantization (#28275)
haoyangli-amd Nov 14, 2025
15ae8e0
[Bugfix][CI/Test][Spec Decode] Fix illegal memory access in offline_i…
rasmith Nov 14, 2025
9310357
[BugFix][CI/Build][ROCM] Fix import error and apply assert in appropr…
rasmith Nov 14, 2025
529cea3
use default CCL_ZE_IPC_EXCHANGE (#28700)
yma11 Nov 14, 2025
b65e752
Merge branch 'main' into imarkov/conditional_compilation_ranges
ilmarkov Nov 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ steps:
queue: cpu_queue_postmerge
commands:
- "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg GIT_REPO_CHECK=1 --build-arg VLLM_CPU_AVX512BF16=true --build-arg VLLM_CPU_AVX512VNNI=true --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version) --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest --progress plain --target vllm-openai -f docker/Dockerfile.cpu ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg GIT_REPO_CHECK=1 --build-arg VLLM_CPU_AVX512BF16=true --build-arg VLLM_CPU_AVX512VNNI=true --build-arg VLLM_CPU_AMXBF16=true --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version) --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest --progress plain --target vllm-openai -f docker/Dockerfile.cpu ."
- "docker push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest"
- "docker push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version)"
env:
Expand Down
18 changes: 7 additions & 11 deletions .buildkite/scripts/hardware_ci/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ while true; do
fi
done

echo "--- Pulling container"
echo "--- Pulling container"
image_name="rocm/vllm-ci:${BUILDKITE_COMMIT}"
container_name="rocm_${BUILDKITE_COMMIT}_$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 10; echo)"
docker pull "${image_name}"
Expand All @@ -78,17 +78,13 @@ HF_MOUNT="/root/.cache/huggingface"
commands=$@
echo "Commands:$commands"

if [[ $commands == *"pytest -v -s basic_correctness/test_basic_correctness.py"* ]]; then
commands=${commands//"pytest -v -s basic_correctness/test_basic_correctness.py"/"VLLM_USE_TRITON_FLASH_ATTN=0 pytest -v -s basic_correctness/test_basic_correctness.py"}
fi
commands=${commands//"pytest -v -s basic_correctness/test_basic_correctness.py"/"pytest -v -s basic_correctness/test_basic_correctness.py"}

if [[ $commands == *"pytest -v -s models/test_registry.py"* ]]; then
commands=${commands//"pytest -v -s models/test_registry.py"/"pytest -v -s models/test_registry.py -k 'not BambaForCausalLM and not GritLM and not Mamba2ForCausalLM and not Zamba2ForCausalLM'"}
fi

if [[ $commands == *"pytest -v -s compile/test_basic_correctness.py"* ]]; then
commands=${commands//"pytest -v -s compile/test_basic_correctness.py"/"VLLM_USE_TRITON_FLASH_ATTN=0 pytest -v -s compile/test_basic_correctness.py"}
fi
commands=${commands//"pytest -v -s compile/test_basic_correctness.py"/"pytest -v -s compile/test_basic_correctness.py"}

if [[ $commands == *"pytest -v -s lora"* ]]; then
commands=${commands//"pytest -v -s lora"/"VLLM_ROCM_CUSTOM_PAGED_ATTN=0 pytest -v -s lora"}
Expand Down Expand Up @@ -181,13 +177,13 @@ if [[ -z "$render_gid" ]]; then
exit 1
fi

# check if the command contains shard flag, we will run all shards in parallel because the host have 8 GPUs.
# check if the command contains shard flag, we will run all shards in parallel because the host have 8 GPUs.
if [[ $commands == *"--shard-id="* ]]; then
# assign job count as the number of shards used
commands=${commands//"--num-shards= "/"--num-shards=${PARALLEL_JOB_COUNT} "}
# assign job count as the number of shards used
commands=$(echo "$commands" | sed -E "s/--num-shards[[:blank:]]*=[[:blank:]]*[0-9]*/--num-shards=${PARALLEL_JOB_COUNT} /g" | sed 's/ \\ / /g')
for GPU in $(seq 0 $(($PARALLEL_JOB_COUNT-1))); do
# assign shard-id for each shard
commands_gpu=${commands//"--shard-id= "/"--shard-id=${GPU} "}
commands_gpu=$(echo "$commands" | sed -E "s/--shard-id[[:blank:]]*=[[:blank:]]*[0-9]*/--shard-id=${GPU} /g" | sed 's/ \\ / /g')
echo "Shard ${GPU} commands:$commands_gpu"
echo "Render devices: $BUILDKITE_AGENT_META_DATA_RENDER_DEVICES"
docker run \
Expand Down
5 changes: 3 additions & 2 deletions .buildkite/scripts/hardware_ci/run-cpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ function cpu_tests() {
# Run kernel tests
docker exec cpu-test-"$NUMA_NODE" bash -c "
set -e
pytest -x -v -s tests/kernels/attention/test_cpu_attn.py
pytest -x -v -s tests/kernels/test_onednn.py"

# Run basic model test
Expand Down Expand Up @@ -76,7 +77,7 @@ function cpu_tests() {
# Run AWQ test
# docker exec cpu-test-"$NUMA_NODE" bash -c "
# set -e
# VLLM_USE_V1=0 pytest -x -s -v \
# pytest -x -s -v \
# tests/quantization/test_ipex_quant.py"

# Run multi-lora tests
Expand Down Expand Up @@ -116,4 +117,4 @@ function cpu_tests() {

# All of CPU tests are expected to be finished less than 40 mins.
export -f cpu_tests
timeout 2h bash -c "cpu_tests $CORE_RANGE $NUMA_NODE"
timeout 2.5h bash -c "cpu_tests $CORE_RANGE $NUMA_NODE"
2 changes: 1 addition & 1 deletion .buildkite/scripts/hardware_ci/run-xpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,6 @@ docker run \
pytest -v -s v1/worker --ignore=v1/worker/test_gpu_model_runner.py
pytest -v -s v1/structured_output
pytest -v -s v1/spec_decode --ignore=v1/spec_decode/test_max_len.py --ignore=v1/spec_decode/test_tree_attention.py --ignore=v1/spec_decode/test_speculators_eagle3.py
pytest -v -s v1/kv_connector/unit --ignore=v1/kv_connector/unit/test_multi_connector.py --ignore=v1/kv_connector/unit/test_nixl_connector.py --ignore=v1/kv_connector/unit/test_shared_storage_connector.py
pytest -v -s v1/kv_connector/unit --ignore=v1/kv_connector/unit/test_multi_connector.py --ignore=v1/kv_connector/unit/test_nixl_connector.py --ignore=v1/kv_connector/unit/test_shared_storage_connector.py --ignore=v1/kv_connector/unit/test_lmcache_integration.py
pytest -v -s v1/test_serial_utils.py
'
Loading