Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
581 commits
Select commit Hold shift + click to select a range
6965a39
Fix: Resolve circular import in model_loader/utils.py (#29189)
nandan2003 Nov 22, 2025
2d4978a
fix: clean up function never use in setup.py (#29061)
yihong0618 Nov 22, 2025
5f7209a
[tiny] Remove unsupported TRITON_MLA backend from batch invariance (#…
bwasti Nov 22, 2025
066209a
[Attention] Refactor FA `block_size` limitations to hybrid models onl…
NickLucche Nov 22, 2025
d44a63c
[BugFix] Fix returned logprobs with spec decode + prefill chunking (#…
njhill Nov 22, 2025
ae66818
[Misc] Fix pre-commit (#29238)
DarkLight1337 Nov 22, 2025
d84d8f4
Fix EVS crash when using `video_embeds` inputs in Qwen2.5-VL (#29232)
skyloevil Nov 22, 2025
f55c76c
chore: add RTX_PRO_6000 GLM4.6-FP8 kernel tuning (#29240)
coval3nte Nov 22, 2025
730bd35
[perf][cpu] Accelerate paged attention GEMMs (QK, PV) on Arm CPUs wit…
fadara01 Nov 22, 2025
d1cf821
[Bugfix] Use HF config fields as fallback when loading Mistral config…
DarkLight1337 Nov 22, 2025
eb5352a
[CI/build] Removes source compilation from runtime image (#26966)
bbartels Nov 22, 2025
7df331c
[BugFix] Fix chunked prompt logprobs + preemption (#29071)
njhill Nov 22, 2025
df78aee
Refactor: Move CUDA graph dispatch logic earlier (#27382)
yiz-liu Nov 22, 2025
472fdee
[Chore] Update batch invariant code owner (#29246)
yewentao256 Nov 22, 2025
4587063
Patch DeepEP when building docker image with CUDA 13 (#29154)
soodoshll Nov 22, 2025
5f96c00
[Fix] Add SM check to flashinfer MOE backend (#29144)
jiahanc Nov 23, 2025
3ed767e
docs: fixes distributed executor backend config for multi-node vllm (…
michaelact Nov 23, 2025
389aa1b
[Doc] Update more docs with respect to V1 (#29188)
DarkLight1337 Nov 23, 2025
20ee418
[Model Runner V2] Minor fix for cudagraph_utils (#29256)
WoosukKwon Nov 23, 2025
71362ff
[CI/Build][AMD] Skip test_multi_shared_storage_connector_consistency …
rasmith Nov 23, 2025
3999442
[CI/Build][AMD] Add check for flash_att_varlen_func to test_tree_atte…
rasmith Nov 23, 2025
55c21c8
[ROCm][CI] Fix "Cannot re-initialize CUDA in forked subprocess" in te…
micah-wil Nov 23, 2025
6fb0215
[Bugfix] Use lazy string reference for DeepseekV3Config in config reg…
yongming-qin Nov 23, 2025
7f12c82
[Model Runner V2] Change bookkeeping logic in preparation for spec de…
WoosukKwon Nov 23, 2025
b004c00
[Model Runner V2] Support spec decoding [1/N] (#29274)
WoosukKwon Nov 23, 2025
62d54ba
[Model Runner V2] Optimize CUDA graph capture time (#29275)
WoosukKwon Nov 23, 2025
3e1ad40
[Model Runner V2] Add apply_temperature option to gumbel_sample (#29276)
WoosukKwon Nov 23, 2025
c309bb5
[Bugfix] Update Gradio OpenAI Chatbot Webserver example to new Gradio…
joshiemoore Nov 24, 2025
1073ba6
[LoRA] Optimize 3D MoE logic (#29222)
jeejeelee Nov 24, 2025
3085478
[Model] Add OpenCUA-7B support (#29068)
lim4349 Nov 24, 2025
5253f42
[ROCm] Support for Whisper v1 with Aiter Unified Attention and Aiter …
apinge Nov 24, 2025
0ff7082
[Core] Deprecate `xformers` (#29262)
ywang96 Nov 24, 2025
ed40d85
[BugFix] Fix R-VL model loading error (#29299)
faaany Nov 24, 2025
68dfe28
[Feature][Benchmark] add --link-vars can filter when serve_param equa…
lengrongfu Nov 24, 2025
8005e60
[Bugfix][Rocm] Fix shared expert weight loading failure in DeepSeek-M…
zhyajie Nov 24, 2025
eca7a8f
[Doc]: fix typos in various files (#29230)
didier-durand Nov 24, 2025
4de8786
[CPU][IBM Z] Fix BF16 support and vectorize math operations for s390x…
R3hankhan123 Nov 24, 2025
2601f18
[EPLB] Optimize EPLB for Async Rearrange Experts (#22179)
david6666666 Nov 24, 2025
f716a15
Update KServe guide link in documentation (#29258)
terrytangyuan Nov 24, 2025
7a228b5
Add option to use unbacked, and backed size obl dynamic shapes for mo…
laithsakka Nov 24, 2025
e48b2e6
[Bugfix] [ROCm] [UX] Reorganize ROCm Backend Selection Logic (#26980)
vllmellm Nov 24, 2025
656516c
[Bugfix] properly handle nested json with llama3 tool parser (#27701)
Aydin-ab Nov 24, 2025
e924bbb
[Build/CI][DP/EP] Add QWen/Qwen3-30B-A3B-FP8 + EPLB tests to Nightly …
varun-sundar-rabindranath Nov 24, 2025
26a4655
[NIXL] Use config to enable telemetry + NIXL version bump (#29305)
NickLucche Nov 24, 2025
cc313cb
[Model Runner V2] Implement Single-step Eagle 1 (#29300)
WoosukKwon Nov 24, 2025
cec418b
[Model Runner V2] Change Numba AoT to JIT (#29328)
WoosukKwon Nov 24, 2025
8f06614
[MoE][Refactor] Make select_experts a non-static method (#29067)
bnellnm Nov 24, 2025
839c6b7
[Multimodal][Qwen3 Omni] Make Qwen3 Omni work with audio-in-video inp…
huachenheli Nov 24, 2025
97588c4
[Model Runner V2] Add minor clarification comments for Eagle (#29332)
WoosukKwon Nov 24, 2025
4d6afca
[CI/Build] Moves to cuda-base runtime image while retaining minimal J…
bbartels Nov 24, 2025
3cfa63a
[XPU]fix Kimi-VL-A3B-thinking on xpu (#29309)
yma11 Nov 24, 2025
f32c7d6
[Model Runner V2] Simplify Eagle bookkeeping with num_rejected (#29347)
WoosukKwon Nov 24, 2025
84371da
[Tests] Verify gpt_oss package is installed in harmony tests (#29336)
njhill Nov 24, 2025
4dd42db
Remove VLLM_SKIP_WARMUP tip (#29331)
tlrmchlsmth Nov 24, 2025
71df2a5
[Hybrid Allocator] Better layer padding strategy for gpt-oss eagle (#…
heheda12345 Nov 24, 2025
c17610e
[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 (#29339)
mgoin Nov 24, 2025
699bca7
[UX] Raise error for attn backend of batch invariant (#29348)
yewentao256 Nov 25, 2025
5f9679a
[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden…
hjjq Nov 25, 2025
b8328b4
[XPU] upgrade torch & ipex 2.9 on XPU platform (#29307)
jikunshang Nov 25, 2025
a178a0b
[BugFix] Fix duplicate id tool-call race condition (#29355)
njhill Nov 25, 2025
a4ad43a
Scheduled removal of `ParallelConfig`'s direct child EPLB fields (#29…
hmellor Nov 25, 2025
6f1355a
[Perf] Disable DeepGEMM MoE by default when TP=8 is used (#29346)
mgoin Nov 25, 2025
77e10c9
[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's per…
ganyi1996ppo Nov 25, 2025
cb7214d
[ROCm][MLA] enable fp8 MLA decode on ROCm (#28032)
gbyu-amd Nov 25, 2025
22b42b5
[CI][ROCm] Install arctic-inference on ROCm tests (#29344)
divakar-amd Nov 25, 2025
7012d8b
[Docker] Optimize Dockerfile: consolidate apt-get and reduce image si…
princepride Nov 25, 2025
9cf4eda
[Metrics] Scheduled removal of deprecated metrics (#29330)
markmc Nov 25, 2025
87185c8
[Bugfix] Make deprecated `--task embedding` consistent with `--runner…
maryamtahhan Nov 25, 2025
92effb0
[Model] Add HunyuanOCR support (#29327)
Isotr0py Nov 25, 2025
81db702
[Attention] add `_cudagraph_support` for linear attention (#28934)
ZJY0516 Nov 25, 2025
2d9ee28
[CI/Test Fix] Fix CP tests on Blackwell (#29338)
LucasWilkinson Nov 25, 2025
316c849
Scheduled removal of `guided_*` config fields (#29326)
hmellor Nov 25, 2025
a21256c
Add TP CLI argument to multimodal inference examples (#29301)
faaany Nov 25, 2025
ce58fdc
Fix PoolingParams.skip_reading_prefix_cache type (#29364)
kflu Nov 25, 2025
40a6f53
Display warning only when ROCm version is less than Pytorch required …
Inokinoki Nov 25, 2025
7992324
[BugFix] Use unique ids for different transcription prompts (#29372)
njhill Nov 25, 2025
64deead
[Bugfix] [ROCm] [UX]: revert Flex attention backend (#29371)
vllmellm Nov 25, 2025
98caead
[fix][cpu] Use a SwigluOAI impl which supports interleaved gate-up we…
fadara01 Nov 25, 2025
fe3a4f5
[CI/Build] Pin torchgeo dependency for AMD (#29353)
rjrock Nov 25, 2025
888152b
Allow oot custom compiler extension via CompilerInterface (#28623)
wxsIcey Nov 25, 2025
f242cfc
[Perf] use cpu all reduce to avoid sync when async_scheduling & dp > …
izhuhaoran Nov 25, 2025
12c007e
EAGLE Support DP>1 (#26086)
Flechman Nov 25, 2025
ef1f703
[ROCm][CI] Fix test_cudagraph_mode failure in AMD CI (#29367)
micah-wil Nov 25, 2025
6330f94
[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841)
elvischenv Nov 25, 2025
67fc16c
[Bugfix] If chunked_prefill is disabled, end the scheduling early. (#…
noooop Nov 25, 2025
db29061
[Misc] Streamline unique id generation (#29375)
njhill Nov 25, 2025
32c40b9
[BugFix] bad_words filtering ineffective when n > 1 (#29313)
GOavi101 Nov 25, 2025
a685b47
[responsesAPI] refactor construct_input_messages (#29359)
qandrew Nov 25, 2025
e1dd706
[Frontend] Respect Chat Completion parallel_tool_calls param (#26233)
bbrowning Nov 25, 2025
7a80b01
[CI] Resettle pooling entrypoints tests. (#29370)
noooop Nov 25, 2025
de68899
[Misc] Suppress log outputs when constructing the default vllm config…
noooop Nov 25, 2025
798e87d
[Core] Generalize Encoder-Decoder `seq_lens` computation to avoid Whi…
NickLucche Nov 25, 2025
c2c661a
[Bugfix] Fix overallocation in MM profiling (#29386)
ywang96 Nov 25, 2025
bf0c75c
Make Transformers Nightly tests soft-fail and enable all tests (#29401)
hmellor Nov 25, 2025
51fc9e0
Scheduled removal of `CompilationConfig.use_inductor` (#29323)
hmellor Nov 25, 2025
516c3f7
[Bugfix] Fix logic for choosing default prefix caching setting (#29393)
tdoublep Nov 25, 2025
0231ce8
Revert back to torch.equal over torch.allclose from #28819 (#29086)
eldarkurtic Nov 25, 2025
794029f
[Feature]: Improve GGUF loading from HuggingFace user experience like…
sts07142 Nov 25, 2025
dbc3d99
[UX] Put CUDA attention backend selection log into one line (#29337)
mgoin Nov 25, 2025
e502098
[Kernel] Add NVFP4 MoE CUTLASS support for SM120 (#29242)
mgoin Nov 25, 2025
48ddb02
[Hybrid Allocator] Support KV cache groups with different block_size …
ivanium Nov 25, 2025
a1f2676
Scheduled removal of `override_pooler_config` and `disable_log_reques…
hmellor Nov 25, 2025
0353d2e
Fix RoPE related failures in Transformers nightly tests (#29333)
hmellor Nov 25, 2025
b07555d
[responsesAPI][2] parse ResponseFunctionToolCallOutputItem (#29383)
qandrew Nov 25, 2025
c32a18c
Attempt to fix GPU OOM in a spec-decoding test (#29419)
eldarkurtic Nov 25, 2025
e7d7762
[Compile] Refactor. Move PostGradPassManager out of Compilation confi…
ilmarkov Nov 25, 2025
4e57c65
[Core] Support logprobs with spec decode + async scheduling (#29223)
njhill Nov 25, 2025
0abc794
[caching] Add enable_prompt_embeds and cpu_offload_gb to compile hash…
zhxchen17 Nov 25, 2025
7df0289
Change warning logs to debug for unimplemented MXFP4 Linear/Attention…
mgoin Nov 25, 2025
de75b0b
[BugFix] Fix initialization of draft model. (#29319)
halyavin Nov 25, 2025
d8819c8
fix assertion for single world use case (uni) (#29429)
luccafong Nov 26, 2025
12866af
dummy run corner case (#29433)
xieyangxu Nov 26, 2025
56531b7
[Misc] Add backup hash algorithm for FIPS constrained environments (#…
geodavic Nov 26, 2025
8d6a89d
[UX] Suppress gloo log spam (#29250)
mgoin Nov 26, 2025
c5ee430
Bump actions/checkout from 4 to 6 (#29293)
dependabot[bot] Nov 26, 2025
53d7f1f
[Kernel] Use pre-allocated output buffer for triton kernel fused_expe…
xyang16 Nov 26, 2025
d9d342d
[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek (#27457)
ganyi1996ppo Nov 26, 2025
452a7c9
[Misc] Allow LM only loading for Pixtral (#29451)
ywang96 Nov 26, 2025
e30859d
[Bugfix] Fix handling of image embeds in models (#29480)
DarkLight1337 Nov 26, 2025
bb706d6
Fix TeleChatForCausalLM not register issue (#29473)
Yejing-Lai Nov 26, 2025
3650a74
Optimize the wording of the document and unify the terminology and th…
Adityayxt Nov 26, 2025
70d5953
Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841)" (#29483)
hl475 Nov 26, 2025
0b0aa87
[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10…
yewentao256 Nov 26, 2025
e603129
[refactor] CTConfig methods to static/class methods (#28870)
HDCharles Nov 26, 2025
c4c0354
[CI/Build] allow user modify pplx and deepep ref by ENV or command li…
alec-flowers Nov 26, 2025
430dd4d
[Attention] Remove imports from `vllm/attention/__init__.py` (#29342)
MatthewBonanni Nov 26, 2025
56539cd
[Core] Refactor padding logic and pad for CUDA graphs before attentio…
LucasWilkinson Nov 26, 2025
ba1fcd8
[TPU] add tpu_inference (#27277)
jcyang43 Nov 26, 2025
df01eda
[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878)
HDCharles Nov 27, 2025
7774019
[Attention][Async] Eliminate `seq_lens_cpu` in FlashAttention metadat…
MatthewBonanni Nov 27, 2025
a67dec7
[Bugfix] fix IMA issue in certain cases of the moe marlin kernel (#28…
jinzhen-lin Nov 27, 2025
9bb33c8
add xpu supported model and model id for cpu (#29380)
louie-tsai Nov 27, 2025
0aeb698
[Model Runner V2] Minor code cleanup (#29570)
WoosukKwon Nov 27, 2025
ee80aee
[Model Runner V2] Minor cleanup for build_attn_metadata (#29576)
WoosukKwon Nov 27, 2025
da8e1a1
[DOC] Add vLLM Bangkok Meetup info (#29561)
tjtanaa Nov 27, 2025
ecb1952
[cpu][fix] Fix Arm CI tests (#29552)
fadara01 Nov 27, 2025
11ea5ec
[Model Runner V2] Refactor CudaGraphManager (#29583)
WoosukKwon Nov 27, 2025
c069086
[Bugfix] Fix getting device for MoE LoRA (#29475)
jeejeelee Nov 27, 2025
3ecabd0
Fix tpu-inference platform path (#29554)
jcyang43 Nov 27, 2025
43c5792
[ROCm][CI] Fix test_cpu_offloading for ROCm (#29548)
micah-wil Nov 27, 2025
da3222f
[Model Runner V2] Implement multi-step Eagle with CUDA graph (#29559)
WoosukKwon Nov 27, 2025
00d3310
[Bugfix] Update Ultravox compatibility (#29588)
DarkLight1337 Nov 27, 2025
0838b52
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): Set up…
morrison-turnansky Nov 27, 2025
51906c8
[Docs] Improve `priority` parameter documentation (#29572)
maang-h Nov 27, 2025
e6d4f3c
[Bugfix] Fix pre-commit (#29601)
DarkLight1337 Nov 27, 2025
a5abd1d
[CI] Auto label CPU related issues (#29602)
bigPYJ1151 Nov 27, 2025
cf348c8
[Bugfix] Fix HunyuanVL XD-RoPE (#29593)
ywang96 Nov 27, 2025
2f5f9ac
[LoRA] Continue optimizing MoE LoRA weight loading (#29322)
jeejeelee Nov 27, 2025
882851d
[CI/Build][Bugfix] Fix auto label issues for CPU (#29610)
bigPYJ1151 Nov 27, 2025
bab438f
[CI/Build] Skip ray tests on ROCm (#29556)
rjrock Nov 27, 2025
66d3d54
[Doc]: fixing typos in diverse files (#29492)
didier-durand Nov 27, 2025
cd007a5
[bugfix] avoid NIXL_ERR_REMOTE_DISCONNECT in nixl_connector when Pref…
hasB4K Nov 27, 2025
fc1d8be
[Attention] Update attention imports (#29540)
MatthewBonanni Nov 27, 2025
e1f2623
Update Transformers pin in CI to 4.57.3 (#29418)
hmellor Nov 27, 2025
0840abd
[BugFix] Optional tokenizer argument when loading GGUF models (#29582)
sts07142 Nov 27, 2025
ee9841d
[Bugfix] Fix doc build on main (#29619)
DarkLight1337 Nov 27, 2025
d45269b
add skip_reading_prefix_cache in repr for PoolingParams (#29620)
guodongxiaren Nov 27, 2025
ea228b4
[Misc] Remove unused code from `protocol.py` (#29616)
DarkLight1337 Nov 27, 2025
a24ea54
[Deprecation] Advance deprecation status (#29617)
DarkLight1337 Nov 27, 2025
38658ec
[Bugfix][MM encoder] Fix ViT attention backend resolving for Turing G…
Isotr0py Nov 27, 2025
e5a621b
[CI] Add batched audios Whisper test (#29308)
NickLucche Nov 27, 2025
a5345bf
[BugFix] Fix `plan` API Mismatch when using latest FlashInfer (#29426)
askliar Nov 27, 2025
ae0ce1b
[Model Runner V2][BugFix] Keep reference to GPU tensors in AsyncOutpu…
WoosukKwon Nov 27, 2025
be493e0
[BugFix] Fix new nightly failures (#29578)
LucasWilkinson Nov 27, 2025
35657bc
[CPU]Update CPU PyTorch to 2.9.0 (#29589)
scydas Nov 28, 2025
745a3ba
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)
xyang16 Nov 28, 2025
18523b8
[Docs] Update supported models for Olmo 3 in tool calling documentati…
wilsonwu Nov 28, 2025
c7ba1f6
[BugFix] Fix ValueError in NewRequestData repr methods (#29392)
maang-h Nov 28, 2025
37b15e9
[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qw…
EanWang211123 Nov 28, 2025
f4b7605
Improve enable chunked_prefill & prefix_caching logic. (#26623)
noooop Nov 28, 2025
b34e877
Revert "[CPU]Update CPU PyTorch to 2.9.0 (#29589)" (#29647)
DarkLight1337 Nov 28, 2025
4805989
[Feature][Bench] Add pareto visualization (#29477)
lengrongfu Nov 28, 2025
cc0f2a0
[Doc] Improve abnormal information string (#29655)
maang-h Nov 28, 2025
b2c1d29
[BUGFIX] MistralTokenizer._call__ adds an invalid EOS token (#29607)
juliendenize Nov 28, 2025
5f5521b
Fix parameter order in GPT-OSS weight loading function for non-MXFP4 …
qGentry Nov 28, 2025
ccbdf51
[Doc] Reorganize benchmark docs (#29658)
DarkLight1337 Nov 28, 2025
3cb32e5
[Rocm] Set VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS default is disab…
zhyajie Nov 28, 2025
5c2b5cb
[Docs] Add SPLADE and Ultravox models to supported models documentati…
wilsonwu Nov 28, 2025
33b06a6
[Misc] Remove redundant attention var constants (#29650)
DarkLight1337 Nov 28, 2025
953d9c8
[mypy] Pass type checking for `vllm/utils` and `vllm/v1/pool` (#29666)
DarkLight1337 Nov 28, 2025
8e7a891
[BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542)
njhill Nov 28, 2025
1168768
[Optimization] Early return for `_apply_matches` and `_iter_placehold…
DarkLight1337 Nov 28, 2025
f8151b6
Revert "Supress verbose logs from model_hosting_container_standards (…
HappyAmazonian Nov 28, 2025
e2f56c3
[CPU] Update torch 2.9.1 for CPU backend (#29664)
bigPYJ1151 Nov 28, 2025
460d8bb
Remove upstream fa checks (#29471)
Victor49152 Nov 28, 2025
0808eb8
[Misc] Remove `yapf` directives (#29675)
DarkLight1337 Nov 28, 2025
9eec282
Guard FlashInfer sampler using the same check as FlashInfer attention…
hmellor Nov 28, 2025
9e6bcda
[mypy] Enable type checking for more directories (#29674)
DarkLight1337 Nov 28, 2025
3bcbb30
add add_truncate_prompt_tokens in repr for PoolingParams (#29683)
guodongxiaren Nov 28, 2025
fae6943
[Doc]: fixing typos in multiple files. (#29685)
didier-durand Nov 28, 2025
6f9d81d
[V0 deprecation] Clean up legacy paged attention helper functions (#2…
Isotr0py Nov 28, 2025
f946a8d
[Chore]: Reorganize model repo operating functions in `transformers_u…
Isotr0py Nov 28, 2025
4332955
[Docs] Add CLI reference doc for `vllm bench sweep plot_pareto` (#29689)
hmellor Nov 28, 2025
d40c854
[CI/Build] Rework CPU multimodal processor test (#29684)
Isotr0py Nov 28, 2025
8d9338f
[Chore] Rename `Processor` to `InputProcessor` (#29682)
DarkLight1337 Nov 28, 2025
fecae12
Remove `all_special_tokens_extended` from tokenizer code (#29686)
hmellor Nov 28, 2025
3461e7e
[Frontend] Remap -O to -cc commandline flag (#29557)
gmagogsfm Nov 28, 2025
1986de1
[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597)
benchislett Nov 28, 2025
7c1ed45
[CI/Build]: make it possible to build with a free-threaded interprete…
rgommers Nov 28, 2025
7675ba3
[Misc] Remove redundant `ClassRegistry` (#29681)
DarkLight1337 Nov 28, 2025
a51f418
[Bugfix] fix dots.llm1.inst (#29687)
ZJY0516 Nov 28, 2025
3fd1fb0
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)…
hl475 Nov 28, 2025
9726e64
bugfix: correct attn output with base 2 or e (#28840)
staugust Nov 28, 2025
6173682
[compile] Include `enable_sleep_mode` into caching factors. (#29696)
zhxchen17 Nov 28, 2025
c625d7b
[Bugfix] Fix O(n²) multimodal string prompt processing (#29667)
mertunsall Nov 29, 2025
ea3370b
[ROCm][Bugfix] Patch for the `Multi-Modal Processor Test` group (#29702)
AndreasKaratzas Nov 29, 2025
1dcafb3
[Model Runner V2] Support penalties using bin counts (#29703)
WoosukKwon Nov 29, 2025
b2c50ed
[Bugfix] Fix wrong mock attribute (#29704)
DarkLight1337 Nov 29, 2025
762a4a6
[Frontend] Perform offline path replacement to `tokenizer` (#29706)
a4lg Nov 29, 2025
ca1b1e7
[Model Runner V2] Refactor prefill token preparation (#29712)
WoosukKwon Nov 29, 2025
e23f665
[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not ite…
LucasWilkinson Nov 29, 2025
4b17ce6
Add gpu memory wait before test_async_tp (#28893)
angelayi Nov 29, 2025
4a80ad0
[Model Runner V2] Don't use UVA buffer for prefill_len (#29713)
WoosukKwon Nov 29, 2025
39e63de
[LoRA] Cleanup LoRA unused code (#29611)
jeejeelee Nov 29, 2025
6afc0ff
[Model Runner V2] Add sample/ directory and reorganize files (#29719)
WoosukKwon Nov 29, 2025
04a797c
[Doc]: fixing typos in various files. (#29717)
didier-durand Nov 29, 2025
f223ed4
[Model Runner V2] Fuse penalties and temperature into single kernel (…
WoosukKwon Nov 29, 2025
34a9842
[Misc] Refactor tokenizer interface (#29693)
DarkLight1337 Nov 29, 2025
f4341f4
[Doc]: fix code block rendering (#29728)
dublc Nov 29, 2025
ad7f714
hfrunner.classify should return list[list[float]] not list[str] (#29671)
nwaughachukwuma Nov 29, 2025
fe3398f
[Chore] Enable passing `tokenizer=None` into MM processor (#29724)
DarkLight1337 Nov 29, 2025
fa59fe4
[Chore] Move `detokenizer_utils` to `vllm/tokenizers` (#29727)
DarkLight1337 Nov 29, 2025
1656ad3
[Kernel][Quantization] add w4a8 support for marlin kernel (#24722)
jinzhen-lin Nov 29, 2025
b9d0504
[Bugfix] Revert test_tokenization.py (#29729)
jeejeelee Nov 29, 2025
a491b09
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708)
xyang16 Nov 30, 2025
e1464c3
[Quantization] Enable compressed-tensors AWQ for Turing GPU (#29732)
Isotr0py Nov 30, 2025
82c795d
Fix AttributeError about _use_fi_prefill (#29734)
hl475 Nov 30, 2025
66b5840
[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) …
Flink-ddd Nov 30, 2025
9381b5c
[Doc]: Fix typo in fused_moe layer (#29731)
BowTen Nov 30, 2025
2afcec4
[Misc] Update `TokenizerLike` interface and move `get_cached_tokenize…
DarkLight1337 Nov 30, 2025
47539cf
[Bugfix] Fix mismatched nvfp4 gemm output shape (#29742)
Isotr0py Nov 30, 2025
64bc09b
[Core] Enable `inputs_embeds_size` separate from `hidden_size` (#29741)
DarkLight1337 Nov 30, 2025
8c363ed
[ROCm][Attention] Sliding window support for `AiterFlashAttentionBack…
ganyi1996ppo Nov 30, 2025
cd719de
Fix RoPE failures in Transformers nightly (#29700)
hmellor Nov 30, 2025
39d2810
[Feat] Support non-gated activations in NVFP4 modelopt path (#29004)
omera-nv Nov 30, 2025
21c2627
[Misc]Remove redundant hidden_size property in ModelConfig (#29749)
charlotte12l Nov 30, 2025
ec38a73
[Model Runner V2] Use packed mask for prompt bin counts (#29756)
WoosukKwon Nov 30, 2025
f72a817
[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141)
wenscarl Dec 1, 2025
1ab8fc8
Make PyTorch profiler gzip and CUDA time dump configurable (#29568)
zhangruoxu Dec 1, 2025
83805a6
[CI] Skip paddleocr_vl for transformer 4.57.3 (#29758)
hl475 Dec 1, 2025
62de4f4
[Frontend] Resettle pooling entrypoints (#29634)
noooop Dec 1, 2025
014ece9
[Frontend] Add tool filtering support to ToolServer (#29224)
daniel-salib Dec 1, 2025
86e178f
[crashfix] Eagle + multimodal can crash on mm cache miss (#29750)
mickaelseznec Dec 1, 2025
f0a28bf
[Misc] Unify tokenizer registration (#29767)
DarkLight1337 Dec 1, 2025
f37e893
[XPU] Fix AWQ skipped layer detection in IPEX quantization (#29774)
faaany Dec 1, 2025
ad9d656
[multimodal][test] Reduce memory utilization for test_siglip to avoid…
zhxchen17 Dec 1, 2025
b95db24
[v1] Add real sliding window calculation to FlexAttention direct Bloc…
Isotr0py Dec 1, 2025
5cfa967
[Bugfix] TypeError: 'NoneType' object is not callable (#29414)
mostrowskix Dec 1, 2025
36db0a3
[CI] Renovation of nightly wheel build & generation (#29690)
Harry-Chen Dec 1, 2025
30624ea
sync upstream
vllmellm Dec 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
46 changes: 0 additions & 46 deletions .buildkite/generate_index.py

This file was deleted.

16 changes: 1 addition & 15 deletions .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ steps:
commands:
# #NOTE: torch_cuda_arch_list is derived from upstream PyTorch build files here:
# https://github.com/pytorch/pytorch/blob/main/.ci/aarch64_linux/aarch64_ci_build.sh#L7
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg VLLM_MAIN_CUDA_VERSION=12.9 --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
- "bash .buildkite/scripts/upload-wheels.sh"
Expand All @@ -30,19 +30,6 @@ steps:
DOCKER_BUILDKIT: "1"

# x86 + CUDA builds
- label: "Build wheel - CUDA 12.8"
depends_on: ~
id: build-wheel-cuda-12-8
agents:
queue: cpu_queue_postmerge
commands:
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.8.1 --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
- "bash .buildkite/scripts/upload-wheels.sh"
env:
DOCKER_BUILDKIT: "1"

- label: "Build wheel - CUDA 12.9"
depends_on: ~
id: build-wheel-cuda-12-9
Expand Down Expand Up @@ -109,7 +96,6 @@ steps:
- label: "Annotate release workflow"
depends_on:
- create-multi-arch-manifest
- build-wheel-cuda-12-8
id: annotate-release-workflow
agents:
queue: cpu_queue_postmerge
Expand Down
9 changes: 5 additions & 4 deletions .buildkite/scripts/annotate-release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ To download the wheel (by version):
aws s3 cp s3://vllm-wheels/${RELEASE_VERSION}/vllm-${RELEASE_VERSION}-cp38-abi3-manylinux1_x86_64.whl .
aws s3 cp s3://vllm-wheels/${RELEASE_VERSION}/vllm-${RELEASE_VERSION}-cp38-abi3-manylinux2014_aarch64.whl .

aws s3 cp s3://vllm-wheels/${RELEASE_VERSION}+cu126/vllm-${RELEASE_VERSION}+cu126-cp38-abi3-manylinux1_x86_64.whl .
aws s3 cp s3://vllm-wheels/${RELEASE_VERSION}+cu129/vllm-${RELEASE_VERSION}+cu129-cp38-abi3-manylinux1_x86_64.whl .
aws s3 cp s3://vllm-wheels/${RELEASE_VERSION}+cu130/vllm-${RELEASE_VERSION}+cu130-cp38-abi3-manylinux1_x86_64.whl .
\`\`\`

To download and upload the image:
Expand All @@ -45,9 +45,10 @@ docker tag vllm/vllm-openai:aarch64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker push vllm/vllm-openai:latest-aarch64
docker push vllm/vllm-openai:v${RELEASE_VERSION}-aarch64

docker manifest create vllm/vllm-openai:latest vllm/vllm-openai:latest-x86_64 vllm/vllm-openai:latest-aarch64 --amend
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION} vllm/vllm-openai:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64 --amend
docker manifest rm vllm/vllm-openai:latest
docker manifest create vllm/vllm-openai:latest vllm/vllm-openai:latest-x86_64 vllm/vllm-openai:latest-aarch64
docker manifest create vllm/vllm-openai:v${RELEASE_VERSION} vllm/vllm-openai:v${RELEASE_VERSION}-x86_64 vllm/vllm-openai:v${RELEASE_VERSION}-aarch64
docker manifest push vllm/vllm-openai:latest
docker manifest push vllm/vllm-openai:v${RELEASE_VERSION}
\`\`\`
EOF
EOF
Loading