Skip to content

Commit c51d37d

Browse files
committed
Fix the issue that run out of requests make no parallelism in PP mode in ray backend.
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
1 parent 552cac9 commit c51d37d

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

vllm/v1/core/sched/scheduler.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -397,7 +397,12 @@ def schedule(self) -> SchedulerOutput:
397397
while self.waiting and token_budget > 0:
398398
if len(self.running) == self.max_num_running_reqs:
399399
break
400-
400+
if len(scheduled_resumed_reqs) + len(scheduled_new_reqs) >= max(
401+
1,
402+
self.max_num_running_reqs
403+
// self.parallel_config.pipeline_parallel_size,
404+
):
405+
break
401406
request = self.waiting.peek_request()
402407

403408
# KVTransfer: skip request if still waiting for remote kvs.

0 commit comments

Comments
 (0)