-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
使用
IMAGE_MAX_TOKEN_NUM=128 CUDA_VISIBLE_DEVICES=0,1,2,3 swift deploy --model /Exp008/v0-20251231-163417/iter_0000600_hf --infer_backend vllm --port 8001 --api_key abc123 --served_model_name Exp008_step600 --vllm_tensor_parallel_size 4 --max_new_tokens 30000 --vllm_gpu_memory_utilization 0.8部署Qwen3-vl-32B-think进行推理,16并发推理,会遇到以下问题
[INFO:swift] Traceback (most recent call last): File "/root/training_env_sft/ms-swift/swift/llm/infer/deploy.py", line 198, in create_chat_completion res_or_gen = await self.infer_async(infer_request, request_config, template=self.template, **infer_kwargs) File "/root/training_env_sft/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 806, in infer_async return await self._infer_full_async(**kwargs) File "/root/training_env_sft/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 671, in _infer_full_async async for result in result_generator: File "/root/training_env_sft/.venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 370, in generate q = await self.add_request( File "/root/training_env_sft/.venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 276, in add_request raise EngineDeadError() vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO: 28.18.40.30:47747 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Your hardware and system info
这是我的环境信息:
vllm 0.11.0
torch 2.8.0
ms-swift 3.13.0.dev0
Additional context
期待您的回复