-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
大神们好。
我用swift export 导出fp8量化的bge-reranker-v2-m3,可以正常导出。但是使用vllm进行启动的时候就报错了。量化前的模型使用VLLM可以正常启动。
fp8量化代码:
CUDA_VISIBLE_DEVICES=5
swift export \
--model /data/xinference/.cache/modelscope/hub/AI-ModelScope/bge-reranker-v2-m3 \
--model_type bge_reranker \
--quant_method fp8 \
--output_dir ./bge-reranker-v2-m3-fp8
这个正常。
VLLM启动:
CUDA_VISIBLE_DEVICES=5 python -m vllm.entrypoints.api_server \
--model ./bge-reranker-v2-m3-fp8 \
--served-model-name bge-reranker-v2-m3 \
--max-model-len 8192 \
--max-num-batched-tokens 32768 \
--max-num-seqs 64 \
--quantization fp8 \
--block-size 32 \
--port 9995
启动报错:
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] self._init_executor()
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] self.collective_rpc("load_model")
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] self.model = model_loader.load_model(
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] self.load_weights(model, model_config)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] loaded_weights = model.load_weights(
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/roberta.py", line 219, in load_weights
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] return loader.load_weights(weights, mapper=self.jina_to_vllm_mapper)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 294, in load_weights
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 252, in _load_module
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] yield from self._load_module(prefix,
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 252, in _load_module
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] yield from self._load_module(prefix,
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 280, in _load_module
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] raise ValueError(msg)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] ValueError: There is no module or parameter named 'classifier.dense.weight_scale_inv' in RobertaForSequenceClassification
(EngineCore_DP0 pid=2883861) Process EngineCore_DP0:
(EngineCore_DP0 pid=2883861) Traceback (most recent call last):
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=2883861) self.run()
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=2883861) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=2883861) raise e
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=2883861) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=2883861) super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=2883861) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=2883861) self._init_executor()
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=2883861) self.collective_rpc("load_model")
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=2883861) return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=2883861) return func(*args, **kwargs)
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=2883861) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=2883861) self.model = model_loader.load_model(
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=2883861) self.load_weights(model, model_config)
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
(EngineCore_DP0 pid=2883861) loaded_weights = model.load_weights(
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/roberta.py", line 219, in load_weights
(EngineCore_DP0 pid=2883861) return loader.load_weights(weights, mapper=self.jina_to_vllm_mapper)
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 294, in load_weights
(EngineCore_DP0 pid=2883861) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 252, in _load_module
(EngineCore_DP0 pid=2883861) yield from self._load_module(prefix,
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 252, in _load_module
(EngineCore_DP0 pid=2883861) yield from self._load_module(prefix,
(EngineCore_DP0 pid=2883861) File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 280, in _load_module
(EngineCore_DP0 pid=2883861) raise ValueError(msg)
(EngineCore_DP0 pid=2883861) ValueError: There is no module or parameter named 'classifier.dense.weight_scale_inv' in RobertaForSequenceClassification
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
(EngineCore_DP0 pid=2883861)
Traceback (most recent call last):
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/entrypoints/api_server.py", line 178, in <module>
asyncio.run(run_server(args))
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/entrypoints/api_server.py", line 127, in run_server
app = await init_app(args, llm_engine)
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/entrypoints/api_server.py", line 113, in init_app
if llm_engine is not None else AsyncLLMEngine.from_engine_args(
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 235, in from_engine_args
return cls(
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
self.engine_core = EngineCoreClient.make_async_mp_client(
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
return AsyncMPClient(*client_args)
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
super().__init__(
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
with launch_core_engines(vllm_config, executor_class,
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
wait_for_engine_startup(
File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
请问这是什么情况?是量化时使用的 model_type 不对吗
VLLM版本:0.13.0
ms_swift版本:3.10.1
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working