Skip to content

swift export 导出fp8量化的bge-reranker-v2-m3,启动时报错 #7288

@Tian14267

Description

@Tian14267

大神们好。
我用swift export 导出fp8量化的bge-reranker-v2-m3,可以正常导出。但是使用vllm进行启动的时候就报错了。量化前的模型使用VLLM可以正常启动。

fp8量化代码:

CUDA_VISIBLE_DEVICES=5
swift export \
    --model /data/xinference/.cache/modelscope/hub/AI-ModelScope/bge-reranker-v2-m3 \
    --model_type bge_reranker \
    --quant_method fp8 \
    --output_dir ./bge-reranker-v2-m3-fp8

这个正常。

VLLM启动:

CUDA_VISIBLE_DEVICES=5 python -m vllm.entrypoints.api_server \
    --model ./bge-reranker-v2-m3-fp8 \
    --served-model-name bge-reranker-v2-m3 \
    --max-model-len 8192 \
    --max-num-batched-tokens 32768 \
    --max-num-seqs 64 \
    --quantization fp8 \
    --block-size 32 \
    --port 9995

启动报错:

(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     self._init_executor()
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     self.collective_rpc("load_model")
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     self.model = model_loader.load_model(
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     self.load_weights(model, model_config)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     loaded_weights = model.load_weights(
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/roberta.py", line 219, in load_weights
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     return loader.load_weights(weights, mapper=self.jina_to_vllm_mapper)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 294, in load_weights
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 252, in _load_module
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     yield from self._load_module(prefix,
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 252, in _load_module
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     yield from self._load_module(prefix,
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 280, in _load_module
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708]     raise ValueError(msg)
(EngineCore_DP0 pid=2883861) ERROR 01-05 17:53:43 [core.py:708] ValueError: There is no module or parameter named 'classifier.dense.weight_scale_inv' in RobertaForSequenceClassification
(EngineCore_DP0 pid=2883861) Process EngineCore_DP0:
(EngineCore_DP0 pid=2883861) Traceback (most recent call last):
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=2883861)     self.run()
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=2883861)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=2883861)     raise e
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=2883861)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=2883861)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=2883861)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=2883861)     self._init_executor()
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=2883861)     self.collective_rpc("load_model")
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=2883861)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=2883861)     return func(*args, **kwargs)
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=2883861)     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=2883861)     self.model = model_loader.load_model(
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=2883861)     self.load_weights(model, model_config)
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
(EngineCore_DP0 pid=2883861)     loaded_weights = model.load_weights(
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/roberta.py", line 219, in load_weights
(EngineCore_DP0 pid=2883861)     return loader.load_weights(weights, mapper=self.jina_to_vllm_mapper)
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 294, in load_weights
(EngineCore_DP0 pid=2883861)     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 252, in _load_module
(EngineCore_DP0 pid=2883861)     yield from self._load_module(prefix,
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 252, in _load_module
(EngineCore_DP0 pid=2883861)     yield from self._load_module(prefix,
(EngineCore_DP0 pid=2883861)   File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 280, in _load_module
(EngineCore_DP0 pid=2883861)     raise ValueError(msg)
(EngineCore_DP0 pid=2883861) ValueError: There is no module or parameter named 'classifier.dense.weight_scale_inv' in RobertaForSequenceClassification
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
(EngineCore_DP0 pid=2883861) 
Traceback (most recent call last):
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/entrypoints/api_server.py", line 178, in <module>
    asyncio.run(run_server(args))
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/entrypoints/api_server.py", line 127, in run_server
    app = await init_app(args, llm_engine)
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/entrypoints/api_server.py", line 113, in init_app
    if llm_engine is not None else AsyncLLMEngine.from_engine_args(
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 235, in from_engine_args
    return cls(
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
    self.engine_core = EngineCoreClient.make_async_mp_client(
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
    return AsyncMPClient(*client_args)
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
    super().__init__(
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
    with launch_core_engines(vllm_config, executor_class,
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/contextlib.py", line 142, in __exit__
    next(self.gen)
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
    wait_for_engine_startup(
  File "/data/miniconda3/envs/fffan_debug/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

请问这是什么情况?是量化时使用的 model_type 不对吗

VLLM版本:0.13.0
ms_swift版本:3.10.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions