-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Description
部署参数:
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve /xxx/qwen3-next/20251017/hf-full-202510171312 --served-model-name qwen3-next --port 9001 --trust-remote-code --tensor-parallel-size 2 --gpu-memory-utilization 0.90 --enable-prefix-caching --max-model-len 4096 --reasoning-parser deepseek_r1 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
使用该参数vllm直接部署qwen-next原始模型无错误
报错日志:
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] Traceback (most recent call last):
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs)
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] self.worker.load_model()
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2642, in load_model
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] self.drafter.load_model(self.model)
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/spec_decode/eagle.py", line 821, in load_model
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] self.model = get_model(vllm_config=self.vllm_config,
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 119, in get_model
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] return loader.load_model(vllm_config=vllm_config,
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] self.load_weights(model, model_config)
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/default_loader.py", line 276, in load_weights
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] raise ValueError("Following weights were not initialized from "
(Worker_TP0 pid=1766) ERROR 10-20 16:59:32 [multiproc_executor.py:597] ValueError: Following weights were not initialized from checkpoint: {'model.layers.0.self_attn.q_norm.weight', 'model.pre_fc_norm_hidden.weight', 'model.layers.0.post_attention_layernorm.weight', 'model.layers.0.mlp.shared_expert.gate_up_proj.weight', 'model.layers.0.self_attn.qkv_proj.weight', 'model.layers.0.mlp.shared_expert_gate.weight', 'model.layers.0.input_layernorm.weight', 'model.layers.0.mlp.experts.w2_weight', 'model.layers.0.mlp.experts.w13_weight', 'model.fc.weight', 'model.layers.0.self_attn.o_proj.weight', 'model.norm.weight', 'model.layers.0.self_attn.k_norm.weight', 'model.layers.0.mlp.shared_expert.down_proj.weight', 'model.pre_fc_norm_embedding.weight', 'model.layers.0.mlp.gate.weight'}```
config.json:
```{
"architectures": [
"Qwen3NextForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"decoder_sparse_step": 1,
"eos_token_id": 151645,
"full_attention_interval": 4,
"head_dim": 256,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5120,
"linear_conv_kernel_dim": 4,
"linear_key_head_dim": 128,
"linear_num_key_heads": 16,
"linear_num_value_heads": 32,
"linear_value_head_dim": 128,
"max_position_embeddings": 262144,
"mlp_only_layers": [],
"model_type": "qwen3_next",
"moe_intermediate_size": 512,
"norm_topk_prob": true,
"num_attention_heads": 16,
"num_experts": 512,
"num_experts_per_tok": 10,
"num_hidden_layers": 48,
"num_key_value_heads": 2,
"output_router_logits": false,
"partial_rotary_factor": 0.25,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 10000000,
"router_aux_loss_coef": 0.001,
"shared_expert_intermediate_size": 512,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.57.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}```
Metadata
Metadata
Assignees
Labels
No labels