Skip to content
Discussion options

You must be logged in to vote

Found that when -ctk is also Q4_0, the performance boost up back.

model size params backend ngl n_ubatch type_k type_v fa dev test t/s
qwen3vlmoe 30B.A3B Q4_K - Medium 16.49 GiB 30.53 B CUDA 99 4096 q8_0 q8_0 1 CUDA1 pp512 3240.28 ± 142.19
qwen3vlmoe 30B.A3B Q4_K - Medium 16.49 GiB 30.53 B CUDA 99 4096 q8_0 q8_0 1 CUDA1 tg128 150.97 ± 1.08
qwen3vlmoe 30B.A3B Q4_K - Medium 16.49 GiB 30.53 B CUDA 99 4096 q8_0 q4_0 1 CUDA1 pp512 164.68 ± 16.57
qwen3vlmoe 30B.A3B Q4_K - Medium 16.49 GiB 30.53 B CUDA 99 4096 q8_0 q4_0 1 CUDA1 tg128 30.97 ± 0.77
qwen3vlmoe 30B.A3B Q4_K - Medium 16.49 GiB 30.53 B CUDA 99 4096 q4_0 q8_0 1 CUDA1 pp512 172.30 ± 5.63
qwen3vlmoe 30B.A3B Q4_K - Medium 1…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by TkskKurumi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant