Performance Discrepancy with -ctv q4_0 compared to q8_0 #17493
-
|
Found that -ctv q4_0 is much slower than q8_0 (more than 10x slower on pp512), with following environments.
launching command line: .\llama-bench.exe --model "E:\LLM\Qwen3-VL-30B-A3B-Instruct\Qwen3-VL-30B-A3B-Instruct-UD-Q4_K_XL.gguf" `
-ctk q8_0 `
-ctv q8_0,q4_0 `
-fa 1 `
-ngl 99 `
--threads 8 `
--device CUDA1 `
-ub 4096 `
-b 2048 Is this performance discrepancy expected or not?
I would be happy to provide any additional information needed. Thank you for your time and for the continued work on this project. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Found that when -ctk is also Q4_0, the performance boost up back.
|
Beta Was this translation helpful? Give feedback.
Found that when -ctk is also Q4_0, the performance boost up back.