llama-server can not host Deepseek-r1-distill-Qwen-1.5B on CUDA #11673
-
|
I carefully follwing this documents to built project After build, I run llama-server with my command I track with And the VRAM of GPU is like this: I don't know if there is misunderstood but when I ran And after that, I ran the command So my question is, is there any way I can host and inference my model on CUDA |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
You need to add the |
Beta Was this translation helpful? Give feedback.


You need to add the
-nglparameter to the command line. Try-ngl 99.