Name and Version
./llama-server --version
version: 9867 (e727109)
built with GNU 14.2.0 for Linux x86_64
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./llama-server --model model.gguf --port 8081 -ngl 32 -c 8192 --flash-attn on -ctk turbo4 -ctv turbo3 -t 6
Problem description & steps to reproduce
CPU: Intel i7
GPU: NVIDIA Quadro T1000 Max-Q (SM75 / Turing 4GB)
CUDA: 12.8
Server segfaults immediately after initializing slots when any CUDA layers are offloaded (-ngl > 0) on SM75 (Turing) hardware. CPU-only (-ngl 0) works correctly. Reproducible on both MTP and non-MTP model.
Segfaults at initializing slots with -ngl > 0
Warning: fused Gated Delta Net (chunked) not supported, set to disabled
CPU-only (-ngl 0) works fine
Reproducible on both MTP and non-MTP model
Occurs with and without --spec-type mtp
Occurs with and without --no-warmup
fused Gated Delta Net disabled warning appears consistently before crash
CPU-only run (-ngl 0) loads and serves correctly
Model: Qwopus3.5-4B-v3-MTP Q5_K_M GGUF
First Bad Commit
No response
Relevant log output
Logs
Here's the last log lines before the SegFault:
W sched_reserve: layer 0 is assigned to device CPU but the fused Gated Delta Net tensor is assigned to device CUDA0
W sched_reserve: fused Gated Delta Net (chunked) not supported, set to disabled
I srv load_model: initializing slots, n_slots = 4
[segfault]
Name and Version
./llama-server --version
version: 9867 (e727109)
built with GNU 14.2.0 for Linux x86_64
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
CPU: Intel i7
GPU: NVIDIA Quadro T1000 Max-Q (SM75 / Turing 4GB)
CUDA: 12.8
Server segfaults immediately after initializing slots when any CUDA layers are offloaded (-ngl > 0) on SM75 (Turing) hardware. CPU-only (-ngl 0) works correctly. Reproducible on both MTP and non-MTP model.
Segfaults at initializing slots with -ngl > 0
Warning: fused Gated Delta Net (chunked) not supported, set to disabled
CPU-only (-ngl 0) works fine
Reproducible on both MTP and non-MTP model
Occurs with and without --spec-type mtp
Occurs with and without --no-warmup
fused Gated Delta Net disabled warning appears consistently before crash
CPU-only run (-ngl 0) loads and serves correctly
Model: Qwopus3.5-4B-v3-MTP Q5_K_M GGUF
First Bad Commit
No response
Relevant log output
Logs