Runtime_flags are ignored on Linux in certain cases

When running DMR on a Linux machine using `runtime_flags`, they are always ignored after a reboot. It also seems to revert to the default config unexpectedly when `docker model config` is manually run. Recreating the containers will apply them again. Any idea why this is happening and how to handle this?

Docker compose:

```
models:
 llm:
   model: ai/qwen3-vl:8B
   context_size: 16384
   runtime_flags:
     - "-c" 
     - "16384"
     - "-np"
     - "4"
     - "-b"  
     - "2048"
     - "--no-mmap" 
     - "--flash-attn" 
     - "on"
     - "--cache-type-k"
     - "q8_0"
     - "--cache-type-v" 
     - "q8_0"
 instant:
   model: ai/qwen3-vl:2B-UD-Q4_K_XL
   context_size: 8192
   runtime_flags:
     - "-c" 
     - "8192"
     - "-np"
     - "4"
     - "-b"  
     - "2048"
     - "--no-mmap" 
     - "--flash-attn" 
     - "on"
     - "--cache-type-k" 
     - "q8_0"
     - "--cache-type-v" 
     - "q8_0"
```

After reboot and executing the first request (see last line, no runtime_flags):

```
time="2026-03-03T10:13:24Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:13:24Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:13:24Z" level=info msg="Listing available models" component=model-manager
time="2026-03-03T10:13:24Z" level=info msg="Successfully listed models, count: 2" component=model-manager
time="2026-03-03T10:13:24Z" level=info msg="Loading llama.cpp backend runner with model sha256:a18971a77b8fda79c555f603c4f94ca0183cb40499f336dc2825659504e29fc5 in completion mode"
time="2026-03-03T10:13:24Z" level=info msg="Getting model by reference: sha256:a18971a77b8fda79c555f603c4f94ca0183cb40499f336dc2825659504e29fc5" component=model-manager
time="2026-03-03T10:13:24Z" level=info msg="Listing available models" component=model-manager
time="2026-03-03T10:13:24Z" level=info msg="Successfully listed models, count: 2" component=model-manager
time="2026-03-03T10:13:24Z" level=info msg="Listing available models" component=model-manager
time="2026-03-03T10:13:24Z" level=info msg="Successfully listed models, count: 2" component=model-manager
time="2026-03-03T10:13:24Z" level=info msg="llama.cpp args: [-ngl 999 --metrics --model /models/bundles/sha256/a18971a77b8fda79c555f603c4f94ca0183cb40499f336dc2825659504e29fc5/model/model.gguf --host inference-runner-0.sock --mmproj /models/bundles/sha256/a18971a77b8fda79c555f603c4f94ca0183cb40499f336dc2825659504e29fc5/model/model.mmproj]"
```

After a `--force-recreate` of the containers (see last line, runtime_flags applied):

```
time="2026-03-03T10:19:40Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Configuring llama.cpp runner for sha256:50f70f7f0ca537b2ca4843bae5456bb4b6f9d9d58d4b357faf1a3ad9b574888d"
time="2026-03-03T10:19:40Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Configuring llama.cpp runner for sha256:a18971a77b8fda79c555f603c4f94ca0183cb40499f336dc2825659504e29fc5"
time="2026-03-03T10:19:40Z" level=info msg="Loading llama.cpp backend runner with model sha256:50f70f7f0ca537b2ca4843bae5456bb4b6f9d9d58d4b357faf1a3ad9b574888d in completion mode"
time="2026-03-03T10:19:40Z" level=info msg="Getting model by reference: sha256:50f70f7f0ca537b2ca4843bae5456bb4b6f9d9d58d4b357faf1a3ad9b574888d" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Listing available models" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Successfully listed models, count: 2" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Listing available models" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Successfully listed models, count: 2" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:19:40Z" level=info msg="Loading llama.cpp backend runner with model sha256:a18971a77b8fda79c555f603c4f94ca0183cb40499f336dc2825659504e29fc5 in completion mode"
time="2026-03-03T10:19:40Z" level=info msg="llama.cpp args: [-ngl 999 --metrics --model /models/bundles/sha256/50f70f7f0ca537b2ca4843bae5456bb4b6f9d9d58d4b357faf1a3ad9b574888d/model/model.gguf --host inference-runner-0.sock --ctx-size 8192 -c 8192 -np 4 -b 2048 --no-mmap --flash-attn on --cache-type-k q8_0 --cache-type-v q8_0 --mmproj /models/bundles/sha256/50f70f7f0ca537b2ca4843bae5456bb4b6f9d9d58d4b357faf1a3ad9b574888d/model/model.mmproj]"
```

I've also tried using cronjobs to keep the models loaded with the config, but after a reboot it also ignores them and reverts to the default context size of 4096, causing issues with larger requests:

```
*/5 * * * * docker model configure --context-size 16384 ai/qwen3-vl:8B -- -c 16384 -np 4 -b 2048 --no-mmap --flash-attn on --cache-type-k q8_0 --cache-type-v q8_0 && docker model run --detach ai/qwen3-vl:8B 2>&1
*/5 * * * * docker model configure --context-size 16384 ai/qwen3-vl:2B-UD-Q4_K_XL -- -c 8192 -np 4 -b 2048 --no-mmap --flash-attn on --cache-type-k q8_0 --cache-type-v q8_0 && docker model run --detach ai/qwen3-vl:2B-UD-Q4_K_XL 2>&1
```

It also seems to reset to 4096 tokens when the cronjob is run later on, at least, that's how I interpret the `n_ctx_slot = 4096`:

```
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Loading llama.cpp backend runner with model sha256:a18971a77b8fda79c555f603c4f94ca0183cb40499f336dc2825659504e29fc5 in completion mode"
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="srv  params_from_: Chat format: Hermes 2 Pro" component=llama.cpp
time="2026-03-03T10:40:01Z" level=info msg="slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.583 (> 0.100 thold), f_keep = 0.003" component=llama.cpp
time="2026-03-03T10:40:01Z" level=info msg="slot launch_slot_: id  3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist " component=llama.cpp
time="2026-03-03T10:40:01Z" level=info msg="slot launch_slot_: id  3 | task 4254 | processing task, is_child = 0" component=llama.cpp
time="2026-03-03T10:40:01Z" level=info msg="slot update_slots: id  3 | task 4254 | new prompt, n_ctx_slot = 4096, n_keep = 0, task.n_tokens = 12" component=llama.cpp
time="2026-03-03T10:40:01Z" level=info msg="slot update_slots: id  3 | task 4254 | n_tokens = 7, memory_seq_rm [7, end)" component=llama.cpp
time="2026-03-03T10:40:01Z" level=info msg="slot update_slots: id  3 | task 4254 | prompt processing progress, n_tokens = 12, batch.n_tokens = 5, progress = 1.000000" component=llama.cpp
time="2026-03-03T10:40:01Z" level=info msg="slot update_slots: id  3 | task 4254 | prompt done, n_tokens = 12, batch.n_tokens = 5" component=llama.cpp
time="2026-03-03T10:40:01Z" level=info msg="slot init_sampler: id  3 | task 4254 | init sampler, took 0.00 ms, tokens: text = 12, total = 12" component=llama.cpp
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Configuration for llama.cpp runner for modelID sha256:50f70f7f0ca537b2ca4843bae5456bb4b6f9d9d58d4b357faf1a3ad9b574888d unchanged"
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Loading llama.cpp backend runner with model sha256:50f70f7f0ca537b2ca4843bae5456bb4b6f9d9d58d4b357faf1a3ad9b574888d in completion mode"
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Configuration for llama.cpp runner for modelID sha256:a18971a77b8fda79c555f603c4f94ca0183cb40499f336dc2825659504e29fc5 unchanged"
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Loading llama.cpp backend runner with model sha256:a18971a77b8fda79c555f603c4f94ca0183cb40499f336dc2825659504e29fc5 in completion mode"
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:2B-UD-Q4_K_XL" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Loading llama.cpp backend runner with model sha256:50f70f7f0ca537b2ca4843bae5456bb4b6f9d9d58d4b357faf1a3ad9b574888d in completion mode"
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Getting model by reference: ai/qwen3-vl:8B" component=model-manager
time="2026-03-03T10:40:01Z" level=info msg="Loading llama.cpp backend runner with model sha256:a18971a77b8fda79c555f603c4f94ca0183cb40499f336dc2825659504e29fc5 in completion mode"
```

Versions:

```
$ docker version
Client: Docker Engine - Community
 Version:           29.2.1
 API version:       1.53
 Go version:        go1.25.6
 Git commit:        a5c7197
 Built:             Mon Feb  2 17:17:26 2026
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          29.2.1
  API version:      1.53 (minimum version 1.44)
  Go version:       go1.25.6
  Git commit:       6bc6209
  Built:            Mon Feb  2 17:17:26 2026
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v2.2.1
  GitCommit:        dea7da592f5d1d2b7755e3a161be07f43fad8f75
 runc:
  Version:          1.3.4
  GitCommit:        v1.3.4-0-gd6d73eb8
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
```

```
$ docker model version
Client:
 Version:    v1.1.5
 OS/Arch:    linux/amd64

Server:
 Version:    v1.1.0
 Engine:     Docker Engine
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime_flags are ignored on Linux in certain cases #726

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Runtime_flags are ignored on Linux in certain cases #726

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions