feat: add ErrorTransformer for llama.cpp by doringeman · Pull Request #729 · docker/model-runner

doringeman · 2026-03-03T21:40:23Z

Add ErrorTransformer for llama.cpp to surface friendly error messages.

E.g.,

$ MODEL_RUNNER_HOST=http://localhost:8080/ docker model run gpt-oss hi
Failed to generate a response: error response: status=500 body=unable to load runner: error waiting for runner to be ready: llama.cpp terminated unexpectedly: llama.cpp failed: not enough GPU memory to load the model (CUDA)

Logs:

time=2026-03-03T21:33:19.366Z level=INFO msg="load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)"
time=2026-03-03T21:36:19.179Z level=INFO msg="ggml_backend_cuda_buffer_type_alloc_buffer: allocating 10723.15 MiB on device 0: cudaMalloc failed: out of memory"
time=2026-03-03T21:36:19.181Z level=INFO msg="alloc_tensor_range: failed to allocate CUDA0 buffer of size 11244037120"
time=2026-03-03T21:36:20.133Z level=INFO msg="llama_model_load: error loading model: unable to allocate CUDA0 buffer"
time=2026-03-03T21:36:20.133Z level=INFO msg="llama_model_load_from_file_impl: failed to load model"
time=2026-03-03T21:36:21.070Z level=INFO msg="common_init_from_params: failed to load model '/models/bundles/sha256/9398339cb0d3b150931212377c16d2a105ddce053ec187e4397ba6e10f3ea112/model/model.gguf'"
time=2026-03-03T21:36:21.070Z level=INFO msg="srv    load_model: failed to load model, '/models/bundles/sha256/9398339cb0d3b150931212377c16d2a105ddce053ec187e4397ba6e10f3ea112/model/model.gguf'"
time=2026-03-03T21:36:21.070Z level=INFO msg="srv    operator(): operator(): cleaning up before exit..."
time=2026-03-03T21:36:21.077Z level=INFO msg="main: exiting due to model loading error"
time=2026-03-03T21:36:21.933Z level=WARN msg="Backend running model exited with error" backend=llama.cpp model=gpt-oss error="llama.cpp terminated unexpectedly: llama.cpp failed: not enough GPU memory to load the model (CUDA)"
time=2026-03-03T21:36:22.342Z level=INFO msg="getting model by reference" component=model-manager reference=sha256:9398339cb0d3b150931212377c16d2a105ddce053ec187e4397ba6e10f3ea112
time=2026-03-03T21:36:22.343Z level=INFO msg="Listing available models" component=model-manager
time=2026-03-03T21:36:22.362Z level=INFO msg="successfully listed models" component=model-manager count=1
time=2026-03-03T21:36:22.371Z level=INFO msg="Removed records for model" component=openai-recorder model=sha256:9398339cb0d3b150931212377c16d2a105ddce053ec187e4397ba6e10f3ea112
time=2026-03-03T21:36:22.371Z level=WARN msg="Backend runner initialization failed" backend=llama.cpp model=sha256:9398339cb0d3b150931212377c16d2a105ddce053ec187e4397ba6e10f3ea112 mode=completion error="llama.cpp terminated unexpectedly: llama.cpp failed: not enough GPU memory to load the model (CUDA)"

Improves UX for #709.

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

gemini-code-assist

Code Review

This pull request introduces an ErrorTransformer for the llama.cpp backend to improve error reporting. A new errors.go file defines regular expressions to catch common llama.cpp errors and replace them with more user-friendly messages. This is accompanied by unit tests in errors_test.go. The new error transformer is then integrated into the main llama.cpp backend logic. The changes are well-structured and effectively address the goal of providing clearer error feedback to users.

sourcery-ai

Hey - I've left some high level feedback:

Consider making the regex patterns slightly more robust (e.g., using case-insensitive flags or matching just the key terms like failed to allocate buffer / out of memory) so they continue to work if upstream log wording changes slightly.
Returning the full stderr output when no pattern matches may surface very noisy logs to the user; you might want to truncate or sanitize the fallback message, or prepend a short generic summary and include the raw output only for debugging.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider making the regex patterns slightly more robust (e.g., using case-insensitive flags or matching just the key terms like `failed to allocate buffer` / `out of memory`) so they continue to work if upstream log wording changes slightly.
- Returning the full stderr output when no pattern matches may surface very noisy logs to the user; you might want to truncate or sanitize the fallback message, or prepend a short generic summary and include the raw output only for debugging.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

feat: add ErrorTransformer for llama.cpp

2445139

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

sourcery-ai bot reviewed Mar 3, 2026

View reviewed changes

doringeman mentioned this pull request Mar 4, 2026

docker model run fails on Docker Offload (background model preload failed: preload failed: status=500 body=unable to load runner) #709

Closed

ilopezluna approved these changes Mar 4, 2026

View reviewed changes

doringeman merged commit 9039311 into docker:main Mar 4, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ErrorTransformer for llama.cpp#729

feat: add ErrorTransformer for llama.cpp#729
doringeman merged 1 commit intodocker:mainfrom
doringeman:llamacpp-error-transformer

doringeman commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

doringeman commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants