docker model runner llmfit launch by sathiraumesh · Pull Request #837 · docker/model-runner

sathiraumesh · 2026-04-06T08:36:42Z

Changes

Add llmfit CLI tool to docker model launch

fixes #747

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

In printAppConfig, the host port is only printed when ca.defaultHostPort > 0, which means an explicitly overridden --port for apps with no defaultHostPort (e.g. future apps like llmfit) would be hidden; consider checking hostPort > 0 instead so overrides are reflected in the config output.
In launchContainerApp, a non-zero portOverride is silently ignored when ca.containerPort == 0 (e.g. for llmfit); consider either validating and erroring on --port usage in that case or logging a message so users know their port setting is not applied.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `printAppConfig`, the host port is only printed when `ca.defaultHostPort > 0`, which means an explicitly overridden `--port` for apps with no defaultHostPort (e.g. future apps like `llmfit`) would be hidden; consider checking `hostPort > 0` instead so overrides are reflected in the config output.
- In `launchContainerApp`, a non-zero `portOverride` is silently ignored when `ca.containerPort == 0` (e.g. for `llmfit`); consider either validating and erroring on `--port` usage in that case or logging a message so users know their port setting is not applied.

## Individual Comments

### Comment 1
<location path="cmd/cli/commands/launch.go" line_range="223-224" />
<code_context>
+		if ca.containerPort > 0 {
+			cmd.Printf("  Container port: %d\n", ca.containerPort)
+		}
+		if ca.defaultHostPort > 0 {
+			cmd.Printf("  Host port:      %d\n", hostPort)
+		}
 		if ca.envFn != nil {
</code_context>
<issue_to_address>
**issue (bug_risk):** Host port visibility should depend on the effective hostPort value, not defaultHostPort.

Using `ca.defaultHostPort > 0` here can suppress valid user overrides. For example, if `defaultHostPort == 0` but the user passes `--port`, `hostPort` will be non-zero yet the host port line won’t print. This should check `hostPort > 0` instead so the output matches the actual binding.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request adds support for the llmfit container application and updates the launch command to support container apps without port mappings. The changes include conditional logic for port display and Docker flags, along with corresponding tests. Review feedback identifies a logic error in how host ports are displayed when no default is provided and notes a formatting inconsistency in the containerApps configuration map.

ericcurtin · 2026-04-06T18:23:40Z

More work needed I think, this doesn't behave as expected, I tested on macOS got his:

$ ./cmd/cli/model-cli launch llmfit
{
"models": [
{
"best_quant": "Q5_K_M",
"category": "Chat",
"context_length": 4096,
"estimated_tps": 76.7,
"fit_level": "Marginal",
"gguf_sources": [],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 4.3,
"moe_offloaded_gb": null,
"name": "microsoft/Phi-mini-MoE-instruct",
"notes": [
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q5_K_M (model default: Q4_K_M)",
"Baseline estimated speed: 76.7 tok/s"
],
"parameter_count": "7.6B",
"params_b": 7.65,
"provider": "Microsoft",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 89.2,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 73.0,
"speed": 100.0
},
"total_memory_gb": 4.3,
"use_case": "Instruction following, chat",
"utilization_pct": 68.0
},
{
"best_quant": "Q4_K_M",
"category": "General",
"context_length": 128000,
"estimated_tps": 75.9,
"fit_level": "Marginal",
"gguf_sources": [
{
"provider": "unsloth",
"repo": "unsloth/LFM2-8B-A1B-GGUF"
}
],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 4.7,
"moe_offloaded_gb": null,
"name": "LiquidAI/LFM2-8B-A1B",
"notes": [
"Context capped at 8192 tokens for estimation (model supports up to 128000; use --max-context to override)",
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Baseline estimated speed: 75.9 tok/s"
],
"parameter_count": "8.3B",
"params_b": 8.34,
"provider": "Liquid AI",
"release_date": "2025-10-07",
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 86.5,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 70.0,
"speed": 100.0
},
"total_memory_gb": 4.7,
"use_case": "General purpose text generation",
"utilization_pct": 74.4
},
{
"best_quant": "Q6_K",
"category": "Chat",
"context_length": 4096,
"estimated_tps": 80.5,
"fit_level": "Marginal",
"gguf_sources": [
{
"provider": "mradermacher",
"repo": "mradermacher/OLMoE-1B-7B-0125-Instruct-GGUF"
}
],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 3.9,
"moe_offloaded_gb": null,
"name": "allenai/OLMoE-1B-7B-0125-Instruct",
"notes": [
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q6_K (model default: Q4_K_M)",
"Baseline estimated speed: 80.5 tok/s"
],
"parameter_count": "6.9B",
"params_b": 6.92,
"provider": "allenai",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 83.6,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 59.0,
"speed": 100.0
},
"total_memory_gb": 3.9,
"use_case": "Instruction following, chat",
"utilization_pct": 61.7
},
{
"best_quant": "Q6_K",
"category": "General",
"context_length": 262144,
"estimated_tps": 185.0,
"fit_level": "Marginal",
"gguf_sources": [],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 3.6,
"moe_offloaded_gb": null,
"name": "apolo13x/Qwen3.5-35B-A3B-quantized.w4a16",
"notes": [
"Context capped at 8192 tokens for estimation (model supports up to 262144; use --max-context to override)",
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q6_K (model default: Q4_K_M)",
"Baseline estimated speed: 185.0 tok/s"
],
"parameter_count": "6.4B",
"params_b": 6.38,
"provider": "apolo13x",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 82.5,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 61.0,
"speed": 100.0
},
"total_memory_gb": 3.6,
"use_case": "General purpose",
"utilization_pct": 57.0
},
{
"best_quant": "Q8_0",
"category": "Chat",
"context_length": 4096,
"estimated_tps": 125.0,
"fit_level": "Marginal",
"gguf_sources": [],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 2.1,
"moe_offloaded_gb": null,
"name": "microsoft/Phi-tiny-MoE-instruct",
"notes": [
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q8_0 (model default: Q4_K_M)",
"Baseline estimated speed: 125.0 tok/s"
],
"parameter_count": "3.8B",
"params_b": 3.76,
"provider": "Microsoft",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 82.0,
"score_components": {
"context": 100.0,
"fit": 86.6,
"quality": 60.0,
"speed": 100.0
},
"total_memory_gb": 2.1,
"use_case": "Instruction following, chat",
"utilization_pct": 33.2
}
],
"system": {
"available_ram_gb": 6.32,
"backend": "CPU (ARM)",
"cpu_cores": 14,
"cpu_name": "0",
"gpu_count": 0,
"gpu_name": null,
"gpu_vram_gb": null,
"gpus": [],
"has_gpu": false,
"total_ram_gb": 7.65,
"unified_memory": false
}
}

sathiraumesh · 2026-04-06T18:46:45Z

Hi @ericcurtin, can you detail what is expected, like the behavior? If you want the TUI version, we need to make more changes to implement it. Are you expecting that behavior?

ericcurtin · 2026-04-06T19:07:14Z

Yes, the TUI version

sathiraumesh · 2026-04-09T05:48:25Z

Hi @ericcurtin, I have implemented the TUI version. Can you have a look

sathiraumesh · 2026-04-13T10:36:47Z

hi @ericcurtin can have a look on this PR ?

ilopezluna · 2026-04-21T12:07:06Z

Thanks for adding llmfit support @sathiraumesh! I tested this locally and found a few issues:

Black screen when running `docker model launch llmfit`

The interactive: true flag adds -it to the docker run command, but the image's default CMD is ["recommend", "--json"], which is not interactive. The combination of -it with a non-TUI command causes a blank/black screen in the terminal. Running without -it works correctly and produces JSON output.

Suggestion: Remove interactive: true since llmfit's default behavior is non-interactive. If you want to support the TUI mode, it should be opt-in via args (e.g., docker model launch llmfit -- tui).

Redundant `--entrypoint` override

The image already defines ENTRYPOINT ["/usr/local/bin/llmfit"]. The extraDockerArgs: []string{"--entrypoint", "llmfit"} override is functionally redundant (just changes full path to short name via PATH). Consider removing it to keep things simpler and avoid potential issues if llmfit isn't in PATH in future image versions.

`--port` flag silently ignored

Since llmfit has no containerPort, running docker model launch llmfit --port 3000 silently ignores the port override. Consider either logging a warning or returning an error when --port is used with apps that don't expose ports.

Minor: typos in test app names

The updated existing tests pass "openapi" and "openpai" as the new appName parameter (lines 162, 180, 194, 209). These seem like typos and should probably be "openwebui" or a consistent test name.

sathiraumesh · 2026-04-27T09:31:43Z

Ill have a look @ilopezluna

sathiraumesh · 2026-05-10T13:42:06Z

@ilopezluna @ericcurtin

There are some few issues I found based on your suggestions

llmfit doesn't has a tui flag by default. Therefore the suggestion docker model launch llmfit -- tui is not a good idea . The main issue is the the normal entry point of the Dockerfile is appended by two args recomended --json
The reason i did the override is to remove the behavior coming from the above issue which i don't think there is another option to override the entrypoint + cmd coming from the Dockerfile in llmfit.
The only approach i see here is to override the endpoint or as authors to change the Dockerfile in llmfit or any other suggestions are welcom

adding llmfit container app intergration

018901d

sourcery-ai Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread cmd/cli/commands/launch.go Outdated

sathiraumesh mentioned this pull request Apr 6, 2026

docker model launch llmfit #747

Open

gemini-code-assist Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread cmd/cli/commands/launch.go

Comment thread cmd/cli/commands/launch.go

sathiraumesh added 3 commits April 6, 2026 10:45

Fix error by review comment

9753b1e

Fix review comments

1d4fb3d

"removed extra spacing"

c9510df

sathiraumesh added 3 commits April 8, 2026 10:12

trying to run in interactive mode

f3392a9

fix spacing in command usage

646487c

make docs

4bb0270

fixes for the review comments

c538758

fixed removed comments

cf788b7

sathiraumesh marked this pull request as draft May 10, 2026 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docker model runner llmfit launch#837

docker model runner llmfit launch#837
sathiraumesh wants to merge 9 commits into
docker:mainfrom
sathiraumesh:launch-llmfit-747

sathiraumesh commented Apr 6, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

ericcurtin commented Apr 6, 2026

Uh oh!

sathiraumesh commented Apr 6, 2026 •

edited

Loading

Uh oh!

ericcurtin commented Apr 6, 2026

Uh oh!

sathiraumesh commented Apr 9, 2026

Uh oh!

sathiraumesh commented Apr 13, 2026

Uh oh!

ilopezluna commented Apr 21, 2026

Uh oh!

sathiraumesh commented Apr 27, 2026

Uh oh!

sathiraumesh commented May 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

sathiraumesh commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ericcurtin commented Apr 6, 2026

Uh oh!

sathiraumesh commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericcurtin commented Apr 6, 2026

Uh oh!

sathiraumesh commented Apr 9, 2026

Uh oh!

sathiraumesh commented Apr 13, 2026

Uh oh!

ilopezluna commented Apr 21, 2026

Black screen when running docker model launch llmfit

Redundant --entrypoint override

--port flag silently ignored

Minor: typos in test app names

Uh oh!

sathiraumesh commented Apr 27, 2026

Uh oh!

sathiraumesh commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sathiraumesh commented Apr 6, 2026 •

edited

Loading

sathiraumesh commented Apr 6, 2026 •

edited

Loading

Black screen when running `docker model launch llmfit`

Redundant `--entrypoint` override

`--port` flag silently ignored

sathiraumesh commented May 10, 2026 •

edited

Loading