Skip to content

docker model runner llmfit launch#837

Open
sathiraumesh wants to merge 7 commits intodocker:mainfrom
sathiraumesh:launch-llmfit-747
Open

docker model runner llmfit launch#837
sathiraumesh wants to merge 7 commits intodocker:mainfrom
sathiraumesh:launch-llmfit-747

Conversation

@sathiraumesh
Copy link
Copy Markdown
Contributor

@sathiraumesh sathiraumesh commented Apr 6, 2026

Changes

  • Add llmfit CLI tool to docker model launch

fixes #747

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In printAppConfig, the host port is only printed when ca.defaultHostPort > 0, which means an explicitly overridden --port for apps with no defaultHostPort (e.g. future apps like llmfit) would be hidden; consider checking hostPort > 0 instead so overrides are reflected in the config output.
  • In launchContainerApp, a non-zero portOverride is silently ignored when ca.containerPort == 0 (e.g. for llmfit); consider either validating and erroring on --port usage in that case or logging a message so users know their port setting is not applied.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `printAppConfig`, the host port is only printed when `ca.defaultHostPort > 0`, which means an explicitly overridden `--port` for apps with no defaultHostPort (e.g. future apps like `llmfit`) would be hidden; consider checking `hostPort > 0` instead so overrides are reflected in the config output.
- In `launchContainerApp`, a non-zero `portOverride` is silently ignored when `ca.containerPort == 0` (e.g. for `llmfit`); consider either validating and erroring on `--port` usage in that case or logging a message so users know their port setting is not applied.

## Individual Comments

### Comment 1
<location path="cmd/cli/commands/launch.go" line_range="223-224" />
<code_context>
+		if ca.containerPort > 0 {
+			cmd.Printf("  Container port: %d\n", ca.containerPort)
+		}
+		if ca.defaultHostPort > 0 {
+			cmd.Printf("  Host port:      %d\n", hostPort)
+		}
 		if ca.envFn != nil {
</code_context>
<issue_to_address>
**issue (bug_risk):** Host port visibility should depend on the effective hostPort value, not defaultHostPort.

Using `ca.defaultHostPort > 0` here can suppress valid user overrides. For example, if `defaultHostPort == 0` but the user passes `--port`, `hostPort` will be non-zero yet the host port line won’t print. This should check `hostPort > 0` instead so the output matches the actual binding.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread cmd/cli/commands/launch.go Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the llmfit container application and updates the launch command to support container apps without port mappings. The changes include conditional logic for port display and Docker flags, along with corresponding tests. Review feedback identifies a logic error in how host ports are displayed when no default is provided and notes a formatting inconsistency in the containerApps configuration map.

Comment thread cmd/cli/commands/launch.go
Comment thread cmd/cli/commands/launch.go
@ericcurtin
Copy link
Copy Markdown
Contributor

More work needed I think, this doesn't behave as expected, I tested on macOS got his:

$ ./cmd/cli/model-cli launch llmfit
{
"models": [
{
"best_quant": "Q5_K_M",
"category": "Chat",
"context_length": 4096,
"estimated_tps": 76.7,
"fit_level": "Marginal",
"gguf_sources": [],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 4.3,
"moe_offloaded_gb": null,
"name": "microsoft/Phi-mini-MoE-instruct",
"notes": [
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q5_K_M (model default: Q4_K_M)",
"Baseline estimated speed: 76.7 tok/s"
],
"parameter_count": "7.6B",
"params_b": 7.65,
"provider": "Microsoft",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 89.2,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 73.0,
"speed": 100.0
},
"total_memory_gb": 4.3,
"use_case": "Instruction following, chat",
"utilization_pct": 68.0
},
{
"best_quant": "Q4_K_M",
"category": "General",
"context_length": 128000,
"estimated_tps": 75.9,
"fit_level": "Marginal",
"gguf_sources": [
{
"provider": "unsloth",
"repo": "unsloth/LFM2-8B-A1B-GGUF"
}
],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 4.7,
"moe_offloaded_gb": null,
"name": "LiquidAI/LFM2-8B-A1B",
"notes": [
"Context capped at 8192 tokens for estimation (model supports up to 128000; use --max-context to override)",
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Baseline estimated speed: 75.9 tok/s"
],
"parameter_count": "8.3B",
"params_b": 8.34,
"provider": "Liquid AI",
"release_date": "2025-10-07",
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 86.5,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 70.0,
"speed": 100.0
},
"total_memory_gb": 4.7,
"use_case": "General purpose text generation",
"utilization_pct": 74.4
},
{
"best_quant": "Q6_K",
"category": "Chat",
"context_length": 4096,
"estimated_tps": 80.5,
"fit_level": "Marginal",
"gguf_sources": [
{
"provider": "mradermacher",
"repo": "mradermacher/OLMoE-1B-7B-0125-Instruct-GGUF"
}
],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 3.9,
"moe_offloaded_gb": null,
"name": "allenai/OLMoE-1B-7B-0125-Instruct",
"notes": [
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q6_K (model default: Q4_K_M)",
"Baseline estimated speed: 80.5 tok/s"
],
"parameter_count": "6.9B",
"params_b": 6.92,
"provider": "allenai",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 83.6,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 59.0,
"speed": 100.0
},
"total_memory_gb": 3.9,
"use_case": "Instruction following, chat",
"utilization_pct": 61.7
},
{
"best_quant": "Q6_K",
"category": "General",
"context_length": 262144,
"estimated_tps": 185.0,
"fit_level": "Marginal",
"gguf_sources": [],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 3.6,
"moe_offloaded_gb": null,
"name": "apolo13x/Qwen3.5-35B-A3B-quantized.w4a16",
"notes": [
"Context capped at 8192 tokens for estimation (model supports up to 262144; use --max-context to override)",
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q6_K (model default: Q4_K_M)",
"Baseline estimated speed: 185.0 tok/s"
],
"parameter_count": "6.4B",
"params_b": 6.38,
"provider": "apolo13x",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 82.5,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 61.0,
"speed": 100.0
},
"total_memory_gb": 3.6,
"use_case": "General purpose",
"utilization_pct": 57.0
},
{
"best_quant": "Q8_0",
"category": "Chat",
"context_length": 4096,
"estimated_tps": 125.0,
"fit_level": "Marginal",
"gguf_sources": [],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 2.1,
"moe_offloaded_gb": null,
"name": "microsoft/Phi-tiny-MoE-instruct",
"notes": [
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q8_0 (model default: Q4_K_M)",
"Baseline estimated speed: 125.0 tok/s"
],
"parameter_count": "3.8B",
"params_b": 3.76,
"provider": "Microsoft",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 82.0,
"score_components": {
"context": 100.0,
"fit": 86.6,
"quality": 60.0,
"speed": 100.0
},
"total_memory_gb": 2.1,
"use_case": "Instruction following, chat",
"utilization_pct": 33.2
}
],
"system": {
"available_ram_gb": 6.32,
"backend": "CPU (ARM)",
"cpu_cores": 14,
"cpu_name": "0",
"gpu_count": 0,
"gpu_name": null,
"gpu_vram_gb": null,
"gpus": [],
"has_gpu": false,
"total_ram_gb": 7.65,
"unified_memory": false
}
}

@sathiraumesh
Copy link
Copy Markdown
Contributor Author

sathiraumesh commented Apr 6, 2026

Hi @ericcurtin, can you detail what is expected, like the behavior? If you want the TUI version, we need to make more changes to implement it. Are you expecting that behavior?

@ericcurtin
Copy link
Copy Markdown
Contributor

Yes, the TUI version

@sathiraumesh
Copy link
Copy Markdown
Contributor Author

Hi @ericcurtin, I have implemented the TUI version. Can you have a look

@sathiraumesh
Copy link
Copy Markdown
Contributor Author

hi @ericcurtin can have a look on this PR ?

@ilopezluna
Copy link
Copy Markdown
Contributor

Thanks for adding llmfit support @sathiraumesh! I tested this locally and found a few issues:

Black screen when running docker model launch llmfit

The interactive: true flag adds -it to the docker run command, but the image's default CMD is ["recommend", "--json"], which is not interactive. The combination of -it with a non-TUI command causes a blank/black screen in the terminal. Running without -it works correctly and produces JSON output.

Suggestion: Remove interactive: true since llmfit's default behavior is non-interactive. If you want to support the TUI mode, it should be opt-in via args (e.g., docker model launch llmfit -- tui).

Redundant --entrypoint override

The image already defines ENTRYPOINT ["/usr/local/bin/llmfit"]. The extraDockerArgs: []string{"--entrypoint", "llmfit"} override is functionally redundant (just changes full path to short name via PATH). Consider removing it to keep things simpler and avoid potential issues if llmfit isn't in PATH in future image versions.

--port flag silently ignored

Since llmfit has no containerPort, running docker model launch llmfit --port 3000 silently ignores the port override. Consider either logging a warning or returning an error when --port is used with apps that don't expose ports.

Minor: typos in test app names

The updated existing tests pass "openapi" and "openpai" as the new appName parameter (lines 162, 180, 194, 209). These seem like typos and should probably be "openwebui" or a consistent test name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docker model launch llmfit

3 participants