docker model runner llmfit launch by sathiraumesh · Pull Request #837 · docker/model-runner

sathiraumesh · 2026-04-06T08:36:42Z

Changes

Add llmfit CLI tool to docker model launch

fixes #747

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

In printAppConfig, the host port is only printed when ca.defaultHostPort > 0, which means an explicitly overridden --port for apps with no defaultHostPort (e.g. future apps like llmfit) would be hidden; consider checking hostPort > 0 instead so overrides are reflected in the config output.
In launchContainerApp, a non-zero portOverride is silently ignored when ca.containerPort == 0 (e.g. for llmfit); consider either validating and erroring on --port usage in that case or logging a message so users know their port setting is not applied.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `printAppConfig`, the host port is only printed when `ca.defaultHostPort > 0`, which means an explicitly overridden `--port` for apps with no defaultHostPort (e.g. future apps like `llmfit`) would be hidden; consider checking `hostPort > 0` instead so overrides are reflected in the config output.
- In `launchContainerApp`, a non-zero `portOverride` is silently ignored when `ca.containerPort == 0` (e.g. for `llmfit`); consider either validating and erroring on `--port` usage in that case or logging a message so users know their port setting is not applied.

## Individual Comments

### Comment 1
<location path="cmd/cli/commands/launch.go" line_range="223-224" />
<code_context>
+		if ca.containerPort > 0 {
+			cmd.Printf("  Container port: %d\n", ca.containerPort)
+		}
+		if ca.defaultHostPort > 0 {
+			cmd.Printf("  Host port:      %d\n", hostPort)
+		}
 		if ca.envFn != nil {
</code_context>
<issue_to_address>
**issue (bug_risk):** Host port visibility should depend on the effective hostPort value, not defaultHostPort.

Using `ca.defaultHostPort > 0` here can suppress valid user overrides. For example, if `defaultHostPort == 0` but the user passes `--port`, `hostPort` will be non-zero yet the host port line won’t print. This should check `hostPort > 0` instead so the output matches the actual binding.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

cmd/cli/commands/launch.go

gemini-code-assist

Code Review

This pull request adds support for the llmfit container application and updates the launch command to support container apps without port mappings. The changes include conditional logic for port display and Docker flags, along with corresponding tests. Review feedback identifies a logic error in how host ports are displayed when no default is provided and notes a formatting inconsistency in the containerApps configuration map.

cmd/cli/commands/launch.go

ericcurtin · 2026-04-06T18:23:40Z

More work needed I think, this doesn't behave as expected, I tested on macOS got his:

$ ./cmd/cli/model-cli launch llmfit
{
"models": [
{
"best_quant": "Q5_K_M",
"category": "Chat",
"context_length": 4096,
"estimated_tps": 76.7,
"fit_level": "Marginal",
"gguf_sources": [],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 4.3,
"moe_offloaded_gb": null,
"name": "microsoft/Phi-mini-MoE-instruct",
"notes": [
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q5_K_M (model default: Q4_K_M)",
"Baseline estimated speed: 76.7 tok/s"
],
"parameter_count": "7.6B",
"params_b": 7.65,
"provider": "Microsoft",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 89.2,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 73.0,
"speed": 100.0
},
"total_memory_gb": 4.3,
"use_case": "Instruction following, chat",
"utilization_pct": 68.0
},
{
"best_quant": "Q4_K_M",
"category": "General",
"context_length": 128000,
"estimated_tps": 75.9,
"fit_level": "Marginal",
"gguf_sources": [
{
"provider": "unsloth",
"repo": "unsloth/LFM2-8B-A1B-GGUF"
}
],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 4.7,
"moe_offloaded_gb": null,
"name": "LiquidAI/LFM2-8B-A1B",
"notes": [
"Context capped at 8192 tokens for estimation (model supports up to 128000; use --max-context to override)",
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Baseline estimated speed: 75.9 tok/s"
],
"parameter_count": "8.3B",
"params_b": 8.34,
"provider": "Liquid AI",
"release_date": "2025-10-07",
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 86.5,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 70.0,
"speed": 100.0
},
"total_memory_gb": 4.7,
"use_case": "General purpose text generation",
"utilization_pct": 74.4
},
{
"best_quant": "Q6_K",
"category": "Chat",
"context_length": 4096,
"estimated_tps": 80.5,
"fit_level": "Marginal",
"gguf_sources": [
{
"provider": "mradermacher",
"repo": "mradermacher/OLMoE-1B-7B-0125-Instruct-GGUF"
}
],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 3.9,
"moe_offloaded_gb": null,
"name": "allenai/OLMoE-1B-7B-0125-Instruct",
"notes": [
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q6_K (model default: Q4_K_M)",
"Baseline estimated speed: 80.5 tok/s"
],
"parameter_count": "6.9B",
"params_b": 6.92,
"provider": "allenai",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 83.6,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 59.0,
"speed": 100.0
},
"total_memory_gb": 3.9,
"use_case": "Instruction following, chat",
"utilization_pct": 61.7
},
{
"best_quant": "Q6_K",
"category": "General",
"context_length": 262144,
"estimated_tps": 185.0,
"fit_level": "Marginal",
"gguf_sources": [],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 3.6,
"moe_offloaded_gb": null,
"name": "apolo13x/Qwen3.5-35B-A3B-quantized.w4a16",
"notes": [
"Context capped at 8192 tokens for estimation (model supports up to 262144; use --max-context to override)",
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q6_K (model default: Q4_K_M)",
"Baseline estimated speed: 185.0 tok/s"
],
"parameter_count": "6.4B",
"params_b": 6.38,
"provider": "apolo13x",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 82.5,
"score_components": {
"context": 100.0,
"fit": 100.0,
"quality": 61.0,
"speed": 100.0
},
"total_memory_gb": 3.6,
"use_case": "General purpose",
"utilization_pct": 57.0
},
{
"best_quant": "Q8_0",
"category": "Chat",
"context_length": 4096,
"estimated_tps": 125.0,
"fit_level": "Marginal",
"gguf_sources": [],
"installed": false,
"is_moe": true,
"license": null,
"memory_available_gb": 6.32,
"memory_required_gb": 2.1,
"moe_offloaded_gb": null,
"name": "microsoft/Phi-tiny-MoE-instruct",
"notes": [
"CPU-only: model loaded into system RAM",
"MoE architecture, but expert offloading requires a GPU",
"No GPU -- inference will be slow",
"Best quantization for hardware: Q8_0 (model default: Q4_K_M)",
"Baseline estimated speed: 125.0 tok/s"
],
"parameter_count": "3.8B",
"params_b": 3.76,
"provider": "Microsoft",
"release_date": null,
"run_mode": "CPU",
"runtime": "llama.cpp",
"runtime_label": "llama.cpp",
"score": 82.0,
"score_components": {
"context": 100.0,
"fit": 86.6,
"quality": 60.0,
"speed": 100.0
},
"total_memory_gb": 2.1,
"use_case": "Instruction following, chat",
"utilization_pct": 33.2
}
],
"system": {
"available_ram_gb": 6.32,
"backend": "CPU (ARM)",
"cpu_cores": 14,
"cpu_name": "0",
"gpu_count": 0,
"gpu_name": null,
"gpu_vram_gb": null,
"gpus": [],
"has_gpu": false,
"total_ram_gb": 7.65,
"unified_memory": false
}
}

sathiraumesh · 2026-04-06T18:46:45Z

Hi @ericcurtin, can you detail what is expected, like the behavior? If you want the TUI version, we need to make more changes to implement it. Are you expecting that behavior?

ericcurtin · 2026-04-06T19:07:14Z

Yes, the TUI version

adding llmfit container app intergration

018901d

sourcery-ai bot reviewed Apr 6, 2026

View reviewed changes

cmd/cli/commands/launch.go Outdated Show resolved Hide resolved

sathiraumesh mentioned this pull request Apr 6, 2026

docker model launch llmfit #747

Open

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

cmd/cli/commands/launch.go Show resolved Hide resolved

cmd/cli/commands/launch.go Show resolved Hide resolved

sathiraumesh added 3 commits April 6, 2026 10:45

Fix error by review comment

9753b1e

Fix review comments

1d4fb3d

"removed extra spacing"

c9510df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker model runner llmfit launch#837

docker model runner llmfit launch#837
sathiraumesh wants to merge 4 commits intodocker:mainfrom
sathiraumesh:launch-llmfit-747

sathiraumesh commented Apr 6, 2026 •

edited

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

ericcurtin commented Apr 6, 2026

Uh oh!

sathiraumesh commented Apr 6, 2026 •

edited

Loading

Uh oh!

ericcurtin commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sathiraumesh commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ericcurtin commented Apr 6, 2026

Uh oh!

sathiraumesh commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericcurtin commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sathiraumesh commented Apr 6, 2026 •

edited

Loading

sathiraumesh commented Apr 6, 2026 •

edited

Loading