Skip to content

xenofect/mlx-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLX Code Server

A launcher for running Qwen2.5 Coder 32B (or any MLX model) as an OpenAI‑compatible API on your Mac. Use it to power local coding assistants, IDE plugins, or your own LLM‑powered apps.

Features

  • One command to startmlx-code-server launches the server with defaults.
  • OpenAI‑compatible endpoint
  • Isolated & global – installed via uv tool
  • Configurable – change the model, port, or host by editing a few lines.

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3) – MLX is built for Metal.
  • uv installed (if not: curl -LsSf https://astral.sh/uv/install.sh | sh)

Installation

# Clone this repo (or your fork)
git clone https://github.com/xenofect/mlx-code-server
cd mlx-code-server
# Install globally in editable mode
uv tool install --editable .

Usage

mlx-code-server

The first time you run it, the model weights (Qwen2.5‑Coder‑32B‑Instruct‑4bit) will be downloaded automatically (about 5 GB). Subsequent starts are instant.

Your server will be available at http://127.0.0.1:8080/v1 – try it with curl:

curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
    "messages": [{"role": "user", "content": "Write a Python function to reverse a string."}]
  }'

Customising the server

You can override the default host, port, or model by passing flags:

mlx-code-server --port 9000 --model mlx-community/Qwen2.5-Coder-7B-Instruct-4bit

To change the permanent defaults, edit src/mlx_server/cli.py and tweak the default_args list. (No need to reinstall – because we used --editable, changes take effect immediately.)

Connecting to your tools

  • VS Code – install the Local LLM for VS Code extension and set the endpoint to http://127.0.0.1:8080/v1.
  • Open WebUI – run with docker run -d -p 3000:8080 -e OPENAI_API_BASE_URL=http://host.docker.internal:8080/v1 ghcr.io/open-webui/open-webui:main
  • Your own Python app – use the OpenAI client:
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="none")
response = client.chat.completions.create(
    model="mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
    messages=[{"role": "user", "content": "Hello!"}]
)

Repo structure

mlx-code-server/
├── README.md
├── pyproject.toml          # package definition and dependencies
└── src/
    └── mlx_server/
        ├── __init__.py
        └── cli.py           # the wrapper script with your defaults

Stopping the server

Press Ctrl+C in the terminal where it’s running.

Notes

The model is downloaded to ~/.cache/huggingface/ – you can safely delete it if you need to reclaim space. If you have trouble with the MLX backend, ensure your macOS is up to date and you have the latest Xcode command line tools installed.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages