A launcher for running Qwen2.5 Coder 32B (or any MLX model) as an OpenAI‑compatible API on your Mac. Use it to power local coding assistants, IDE plugins, or your own LLM‑powered apps.
- One command to start –
mlx-code-serverlaunches the server with defaults. - OpenAI‑compatible endpoint
- Isolated & global – installed via
uv tool - Configurable – change the model, port, or host by editing a few lines.
- macOS with Apple Silicon (M1/M2/M3) – MLX is built for Metal.
- uv installed (if not:
curl -LsSf https://astral.sh/uv/install.sh | sh)
# Clone this repo (or your fork)
git clone https://github.com/xenofect/mlx-code-server
cd mlx-code-server
# Install globally in editable mode
uv tool install --editable .mlx-code-serverThe first time you run it, the model weights (Qwen2.5‑Coder‑32B‑Instruct‑4bit) will be downloaded automatically (about 5 GB). Subsequent starts are instant.
Your server will be available at http://127.0.0.1:8080/v1 – try it with curl:
curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
"messages": [{"role": "user", "content": "Write a Python function to reverse a string."}]
}'You can override the default host, port, or model by passing flags:
mlx-code-server --port 9000 --model mlx-community/Qwen2.5-Coder-7B-Instruct-4bitTo change the permanent defaults, edit src/mlx_server/cli.py and tweak the default_args list.
(No need to reinstall – because we used --editable, changes take effect immediately.)
- VS Code – install the Local LLM for VS Code extension and set the endpoint to
http://127.0.0.1:8080/v1. - Open WebUI – run with
docker run -d -p 3000:8080 -e OPENAI_API_BASE_URL=http://host.docker.internal:8080/v1 ghcr.io/open-webui/open-webui:main - Your own Python app – use the OpenAI client:
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="none")
response = client.chat.completions.create(
model="mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
messages=[{"role": "user", "content": "Hello!"}]
)mlx-code-server/
├── README.md
├── pyproject.toml # package definition and dependencies
└── src/
└── mlx_server/
├── __init__.py
└── cli.py # the wrapper script with your defaults
Press Ctrl+C in the terminal where it’s running.
The model is downloaded to ~/.cache/huggingface/ – you can safely delete it if you need to reclaim space.
If you have trouble with the MLX backend, ensure your macOS is up to date and you have the latest Xcode command line tools installed.