MLX Code Server

A launcher for running Qwen2.5 Coder 32B (or any MLX model) as an OpenAI‑compatible API on your Mac. Use it to power local coding assistants, IDE plugins, or your own LLM‑powered apps.

Features

One command to start – mlx-code-server launches the server with defaults.
OpenAI‑compatible endpoint
Isolated & global – installed via uv tool
Configurable – change the model, port, or host by editing a few lines.

Prerequisites

macOS with Apple Silicon (M1/M2/M3) – MLX is built for Metal.
uv installed (if not: curl -LsSf https://astral.sh/uv/install.sh | sh)

Installation

# Clone this repo (or your fork)
git clone https://github.com/xenofect/mlx-code-server
cd mlx-code-server
# Install globally in editable mode
uv tool install --editable .

Usage

mlx-code-server

The first time you run it, the model weights (Qwen2.5‑Coder‑32B‑Instruct‑4bit) will be downloaded automatically (about 5 GB). Subsequent starts are instant.

Your server will be available at http://127.0.0.1:8080/v1 – try it with curl:

curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
    "messages": [{"role": "user", "content": "Write a Python function to reverse a string."}]
  }'

Customising the server

You can override the default host, port, or model by passing flags:

mlx-code-server --port 9000 --model mlx-community/Qwen2.5-Coder-7B-Instruct-4bit

To change the permanent defaults, edit src/mlx_server/cli.py and tweak the default_args list. (No need to reinstall – because we used --editable, changes take effect immediately.)

Connecting to your tools

VS Code – install the Local LLM for VS Code extension and set the endpoint to http://127.0.0.1:8080/v1.
Open WebUI – run with docker run -d -p 3000:8080 -e OPENAI_API_BASE_URL=http://host.docker.internal:8080/v1 ghcr.io/open-webui/open-webui:main
Your own Python app – use the OpenAI client:

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="none")
response = client.chat.completions.create(
    model="mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
    messages=[{"role": "user", "content": "Hello!"}]
)

Repo structure

mlx-code-server/
├── README.md
├── pyproject.toml          # package definition and dependencies
└── src/
    └── mlx_server/
        ├── __init__.py
        └── cli.py           # the wrapper script with your defaults

Stopping the server

Press Ctrl+C in the terminal where it’s running.

Notes

The model is downloaded to ~/.cache/huggingface/ – you can safely delete it if you need to reclaim space. If you have trouble with the MLX backend, ensure your macOS is up to date and you have the latest Xcode command line tools installed.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/mlx_server		src/mlx_server
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLX Code Server

Features

Prerequisites

Installation

Usage

Customising the server

Connecting to your tools

Repo structure

Stopping the server

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MLX Code Server

Features

Prerequisites

Installation

Usage

Customising the server

Connecting to your tools

Repo structure

Stopping the server

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages