AI workflows you can run, review, and own.
Contenox is an open-source AI workflow runtime for developers. It turns repeatable coding and tool workflows into versioned Chains: files that declare prompts, model/provider routing, tool allowlists, retries, branches, budgets, and human approval gates.
Many coding workflows do not need a frontier model. Contenox gives you a way to run that work where the code is, with a proper agent loop instead of hidden prompt habits or one-off glue, and route to network or cloud models when the job needs them.
Run the same workflow from the CLI, VS Code, or any ACP client. Use modeld for
the edge path, Ollama or vLLM on your network, or hosted providers, while
sessions, config, telemetry, and runtime state stay on your machine.
- It speaks Unix: Pipe data directly into your workflows.
git diff | contenox run commit-msgorgit log | contenox run release-notes. - It respects boundaries: Human-in-the-loop isn't a UI toggle, it's a strict policy file. The AI pauses and asks for terminal approval before running destructive commands.
- It routes inference: Use edge
modeld, private-network backends, or hosted providers per workflow.modeldis built for one active local model and resident coding context, not model multiplexing.
You own the workflow. The vendor doesn't decide how it behaves on your machine. You do.
It is built for specific, reviewable AI work, not vague promises of fully autonomous agents.
Package a repeatable AI task as a chain, then run it the same way every time:
- Review a diff — run the tests, summarize the risk, and gate on your approval before it acts.
- Draft release evidence — turn git log, PRs, and CI output into a changelog and reviewer packet.
- Wrap an internal API — expose a safe, curated tool subset with approval required on mutating calls.
- Automate repo chores — take an issue, produce a patch, run the tests, write the PR description.
- Ask an owned model — codebase chat and one-off prompts through local modeld or a private inference endpoint.
- Use edge autocomplete — keep VS Code ghost text on a local or local-network coder model while chat uses a larger hosted model.
The same chains run from the CLI, VS Code, or any ACP client. Inference can sit on the device, on your network, or with a cloud vendor, while sessions and state stay local. Detailed examples are in What it is good for below.
curl -fsSL https://contenox.com/install.sh | sh# Configure a provider/model for this machine
contenox setup
# Use it from the CLI
contenox "say hello world in python"
contenox chat -e # open $EDITOR to compose a promptFor normal CLI/VS Code installs, choose local Ollama, a private network backend,
or a hosted provider in setup. Owned local GGUF/OpenVINO inference uses the
separate native modeld daemon, which is not bundled in release installs yet.
If you choose a local modeld provider, setup prints source-build commands. Full guide:
modeld Source Build and Packaging.
Resume past sessions with contenox session list and
contenox session switch <name>. Backends are summarized below.
Developing the source-built local backend? See modeld Source Build and Packaging.
Inline autocomplete is intentionally separate from chat. That lets you run low-latency ghost text at the edge, on a LAN Ollama box, or on a FIM/coder cloud model while keeping chat and tool workflows on a larger provider.
# Chat can stay on a hosted model:
contenox config set default-provider openai
contenox config set default-model gpt-5-mini
# Autocomplete can stay local via modeld:
contenox config set default-autocomplete-provider llama
contenox config set default-autocomplete-model qwen3-coder-30b-a3b
# Or point autocomplete at a local-network Ollama coder model:
contenox config set default-autocomplete-provider ollama
contenox config set default-autocomplete-model qwen2.5-coder:7bIn VS Code, enable it with Contenox: Enable Autocomplete and verify with
Contenox: Test Autocomplete At Cursor.
The workflow behavior is a chain file. Every decision is a JSON key:
{
"id": "review",
"tasks": [
{
"id": "review",
"handler": "chat_completion",
"system_instruction": "You are a code reviewer. Analyze the diff, run the tests if tools are available, then give a concise review.",
"execute_config": {
"model": "{{var:model}}",
"provider": "{{var:provider}}",
"tools": ["local_shell", "local_fs"],
"tools_policies": {
"local_shell": { "_allowed_commands": "go,make,npm,cargo,grep,cat" }
}
},
"transition": {
"branches": [
{ "operator": "equals", "when": "tool_call", "goto": "run_tools" },
{ "operator": "default", "goto": "end" }
]
}
},
{
"id": "run_tools",
"handler": "execute_tool_calls",
"input_var": "review",
"transition": {
"branches": [
{ "operator": "default", "goto": "review" }
]
}
}
]
}System prompt, model, tool policy, allowed commands, retry budget, and transitions are all visible. Save the chain and pipe in a diff:
git diff | contenox run --chain ./review.jsonWalk through your first chain step by step: contenox.com/docs/guide/first-chain.
Contenox is strongest when the workflow is specific and repeatable: known inputs, known tools, known output shape, and explicit review gates.
Examples of workflows you can package as chains:
Release evidence pack
Input: git log, PRs, tickets, CI output
Output: changelog, risk notes, deployment checklist, reviewer packet
Gate: human approval before publishing
API-to-workflow wrapper
Input: internal OpenAPI spec
Output: curated tool subset, hidden tenant/env args, auth handling, HITL policy
Gate: approval for mutating calls
Repo maintenance chain
Input: issue or migration request
Output: patch, test run, PR description
Gate: shell/filesystem approval and human merge
State lives locally in SQLite. Sessions persist across invocations. The AI
provider is a config line: local modeld (llama/openvino), Ollama, vLLM,
OpenAI, Anthropic, Mistral, Gemini, AWS Bedrock, OpenRouter, or Vertex. Use
edge inference, private network inference, or a hosted vendor depending on the
workflow, latency target, cost, and data boundary. Autocomplete has its own
provider/model defaults, so editor ghost text can stay local even when chat
uses the cloud.
Contenox is the agent layer you control from terminal to editor. The category is AI workflow runtime with edge, private network, and cloud inference routing; the architecture is developer agent runtime.
| Nearby world | Why Contenox is different |
|---|---|
| Cursor / IDE copilots | Runtime-first, not editor-first. The same engine works from the terminal, VS Code, and ACP clients. |
| Aider / CLI coding agents | Broader workflow, session, tool policy, and provider scope than a single coding loop. |
| LangChain / agent frameworks | End-user executable product, not just a library you wire into an app. |
| Dify / n8n / web AI workflow tools | Local desktop/workspace-first, not web-app/SaaS-first. |
| Ollama wrappers | Provider-neutral and workflow/tool/HITL-oriented, spanning owned local inference, private network backends, and hosted vendors. |
Anything you can reach over MCP, an OpenAPI spec, or a shell command can become a scoped tool in a chain:
# Any MCP-compatible server (Notion, Linear, Playwright, GitHub, Postgres, …)
contenox mcp add notion https://mcp.notion.com/mcp --auth-type oauth
# Any HTTP API with an OpenAPI spec (no glue code required)
# Slice a monolithic API into safe subsets by pointing --spec at a curated local file
contenox tools add erp_billing --url https://erp.internal.example.com --spec ./billing-subset.yaml
# The shell, with your own command policy declared in the chain
contenox --shell "check Proxmox and flag anything red"ACP/editor support is an optional way to run the same local chains inside an editor. Contenox speaks the Agent Client Protocol over stdio. Drop this into ~/.config/zed/settings.json:
{
"agent_servers": {
"Contenox": {
"type": "custom",
"command": "contenox",
"args": ["acp"]
}
}
}Open Zed's agent panel and pick Contenox. Your chain runs inside the editor: tool calls render as cards with the actual command/path, HITL prompts route through Zed's permission UI, and session history replays when you reopen the project. Chain selection lives at ~/.contenox/default-acp-chain.json (or set CONTENOX_ACP_CHAIN_PATH). Full guide → contenox.com/docs/guide/zed.
JetBrains (GoLand, IntelliJ IDEA, …) reads agent servers from ~/.jetbrains/acp.json — same binary, different schema (no "type" field):
{
"default_mcp_settings": { "use_custom_mcp": true, "use_idea_mcp": false },
"agent_servers": {
"Contenox": {
"command": "contenox",
"args": ["acp"]
}
}
}Verified with GoLand 2026.1.2. Full guide → contenox.com/docs/guide/jetbrains.
AionUi — a free, local, open-source desktop chat UI for ACP agents. Add a Custom Agent: command contenox, args ["acp"]. Verified with AionUi 2.0.0. Full guide → contenox.com/docs/guide/aionui.
Most of Contenox runs against whatever provider you choose. The native modeld
daemon exists for one specific bet: a local AI coding agent on a single consumer
accelerator that serves real, long-context work — an effective context far
beyond a model's native window (the goal is ~200k tokens) on limited hardware, by
treating context as resident state kept hot rather than a prompt resent every turn.
modeld is shaped entirely around that bet:
- One model, one user, many sessions. A single active model slot serves many persistent sessions for one owner, so the device's whole memory and KV budget go to making that model deep and fast instead of multiplexing several.
- Warm-reuse sessions. Each session keeps its stable prefix's KV hot and
re-prefills only the changed suffix (
EnsurePrefix → PrefillSuffix → Decode), so a long working context is paid for once, not resent on every turn. - Snapshot / restore. Session state is durable and branchable, so effective context outlives a single live process.
- Accelerator-driven, no knobs. modeld detects the accelerator and derives offload and the effective window from the device at runtime — no per-model flags.
This is the direction the local backend is built toward, not a shipped guarantee on every model and device. The workflow runtime above doesn't depend on it — use any hosted or local provider today. How it maps onto the code (KV cache, warm reuse, capacity, the latency budget, and what's still required): Effective Context North Star.
The llama and openvino backends are local modeld-backed inference providers.
contenox init registers them automatically and contenox model pull <name>
downloads artifacts into ~/.contenox/models/<backend>/. The current CLI/VSIX
release assets do not bundle modeld, so local modeld providers require a
source build for now:
modeld Source Build and Packaging.
To add other backends:
# Private network / self-hosted inference
contenox backend add ollama --type ollama
contenox backend add myvllm --type vllm --url http://gpu-host:8000
# Hosted AI vendors
contenox backend add openai --type openai --api-key-env OPENAI_API_KEY
contenox backend add anthropic --type anthropic --api-key-env ANTHROPIC_API_KEY
contenox backend add mistral --type mistral --api-key-env MISTRAL_API_KEY
contenox backend add gemini --type gemini --api-key-env GEMINI_API_KEY
contenox backend add bedrock --type bedrock --url https://bedrock-runtime.us-east-1.amazonaws.com
contenox backend add vertex --type vertex-google --url "https://us-central1-aiplatform.googleapis.com/v1/projects/$GOOGLE_CLOUD_PROJECT/locations/us-central1"
# Set your defaults
contenox config set default-model qwen3-8b
contenox config set default-provider llamaRequires Go 1.25+.
git clone https://github.com/contenox/runtime
cd runtime
make build-contenoxBuild and run local modeld for llama.cpp:
CONTENOX_MODELD_BACKEND=llama make run-modeldBuild and run local modeld for OpenVINO:
make deps-modeld
CONTENOX_MODELD_BACKEND=openvino make run-modeldBuild a relocatable Linux modeld bundle:
MODELD_DIST_DIR="$PWD/bin/modeld-linux-amd64" make package-modeld
tar -C bin -czf bin/modeld-linux-amd64.tar.gz modeld-linux-amd64See modeld Source Build and Packaging for the complete local modeld flow.
The contenox CLI is pure Go. Local inference lives in the separate modeld
daemon, which builds on these upstream projects (pinned in mk/llama-flags.mk and
mk/openvino-flags.mk):
| Project | Role | License |
|---|---|---|
| llama.cpp | GGUF inference and the ggml CPU/CUDA/HIP/Metal backends | MIT |
| OpenVINO | Inference runtime (CPU / iGPU / NPU) | Apache-2.0 |
| OpenVINO GenAI | LLM pipeline over OpenVINO | Apache-2.0 |
| OpenVINO Tokenizers | Tokenizer extension for OpenVINO GenAI | Apache-2.0 |
| minja | Chat-template engine (vendored by OpenVINO GenAI) | MIT |
| gguf-tools | GGUF parsing headers (vendored by OpenVINO GenAI) | see upstream |
Native backends are compiled, not embedded: modeld links these at build time and
ships their runtime libraries inside each release package. Upstream license texts
travel with the artifacts (licenses/ in dependency bundles, LICENSES/ in modeld
packages). Other Go dependencies are listed in go.mod.
Provider integrations contenox talks to over the network (Ollama, vLLM, and hosted OpenAI-compatible vendors) are not built into contenox and are not listed here.
Questions: hello@contenox.com