bitloops-inference is a small Rust workspace that runs semantic-summary inference out of process for Bitloops. Bitloops launches the runtime as a child process, speaks a versioned line-delimited JSON protocol over stdin and stdout, and leaves all provider-specific HTTP, auth, parsing, and error handling inside this repository.
bitloops-inference-protocol: shared protocol types, versioning, and JSON-line serialisation helpers.bitloops-inference: config loading, CLI, provider registry, provider implementations, and the stdio runtime loop.
Bitloops core stays provider-agnostic. Adding or changing a summary provider only requires a new bitloops-inference release rather than a Bitloops release.
bitloops-inference run --config config.toml --profile openai_fast
bitloops-inference validate-config --config config.toml
bitloops-inference describe-profile --config config.toml --profile openai_fastrun reserves stdout strictly for line-delimited JSON protocol responses. Diagnostics and failures go to stderr.
bitloops-inference reads the Bitloops daemon inference config. Text-generation profiles live under [inference.profiles.<name>] and reference a runtime from [inference.runtimes.<name>].
[inference.runtimes.bitloops_inference]
request_timeout_secs = 60
[inference.profiles.openai_fast]
task = "text_generation"
driver = "openai_chat_completions"
runtime = "bitloops_inference"
model = "gpt-4.1-mini"
base_url = "https://api.openai.com/v1/chat/completions"
api_key = "${OPENAI_API_KEY}"
temperature = "0.1"
max_output_tokens = 200
[inference.profiles.ollama_local]
task = "text_generation"
driver = "ollama_chat"
runtime = "bitloops_inference"
model = "qwen2.5-coder:14b"
base_url = "http://127.0.0.1:11434/api/chat"
temperature = "0.1"
max_output_tokens = 200String fields support ${ENV_VAR} interpolation. Missing environment variables fail validation immediately. Non-text-generation profiles in the same daemon config are ignored by bitloops-inference.
The public Bitloops platform gateway has a dedicated bitloops_platform_chat driver. It defaults to the production Bitloops platform endpoint, and the Bitloops host can optionally provide base_url when a test or non-production override is needed:
[inference.runtimes.bitloops_inference]
request_timeout_secs = 300
[inference.profiles.platform_summary]
task = "text_generation"
driver = "bitloops_platform_chat"
runtime = "bitloops_inference"
model = "ministral-3-3b-instruct"
api_key = "${BITLOOPS_PLATFORM_GATEWAY_TOKEN}"
temperature = "0.1"
max_output_tokens = 200If base_url is omitted, bitloops-inference uses https://platform.bitloops.net/v1/chat/completions. When base_url is present, it overrides that default for the selected profile.
openai_chat_completionsbitloops_platform_chatollama_chat
Both providers normalise their outputs into one canonical inference response with text, optional parsed_json, optional token usage, finish reason, provider name, and model name.
- Start the runtime once for a selected profile.
- Send JSON requests over
stdin, one line per request. - Read one JSON response line per request from
stdout. - Send
shutdownwhen the session is finished.
Example request stream:
{"request_id":"1","type":"describe"}
{"request_id":"2","type":"infer","system_prompt":"You write terse semantic summaries.","user_prompt":"Summarise this diff.","response_mode":"json_object","temperature":0.1,"max_output_tokens":200}
{"request_id":"3","type":"shutdown"}Example responses:
{"request_id":"1","type":"describe","protocol_version":1,"runtime_name":"bitloops-inference","runtime_version":"0.1.2","profile_name":"openai_fast","provider":{"kind":"openai_chat_completions","provider_name":"openai","model_name":"gpt-4.1-mini","endpoint":"https://api.openai.com/v1/chat/completions","capabilities":{"response_modes":["text","json_object"],"usage_reporting":true}}}
{"request_id":"2","type":"infer","text":"{\"summary\":\"Adds provider isolation\",\"confidence\":0.92}","parsed_json":{"summary":"Adds provider isolation","confidence":0.92},"usage":{"prompt_tokens":120,"completion_tokens":24,"total_tokens":144},"finish_reason":"stop","provider_name":"openai","model_name":"gpt-4.1-mini"}
{"request_id":"3","type":"shutdown"}Run config validation first:
cargo run -p bitloops-inference -- validate-config --config ./bitloops-daemon-config.tomlDescribe a profile:
cargo run -p bitloops-inference -- describe-profile --config ./bitloops-daemon-config.toml --profile ollama_localStart the stdio runtime:
cargo run -p bitloops-inference -- run --config ./bitloops-daemon-config.toml --profile ollama_localYou can then write protocol lines to stdin manually or from another process.
The test suite avoids live network calls. Provider integrations use mocked HTTP servers and the stdio loop is exercised through spawned child-process tests.
cargo nextest run
cargo dev-clippyGitHub Actions runs a lean hosted-runner CI pipeline for formatting, clippy, nextest, and native release-build smoke checks on Linux, macOS, and Windows.
Tagged releases are published from v* tags. The release workflow builds packaged artefacts for:
aarch64-apple-darwinx86_64-apple-darwinx86_64-unknown-linux-muslaarch64-unknown-linux-muslx86_64-pc-windows-msvcaarch64-pc-windows-msvc
macOS signing and notarisation use the same secret and variable names as the main Bitloops repository:
- Secrets:
APPLE_CERT_P12_BASE64,APPLE_CERT_PASSWORD,APPSTORE_CONNECT_API_KEY_P8_BASE64 - Variables:
APPLE_SIGNING_IDENTITY,APPSTORE_CONNECT_KEY_ID,APPSTORE_CONNECT_ISSUER_ID
Optional release notification:
- Secret:
SLACK_WEBHOOK_URL
Possible later provider families include anthropic_messages and other explicit provider integrations. v1 deliberately avoids a generic mapping DSL, streaming, batching, local in-process model serving, and runtime orchestration.