Skip to content

SmooAI/smooth-operator

Repository files navigation

smooth-operator — Polyglot AI agent service. One protocol.

Smoo AI license lom.smoo.ai

tests passing Kubernetes · serverless · local 5 languages · one protocol

What it is  ·  Quickstart  ·  Deploy flavors  ·  Architecture  ·  Platform


smooth-operator gives you hybrid retrieval (dense + sparse + rerank), durable agent checkpoints, human-in-the-loop approvals, and multi-participant conversations — one operator binary that runs the same way on Kubernetes, AWS serverless, or a single laptop process. Built in the open, test-first.


What is this?

smooth-operator is a polyglot AI agent service. The agent orchestration is done by smooth-operator-core — a 5-language parity engine; the service wraps it with conversations, knowledge ingestion + retrieval, a tool catalog, and one schema-driven WebSocket protocol that clients in five languages speak natively.

You get hybrid retrieval (dense + sparse + rerank), durable agent checkpoints, human-in-the-loop approvals, and multi-participant conversations (user · ai-agent · human-agent) — behind a stable wire protocol, with storage, backplane, and auth selected by config, not by a code fork.

One operator binary, three deployment flavors (see below):

  • Kubernetes — the primary self-host target: a long-running service with Postgres + pgvector and a Redis/NATS backplane for multi-replica scale-out.
  • AWS serverless — API Gateway WebSocket + Lambda + DynamoDB + S3 Vectors, deployed with SST.
  • Local — a single in-memory process with auth off and zero external services, for laptop dev or to embed in-process.

The same binary picks its flavor from the environment (SMOOTH_AGENT_STORAGE · SMOOTH_AGENT_BACKPLANE · AUTH_MODE). No build flags, no second codebase.

Built in the open, test-first. See docs/Planning/Roadmap.md for what works today and what's queued.


30-second quickstart

Run the reference server locally — fully in-memory, no database, no auth, no AWS — and drive a real agent turn. The server talks to the SmooAI LLM gateway (llm.smoo.ai); bring a gateway key.

git clone https://github.com/SmooAI/smooth-operator && cd smooth-operator/rust

# Point at the gateway and seed a distinctive "17-day return window" demo doc.
export SMOOAI_GATEWAY_KEY=sk-…           # your llm.smoo.ai key
export SMOOTH_AGENT_SEED_KB=1            # seeds the demo knowledge docs

cargo run -p smooai-smooth-operator-server
# → smooth-operator-server (local flavor) listening on ws://127.0.0.1:8787/ws (model claude-haiku-4-5)

That's it — an agent backend on ws://127.0.0.1:8787/ws, with knowledge retrieval, tool-calling, and streaming. With no env set, the binary boots the local flavor: in-memory storage, in-memory backplane, loopback bind, admin off. Set SMOOTH_AGENT_STORAGE=postgres (or dynamodb) and a backplane to graduate the same binary to the k8s or serverless flavor.

No key? The server still boots and answers protocol actions — only send_message (which needs the LLM) errors cleanly until SMOOAI_GATEWAY_KEY is set.

You can also embed the local flavor in-process from Rust — smooth_operator_server::local::serve_local("127.0.0.1:8787"), or LocalServer::builder().seed_kb(true).spawn() for a handle with a graceful-shutdown switch. See deploy/local/README.md.


Watch it stream

Connect, start a session, send a turn, and watch tokens stream in — then await the authoritative terminal response. Here in TypeScript (@smooai/smooth-operator); the same shape exists in Go, .NET, Python, and Rust.

import { SmoothAgentClient } from '@smooai/smooth-operator';

const client = new SmoothAgentClient({ url: 'ws://127.0.0.1:8787/ws' });
await client.connect();

const session = await client.createConversationSession({ agentId, userName: 'Alice' });

// One turn. Iterate the stream; `await` the same handle for the final state.
const turn = client.sendMessage({ sessionId: session.sessionId, message: 'How long is your return window?' });

for await (const ev of turn) {
  if (ev.type === 'stream_chunk') console.error(`  ↳ node: ${ev.node}`); // knowledge_search, response_gen, …
  if (ev.type === 'stream_token') process.stdout.write(ev.token ?? '');  // "Our return window is 17 days…"
  if (ev.type === 'write_confirmation_required') {
    // HITL: a tool wants to write — approve, and the resumed stream flows back into this same turn.
    client.confirmToolAction({ sessionId: session.sessionId, requestId: turn.requestId, approved: true });
  }
}

const final = await turn; // EventualResponse — cost, tokens, messageId

The model autonomously calls knowledge_search, retrieves the seeded 17-day return window, and grounds its answer in it — verified live against llm.smoo.ai and across every client.

Need an embeddable web UI? The TypeScript side ships a React binding and an embeddable widget (a custom element) on top of the same client.


Deployment flavors

One operator binary, one codebase. The StorageAdapter + backplane + auth seams are what let the same agent code run on any of three flavors — application code never names a backend. The flavor is selected by config, not by a build.

Kubernetes (primary self-host) AWS serverless (SST) Local (dev / embed)
Compute Long-running pods API GW WebSocket → Lambda One in-process server
Storage Postgres + pgvector DynamoDB + S3 Vectors In-memory
Backplane Redis / NATS (multi-replica) API GW connections In-memory (single process)
Auth AUTH_MODE=jwt / smoo AUTH_MODE=jwt / smoo AUTH_MODE=none (dev only)
SMOOTH_AGENT_STORAGE postgres dynamodb memory (default)
Deploy helm install smooth-operator ./deploy/k8s npx sst deploy in deploy/sst cargo run -p smooai-smooth-operator-server
# Kubernetes (Helm + ArgoCD) — service + WS ingress, Postgres + pgvector, Redis/NATS backplane
helm install smooth-operator ./deploy/k8s --set image.tag=$(git rev-parse --short HEAD)

# AWS serverless (SST) — API GW WebSocket + Lambda + DynamoDB + S3 Vectors
cd deploy/sst && pnpm install && npx sst deploy --stage prod

# Local — fully in-memory, auth off, no external services
cargo run -p smooai-smooth-operator-server

What every flavor keeps: hybrid (vector + keyword) retrieval with reranking, a clean Chat · RAG · Agents · Actions decomposition, connector-style ingestion, document-level ACLs over org isolation, and the MIT, batteries-included self-host story. See deploy/README.md and docs/DEPLOY.md for the full matrix.


Architecture

One protocol in front; a swappable engine and storage behind it. A client never names a language, a backend, or whether the engine is embedded or remote — it only ever sees the protocol.

%%{init: {'theme':'base','themeVariables':{
  'background':'#020618','primaryColor':'#0b1426','primaryTextColor':'#e6edf6','primaryBorderColor':'#2b3a52',
  'lineColor':'#7c8aa0','secondaryColor':'#0b1426','tertiaryColor':'#0b1426','fontFamily':'ui-sans-serif, system-ui, sans-serif',
  'clusterBkg':'#0b1426','clusterBorder':'#22304a'}}}%%
flowchart LR
  CLIENTS["5 native clients<br/>TS · Go · .NET · Python · Rust"]
  CLIENTS -->|"WebSocket protocol"| SVC

  subgraph SVC["smooth-operator · service"]
    PROTO["Protocol layer"] --> RT["KnowledgeChatRuntime"]
  end

  RT -->|"Agent::run"| ENGINE["smooth-operator-core<br/>5-language engine"]
  ENGINE -->|"LlmProvider"| GW[("llm.smoo.ai<br/>or BYO gateway")]
  RT -->|"StorageAdapter"| KB[("Knowledge + conversations<br/>pgvector / DynamoDB + S3 Vectors / in-memory")]

  classDef warm fill:#f49f0a,stroke:#ff6b6c,color:#1a0f00;
  classDef teal fill:#00a6a6,stroke:#00c2c2,color:#011;
  class ENGINE warm
  class GW,KB teal
Loading

An agent turn, end to end

%%{init: {'theme':'base','themeVariables':{
  'background':'#020618','primaryColor':'#0b1426','primaryTextColor':'#e6edf6','primaryBorderColor':'#2b3a52',
  'lineColor':'#7c8aa0','actorBkg':'#0b1426','actorBorder':'#2b3a52','actorTextColor':'#e6edf6',
  'signalColor':'#7c8aa0','signalTextColor':'#e6edf6','noteBkgColor':'#f49f0a','noteTextColor':'#1a0f00','noteBorderColor':'#ff6b6c',
  'fontFamily':'ui-sans-serif, system-ui, sans-serif'}}}%%
sequenceDiagram
  participant C as Client
  participant S as Service
  participant A as Agent
  participant K as Knowledge / Tools
  participant L as LLM gateway

  C->>S: send_message { sessionId, message }
  S->>A: run turn (replay prior messages)
  S-->>C: immediate_response (202, ack)
  A->>K: knowledge_search("return window")
  K-->>A: top-K snippets (the 17-day fact)
  A->>L: chat completion (grounded prompt)
  L-->>A: token deltas …
  A-->>S: TokenDelta / PhaseStart / ToolCallComplete
  S-->>C: stream_token "Our" "return" "window" …
  S-->>C: stream_chunk { node: response_gen }
  A-->>S: Completed { cost, tokens }
  S-->>C: eventual_response (200, final)
Loading

Protocol lifecycle (incl. HITL)

%%{init: {'theme':'base','themeVariables':{
  'background':'#020618','primaryColor':'#0b1426','primaryTextColor':'#e6edf6','primaryBorderColor':'#2b3a52',
  'lineColor':'#7c8aa0','secondaryColor':'#0b1426','tertiaryColor':'#0b1426','fontFamily':'ui-sans-serif, system-ui, sans-serif'}}}%%
stateDiagram-v2
  [*] --> Connected: connect
  Connected --> SessionOpen: create_session
  SessionOpen --> Streaming: send_message
  Streaming --> Streaming: stream_token · chunk
  Streaming --> AwaitingApproval: confirm_required
  AwaitingApproval --> Streaming: approve
  Streaming --> AwaitingOtp: otp_required
  AwaitingOtp --> Streaming: verify_otp
  Streaming --> SessionOpen: eventual_response
  SessionOpen --> [*]: disconnect
Loading

Full action/event tables, the AgentEvent mapping, and connection-state keys are in docs/PROTOCOL.md.


The polyglot story (honest status)

One protocol, defined once in spec/ (JSON Schema). Everything else is generated or hand-written to match it.

Surface Status
Engine (smooth-operator-core) 5-language parity engine — Rust · C# · Python · TypeScript · Go, each published (crates.io / NuGet / PyPI / npm / Go module). Rust is the reference; the others mirror its surface.
Protocol clients All five languages — TypeScript (@smooai/smooth-operator), Go, .NET (with a Microsoft.Extensions.AI IChatClient facade), Python, Rust. The TS side also ships a React binding and an embeddable widget.
Servers All five languages — Rust · C# · Python · TypeScript · Go, each consuming its own language's engine so a host can run the full service in its native stack. Rust + C# carry the full surface (ingestion, admin, ACL, storage adapters); Python/TS/Go are native servers (transport · frame dispatch · per-turn engine · sessions · auth · graceful drain). All five run the shared scenario conformance corpus — protocol parity, tested.

All five native servers now exist and run the same spec/conformance/scenarios corpus — driven by the engine's deterministic mock, so they must produce identical protocol output (the corpus already caught and fixed real error-handling divergences in the TS and C# servers). The Rust + C# servers carry the full surface; the Python/TS/Go servers are native and at protocol parity, growing toward the full feature surface. The five clients, five engines, and five servers are all real.


Test-driven by default

Nothing here is vibe-coded — it's verified against a real LLM gateway. Substring tests prove a reply contains the right number; an LLM-as-judge proves the agent reasoned its way there and didn't hallucinate. We run both.

%%{init: {'theme':'base','themeVariables':{
  'background':'#020618','primaryColor':'#0b1426','primaryTextColor':'#e6edf6','primaryBorderColor':'#2b3a52',
  'lineColor':'#7c8aa0','secondaryColor':'#0b1426','tertiaryColor':'#0b1426','fontFamily':'ui-sans-serif, system-ui, sans-serif'}}}%%
flowchart TD
  U["Unit tests<br/>chunker · SSRF guard · can_access"] --> C
  C["Testcontainers conformance<br/>pgvector + DynamoDB-Local"] --> E
  E["Live cross-language E2E<br/>all 5 clients, real WebSocket turns"] --> J
  J["LLM-as-judge quality evals<br/>real gateway, rubric-scored 1–5"]

  classDef warm fill:#f49f0a,stroke:#ff6b6c,color:#1a0f00;
  classDef teal fill:#00a6a6,stroke:#00c2c2,color:#011;
  class U teal
  class J warm
Loading

All five native servers run a shared scenario conformance corpus (spec/conformance/scenarios) — language-neutral protocol flows driven by the engine's deterministic mock, so every server must produce identical output. That's the polyglot parity oracle, on top of each server's own protocol/ingestion/ACL/rerank/embedder suites and the engine's offline suite (337 tests on a deterministic MockLlmClient). The five protocol clients are exercised against a real WebSocket in a cross-language E2E harness.

The proof story

The headline isn't a count — it's a real defect a substring test would have missed. On the first live run, our LLM-as-judge scored a multi-turn answer 1/5: the runtime built a fresh agent per turn, so turn 2 had no memory of turn 1's delivery date and couldn't compute the last return day. A contains("the 22nd") assertion would have stayed green on a hallucinated guess. The judge caught it; the fix wired per-session memory; it now scores 5/5.

That's the whole bet: quality regressions that only a grader can see, caught in CI. Details — the five scenarios, the rubric, the same-model-judge knob — in docs/EVALS.md.

Gated, never silently skipped

Live tests need a gateway key. They are gated, not deleted: with SMOOTH_AGENT_E2E=1 + SMOOAI_GATEWAY_KEY they run (and print every per-scenario score under --nocapture); without them they print an explicit skip and return — so credential-free cargo test and CI stay green, and the nightly job runs the full live suite. The gateway key is read from the environment and never printed.

# Unit + conformance — no creds, runs everywhere
cd rust && cargo test

# + live LLM-as-judge evals
export SMOOAI_GATEWAY_KEY=sk-… SMOOTH_AGENT_E2E=1
cargo test -p smooai-smooth-operator-evals --test llm_judge -- --nocapture --test-threads=1

Smoo-powered or bring-your-own

A recurring principle across the whole stack: same code, two postures.

Capability Smoo-powered (hosted) Bring-your-own (self-host)
LLM gateway llm.smoo.ai any OpenAI-compatible endpoint
Embeddings gateway (text-embedding-3-small) DeterministicEmbedder or your provider
Web search Smoo provider Brave / Bing / Tavily via WebSearchProvider
Identity / RBAC Smoo identity (AUTH_MODE=smoo) AUTH_MODE=jwt (BYO JWT/OIDC)
Connectors managed GitHub/Slack apps your tokens, same Connector trait

Self-host brings their own; hosted wires Smoo's apps. The seams are identical — see docs/INGESTION.md, docs/TOOLS.md, and docs/STORAGE.md.


The two-repo split

Repo What it is
smooth-operator-core The agent engineAgent, Workflow, Tool, CheckpointStore, LlmProvider, Memory, KnowledgeBase. A 5-language parity engine (Rust · C# · Python · TypeScript · Go), each published.
smooth-operator (this repo) The service — conversations, knowledge ingestion + retrieval, the tool catalog, the WebSocket protocol, the five clients, the management console, and the Kubernetes / AWS / local deploy flavors.

Repository layout

smooth-operator/
├── spec/         # The language-neutral wire protocol (JSON Schema) — source of truth for all clients
├── rust/         # Reference server + service crate (smooai-smooth-operator) + adapters, lambda, evals, ingestion
├── typescript/   # @smooai/smooth-operator — client + React binding + embeddable widget
├── go/           # github.com/SmooAI/smooth-operator/go — protocol.Client
├── dotnet/       # SmooAI.SmoothOperator — client (+ Microsoft.Extensions.AI facade) and the C# server
├── python/       # smooth-operator (import smooth_operator) — async client
├── console/      # Next.js management console for the auth-gated /admin/* API
├── adapters/     # Storage adapters: postgres (pgvector) and dynamodb (S3 Vectors)
├── deploy/
│   ├── k8s/      # Kubernetes (Helm + ArgoCD) — Postgres + pgvector + Redis/NATS backplane
│   ├── sst/      # AWS serverless (API GW WebSocket + Lambda + DynamoDB + S3 Vectors)
│   └── local/    # Local / embed-in-process — in-memory, auth off, no external services
└── docs/         # Architecture, protocol, storage, evals, ingestion, access-control, observability, deploy, roadmap

Run it hosted

Don't want to operate it yourself? lom.smoo.ai runs smooth-operator as a managed, multi-tenant service.

Documentation

Doc What
docs/ARCHITECTURE.md System design, the agent pipeline, how it consumes the engine
docs/PROTOCOL.md The schema-driven WebSocket protocol
docs/STORAGE.md The StorageAdapter trait; Postgres and DynamoDB/S3 Vectors designs
docs/EVALS.md The LLM-as-judge quality harness (the 1/5 → 5/5 story)
docs/INGESTION.md Connectors, chunking, the embedder seam
docs/TOOLS.md The built-in tool catalog + authoring your own
docs/ACCESS-CONTROL.md Document-level ACLs over org isolation
docs/ADMIN-API.md The auth-gated /admin/* API the console consumes
docs/OBSERVABILITY.md OpenTelemetry gen_ai.* tracing
docs/DEPLOY.md The three deploy flavors + the shared SmooAI/deploy package
docs/Planning/Roadmap.md Phased build plan + current status

🧩 Part of Smoo AI {#part-of-smoo-ai}

smooth-operator is built and open-sourced by Smoo AI — the AI-powered business platform with AI built into every product: CRM, customer support, campaigns, field service, observability, and developer tools.

🤝 Contributing

Built in the open, test-first. Issues and PRs welcome — see the docs vault for architecture, protocol, and the eval harness, and docs/Planning/Roadmap.md for what's queued.

📄 License

MIT © 2026 Smoo AI. See LICENSE.


Built by Smoo AI — AI built into every product.

About

Polyglot AI agent service — knowledge chat, tools, durable checkpoints, human-in-the-loop, and multi-participant conversations over one schema-driven WebSocket protocol. Built on the smooth-operator-core engine (5-language parity). Deploy to Kubernetes, AWS serverless, or run locally. Hosted at lom.smoo.ai.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors