TeaAgent is a governance-first agent harness that gives an LLM-controlled agent access to a workspace directory through registered tools. The primary threats are:
- Model misbehavior — the LLM generates tool calls that escape the workspace, execute dangerous shell commands, or exfiltrate data.
- Untrusted MCP clients — remote MCP clients that attempt unauthorized tool execution or credential theft.
- Network attackers — attackers on the same network that intercept MCP HTTP traffic or replay authenticated sessions.
- Multi-tenant workspace collision — concurrent agent runs on the same workspace root corrupting each other's state.
- Prompt injection — attacker-controlled content in workspace files influencing the agent's decision-making loop.
TeaAgent assumes the LLM provider and local process boundary are trusted. It does not protect against a compromised Python runtime, operating system, or LLM provider infrastructure.
| Mode | Read | File Write | Shell Mutate | Approval Required |
|---|---|---|---|---|
read-only |
Yes | No | No | N/A |
workspace-write |
Yes | Yes | No | N/A |
prompt |
Yes | Conditional | Conditional | Human-in-the-loop |
allow |
Yes | Yes | Yes | Session-scoped |
danger-full-access |
Yes | Yes | Yes | None |
Even in danger-full-access, the harness enforces:
- Workspace path confinement (tools reject
../, absolute paths, symlink escapes) - File size limits (
max_read_bytes,max_write_bytes,max_shell_output_bytes) - Shell command size limits (
max_shell_command_bytes) - Shell command timeout ceiling (
max_shell_timeout_seconds) - Iteration and tool-call budgets
Shell commands are classified as inspect or mutate before execution:
- Inspect:
pwd,ls,find,rg,grep,cat,head,tail,wc, safegitsubcommands (status,diff,log,show,branch,grep) - Mutate: everything else
The classifier (workspace_tools.py:classify_shell_command_policy) uses quote-aware
scanning to detect unquoted shell operators (>, <, |, &&, ;, `, $().
Quoted operators (e.g., git log --grep='>') are correctly classified as inspect
because the shell treats them as string content, not operators.
Property tests (tests/test_workspace_tools.py:ShellClassifierPropertyTests)
verify that:
- All inspect commands are classified as inspect
- All mutate commands are classified as mutate
- Quoted operators do not trigger mutate
- Actual redirect/pipe/chain operators trigger mutate
- Command substitution triggers mutate
- Workspace-escape paths trigger mutate
Inspect commands are executed via shlex.split() with shell=False, which:
- Prevents shell metacharacter expansion
- Prevents command chaining (the
;,&&,||operators are rejected by the classifier before reaching execution) - Rejects workspace-escaping arguments (
/,~,..)
- The classifier uses a heuristic approach. An adversarial prompt may discover edge cases. Report findings as described below.
shell_arg_escapes_workspaceonly checks literal path prefixes; it does not expand environment variables or glob patterns.- Commands that match the inspect allowlist but spawn subprocesses (e.g.,
githooks, custom pager configurations) can execute arbitrary code.
The MCP HTTP server enforces authentication at two layers:
- CLI layer (
cli/__init__.py:mcp_serve_command): refuses to start on a non-loopback host (0.0.0.0,::, external IPs) unless--auth-tokenor--oauth-issueris provided. - Library layer (
mcp_http.py:build_mcp_http_server): raisesValueErrorwhen constructed with a non-loopback host and noauth_tokenoroauth_server.
- Bearer token: static shared secret via
--auth-token. Every request must includeAuthorization: Bearer <token>. - OAuth 2.1 with DPoP: proof-of-possession token binding via
--oauth-issuer+--oauth-signing-key. Access tokens are bound to the client's DPoP key, preventing token replay by network observers.
Pass --allowed-origin (repeatable) to restrict browser-originated requests.
Without it, all origins are accepted.
The MCP HTTP server does not support TLS natively. When serving on non-loopback hosts, place a reverse proxy (nginx, Caddy, Cloudflare Tunnel) with TLS termination in front of the server.
Code Mode executes LLM-generated Python code only after AST allow-list validation (limited node types, restricted builtins, no imports, no attributes, no arbitrary function calls). It has two backends:
- Child process backend (default): runs
exec()in a child Python process with wall-clock timeout,RLIMIT_CPU, and best-effortRLIMIT_ASmemory limits. - Container backend (
ContainerCodeModeBackend): delegates execution to a Docker/Podman-style runtime with--network none,--read-only,--cap-drop=ALL,--security-opt=no-new-privileges, non-root--user, tmpfs/tmp, memory/swap limits, CPU ulimit, PID limit, optional image digest pinning, and optional image allowlist enforcement.
Code Mode is still not a complete production sandbox:
- The default backend runs inside the same Python interpreter family; isolate it from untrusted workloads.
- The container backend does not install a project-specific seccomp/AppArmor/SELinux
profile. Production callers should enable
require_image_digestandallowed_images. - Memory limits are advisory on macOS for the child-process backend.
For high-risk production use, prefer a hardened external execution service, VM sandbox, V8 isolate, or container runtime with mandatory security profiles.
All workspace tools enforce path confinement through resolve_workspace_path():
- Paths are resolved relative to the workspace root
../escapes, absolute paths, and symlink escapes are rejected.gitdirectories are excluded fromlist_filesandsearch_text
Write tools (workspace_write_file, workspace_apply_patch, workspace_edit_at_hash)
enforce max_write_bytes before any write occurs.
workspace_apply_patchrequires theoldtext to appear exactly once in the target file. If it appears multiple times, the edit is rejected — the caller must provide more context or useworkspace_edit_at_hash.workspace_edit_at_hashuses CRC32 line anchors (LINE#HASH|content). If the hash of the target line has changed (stale read), the edit is rejected.
Every tool call, approval decision, iteration, and final result is recorded in the audit log with:
- Per-run JSONL persistence (file mode
0600) - Argument redaction for sensitive keys (API keys, passwords, tokens, secrets)
- Result content redaction for stdout/stderr
- Truncation of long strings at 20,000 characters
Audit logs are append-only. TeaAgent does not rotate or expire audit files — set up
external log rotation for long-running deployments. The CLI provides manual lifecycle
commands: teaagent audit list, teaagent audit show, and teaagent audit prune.
- LLM API keys are read from environment variables only; never from files or command-line arguments
- MCP
--auth-tokenand--oauth-signing-keyare command-line arguments (visible inps). Prefer environment variables or a secrets manager for production SQLiteOAuthStorestores OAuth client secrets as PBKDF2-SHA256 hashes with per-client salts; the in-memory store keeps secrets in process memory only.- OAuth DPoP nonces are consumed on validation; replaying the same nonce fails.
- OAuth DPoP proof
jtivalues are cached for the proof freshness window; replaying the same proof against the authorization or resource server fails. - Audit logs redact keys matching
api_key,authorization,credential,password,secret,tokenin any casing
- The LLM adapter layer includes configurable exponential-backoff retry
(
LLMRetryConfig) for transient errors (HTTP 429, 5xx, connection failures). - Cost budget pre-flight (
RunBudget.check_cost_preflight) estimates the maximum possible cost of an LLM call before spending money, rejecting calls that would exceed the budget. - Every run has hard limits on iterations and tool calls, enforced by
AgentRunner.
AuditLogger and MemoryCatalog write JSONL through teaagent.storage.append_jsonl_line(),
which acquires an fcntl.LOCK_EX advisory lock and fsync()s after every append. On
platforms without fcntl (Windows), the lock is a best-effort no-op but the append
and fsync still run. RunStore final-state writes and UltraworkStore worker
records use teaagent.storage.atomic_write_text() (lock + temp file + os.replace).
Remaining concurrency limitations:
- The lock is per-file advisory; non-cooperating writers can still corrupt the file.
- Cross-host concurrency (NFS, SMB) is not supported — the
fcntllock is local. - For multi-worker production, use separate workspace roots per worker or replace the JSONL backend with a transactional store (SQLite, PostgreSQL).
Report security vulnerabilities to the project maintainers. Do not file public issues for security-sensitive findings.
Scope: vulnerabilities in TeaAgent's harness logic, tool governance, sandbox escape vectors, or authentication bypasses.
Out of scope: vulnerabilities in LLM providers, the Python standard library, the operating system, or upstream dependencies.