English | 中文
LLM-supervised persistent memory for AI agents.
LLM agents forget everything between sessions. Context compaction drops critical decisions, cross-session knowledge vanishes, and long conversations push early information out of the window.
Mnemon gives your agent persistent, cross-session memory — a four-graph knowledge store with intent-aware recall, importance decay, and automatic deduplication. Single binary, zero API keys, one setup command.
Claude Max / Pro subscriber? Mnemon works entirely through your existing subscription — no separate API key required. Your LLM subscription is the intelligence layer. Two commands and you're done.
Most memory tools embed their own LLM inside the pipeline. Mnemon takes a different approach: your host LLM is the supervisor. The binary handles deterministic computation (storage, graph indexing, search, decay); the LLM makes judgment calls (what to remember, how to link, when to forget). No middleman, no extra inference cost.
| Pattern | LLM Role | Representative |
|---|---|---|
| LLM-Embedded | Executor inside the pipeline | Mem0, Letta |
| File Injection | None — reads file at session start | Claude Code Memory |
| MCP Server | Tool provider via MCP protocol | claude-mem |
| LLM-Supervised | External supervisor of a standalone binary | Mnemon |
Mnemon also addresses a gap in the protocol stack. MCP standardizes how LLMs discover and invoke tools. ODBC/JDBC standardizes how applications access databases. But how LLMs interact with databases using memory semantics — this layer has no protocol. Mnemon's three primitives — remember, link, recall — form an intent-native protocol: command names map to the LLM's cognitive vocabulary (remember not INSERT, recall not SELECT), and output is structured JSON with signal transparency rather than raw database rows.
The LLM-Supervised pattern: hooks drive the lifecycle, the host LLM makes judgment calls, the binary handles deterministic computation.
Memory has a compound interest effect — the longer it accumulates, the greater its value. LLM engines iterate constantly, skill files cost nearly nothing to write, but memory is a private asset that grows with the user. It is the only component in the agent ecosystem worth deep investment.
A real knowledge graph built by Mnemon — 87 insights, 2150 edges across four graph types.
See Design & Architecture for details.
Homebrew (macOS / Linux):
brew install mnemon-dev/tap/mnemonGo install:
go install github.com/mnemon-dev/mnemon@latestFrom source:
git clone https://github.com/mnemon-dev/mnemon.git && cd mnemon
make installVerify installation:
mnemon --versionmnemon setupmnemon setup auto-detects Claude Code, then interactively deploys skill, hooks, and behavioral guide. Start a new session — memory just works.
mnemon setup --target openclaw --yesOne command deploys skill, hook, plugin, and behavioral guide to ~/.openclaw/. Restart the OpenClaw gateway to activate.
NanoClaw runs agents inside Linux containers. Use the /add-mnemon skill to integrate:
- Install mnemon on the host (see above)
- In your NanoClaw project, run
/add-mnemon— Claude Code will modify the Dockerfile, add a container skill, and set up volume mounts - Each WhatsApp group gets its own isolated memory store, with optional global shared memory (read-only)
The skill is available at .claude/skills/add-mnemon/ in the NanoClaw repo.
mnemon setup --ejectOnce set up, memory operates through a lightweight harness: SKILL.md teaches
commands, GUIDELINE.md teaches judgment, hooks remind the agent at lifecycle
boundaries, and the mnemon binary executes deterministic memory operations.
Supported setup commands automate this, but the harness is installable from
markdown alone.
Session starts
|
v
Prime -> make skill, guideline, and active store visible
|
v
User prompt arrives
|
v
Remind -> decide whether recall could change this task
|
v
Agent works and calls Mnemon only when useful
|
v
Nudge -> decide whether durable writeback is justified
|
v
Before context compaction
|
v
Compact -> preserve only critical continuity
The four hook phases are reminders, not a hard workflow. Prime makes the skill, guideline, and active store visible. Remind prompts a recall decision. Nudge prompts a writeback decision. Compact preserves only critical continuity before context compression.
You don't run mnemon commands yourself. The agent does when the guideline says memory is useful.
- Zero user-side operation — install once; supported runtimes can use hooks, minimal runtimes can use persistent rules
- LLM-supervised — the host LLM decides what to remember, update, and forget; no embedded LLM, no API keys
- Markdown-installable harness —
SKILL.md,INSTALL.md,GUIDELINE.md, and four lifecycle reminders - Four-graph architecture — temporal, entity, causal, and semantic edges, not just vector similarity
- Intent-native protocol — three primitives (
remember,link,recall) map to the LLM's cognitive vocabulary, not database syntax; structured JSON output with signal transparency - Intent-aware recall — graph traversal + optional vector search (RRF fusion), enabled by default for all queries
- Built-in deduplication —
rememberauto-detects duplicates and conflicts; skips or auto-replaces - Retention lifecycle — importance decay, access-count boosting, and garbage collection
- Optional embeddings — works fully without Ollama; add local Ollama for enhanced vector+keyword hybrid search
All your local agentic AIs — across sessions and frameworks — sharing one pool of live memory.
Claude Code ──┐
│
OpenClaw ─────┤
│
NanoClaw ─────┤
├──▶ ~/.mnemon ◀── shared memory
OpenCode ─────┤
│
Gemini CLI ───┘
The foundation is in place: a single ~/.mnemon database that any agent can
read and write. Claude Code setup automates hook installation; OpenClaw can use
plugin hooks; NanoClaw integrates via container skills and volume mounts. The
same harness can be installed in any LLM CLI that supports skills, rules,
system prompts, or event hooks.
The longer-term direction is a memory gateway: protocol decoupled from storage engine. The current SQLite backend is the first adapter; the protocol surface (remember / link / recall) can sit on top of PostgreSQL, Neo4j, or any graph database. Agent-side optimization (when to recall, what to remember) and storage-side optimization (indexing, graph algorithms) evolve independently. See Future Direction for details.
Do different sessions share memory?
Yes. By default, all sessions use the same default store — a decision remembered in one session is available in every future session.
Can I isolate memory per project or agent? Yes. Use named stores to separate memory:
mnemon store create work # create a new store
mnemon store set work # set as default
MNEMON_STORE=work mnemon recall "query" # or use env var per-processDifferent agents/processes can use different stores via the MNEMON_STORE environment variable — no global state contention.
Local or global mode?
mnemon setup defaults to local (project-scoped .claude/), recommended for most users. Global (mnemon setup --global, installed to ~/.claude/) activates mnemon across all projects — convenient if you want other frameworks (e.g., OpenClaw) to share memory by forwarding requests through Claude Code CLI, but may add maintenance overhead.
How do I customize the behavior?
Edit the generated guideline (~/.mnemon/prompt/guide.md in current setup
flows) or use the installable memory loop GUIDE
as the source. The skill file should stay focused on command syntax.
What is sub-agent delegation?
Sub-agent delegation is optional. When a runtime supports it, the main agent can
decide what to remember and ask a cheaper or isolated worker to execute
mnemon remember. It is a useful execution strategy, not a required part of the
Mnemon architecture.
| Environment Variable | Default | Description |
|---|---|---|
MNEMON_DATA_DIR |
~/.mnemon |
Base data directory |
MNEMON_STORE |
(active file or default) |
Named memory store for data isolation |
Ollama-specific (only relevant if using embeddings):
| Environment Variable | Default | Description |
|---|---|---|
MNEMON_EMBED_ENDPOINT |
http://localhost:11434 |
Ollama API endpoint |
MNEMON_EMBED_MODEL |
nomic-embed-text |
Embedding model name |
make build # build binary
make install # build + install to $GOBIN
make test # run E2E test suite
mnemon setup # interactive setup
mnemon setup --eject # remove all integrations
make help # show all targetsDependencies: Go 1.24+, modernc.org/sqlite, spf13/cobra, google/uuid
See Development and Deployment for Docker, Compose, Ollama embedding, and release setup.
- Modular Self-Evolution Harness — formal harness docs for modular agent, memory loop, and skill loop design
- Memory Loop Harness — installable memory loop assets
- Skill Loop Harness — installable skill loop assets
- Design & Architecture — current engine architecture, algorithms, integration design
- Usage & Reference — CLI commands, embedding support, architecture overview
- Architecture Diagrams — system architecture, pipelines, lifecycle management
Mnemon combines the paradigm of one paper with the methodology of another, grounded in the structural insight that graph memory is isomorphic to LLM attention. See Theoretical Foundations for details.
- RLM — Zhang, Kraska & Khattab. Recursive Language Models. 2025. Establishes the paradigm: LLMs are more effective as orchestrators of external environments than as direct data processors.
- MAGMA — Zou et al. A Multi-Graph based Agentic Memory Architecture. 2025. Provides the methodology: four-graph model (temporal, entity, causal, semantic) with intent-adaptive retrieval.
- Graph-LLM Structural Insight — Joshi & Zhu. Building Powerful GNNs from Transformers. 2025; and the Graph-based Agent Memory survey (Chang Yang et al., 2026). Confirms that LLM attention is computationally equivalent to GNN operations — graph memory is a structural match, not an engineering convenience.