Local-first memory for every message you've ever sent.
Index your email and Slack — once. Recall it from your CLI, a local web UI, or any AI assistant via MCP. 100% on-device storage. BYO model: works with Ollama, OpenAI, Claude, Gemini, LM Studio, OpenRouter, Groq, Together, or anything OpenAI-compatible. Discord, WhatsApp, and iMessage on the way.
recallr is a tiny TypeScript engine that gives any AI assistant total
recall over every message you've ever sent — without uploading a single
byte. Maintained by Flowdesk.
What if your AI could remember every conversation you've ever had?
Today, when you ask Cursor or Claude "what did Ana decide about pricing in March?", they have nothing to go on. Your inbox lives in twelve different silos, none of which speak to your AI. Recallr fixes that — locally, with one command.
npx recallr index ~/Downloads/gmail-takeout.mbox
npx recallr index ~/Downloads/slack-export/
npx recallr ask "what did Ana decide about pricing?"After the customer interviews, Ana locked Q3 pricing on March 7 [#1].
She'd flagged it in the team Slack two days earlier [#4] and worked
through the open questions with Marc in email [#2][#3]:
- Pro tier at $19/month with a 20% annual prepay discount
- Education/Nonprofit Pro at $9.50/month, domain-verified
- Team tier discontinued; existing subs grandfathered through Dec 31
Sources:
[#1] 2026-03-07 · Ana Diaz · email · Re: Q3 pricing decision — LOCKED
[#2] 2026-03-04 · Ana Diaz · email · Re: Q3 pricing decision
[#3] 2026-03-03 · Marc Liu · email · Re: Q3 pricing decision
[#4] 2026-03-03 · Ana · slack · #general
- Local-first. Your messages never leave your machine. Embeddings run on-device via transformers.js. The LLM is whatever you point it at — Ollama, LM Studio, OpenAI, OpenRouter.
- One file, zero daemons. SQLite + FTS5 + dense vectors stored as
BLOBcolumns. Backup iscp recallr.db elsewhere. - Hybrid search. BM25 for precision, embeddings for recall, fused with min-max normalization. Works well immediately — no tuning required.
- MCP-native. A single
recallr mcpcommand exposes your memory to any MCP client (Cursor, Claude Desktop, Goose, Zed). No plugins, no configuration ceremony. - Hackable. ~3k lines of strict TypeScript across a handful of focused files. Add a new connector in an afternoon.
npm i -g recallr # global CLI
# or
npx recallr --help # zero-installRequires Node 20.10+. The default model (~33MB) downloads on first index.
# A local mbox export from Gmail, Apple Mail, Thunderbird, mutt, etc.
recallr index ~/mail.mbox
# A Slack workspace export (extract the .zip first)
unzip slack-export.zip -d slack-export/
recallr index ./slack-export/Or run recallr init, edit ~/.recallr/config.json, and add real sources:
{
"sources": [
{ "type": "mbox", "name": "takeout", "path": "~/Downloads/All mail Including Spam and Trash.mbox" },
{ "type": "slack", "name": "work", "path": "~/Downloads/slack-export/" },
{
"type": "imap",
"name": "fastmail",
"host": "imap.fastmail.com",
"user": "you@example.com",
"pass": "app-password-here",
"mailboxes": ["INBOX", "Sent", "Archive"]
}
]
}Then:
recallr index # syncs every configured source
recallr status # see what's in the databaseRecallr talks to any OpenAI-compatible chat endpoint — pick whichever one you want. Resolution order, most-specific wins:
- CLI flags (
--llm-base-url,--llm-model,--llm-api-key) — one-off per call - Env vars (
RECALLR_LLM_BASE_URL,RECALLR_LLM_MODEL,RECALLR_LLM_API_KEY) — per shell llmblock in~/.recallr/config.json— your persistent setup- Cloud-provider shortcut env vars — set one of these and you're done:
OPENAI_API_KEY→ OpenAI (gpt-5.5-mini)ANTHROPIC_API_KEY→ Anthropic Claude (claude-haiku-4-7-latest)GEMINI_API_KEY(orGOOGLE_API_KEY) → Google Gemini (gemini-3.0-flash)
- Default → Ollama at
http://localhost:11434/v1(llama3.2)
The recommended place for "this is my setup" is the config file:
{
"llm": {
"baseUrl": "https://openrouter.ai/api/v1",
"model": "anthropic/claude-opus-4.7",
"apiKey": "sk-or-..."
},
"sources": [ /* ... */ ]
}Env vars are still useful for "different model on this run" without editing the file; CLI flags for a single call.
# 1. Install Ollama: https://ollama.com
ollama serve # leave running in another terminal
ollama pull llama3.2 # ~2GB, one-time
# 2. That's it — recallr finds it automatically.
recallr ask "what did Ana decide about pricing?"Want a different local model? Either ollama pull qwen2.5:7b and:
export RECALLR_LLM_MODEL=qwen2.5:7b # bash / zsh
$env:RECALLR_LLM_MODEL = "qwen2.5:7b" # PowerShellOr pass it per-call: recallr ask --llm-model qwen2.5:7b "...".
export OPENAI_API_KEY=sk-... # bash / zsh
$env:OPENAI_API_KEY = "sk-..." # PowerShell
setx OPENAI_API_KEY "sk-..." # PowerShell, persistent
recallr ask "..." # uses gpt-5.5-mini
recallr ask --llm-model gpt-5.5 "..." # any OpenAI modelRecallr uses Anthropic's official OpenAI-compat layer — no extra config beyond an API key:
export ANTHROPIC_API_KEY=sk-ant-... # bash / zsh
$env:ANTHROPIC_API_KEY = "sk-ant-..." # PowerShell
recallr ask "..." # uses claude-haiku-4-7-latest
recallr ask --llm-model claude-sonnet-4-7-latest "..."
recallr ask --llm-model claude-opus-4-7-latest "..."Recallr uses Gemini's OpenAI-compat layer. Get a free key at aistudio.google.com:
export GEMINI_API_KEY=AIza... # bash / zsh
$env:GEMINI_API_KEY = "AIza..." # PowerShell
recallr ask "..." # uses gemini-3.0-flash (fast + free tier)
recallr ask --llm-model gemini-3.1-pro "..."GOOGLE_API_KEY is accepted as an alias for GEMINI_API_KEY for compatibility with Google's other SDKs.
Start LM Studio's local server, then:
recallr ask --llm-base-url http://localhost:1234/v1 \
--llm-model my-local-model "..."Or set it permanently:
export RECALLR_LLM_BASE_URL=http://localhost:1234/v1
export RECALLR_LLM_MODEL=my-local-model# Example: OpenRouter (gives you Claude, GPT-4, Llama, Gemini, ... behind one URL)
export RECALLR_LLM_BASE_URL=https://openrouter.ai/api/v1
export RECALLR_LLM_MODEL=anthropic/claude-opus-4.7
export RECALLR_LLM_API_KEY=sk-or-...
# Example: Groq (extremely fast)
export RECALLR_LLM_BASE_URL=https://api.groq.com/openai/v1
export RECALLR_LLM_MODEL=llama-3.3-70b-versatile
export RECALLR_LLM_API_KEY=gsk_...
# Example: Together
export RECALLR_LLM_BASE_URL=https://api.together.xyz/v1
export RECALLR_LLM_MODEL=meta-llama/Llama-3.3-70B-Instruct-Turbo
export RECALLR_LLM_API_KEY=...
recallr ask "..."Run recallr ask --help to see all the per-call overrides.
recallr ask "what did the team decide about pricing?"
recallr ask "summarize what Ana said this quarter" --source mbox
recallr ask "find the figma link for the onboarding redesign" --show-context
recallr ask -k 16 "what's the latest from Marc?" # pull more contextrecallr serve
# → http://127.0.0.1:7474 (auto-opens in your browser)A clean local chat UI with:
- Streaming answers — citations land first, tokens flow in as the model writes
- Faceted search — filter the next question by source / date range / participant
- Thread browser — recent-conversation rail on the left; click to open
- Theme toggle — dark / light / system (your choice persists across reloads)
Click any citation to expand the full thread inline. Bound to 127.0.0.1 only —
your messages never touch a network.
recallr serve --port 9000 # different port
recallr serve --host 0.0.0.0 # expose on LAN (use carefully)
recallr serve --no-open # don't auto-open the browser
recallr serve --no-embed # lexical-only (skip loading the embedder)Claude Desktop — add to claude_desktop_config.json:
{
"mcpServers": {
"recallr": {
"command": "npx",
"args": ["-y", "recallr", "mcp"]
}
}
}Cursor — Settings → MCP → add server:
{
"name": "recallr",
"command": "npx",
"args": ["-y", "recallr", "mcp"]
}Now ask Cursor/Claude things like "summarize every conversation I had with Ana about pricing this year" and it will call search_messages against your local index, with citations.
The MCP server exposes four tools:
| Tool | Purpose |
|---|---|
search_messages |
Hybrid BM25 + embedding search, with source/date/people filters |
get_message |
Fetch a single message by id |
get_thread |
Fetch the full conversation containing a message |
status |
Report database stats by source |
Recallr reads (in priority order) explicit overrides → environment variables →
~/.recallr/config.json → built-in defaults.
| Variable | Default | Purpose |
|---|---|---|
RECALLR_HOME |
~/.recallr |
Where the database, model cache, and config live |
RECALLR_DB |
$RECALLR_HOME/recallr.db |
Path to the SQLite database file |
RECALLR_EMBED_MODEL |
Xenova/bge-small-en-v1.5 |
Hugging Face id of the embedding model |
RECALLR_EMBED_DIM |
384 |
Vector dimension produced by the embedder |
RECALLR_LLM_BASE_URL |
(auto) | OpenAI-compatible base URL |
RECALLR_LLM_MODEL |
(auto) | Model id passed to the LLM |
RECALLR_LLM_API_KEY |
(none) | Bearer token for the LLM endpoint |
OPENAI_API_KEY |
(none) | Shortcut: enables OpenAI (gpt-5.5-mini) |
ANTHROPIC_API_KEY |
(none) | Shortcut: enables Anthropic (claude-haiku-4-7-latest) |
GEMINI_API_KEY |
(none) | Shortcut: enables Google Gemini (gemini-3.0-flash) |
GOOGLE_API_KEY |
(none) | Alias for GEMINI_API_KEY |
The same fields are settable in ~/.recallr/config.json:
{
"embedModel": "Xenova/bge-small-en-v1.5",
"embedDimension": 384,
"llm": {
"baseUrl": "https://api.openai.com/v1",
"model": "gpt-5.5-mini",
"apiKey": "sk-..."
},
"sources": [ /* ... see Quickstart ... */ ]
}Heads up: API keys committed to a config file are still secrets. If you share
config.json(e.g. in dotfiles) prefer leavingapiKeyout and exportingRECALLR_LLM_API_KEYfrom your shell instead.
recallr ask says "failed to reach LLM at http://localhost:11434/v1"
You don't have Ollama running and no provider env var is set. Either:
- start Ollama (
ollama serve+ollama pull llama3.2), or - set one of
OPENAI_API_KEY/ANTHROPIC_API_KEY/GEMINI_API_KEY, or - point at any OpenAI-compatible endpoint via
RECALLR_LLM_BASE_URL+RECALLR_LLM_MODEL.
See Connect an LLM for full instructions.
recallr ask says "LLM returned 401"
The RECALLR_LLM_API_KEY (or OPENAI_API_KEY) is missing or wrong for
the base URL you're using. Double-check that the key matches the provider
of RECALLR_LLM_BASE_URL.
recallr ask says "LLM returned 404 / model not found"
The model id in RECALLR_LLM_MODEL doesn't exist on that endpoint. List
available models from the provider's docs and set RECALLR_LLM_MODEL
(or pass --llm-model per call).
recallr index is slow on first run
The embedding model (~33MB, Xenova/bge-small-en-v1.5) downloads once
into ~/.recallr/. After that indexing is fast. Pass --no-embed for a
~10× faster lexical-only index if you want a quick smoke test.
recallr status shows 0 messages
Run recallr init, edit ~/.recallr/config.json to add real sources,
then recallr index. Or just recallr index <path-to-mbox-or-slack-export>.
MCP tools don't show up in Cursor/Claude Desktop
Confirm the absolute path to npx resolves on the host (some configs need
"command": "/usr/local/bin/npx" or the full Windows path). On first call
the model is downloaded — give it 10-20s.
┌─────────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Connectors │ │ Indexer │ │ Store │
│ ─────────────────── │ │ ──────────────── │ │ ──────────────── │
│ IMAP mbox │ ─▶ │ fetch → embed │ ─▶ │ SQLite + FTS5 │
│ Slack │ │ → upsert │ │ + dense vectors │
│ Gmail / Discord │ │ (idempotent) │ │ (Float32 BLOBs) │
│ (v0.2) │ │ │ │ │
└─────────────────────┘ └──────────────────┘ └──────────────────┘
│
┌───────────────────────────┼────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌────────────────┐ ┌───────────┐
│ recallr ask │ │ recallr mcp │ │ recallr │
│ (RAG, CLI) │ │ (Cursor/Claude)│ │ serve (UI)│
└──────────────┘ └────────────────┘ └───────────┘
| Source | Live or one-shot? | Status |
|---|---|---|
| IMAP (Fastmail, iCloud, Proton, …) | Live | shipped |
| mbox (Gmail Takeout, Apple Mail, …) | One-shot file | shipped |
Slack workspace export.zip |
One-shot folder | shipped |
| Gmail API | Live | v0.2 |
| Slack live API | Live | v0.2 |
| Discord export, WhatsApp, iMessage | One-shot folder | v0.2 |
Each connector normalizes its source into a single Message shape. The indexer is idempotent: re-running recallr index only fetches what's new and only embeds what hasn't been embedded. Search is hybrid — FTS5 BM25 pulls candidates, embedding cosine reranks them, results are fused by min-max-normalized score.
For corpora under ~250k messages everything fits comfortably on a laptop. Past that, swap in sqlite-vec (planned for v0.3).
Everything the CLI does is also a public TypeScript API. Embed recallr inside your own Node service to give each of your users a queryable knowledge graph over their own messages.
import {
SqliteStore,
LocalEmbedder,
MboxConnector,
SlackExportConnector,
indexConnector,
ask,
llmFromEnv,
} from "recallr";
const store = await SqliteStore.open("./alice.recallr.db");
const embedder = await LocalEmbedder.load();
await indexConnector({
connector: new MboxConnector("./alice.mbox"),
store,
embedder,
});
await indexConnector({
connector: new SlackExportConnector({ path: "./alice-slack-export/" }),
store,
embedder,
});
const result = await ask({
question: "what did Ana decide about pricing?",
store,
llm: llmFromEnv(),
embedder,
});
console.log(result.answer);
console.log(result.citations.map((c) => c.message.subject));The full type surface is exported from recallr and recallr/mcp.
recallr is brand new. The roadmap is community-driven — open an issue
if you want to drive a track. Shipped versions live in the
changelog.
v0.3 — more sources
- Gmail API connector (live + Takeout)
- Slack live API connector (export.zip works today)
- Discord export connector
- WhatsApp chat exports
- iMessage (macOS
chat.dbreader) - Slack zip-file ingestion (today: extract first, then point at the directory)
v0.4 — performance & scale
sqlite-vecbackend for >100k message corpora- Int8 / binary vector quantization (4–32× smaller index)
- Incremental re-embed on model upgrades
recallr watchdaemon: continuously sync configured live sources
v1.0 — polish
- Encrypted-at-rest mode (libsodium-wrapped db)
- Per-source redaction rules
- Connector plugin system (
recallr-connector-*packages)
- Node ≥ 20.10 (the engines field is enforced — older versions miss
Float32Arrayfeatures used by the embedder) - Git
- An LLM endpoint for the
askcommand. The fastest free path is Ollama:ollama serve && ollama pull llama3.2. See "Connect an LLM" above for cloud alternatives. - Build tools for
better-sqlite3(auto-built on install):- macOS: nothing — Xcode CLT is enough
- Linux:
python3,make,g++ - Windows: ships with
node-gypprebuilt; if your install fails, runnpm install --global windows-build-toolsonce
git clone https://github.com/flowdesktech/recallr && cd recallr
npm install
npm run typecheck
npm run test # 45 tests, ~2s
npm run build # produces dist/ and dist-web/You can now drive everything from dist/:
node dist/cli/bin.js init
node dist/cli/bin.js index ./examples/sample.mbox
node dist/cli/bin.js ask "what did the team decide about pricing?"
node dist/cli/bin.js serve
node dist/cli/bin.js mcpThe included examples/sample.mbox is a tiny multi-thread fixture so you
can exercise the full pipeline without touching real mail.
Pick the loop that matches what you're changing.
npm run dev # tsup --watch
# in another shell:
node dist/cli/bin.js ask "your question here"tsup --watch rebuilds dist/ on every save. The CLI runs against the
freshly-built bundle each time. For fast inner-loop testing:
npm run test:watch # vitest watchThe web UI lives in web/ and is built into dist-web/ by Vite.
For UI iteration you want two processes: the recallr API and Vite's
dev server (with HMR). Vite is pre-configured to proxy /api to
localhost:7474, so it's all transparent.
# terminal 1 — start the recallr backend on its production port
node dist/cli/bin.js serve --no-open
# terminal 2 — start Vite with hot reload
npm run dev:web
# → http://localhost:5173 (proxies /api → :7474 automatically)Edit anything under web/src/ and the page hot-reloads in milliseconds.
When you're happy, npm run build:web regenerates dist-web/ so
recallr serve ships the new UI.
Everything respects two env vars that let you keep dev runs out of your
real ~/.recallr:
export RECALLR_HOME=/tmp/recallr-scratch
export RECALLR_DB=/tmp/recallr-scratch/dev.db
node dist/cli/bin.js init
node dist/cli/bin.js index ./examples/sample.mboxThe integration test suite uses exactly this pattern — see
src/server/server.test.ts.
ask and the web UI need some LLM. If you don't have one configured,
recallr defaults to Ollama at http://localhost:11434/v1 and will print
a friendly error if it can't reach it. Quick options:
# Free, local, runs offline:
ollama serve && ollama pull llama3.2
# Cloud (one env var each):
export OPENAI_API_KEY=sk-... # → gpt-5.5-mini
export ANTHROPIC_API_KEY=sk-ant-... # → claude-haiku-4-7-latest
export GEMINI_API_KEY=AIza... # → gemini-3.0-flash
# Any OpenAI-compatible endpoint:
export RECALLR_LLM_BASE_URL=https://openrouter.ai/api/v1
export RECALLR_LLM_MODEL=anthropic/claude-opus-4.7
export RECALLR_LLM_API_KEY=sk-or-...Or set these once in ~/.recallr/config.json under the llm block —
they survive across shells. See "Connect an LLM" earlier in this README
for the full precedence ladder.
| Script | What it does |
|---|---|
npm run dev |
Watch-mode build of the server / CLI bundle |
npm run dev:web |
Vite dev server with HMR for the web UI |
npm run build |
Production build of both server (dist/) and web (dist-web/) |
npm run typecheck |
tsc --noEmit for both server and web TS projects |
npm run test |
Full vitest suite (no network required) |
npm run test:watch |
Vitest in watch mode |
npm run lint |
Biome lint + format check |
npm run format |
Apply Biome formatting in place |
npm run demo |
Index examples/sample.mbox and ask a canned question |
src/
cli/ Commander-based CLI (recallr <command>)
connectors/ Source adapters: mbox, IMAP, Slack export
embed/ On-device embedder (transformers.js)
llm/ OpenAI-compatible chat client (also streaming)
mcp/ MCP server exposing search/ask as agent tools
server/ Local HTTP + SSE server backing the web UI
store/ SQLite + FTS5 + dense-vector store
ask.ts RAG pipeline (sync and streaming)
config.ts ~/.recallr/config.json loader + precedence
indexer.ts Connector → embedder → store wiring
types.ts Domain types: Message, Store, Connector, LlmClient
web/
src/ React UI (App, Sidebar, FilterBar, Composer, ...)
vite.config.ts Dev server config (proxies /api → :7474)
examples/
sample.mbox Multi-thread fixture used by tests + the demo
dist/ Server / CLI build output (npm publishes this)
dist-web/ Web UI build output (npm publishes this)
The highest-leverage contributions:
- New connectors. Implement the
Connectorinterface and emit normalizedMessageobjects fromfetch(). References:mbox.ts,slack.ts,imap.ts. - Bug reports with a reproducer. Ideally a tiny mbox or JSON export
in
examples/plus a vitest case. - Web UI polish. Streaming UX, accessibility, keyboard navigation —
the bar in
recallr serveis intentionally minimal so this is wide open.
PRs run on GitHub Actions — see
.github/workflows/ci.yml. The CI matrix is
{ubuntu, macos, windows} × {node 20, 22} and runs typecheck, lint,
build, and tests. Keep npm run lint && npm run typecheck && npm test
green locally and CI will be happy too.
- Streaming answers.
recallr servenow streams over SSE — citations land the moment retrieval finishes, then tokens flow in with a blinking caret. Backed by a newLlmClient.chatStreamandPOST /api/ask/stream. - Faceted search bar. Filter the next question by source, date range
("this week" / "this month" / "last 90d" / "this year"), and a free-text
participant match. Filters plumb through
/api/ask,/api/ask/stream, and/api/search. - Thread browser sidebar. New
Store.listThreads()andGET /api/threads; collapsible left rail with recent-conversation snippets. - Manual theme toggle. Dark / light / system, persisted in
localStorage, applied pre-React so there's no flash on reload.
- SQLite + FTS5 hybrid store with on-device embeddings (
bge-small-en-v1.5) - CLI:
init,index,ask,status,serve,mcp - Connectors: mbox, IMAP, Slack export
- MCP server exposing
search_messages,get_message,get_thread,status - Bundled web UI (
recallr serve)
MIT © Flowdesk
Recallr is part of Flowdesk's open source initiative.