Problem
The bm-local provider spawns a new bm tool search-notes CLI subprocess per query. On the full LoCoMo run (1,982 queries), this produces 2,948ms mean latency vs Mem0's 229ms — because each call cold-starts Python, opens SQLite, runs the search, and exits.
Mem0's provider keeps a warm Memory() instance across all queries, which is a fairer comparison to how BM is actually used in production (persistent MCP server).
Proposal
Add a bm-mcp provider that:
- Spawns
bm mcp once via stdio transport at the start of the run
- Sends
search_notes tool calls over the MCP protocol
- Keeps the connection warm for all queries
- Tears down on cleanup
This matches real-world usage (OpenClaw plugin, Claude Desktop) and should bring latency to 200-400ms range.
The existing bm-local CLI provider should stay as-is — it's the most reproducible option (anyone with bm installed can run it).
Benchmark evidence
Full LoCoMo run (locomo-full-20260226T055634Z):
| Provider |
Mean ms |
P95 ms |
| bm-local (CLI) |
2,948 |
3,195 |
| mem0-local (warm) |
229 |
321 |