-
Notifications
You must be signed in to change notification settings - Fork 175
Description
Context
Analyzed memory-lancedb-pro (LanceDB-based OpenClaw memory plugin) and their demo. While architecturally different from BM (opaque vectors vs human-readable plain text), their retrieval pipeline has several techniques worth adopting.
Our LoCoMo benchmark baseline: Recall@5 76.4%, Recall@10 85.5%, MRR 0.658. Weakest areas: single_hop 57.7%, temporal 59.1%. Root cause identified in #577: RRF scoring flattens results, FTS outperforms vector for observations.
Proposed Improvements
1. Cross-encoder reranking (highest impact)
After initial hybrid retrieval (vector + FTS), run a second pass through a cross-encoder reranker (e.g., Jina reranker-v3, Voyage rerank-2.5). This would re-score candidates based on query-document semantic relevance rather than just embedding similarity.
memory-lancedb-pro uses: 60% cross-encoder score + 40% original fused score, with graceful fallback to cosine similarity on API failure.
Implementation options:
- Cloud API: Jina, Voyage, Pinecone (cheap per-query cost)
- Local: BAAI/bge-reranker via Ollama-compatible endpoint
- Config-driven: optional, off by default, provider-agnostic
This directly addresses #577 (RRF flattening) and should significantly improve Recall@5 and MRR.
2. Length normalization
Long notes/observations currently have an outsized influence on search results. A length normalization step (anchor around 500 chars) would prevent verbose entries from dominating over precise, short observations.
3. Noise filtering / adaptive retrieval
Skip memory retrieval entirely for queries that don't need it: greetings, slash commands, simple confirmations, emoji-only messages. Also filter low-quality content from capture: agent refusals, meta-questions, boilerplate.
This reduces wasted retrieval cycles and keeps the index cleaner.
4. MMR diversity filtering
When multiple results are very similar (cosine > 0.85), demote duplicates to increase result diversity. Prevents near-duplicate observations from consuming all top-K slots.
What we already do better
- Human-readable plain text (their memories are opaque vector rows)
- Knowledge graph with relational structure (observations + relations)
- Bidirectional editing (humans can correct memories by editing files)
- Schema system for structured note types
- Git history as provenance trail
Priority
- Cross-encoder reranking (biggest recall improvement per effort)
- Length normalization (quick win)
- MMR diversity (moderate effort)
- Noise filtering / adaptive retrieval (nice to have)
References
- memory-lancedb-pro: https://github.com/CortexReach/memory-lancedb-pro
- LanceDB demo: https://github.com/lancedb/openclaw-lancedb-demo
- LoCoMo benchmark results: #608
- RRF scoring issue: BUG: Hybrid RRF fusion dilutes vector scores with weak FTS scores #577