Retrieval pipeline improvements: reranking, length normalization, noise filtering, adaptive recall

## Context

Analyzed [memory-lancedb-pro](https://github.com/CortexReach/memory-lancedb-pro) (LanceDB-based OpenClaw memory plugin) and [their demo](https://github.com/lancedb/openclaw-lancedb-demo). While architecturally different from BM (opaque vectors vs human-readable plain text), their retrieval pipeline has several techniques worth adopting.

Our LoCoMo benchmark baseline: Recall@5 76.4%, Recall@10 85.5%, MRR 0.658. Weakest areas: single_hop 57.7%, temporal 59.1%. Root cause identified in #577: RRF scoring flattens results, FTS outperforms vector for observations.

## Proposed Improvements

### 1. Cross-encoder reranking (highest impact)
After initial hybrid retrieval (vector + FTS), run a second pass through a cross-encoder reranker (e.g., Jina reranker-v3, Voyage rerank-2.5). This would re-score candidates based on query-document semantic relevance rather than just embedding similarity.

memory-lancedb-pro uses: 60% cross-encoder score + 40% original fused score, with graceful fallback to cosine similarity on API failure.

**Implementation options:**
- Cloud API: Jina, Voyage, Pinecone (cheap per-query cost)
- Local: BAAI/bge-reranker via Ollama-compatible endpoint
- Config-driven: optional, off by default, provider-agnostic

This directly addresses #577 (RRF flattening) and should significantly improve Recall@5 and MRR.

### 2. Length normalization
Long notes/observations currently have an outsized influence on search results. A length normalization step (anchor around 500 chars) would prevent verbose entries from dominating over precise, short observations.

### 3. Noise filtering / adaptive retrieval
Skip memory retrieval entirely for queries that don't need it: greetings, slash commands, simple confirmations, emoji-only messages. Also filter low-quality content from capture: agent refusals, meta-questions, boilerplate.

This reduces wasted retrieval cycles and keeps the index cleaner.

### 4. MMR diversity filtering
When multiple results are very similar (cosine > 0.85), demote duplicates to increase result diversity. Prevents near-duplicate observations from consuming all top-K slots.

## What we already do better
- Human-readable plain text (their memories are opaque vector rows)
- Knowledge graph with relational structure (observations + relations)
- Bidirectional editing (humans can correct memories by editing files)
- Schema system for structured note types
- Git history as provenance trail

## Priority
1. Cross-encoder reranking (biggest recall improvement per effort)
2. Length normalization (quick win)
3. MMR diversity (moderate effort)
4. Noise filtering / adaptive retrieval (nice to have)

## References
- memory-lancedb-pro: https://github.com/CortexReach/memory-lancedb-pro
- LanceDB demo: https://github.com/lancedb/openclaw-lancedb-demo
- LoCoMo benchmark results: #608
- RRF scoring issue: #577

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval pipeline improvements: reranking, length normalization, noise filtering, adaptive recall #666

Context

Proposed Improvements

1. Cross-encoder reranking (highest impact)

2. Length normalization

3. Noise filtering / adaptive retrieval

4. MMR diversity filtering

What we already do better

Priority

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Retrieval pipeline improvements: reranking, length normalization, noise filtering, adaptive recall #666

Description

Context

Proposed Improvements

1. Cross-encoder reranking (highest impact)

2. Length normalization

3. Noise filtering / adaptive retrieval

4. MMR diversity filtering

What we already do better

Priority

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions