Investigate low content hit rate for bm-local (15.5% vs Mem0 34.3%)

## Problem

On the full LoCoMo run, BM's content hit rate is 15.5% vs Mem0's 34.3% — despite BM winning on every retrieval metric (R@5, R@10, MRR).

This means BM finds the **right document** more often, but the retrieved text less often contains the **exact answer string**.

## Root cause analysis

Content hit is measured by checking if `expected_answer` appears as a literal substring in the concatenated `hit.text` of the top results.

Two factors hurt BM here:

### 1. BM returns `matched_chunk` not full note content
`bm tool search-notes` returns `matched_chunk` (the specific chunk that matched) plus truncated content. The `expected_answer` might be in a different part of the same note that isn't in the returned text. The correct *document* is found (recall is high) but the answer text isn't in the returned snippet.

**Fix opportunity (basic-memory core):** Return more context around matched chunks, or return full note content when notes are small enough.

### 2. Mem0 stores atomic extracted memories
Mem0 extracts "important sentences" during ingestion, creating small atomic memory units. These are closer to answer phrasing by design. BM stores full conversation sessions and relies on chunk matching.

This is a fundamental architectural difference — Mem0 trades context for precision, BM preserves full context. But we could improve by:
- Extracting observations/facts at ingestion time (which BM already does via the observation system)
- Ensuring search returns observation-level hits, not just entity-level

### 3. Scoring methodology
The `content_hit` function does exact substring matching:
```python
needle = expected_answer.strip().lower()
haystack = '\n'.join((hit.text or '') for hit in hits).lower()
return needle in haystack
```

This is brittle — semantically correct answers with different wording score as misses. Could supplement with fuzzy/semantic matching.

## Benchmark evidence

Full LoCoMo run (`locomo-full-20260226T055634Z`):
| Provider | R@5 | Content Hit |
|----------|-----|-------------|
| bm-local | 74.3% | 15.5% |
| mem0-local | 64.6% | 34.3% |

## Potential improvements

1. **Benchmark repo:** Add fuzzy content matching (e.g., token overlap ratio) as supplementary metric
2. **BM core:** Return more text context per search hit (full note for small notes, larger chunks for large notes)
3. **BM core:** Ensure observation-level entities surface in search results with their full text
4. **BM benchmark provider:** Try fetching full note via `read-note` for top-K hits to get complete content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate low content hit rate for bm-local (15.5% vs Mem0 34.3%) #2

Problem

Root cause analysis

1. BM returns `matched_chunk` not full note content

2. Mem0 stores atomic extracted memories

3. Scoring methodology

Benchmark evidence

Potential improvements

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Investigate low content hit rate for bm-local (15.5% vs Mem0 34.3%) #2

Description

Problem

Root cause analysis

1. BM returns matched_chunk not full note content

2. Mem0 stores atomic extracted memories

3. Scoring methodology

Benchmark evidence

Potential improvements

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. BM returns `matched_chunk` not full note content