-
Notifications
You must be signed in to change notification settings - Fork 175
Description
Bug Description
After a successful bm reindex, bm project info still reports missing embeddings and recommends another embeddings reindex. In my case, this appears to be caused by stale rows in derived search tables rather than actual current entities missing embeddings.
Steps To Reproduce
- Install Basic Memory version
0.20.2in Docker with semantic search enabled. - Use a project with
fastembedconfigured as the embedding provider. - Run:
bm reindex
- After it completes successfully, run:
bm project info main
- Observe that
bm project infostill reports:Indexed 1148/1172Reindex recommended24 entities missing embeddings — run: bm reindex --embeddings
Expected Behavior
After a successful bm reindex, I would expect bm project info to stop recommending another embeddings reindex unless there are actually current entities missing embeddings.
Actual Behavior
bm reindex completes successfully:
Project: main
Rebuilding full-text search index...
✓ Full-text search index rebuilt
Building vector embeddings...
Embedding entities... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100%
✓ Embeddings complete: 1140 entities embedded, 0 skipped, 0 errors
Reindex complete!
But immediately afterward, bm project info main still shows:

I inspected the database and found:
0current entities missing chunks- stale
entity_ids remain insearch_index - stale
entity_ids remain insearch_vector_chunks
In my case:
- stale
search_indexentity IDs:32 - stale
search_vector_chunksentity IDs:8
So the reported gap of 24 appears to be coming from stale derived-table rows rather than live notes that still need embedding.
Environment
- OS: macOS (host) with Dockerized Basic Memory container
- Python version: Python
3.13.12in the container - Basic Memory version:
0.20.2 - Installation method: Docker
- Claude Desktop version (if applicable): N/A
Additional Context
Example stale derived rows I found:
entity_id = 1130- title:
Ops Home - file path:
ops/index.md - type:
relation
- title:
entity_id = 1246- title:
Dialog without buttons - file path:
conversations/chatgpt-20240716-Dialog_without_buttons.md - type:
entity
- title:
These rows still exist in derived search tables but no longer correspond to current rows in the canonical entity table.
This makes bm project info appear to be overstating missing embeddings after a successful reindex.
Possible Solution
bm project info may be calculating embedding coverage from derived tables without excluding stale rows whose entity_id no longer exists in the canonical entity table.
Possible fixes:
- when computing embedding coverage, only count entity IDs that still exist in
entity - or ensure
bm reindexalso cleans up stale rows insearch_index/search_vector_chunksso the post-reindex stats remain consistent