fix(search): batch missing-embedding calls to avoid API batch-size limits by octo-patch · Pull Request #1542 · MemTensor/MemOS

octo-patch · 2026-04-24T04:26:46Z

Problem

_extract_embeddings in search_handler.py collects all documents that lack a cached embedding and calls embedder.embed(all_missing) in a single shot. Providers such as Dashscope text-embedding-v4 reject or silently return None when the batch is too large (e.g. 25 documents), which then causes a TypeError when the code tries to iterate over the None result, or silently drops all missing embeddings.

The warning message before this:

[SearchHandler] MMR embedding metadata missing; will compute missing embeddings: missing_total=25

followed by a crash or silent failure in the MMR deduplication path.

Solution

Split missing_documents into chunks of _EMBED_BATCH_SIZE (16) and call embed() for each chunk, extending a combined result list. Batches that return None/empty are skipped gracefully so remaining embeddings can still be used.

_EMBED_BATCH_SIZE = 16

computed: list[list[float]] = []
for i in range(0, len(missing_documents), _EMBED_BATCH_SIZE):
    batch = missing_documents[i : i + _EMBED_BATCH_SIZE]
    batch_result = self.searcher.embedder.embed(batch)
    if batch_result:
        computed.extend(batch_result)

Testing

Verified with 25 missing documents: previously crashed with TypeError; now completes successfully using 2 batches of 16 and 9.
Verified with <16 missing documents: behaviour unchanged (single call).

…mits When _extract_embeddings encounters many documents without cached embeddings it previously called embedder.embed(all_missing) in one shot. Providers like Dashscope text-embedding-v4 reject or silently return None for large batches (e.g. 25 documents), causing a TypeError / empty result downstream in the MMR deduplication path. Fix: split missing_documents into chunks of _EMBED_BATCH_SIZE (16) and accumulate results, skipping any batch that returns None/empty so the rest of the embeddings can still be used. Fixes MemTensor#1482 Co-Authored-By: Octopus <liyuan851277048@icloud.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(search): batch missing-embedding calls to avoid API batch-size limits#1542

fix(search): batch missing-embedding calls to avoid API batch-size limits#1542
octo-patch wants to merge 1 commit intoMemTensor:mainfrom
octo-patch:fix/issue-1482-embed-batch-missing-documents

octo-patch commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Apr 24, 2026

Problem

Solution

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant