feat: add run_async to TextEmbeddingRetriever, MultiQueryEmbeddingRetriever, and MultiQueryTextRetriever by sachinn854 · Pull Request #11367 · deepset-ai/haystack

sachinn854 · 2026-05-21T19:03:01Z

Related Issues

closes feat: Add run_async to MultiQueryEmbeddingRetriever, MultiQueryTextRetriever, and TextEmbeddingRetriever #11358

Proposed Changes:

TextEmbeddingRetriever, MultiQueryEmbeddingRetriever, and MultiQueryTextRetriever did not
implement run_async, so AsyncPipeline fell back to running them in a thread executor even when
their wrapped components supported native async execution.

TextEmbeddingRetriever.run_async: chains text_embedder and retriever calls sequentially,
using each component's run_async if available, otherwise falls back to loop.run_in_executor
MultiQueryEmbeddingRetriever.run_async: replaces ThreadPoolExecutor with asyncio.gather
via a new _run_one_async helper; uses run_async on both the embedder and retriever when available
MultiQueryTextRetriever.run_async: same pattern as MultiQueryEmbeddingRetriever

All three follow the pattern already established in MultiRetriever.run_async.

How did you test it?

Added async unit test files for all three components:
- test/components/retrievers/test_text_embedding_retriever_async.py
- test/components/retrievers/test_multi_query_embedding_retriever_async.py
- test/components/retrievers/test_multi_query_text_retriever_async.py
Tests cover: empty inputs, sorted results, deduplication, fallback to sync when run_async is absent
All existing sync tests continue to pass

Notes for the reviewer

_run_on_thread (used by sync run()) is unchanged — only new async methods were added.
The run_async signature matches run exactly in all three components.

Checklist

I have read the contributors guidelines and the code of conduct.
I have updated the related issue with new insights and changes.
I have added unit tests and updated the docstrings.
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I have documented my code.
I have added a release note file, following the contributors guidelines.
I have run pre-commit hooks and fixed any issue.

…riever, and MultiQueryTextRetriever

vercel · 2026-05-21T19:03:08Z

@sachinn854 is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2026-05-21T19:03:12Z

All committers have signed the CLA.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds native coroutine support (run_async) to several retriever components so they can execute efficiently inside AsyncPipeline, with a thread-based fallback for sync-only wrapped components.

Changes:

Implemented run_async in TextEmbeddingRetriever, MultiQueryTextRetriever, and MultiQueryEmbeddingRetriever.
Added async test coverage (including AsyncPipeline integration tests) for the new async behavior and sync fallbacks.
Added release notes describing the new coroutine execution and fallback behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`haystack/components/retrievers/text_embedding_retriever.py`	Adds `run_async` with async-first execution and thread fallback for sync embedders/retrievers.
`haystack/components/retrievers/multi_query_text_retriever.py`	Adds concurrent `run_async` for multi-query text retrieval with per-query async execution and dedup/sort.
`haystack/components/retrievers/multi_query_embedding_retriever.py`	Adds concurrent `run_async` for multi-query embedding retrieval, including async embedder + retriever execution.
`test/components/retrievers/test_text_embedding_retriever_async.py`	New tests for `TextEmbeddingRetriever.run_async` + pipeline integration + sync fallback.
`test/components/retrievers/test_multi_query_text_retriever_async.py`	New tests for `MultiQueryTextRetriever.run_async`, dedup, ordering, fallback, and pipeline integration.
`test/components/retrievers/test_multi_query_embedding_retriever_async.py`	New tests for `MultiQueryEmbeddingRetriever.run_async`, dedup, ordering, fallback, and pipeline integration.
`releasenotes/notes/add-run-async-to-retrievers-a265779e909abc2c.yaml`	Documents the new `run_async` support and fallback behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    @component.output_types(documents=list[Document])
+    async def run_async(
+        self, queries: list[str], retriever_kwargs: dict[str, Any] | None = None
+    ) -> dict[str, list[Document]]:


+        retriever_kwargs = retriever_kwargs or {}
+
+        if not self._is_warmed_up:
+            self.warm_up()
+
+        results = await asyncio.gather(*[self._run_one_async(q, retriever_kwargs) for q in queries])
+        docs: list[Document] = [doc for result in results if result for doc in result]
+        docs = _deduplicate_documents(docs)
+        docs.sort(key=lambda x: x.score or 0.0, reverse=True)
+        return {"documents": docs}


+    @component.output_types(documents=list[Document])
+    async def run_async(
+        self, queries: list[str], retriever_kwargs: dict[str, Any] | None = None
+    ) -> dict[str, list[Document]]:


+        retriever_kwargs = retriever_kwargs or {}
+
+        if not self._is_warmed_up:
+            self.warm_up()
+
+        results = await asyncio.gather(*[self._run_one_async(q, retriever_kwargs) for q in queries])
+        docs: list[Document] = [doc for result in results if result for doc in result]
+        docs = _deduplicate_documents(docs)
+        docs.sort(key=lambda x: x.score or 0.0, reverse=True)
+        return {"documents": docs}


+        loop = asyncio.get_running_loop()
+
+        if hasattr(self.text_embedder, "run_async") and callable(self.text_embedder.run_async):
+            embedding_result = await self.text_embedder.run_async(text=query)
+        else:
+            embedding_result = await loop.run_in_executor(None, lambda: self.text_embedder.run(text=query))
+
+        if hasattr(self.retriever, "run_async") and callable(self.retriever.run_async):
+            result = await self.retriever.run_async(
+                query_embedding=embedding_result["embedding"], filters=filters, top_k=top_k
+            )
+        else:
+            result = await loop.run_in_executor(
+                None,
+                lambda: self.retriever.run(query_embedding=embedding_result["embedding"], filters=filters, top_k=top_k),
+            )


+    @pytest.mark.asyncio
+    async def test_run_async_deduplication(self):
+        doc2 = Document(content="Wind energy is clean", id="doc2", score=0.8)
+        # doc3 shares the same id as doc1 — simulates the same doc retrieved by different queries


sachinn854 added 2 commits May 22, 2026 00:20

style: apply ruff formatting fixes

cb37749

feat: add run_async to TextEmbeddingRetriever, MultiQueryEmbeddingRet…

2271fdd

…riever, and MultiQueryTextRetriever

Copilot AI review requested due to automatic review settings May 21, 2026 19:03

sachinn854 requested a review from a team as a code owner May 21, 2026 19:03

sachinn854 requested review from davidsbatista and removed request for a team May 21, 2026 19:03

github-actions Bot added topic:tests type:documentation Improvements on the docs labels May 21, 2026

Copilot AI reviewed May 21, 2026

View reviewed changes

style: fix stale comment in deduplication tests

df49c1b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add run_async to TextEmbeddingRetriever, MultiQueryEmbeddingRetriever, and MultiQueryTextRetriever#11367

feat: add run_async to TextEmbeddingRetriever, MultiQueryEmbeddingRetriever, and MultiQueryTextRetriever#11367
sachinn854 wants to merge 3 commits into
deepset-ai:mainfrom
sachinn854:feat/add-run-async-to-retrievers

sachinn854 commented May 21, 2026

Uh oh!

vercel Bot commented May 21, 2026

Uh oh!

CLAassistant commented May 21, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sachinn854 commented May 21, 2026

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

vercel Bot commented May 21, 2026

Uh oh!

CLAassistant commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented May 21, 2026 •

edited

Loading