feat(llm): Promote TwelveLabs video RAG components into xpacks core by iapoorv01 · Pull Request #256 · pathwaycom/pathway

iapoorv01 · 2026-06-30T20:05:29Z

Introduction

This PR resolves #255 by officially promoting the TwelveLabsVideoParser and MarengoEmbedder out of the llm-app examples template and integrating them natively into the pathway.xpacks.llm core library.

This enables native, first-class Video RAG capabilities (parsing raw video bytes into Pegasus text descriptions and retrieving them using a shared multimodal embedding space) directly out of the box in Pathway.

Context

Currently, Pathway's core LLM xpack handles PDFs, DOCX, and slides exceptionally well, but has lacked native multimodal video handling. Previously introduced as an app template in PR #129, the TwelveLabs components proved stable and highly valuable. Promoting them to the core xpacks library eliminates the need for developers to maintain boilerplate API scaffolding when building multimodal search applications.

Design Decisions & Approach:

Modular Component Folding: Instead of creating a monolithic twelvelabs.py file, we strictly adhered to Pathway's architectural patterns by natively integrating MarengoEmbedder into embedders.py and TwelveLabsVideoParser into parsers.py.
Zero-Bloat Dependency Architecture: We opted for a strict lazy-import approach. The twelvelabs SDK is not a hard dependency. It sits securely in a new [twelvelabs] optional extra block in pyproject.toml. The components leverage a robust try/except ImportError guidance pattern to safely prompt users to run pip install pathway[twelvelabs] only if they attempt to instantiate the classes without the SDK.
Asynchronous Scalability & Resource Management: We preserved the crucial asyncio.gather pipeline for the Embedder to prevent thread-blocking under load. Furthermore, the Parser correctly utilizes a try/finally block to proactively delete TwelveLabs assets (delete_assets=True by default) so users do not accidentally flood their cloud workspace with zombie assets during repeated pipeline runs.

How has this been tested?

All tests have been ported from the template environment into the native test suite at python/pathway/xpacks/llm/tests/test_twelvelabs.py.

No-Network Unit Tests: Verified the __wrapped__ async concurrency behaviors, asset cleanup lifecycle, and fallback failure modes using the strict _FakeClient stubbing pattern.
Type-Checker Alignment: Refactored legacy IDE/Mypy workarounds from the template test suite (such as redundant sys.modules stubs) to align cleanly with the core [tests] environment runtime guarantees.
Live Smoke Testing: Maintained the live API dimension probe, successfully gating it behind @pytest.mark.skipif(not os.environ.get("TWELVELABS_API_KEY")) to guarantee zero disruptions to the standard CI pipeline.
All files have been successfully formatted against the strict isort and black hooks.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature or improvement (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Related issue(s):

Resolves Promote TwelveLabsVideoParser and MarengoEmbedder from the Video RAG template into pathway.xpacks.llm #255

Checklist:

My code follows the code style of this project,
My change requires a change to the documentation,
I described the modification in the CHANGELOG.md file.

This commit promotes the TwelveLabs integration from the template directory into the native pathway.xpacks.llm core library, as proposed in pathwaycom#255. Key changes: - pyproject.toml: Introduced a new [twelvelabs] optional dependency extra. - embedders.py: Promoted MarengoEmbedder into pathway.xpacks.llm.embedders. It inherits from BaseEmbedder and leverages an async pathway (_aembed_one) via asyncio.gather for concurrent embedding generation, preventing thread blocking. It implements the lazy ImportError pattern. - parsers.py: Promoted TwelveLabsVideoParser into pathway.xpacks.llm.parsers. Implemented proper resource cleanup logic using try/finally blocks to explicitly delete TwelveLabs assets (when delete_assets=True) preventing asset flooding on the API. Also implements the lazy ImportError pattern. - test_twelvelabs.py: Ported the unit tests into python/pathway/xpacks/llm/tests/. Contains no-network test coverage via stubbed SDK mocks. The live smoke test is maintained but securely gated behind the TWELVELABS_API_KEY existence check. - CHANGELOG.md: Added an entry under [Unreleased] -> Added documenting the promotion. This effectively extends Pathway's live-sync multimodal indexing capabilities to handle full video ingestion over the TwelveLabs network natively.

iapoorv01 · 2026-07-04T05:36:22Z

Thanks for the detailed review @zxqfd555 . I've pushed an update addressing all your points:

Dependency Floor Fixed: Updated pyproject.toml to bump the twelvelabs floor to >= 1.2.8 to ensure pip fetches the newer version required for AsyncTwelveLabs and the assets client, preventing the runtime ModuleNotFoundError.
Helper Consolidation & Refactoring:
- Moved _resolve_twelvelabs_api_key, _build_twelvelabs_client, and _build_async_twelvelabs_client into python/pathway/xpacks/llm/_utils.py so they share a single home.
- Updated both parsers.py and embedders.py to import from _utils.py.
- Replaced the hand-rolled try/except ImportError blocks with Pathway's standard with optional_imports("twelvelabs"): inside the helpers.
Stdlib Imports Cleaned Up: Un-nested import time, import os, and import asyncio from inside the functions and correctly promoted them to top-level module imports.

iapoorv01 force-pushed the feature/twelvelabs-video-rag branch from a526624 to f60073e Compare June 30, 2026 20:09

zxqfd555 reviewed Jul 3, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

zxqfd555 reviewed Jul 3, 2026

View reviewed changes

Comment thread python/pathway/xpacks/llm/parsers.py Outdated

zxqfd555 reviewed Jul 3, 2026

View reviewed changes

Comment thread python/pathway/xpacks/llm/embedders.py Outdated

zxqfd555 requested changes Jul 3, 2026

View reviewed changes

iapoorv01 force-pushed the feature/twelvelabs-video-rag branch 2 times, most recently from 8dce7b9 to bb3e6c0 Compare July 4, 2026 05:25

iapoorv01 force-pushed the feature/twelvelabs-video-rag branch from bb3e6c0 to e863cdd Compare July 4, 2026 05:29

refactor(llm): address reviewer feedback for twelvelabs integration

e863cdd

iapoorv01 requested a review from zxqfd555 July 4, 2026 05:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(llm): Promote TwelveLabs video RAG components into xpacks core#256

feat(llm): Promote TwelveLabs video RAG components into xpacks core#256
iapoorv01 wants to merge 2 commits into
pathwaycom:mainfrom
iapoorv01:feature/twelvelabs-video-rag

iapoorv01 commented Jun 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iapoorv01 commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

iapoorv01 commented Jun 30, 2026

Introduction

Context

How has this been tested?

Types of changes

Related issue(s):

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iapoorv01 commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants