Add SEC 10-K financial analysis example DAG for LlamaIndex#67615
Draft
vikramkoka wants to merge 6 commits into
Draft
Add SEC 10-K financial analysis example DAG for LlamaIndex#67615vikramkoka wants to merge 6 commits into
vikramkoka wants to merge 6 commits into
Conversation
- Adds `example_llamaindex_rag.py` with three example DAGs demonstrating RAG patterns using the new LlamaIndex operators - **Full RAG pipeline**: DocumentLoaderOperator → EmbeddingOperator → RetrievalOperator → LLMOperator - **Separate index/query DAGs**: weekly PDF indexing DAG + on-demand parameterized query DAG (production pattern) - **Multi-source RAG**: combines CSV and text files with metadata tagging, merges via @task, then embeds ## Dependencies Requires PR #67120 (DocumentLoaderOperator) and PR #67121 (LlamaIndex operators) to merge first. ## Test plan - [ ] Verify DAG file parses without errors after dependency PRs merge - [ ] Verify all three DAGs appear in the Airflow UI - [ ] Test full RAG pipeline end-to-end with sample text files and an OpenAI connection - [ ] Test parameterized query DAG with custom `question` parameter
Two example-shaped Dags demonstrating multi-company financial research
with LlamaIndex RAG and Dynamic Task Mapping:
- Indexing DAG (weekly): builds per-company vector indexes via
LlamaIndexEmbeddingOperator.partial().expand_kwargs()
- Analysis DAG (manual): HITLEntryOperator -> @task.llm with structured
output (DecomposedQuestion) -> LlamaIndexRetrievalOperator.partial().expand_kwargs()
fanned out per sub-question -> LLMOperator with UsageLimits + AnalysisReport
schema -> ApprovalOperator
Showcases DTM where N is decided by the LLM at runtime, HITL bookends,
structured output via Pydantic models, and token budget caps. Includes
five fictional companies with inline 10-K text for self-contained execution.
…airflow into aip99-llamaindex-example
reverting changes from last commit
…airflow into aip99-llamaindex-example
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two example Dags demonstrating multi-company financial research
with LlamaIndex RAG and Dynamic Task Mapping:
LlamaIndexEmbeddingOperator.partial().expand_kwargs()
output (DecomposedQuestion) -> LlamaIndexRetrievalOperator.partial().expand_kwargs()
fanned out per sub-question -> LLMOperator with UsageLimits + AnalysisReport
schema -> ApprovalOperator
Showcases DTM where N is decided by the LLM at runtime, HITL bookends,
structured output via Pydantic models, and token budget caps. Includes
five fictional companies with inline 10-K text for self-contained execution.
Was generative AI tooling used to co-author this PR?
Generated-by: [Claude] following the guidelines