Skip to content

Add SEC 10-K financial analysis example DAG for LlamaIndex#67615

Draft
vikramkoka wants to merge 6 commits into
mainfrom
aip99-llamaindex-example
Draft

Add SEC 10-K financial analysis example DAG for LlamaIndex#67615
vikramkoka wants to merge 6 commits into
mainfrom
aip99-llamaindex-example

Conversation

@vikramkoka
Copy link
Copy Markdown
Contributor

Two example Dags demonstrating multi-company financial research
with LlamaIndex RAG and Dynamic Task Mapping:

  • Indexing DAG (weekly): builds per-company vector indexes via
    LlamaIndexEmbeddingOperator.partial().expand_kwargs()
  • Analysis DAG (manual): HITLEntryOperator -> @task.llm with structured
    output (DecomposedQuestion) -> LlamaIndexRetrievalOperator.partial().expand_kwargs()
    fanned out per sub-question -> LLMOperator with UsageLimits + AnalysisReport
    schema -> ApprovalOperator

Showcases DTM where N is decided by the LLM at runtime, HITL bookends,
structured output via Pydantic models, and token budget caps. Includes
five fictional companies with inline 10-K text for self-contained execution.


Was generative AI tooling used to co-author this PR?
  • [ x] Yes (please specify the tool below)
    Generated-by: [Claude] following the guidelines

vikramkoka and others added 6 commits May 19, 2026 16:21
- Adds `example_llamaindex_rag.py` with three example DAGs demonstrating RAG patterns using the new LlamaIndex operators
  - **Full RAG pipeline**: DocumentLoaderOperator → EmbeddingOperator → RetrievalOperator → LLMOperator
  - **Separate index/query DAGs**: weekly PDF indexing DAG + on-demand parameterized query DAG (production pattern)
  - **Multi-source RAG**: combines CSV and text files with metadata tagging, merges via @task, then embeds

  ## Dependencies

  Requires PR #67120 (DocumentLoaderOperator) and PR #67121 (LlamaIndex operators) to merge first.

  ## Test plan

  - [ ] Verify DAG file parses without errors after dependency PRs merge
  - [ ] Verify all three DAGs appear in the Airflow UI
  - [ ] Test full RAG pipeline end-to-end with sample text files and an OpenAI connection
  - [ ] Test parameterized query DAG with custom `question` parameter
Two example-shaped Dags demonstrating multi-company financial research
  with LlamaIndex RAG and Dynamic Task Mapping:

  - Indexing DAG (weekly): builds per-company vector indexes via
    LlamaIndexEmbeddingOperator.partial().expand_kwargs()
  - Analysis DAG (manual): HITLEntryOperator -> @task.llm with structured
    output (DecomposedQuestion) -> LlamaIndexRetrievalOperator.partial().expand_kwargs()
    fanned out per sub-question -> LLMOperator with UsageLimits + AnalysisReport
    schema -> ApprovalOperator

  Showcases DTM where N is decided by the LLM at runtime, HITL bookends,
  structured output via Pydantic models, and token budget caps. Includes
  five fictional companies with inline 10-K text for self-contained execution.
reverting changes from last commit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants