Skip to content

CSALLAHBAKASH/enterprise-data-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enterprise Data Pipeline AI Agent

AI-powered agent for database schema discovery, natural language querying, pipeline generation, and data quality monitoring — built as a Forward Deployed Engineer portfolio project.

What This Does

This is the tool an FDE would build on Day 1 at a customer site. Connect it to any PostgreSQL database and it will:

  1. Discover Schema — Introspect tables, columns, relationships, and data volumes. Generate a stakeholder-ready summary with data quality assessment.
  2. Ask Your Data — Ask questions in plain English. The agent writes SQL, executes it safely, and returns a human-readable answer.
  3. Generate Pipelines — Describe what data pipeline you need in natural language. The agent generates a production-quality Apache Airflow DAG with error handling, retries, and quality checks.
  4. Monitor Data Quality — Run comprehensive checks for nulls, duplicates, empty tables, and schema drift. Get severity-rated reports.

Architecture

┌─────────────────────────────────────────────────────┐
│                  Streamlit Frontend                   │
│            (Schema | Query | Pipeline | Quality)      │
└──────────────────────┬──────────────────────────────┘
                       │ HTTP
┌──────────────────────▼──────────────────────────────┐
│                   FastAPI Backend                      │
│                  /api/v1/schema                        │
│                  /api/v1/query                         │
│                  /api/v1/pipeline                      │
│                  /api/v1/quality                       │
│                  /metrics (Prometheus)                 │
└──────┬───────────────┬───────────────┬──────────────┘
       │               │               │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│  LangGraph  │ │ PostgreSQL  │ │ Anthropic   │
│  Agents     │ │ Database    │ │ Claude API  │
└─────────────┘ └─────────────┘ └─────────────┘

Tech Stack

  • Python 3.11+ — Primary language
  • FastAPI — REST API framework
  • LangGraph — Agent orchestration (4 agents, each a multi-step graph)
  • Anthropic Claude — LLM for SQL generation, summaries, and pipeline code
  • PostgreSQL — Sample enterprise database
  • Streamlit — Demo frontend
  • Prometheus + Grafana — Monitoring and dashboards
  • Docker — Containerized deployment

Quick Start

1. Clone and configure

cd enterprise-data-agent
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY

2. Run with Docker Compose

docker compose up --build

This starts:

  • PostgreSQL (port 5432) — seeded with sample e-commerce data
  • FastAPI API (port 8000) — agent endpoints + Swagger docs at /docs
  • Streamlit (port 8501) — interactive demo frontend
  • Prometheus (port 9090) — metrics collection
  • Grafana (port 3000) — dashboards (admin/admin)

3. Try it

Example queries:

  • "What are the top 5 customers by total order value?"
  • "How many orders were placed per region in 2025?"
  • "Which products have never been ordered?"
  • "Show me the monthly revenue trend"

Project Structure

enterprise-data-agent/
├── src/
│   ├── agents/                    # LangGraph agent implementations
│   │   ├── schema_discovery.py    # Schema introspection + quality assessment
│   │   ├── nl_to_sql.py          # Natural language → SQL → answer
│   │   ├── pipeline_generator.py  # Requirement → Airflow DAG
│   │   └── data_quality.py       # Null/duplicate/drift checks
│   ├── api/
│   │   ├── main.py               # FastAPI app
│   │   └── routes.py             # All API endpoints + Prometheus metrics
│   ├── core/
│   │   ├── config.py             # Pydantic settings from env vars
│   │   ├── database.py           # SQLAlchemy connection + introspection
│   │   └── llm.py                # Anthropic Claude client
│   └── evaluation/
│       ├── evals.py              # Agent accuracy evaluation framework
│       └── metrics.py            # Prometheus metric definitions
├── frontend/
│   └── app.py                    # Streamlit UI (4 tabs)
├── scripts/
│   └── seed.sql                  # Sample e-commerce database
├── monitoring/
│   └── prometheus.yml            # Prometheus scrape config
├── docker-compose.yml            # Full stack: DB + API + UI + monitoring
├── Dockerfile
└── pyproject.toml

Why This Project Demonstrates FDE Skills

FDE Skill How This Project Shows It
Customer-facing tool Streamlit UI that a customer could use directly
Data pipeline expertise Schema introspection, Airflow DAG generation
AI agent orchestration 4 LangGraph agents with multi-step workflows
Production patterns API auth, error handling, Prometheus metrics
Enterprise integration Database connectivity, SQL safety, quality gates
Observability Prometheus + Grafana monitoring stack
Documentation Architecture docs, demo script, clean README

Author

Allah Bakash C S — Forward Deployed Engineer | Cloud Data & AI

About

AI-powered enterprise data pipeline agent — schema discovery, NL-to-SQL, Airflow DAG generation, data quality monitoring. Built as an FDE portfolio project.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors