AI-powered agent for database schema discovery, natural language querying, pipeline generation, and data quality monitoring — built as a Forward Deployed Engineer portfolio project.
This is the tool an FDE would build on Day 1 at a customer site. Connect it to any PostgreSQL database and it will:
- Discover Schema — Introspect tables, columns, relationships, and data volumes. Generate a stakeholder-ready summary with data quality assessment.
- Ask Your Data — Ask questions in plain English. The agent writes SQL, executes it safely, and returns a human-readable answer.
- Generate Pipelines — Describe what data pipeline you need in natural language. The agent generates a production-quality Apache Airflow DAG with error handling, retries, and quality checks.
- Monitor Data Quality — Run comprehensive checks for nulls, duplicates, empty tables, and schema drift. Get severity-rated reports.
┌─────────────────────────────────────────────────────┐
│ Streamlit Frontend │
│ (Schema | Query | Pipeline | Quality) │
└──────────────────────┬──────────────────────────────┘
│ HTTP
┌──────────────────────▼──────────────────────────────┐
│ FastAPI Backend │
│ /api/v1/schema │
│ /api/v1/query │
│ /api/v1/pipeline │
│ /api/v1/quality │
│ /metrics (Prometheus) │
└──────┬───────────────┬───────────────┬──────────────┘
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ LangGraph │ │ PostgreSQL │ │ Anthropic │
│ Agents │ │ Database │ │ Claude API │
└─────────────┘ └─────────────┘ └─────────────┘
- Python 3.11+ — Primary language
- FastAPI — REST API framework
- LangGraph — Agent orchestration (4 agents, each a multi-step graph)
- Anthropic Claude — LLM for SQL generation, summaries, and pipeline code
- PostgreSQL — Sample enterprise database
- Streamlit — Demo frontend
- Prometheus + Grafana — Monitoring and dashboards
- Docker — Containerized deployment
cd enterprise-data-agent
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEYdocker compose up --buildThis starts:
- PostgreSQL (port 5432) — seeded with sample e-commerce data
- FastAPI API (port 8000) — agent endpoints + Swagger docs at
/docs - Streamlit (port 8501) — interactive demo frontend
- Prometheus (port 9090) — metrics collection
- Grafana (port 3000) — dashboards (admin/admin)
- Open http://localhost:8501 for the Streamlit UI
- Open http://localhost:8000/docs for the API docs
- "What are the top 5 customers by total order value?"
- "How many orders were placed per region in 2025?"
- "Which products have never been ordered?"
- "Show me the monthly revenue trend"
enterprise-data-agent/
├── src/
│ ├── agents/ # LangGraph agent implementations
│ │ ├── schema_discovery.py # Schema introspection + quality assessment
│ │ ├── nl_to_sql.py # Natural language → SQL → answer
│ │ ├── pipeline_generator.py # Requirement → Airflow DAG
│ │ └── data_quality.py # Null/duplicate/drift checks
│ ├── api/
│ │ ├── main.py # FastAPI app
│ │ └── routes.py # All API endpoints + Prometheus metrics
│ ├── core/
│ │ ├── config.py # Pydantic settings from env vars
│ │ ├── database.py # SQLAlchemy connection + introspection
│ │ └── llm.py # Anthropic Claude client
│ └── evaluation/
│ ├── evals.py # Agent accuracy evaluation framework
│ └── metrics.py # Prometheus metric definitions
├── frontend/
│ └── app.py # Streamlit UI (4 tabs)
├── scripts/
│ └── seed.sql # Sample e-commerce database
├── monitoring/
│ └── prometheus.yml # Prometheus scrape config
├── docker-compose.yml # Full stack: DB + API + UI + monitoring
├── Dockerfile
└── pyproject.toml
| FDE Skill | How This Project Shows It |
|---|---|
| Customer-facing tool | Streamlit UI that a customer could use directly |
| Data pipeline expertise | Schema introspection, Airflow DAG generation |
| AI agent orchestration | 4 LangGraph agents with multi-step workflows |
| Production patterns | API auth, error handling, Prometheus metrics |
| Enterprise integration | Database connectivity, SQL safety, quality gates |
| Observability | Prometheus + Grafana monitoring stack |
| Documentation | Architecture docs, demo script, clean README |
Allah Bakash C S — Forward Deployed Engineer | Cloud Data & AI