A production-grade Retrieval Augmented Generation (RAG) chatbot built with FastAPI, ChromaDB, and Claude AI. This system answers questions based on your custom knowledge base with lightning-fast responses (1-3 seconds).
- β‘ Lightning Fast: 1-3 second responses using Claude Haiku
- π― Accurate Answers: RAG architecture prevents hallucinations
- π Custom Knowledge Base: Add your own documents
- π³ Fully Dockerized: One-command deployment
- π Semantic Search: ChromaDB vector database
- π¨ Beautiful UI: Real-time chat interface
- π° Cost Efficient: ~$0.0003 per query
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Interface β
β (React/HTML Frontend) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β FastAPI Backend β
β (RAG Logic & Orchestration) β
βββββββ¬βββββββββββββββββββββββββββ¬βββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββ ββββββββββββββββββββ
β ChromaDB β β Claude Haiku β
β (Vector DB) β β (Anthropic) β
βββββββββββββββ ββββββββββββββββββββ
- User asks a question β Frontend sends to API
- API queries ChromaDB β Retrieves relevant document chunks
- Context + Question sent to Claude β Generates grounded answer
- Response returned β Displayed in UI
- Docker Desktop installed and running
- Anthropic API key (Get one here)
- 8GB+ RAM recommended
- Windows, Mac, or Linux
- Clone the repository
git clone <your-repo-url>
cd rag-chatbot-lab-
Set up environment variables
Create a
.envfile in the project root:
ANTHROPIC_API_KEY=sk-ant-api-YOUR-KEY-HERE-
Add your documents
Place
.txtfiles in thedata/documents/folder:
data/
βββ documents/
βββ sample.txt
βββ your-doc-1.txt
βββ your-doc-2.txt- Start the application
docker-compose up -d-
Wait for services to initialize (~30 seconds)
-
Open your browser
Navigate to: http://localhost:3000
That's it! Start chatting with your AI assistant! π
rag-chatbot-lab/
βββ api/ # FastAPI backend
β βββ Dockerfile
β βββ main.py # Core application logic
β βββ requirements.txt # Python dependencies
βββ frontend/ # Chat interface
β βββ index.html # Single-page application
βββ data/
β βββ documents/ # Your knowledge base (add .txt files here)
β βββ sample.txt
βββ docker-compose.yml # Container orchestration
βββ .env # Environment variables (create this)
βββ README.md # You are here
| Variable | Description | Required | Default |
|---|---|---|---|
ANTHROPIC_API_KEY |
Claude API key | Yes | - |
CHROMA_HOST |
ChromaDB hostname | No | chromadb |
CHROMA_PORT |
ChromaDB port | No | 8000 |
By default, the system uses Claude Haiku for optimal speed and cost.
To change models, edit api/main.py (line ~180):
# Available models:
model="claude-3-haiku-20240307", # Fast & cheap (current)
model="claude-3-sonnet-20240229", # Balanced quality/speed
model="claude-3-opus-20240229", # Highest qualityEdit api/main.py (line ~180):
max_tokens=500, # Default: medium-length responses
max_tokens=200, # Short, concise answers
max_tokens=1000, # Detailed, comprehensive answersGET http://localhost:8080/healthReturns system status (API, Claude, ChromaDB).
POST http://localhost:8080/chat
Content-Type: application/json
{
"message": "What is this lab about?"
}POST http://localhost:8080/ingest
Content-Type: application/json
{
"text": "Your document content here...",
"metadata": {
"source": "my-doc",
"type": "guide"
}
}GET http://localhost:8080/collections# Health check
curl http://localhost:8080/health
# Chat test
curl -X POST http://localhost:8080/chat \
-H "Content-Type: application/json" \
-d '{"message":"Hello, what can you help me with?"}'- Open http://localhost:3000
- Type a question: "What is this innovation lab about?"
- Press Send
- You should get a response in 1-3 seconds β‘
- Add
.txtfiles todata/documents/ - Restart the API:
docker-compose restart api- Documents are automatically loaded on startup
curl -X POST http://localhost:8080/ingest \
-H "Content-Type: application/json" \
-d '{
"text": "Your content here...",
"metadata": {"source": "api-upload"}
}'- Plain text (
.txt) files - UTF-8 encoding
- Any length (automatically chunked)
- Technical docs, guides, notes, articles, etc.
Solution: Make sure Docker Desktop is running.
# Check Docker status
docker --version
docker psSolution: Check if API key is set correctly.
# Verify API key in container
docker exec rag-api printenv | findstr ANTHROPIC_API_KEY
# If blank, check .env file exists and restart
docker-compose down
docker-compose up -dSolution: Wait longer for startup, or restart ChromaDB.
# Restart ChromaDB
docker-compose restart chromadb
# Wait 20 seconds
docker-compose restart apiSolution: You might be using the wrong model.
- Verify you're using
claude-3-haiku-20240307 - Check
api/main.pyline ~180 - Rebuild if changed:
docker-compose build api
# View all logs
docker-compose logs
# Follow API logs
docker-compose logs -f api
# Follow ChromaDB logs
docker-compose logs -f chromadb# Rebuild with latest packages
docker-compose build --no-cache api
docker-compose up -d# Stop and remove all data
docker-compose down -v
# Start fresh
docker-compose up -d# Backup documents
cp -r data/documents /path/to/backup/
# Backup ChromaDB data
docker cp chromadb:/chroma/chroma ./chroma-backupClaude Haiku Pricing:
- Input: $0.25 per 1M tokens
- Output: $1.25 per 1M tokens
Average Query Cost:
- ~500 input tokens (context + question)
- ~200 output tokens (answer)
- Cost per query: ~$0.0003
Example Usage:
- 1,000 queries: ~$0.30
- 10,000 queries: ~$3.00
- 100,000 queries: ~$30.00
Free Tier: Anthropic provides $5 in free credits (~16,000 queries)
| Metric | Value |
|---|---|
| Response Time | 1-3 seconds |
| Accuracy | High (RAG-grounded) |
| Concurrent Users | 10-50 (single instance) |
| Document Limit | Unlimited (chunked) |
| Memory Usage | ~2GB (all services) |
- Backend: Python 3.11, FastAPI
- AI Model: Claude Haiku (Anthropic)
- Vector DB: ChromaDB 0.4.24
- Frontend: HTML/JavaScript (vanilla)
- Containerization: Docker & Docker Compose
- Web Server: Nginx (Alpine)
fastapi==0.109.0 # Modern Python web framework
anthropic==0.18.1 # Claude API client
chromadb==0.4.24 # Vector database
uvicorn==0.27.0 # ASGI server
numpy==1.26.4 # Numerical computingβ οΈ API key stored in.env(add to.gitignore)β οΈ CORS enabled for development (restrict in production)β οΈ No authentication (add before production deployment)- β API keys never logged or exposed in responses
For production use, consider adding:
- User authentication (JWT, OAuth)
- Rate limiting (Redis)
- HTTPS/SSL (Let's Encrypt)
- Monitoring (Prometheus/Grafana)
- Auto-scaling (Kubernetes)
- Backup strategy (automated)
- Error tracking (Sentry)
- Analytics (Mixpanel, Amplitude)
This is a personal innovation lab project, but suggestions are welcome!
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
MIT License - feel free to use for personal or commercial projects.
- Anthropic for Claude AI
- ChromaDB team for the excellent vector database
- FastAPI community for the amazing framework
- Everyone who said "you can't do it" (thanks for the motivation π)
Issues? Check the Troubleshooting section first.
Questions? Open an issue or contact via LinkedIn.
# Start all services
docker-compose up -d
# Stop all services
docker-compose down
# View logs
docker-compose logs -f api
# Restart API only
docker-compose restart api
# Rebuild after code changes
docker-compose build api
docker-compose up -d
# Complete reset
docker-compose down -v
docker-compose up -d
# Check service status
docker-compose ps
# Access API container shell
docker exec -it rag-api bashBuilt with β€οΈ and 20+ years of IT experience
2026 Innovation Lab Project
If you found this useful, give it a β on GitHub!