Understand any codebase in minutes, not days.
AI-powered codebase understanding, onboarding, and hands-on learning for developers.
Quick Start • Demo Video • Architecture Diagram • Screenshots • Features • Live Demo Mode • API • CLI
CodebaseQA is built for the moment you open an unfamiliar repository and need answers fast.
It gives you:
- Chat Q&A over real code context (RAG + source citations)
- Learning paths tailored by persona
- Interactive lessons with file-linked references and Mermaid diagrams
- Quizzes and coding challenges (bug hunt, code trace, fill-in-the-blank)
- Gamification (XP, levels, streaks, achievements, activity heatmap)
- Full-workspace dependency graph visualization with adaptive module-first overview, scoped drill-down, and PNG export
Use it from the web UI or from the CLI, depending on your workflow.
- Add a GitHub repository and let CodebaseQA index it.
- Ask natural-language questions and get answers with source-backed citations.
- Generate a persona-based curriculum and open any lesson.
- Practice with quizzes/challenges and track progress with XP, streaks, and achievements.
- Explore system structure in the dependency graph and export lesson tours for VS Code.
- New team members onboarding into large codebases
- Engineering managers/leads accelerating ramp-up
- Developers trying to understand architecture and key execution paths
- Interview prep / self-learning on open-source repositories
Click the thumbnail or button above to play the full demo video.
Media pack guide: docs/media/README.md
The gallery below uses the 10 numbered screenshots in docs/media/screenshots.
| Feature | Description |
|---|---|
| Repository Indexing | Clone/index GitHub repos with progress states (pending, cloning, parsing, embedding, completed, failed) |
| RAG Chat | Streaming responses with query expansion, hybrid retrieval, and source snippets |
| Semantic Search | Vector + keyword hybrid search with language/file filters |
| Learning Personas | New Hire, Security Auditor, Full Stack Dev, Archaeologist |
| Lesson Generation | AI-generated lesson markdown, code references, optional Mermaid diagram |
| Quiz Generation | Lesson-based multiple-choice quizzes |
| Challenges | Bug Hunt, Code Trace, Fill-in-the-Blank generation + validation |
| Gamification | XP rewards, 6 levels, streak tracking, achievements, dashboard analytics |
| Dependency Graph | Full-workspace intelligent graph with adaptive module overview, focus mode, progressive edge reveal, deterministic extraction, and PNG export |
| CodeTour Export | Export lesson content as VS Code CodeTour (.tour) |
| CLI Tooling | Index, ask, search, list, lessons, and CodeTour export from terminal |
| Demo Bootstrap | Seed a demo repository via API/UI (/api/repos/demo/seed) |
- Docker Desktop (recommended for fastest setup)
- Or local dev: Node.js 20+ and Python 3.11+
- At least one supported LLM provider (OpenAI, Anthropic, or Ollama)
git clone https://github.com/ShreeBohara/codebaseqa.git
cd codebaseqa
cp .env.example .env
# Edit .env and add provider credentials (typically OPENAI_API_KEY)
./scripts/start-docker.sh
# Optional demo seed:
# ./scripts/start-docker.sh --with-demoEndpoints after startup:
- Web UI:
http://localhost:3000 - API docs:
http://localhost:8000/docs - Health check:
http://localhost:8000/health
# Install JS workspace deps once (from repo root)
pnpm install
# Terminal 1: API
cd apps/api
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn src.main:app --reload
# Terminal 2: Web
pnpm web:dev
# Optional (direct package command)
# cd apps/web
# pnpm devIf the UI looks unstyled (plain text/stacked layout), clear Next.js build artifacts and restart:
rm -rf apps/web/.next
pnpm web:devUse pnpm web:dev as the canonical frontend start command.
CodebaseQA supports a public-safe runtime mode for hosted demos.
- Set
DEMO_MODE=trueto pin the deployment to one featured repository. - Default featured repository is
vercel/nextjs-subscription-payments(configurable withDEMO_REPO_*env vars). - In demo mode, repo import/delete can be disabled, and chat/learning endpoints apply soft rate limits.
- Frontend automatically shows a demo banner and hides destructive repo actions.
Key env vars:
DEMO_MODE,SEED_DEMODEMO_REPO_URL,DEMO_REPO_OWNER,DEMO_REPO_NAME,DEMO_REPO_BRANCHDEMO_ALLOW_PUBLIC_IMPORTS,DEMO_BANNER_TEXT,DEMO_BUSY_MODEDEMO_RATE_LIMIT_*knobs for soft guardrails
For local validation:
pnpm web:dev
# in another terminal
cd apps/api && uvicorn src.main:app --reloadFor production prewarm after first deploy:
pnpm demo:prewarmCodebaseQA reads settings from environment variables (via apps/api/src/config.py).
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
SQLAlchemy DB URL | sqlite:///./data/codebaseqa.db |
CHROMA_PERSIST_DIR |
Chroma storage path | ./data/chroma |
REPOS_DIR |
Cloned repository cache path | ./data/repos |
GITHUB_TOKEN |
Needed for private repos / higher API limits | unset |
MAX_FILES_PER_REPO |
Indexing cap per repository | 5000 |
MAX_FILE_SIZE_KB |
Skip files larger than this | 500 |
DEBUG |
API debug mode | false |
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER |
openai, anthropic, or ollama |
openai |
EMBEDDING_PROVIDER |
openai or ollama |
openai |
OPENAI_API_KEY |
OpenAI API key | unset |
OPENAI_MODEL |
OpenAI chat model | gpt-4o |
OPENAI_EMBEDDING_MODEL |
OpenAI embedding model | text-embedding-3-small |
OPENAI_BASE_URL |
OpenAI-compatible endpoint override | unset |
OPENAI_EMBEDDING_MAX_TOKENS_PER_REQUEST |
Max total tokens per embedding request batch | 250000 |
OPENAI_EMBEDDING_MAX_TEXTS_PER_REQUEST |
Max chunk count per embedding request batch | 128 |
OPENAI_EMBEDDING_REQUEST_CONCURRENCY |
Max concurrent embedding requests | 1 |
OPENAI_EMBEDDING_MIN_SECONDS_BETWEEN_REQUESTS |
Minimum delay between embedding requests | 0.0 |
OPENAI_EMBEDDING_RATE_LIMIT_MAX_RETRIES |
Retry attempts for embedding HTTP 429 responses |
6 |
OPENAI_EMBEDDING_RATE_LIMIT_BASE_BACKOFF_SECONDS |
Base seconds for exponential backoff on HTTP 429 |
1.0 |
OPENAI_EMBEDDING_RATE_LIMIT_MAX_BACKOFF_SECONDS |
Maximum wait per retry on HTTP 429 |
30.0 |
ANTHROPIC_API_KEY |
Anthropic API key | unset |
ANTHROPIC_MODEL |
Anthropic model | claude-sonnet-4-20250514 |
OLLAMA_BASE_URL |
Ollama host URL | http://localhost:11434 |
OLLAMA_MODEL |
Ollama generation model | llama3.1 |
LOCAL_EMBEDDING_MODEL |
Ollama embedding model name | nomic-ai/nomic-embed-text-v1.5 |
LEARNING_V2_ENABLED |
Enable Learning V2 syllabus/lesson cache pipeline | false |
Notes:
- Docker compose currently passes
OPENAI_API_KEYby default; if you want Anthropic/Ollama in Docker, add those env vars indocker/docker-compose.yml. - For local development, all variables above can be set directly in your shell or
.env. - Large repositories can trigger temporary embedding rate limits (
HTTP 429). Indexing retries automatically; tune theOPENAI_EMBEDDING_*controls above (batch size, pacing, and retry backoff) if needed. VOYAGE_API_KEYexists in config for future provider support, butEMBEDDING_PROVIDERcurrently supports onlyopenaiandollama.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/repos/ |
Add repository and start background indexing |
GET |
/api/repos/ |
List repositories |
GET |
/api/repos/{repo_id} |
Get repository details |
GET |
/api/repos/{repo_id}/progress |
Stream indexing progress (SSE) |
DELETE |
/api/repos/{repo_id} |
Delete repository and indexed data |
GET |
/api/repos/{repo_id}/files/content |
Fetch file content by path query param |
POST |
/api/repos/demo/seed |
Seed demo repository |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/chat/sessions |
Create chat session |
GET |
/api/chat/sessions/{session_id} |
Get session + messages |
POST |
/api/chat/sessions/{session_id}/messages |
Stream assistant response (SSE) |
POST |
/api/search/ |
Hybrid semantic code search |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/learning/personas |
List available personas |
POST |
/api/learning/{repo_id}/curriculum |
Generate syllabus |
POST |
/api/learning/{repo_id}/lessons/{lesson_id} |
Generate lesson content |
POST |
/api/learning/{repo_id}/lessons/{lesson_id}/quiz |
Generate quiz |
GET |
/api/learning/{repo_id}/lessons/{lesson_id}/export/codetour |
Export lesson as CodeTour |
GET |
/api/learning/{repo_id}/graph |
Generate dependency graph |
GET |
/api/learning/{repo_id}/stats |
User XP/level/streak stats |
GET |
/api/learning/{repo_id}/activity |
Activity heatmap data |
GET |
/api/learning/{repo_id}/achievements |
Achievement list + unlock status |
GET |
/api/learning/{repo_id}/progress |
Completed lessons |
POST |
/api/learning/{repo_id}/lessons/{lesson_id}/complete |
Mark lesson complete + award XP |
POST |
/api/learning/{repo_id}/lessons/{lesson_id}/quiz/result |
Submit quiz result + award XP |
POST |
/api/learning/{repo_id}/challenges/complete |
Record challenge completion + award XP |
POST |
/api/learning/{repo_id}/graph/viewed |
Record graph view event |
POST |
/api/learning/{repo_id}/graph/nodes/viewed |
Record unique graph node exploration events |
POST |
/api/learning/{repo_id}/lessons/{lesson_id}/challenge |
Generate challenge |
POST |
/api/learning/{repo_id}/challenges/validate/bug_hunt |
Validate bug hunt answer |
POST |
/api/learning/{repo_id}/challenges/validate/code_trace |
Validate code trace answer |
POST |
/api/learning/{repo_id}/challenges/validate/fill_blank |
Validate fill-blank answer |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/platform/config |
Runtime demo flags for frontend behavior |
GET |
/health |
Service health + dependency checks |
GET |
/openapi.json |
OpenAPI JSON |
GET |
/openapi.yaml |
OpenAPI YAML |
GET |
/api/cache/stats |
LLM cache statistics |
Install:
cd cli
pip install -e .Optional: set CODEBASEQA_API_URL if your API is not at http://localhost:8000.
Commands:
# Index repository
codebaseqa index https://github.com/expressjs/express
# List repositories
codebaseqa list
# Ask a question
codebaseqa ask <repo_id> "What is the main entry point?"
# Search code
codebaseqa search <repo_id> "authentication middleware"
# List generated lessons (default persona: new_hire)
codebaseqa lessons <repo_id>
# Export lesson as VS Code CodeTour
codebaseqa export-tour <repo_id> <lesson_id>codebaseqa/
├── apps/
│ ├── api/ # FastAPI backend (RAG, indexing, learning, gamification)
│ └── web/ # Next.js frontend UI
├── cli/ # Python CLI client
├── docker/ # Dockerfiles + compose + entrypoint
├── docs/ # Architecture and design notes
└── scripts/ # Local helper scripts
Backend highlights:
- Tree-sitter semantic parsing for Python, JavaScript, TypeScript, Java, Go, Rust, C#, C++, Ruby
- Hybrid retrieval (vector + keyword) and query expansion
- LLM-based reranking for improved relevance
- SQLite metadata + Chroma vector persistence
Backend:
cd apps/api
# after activating your virtualenv
python -m pytest tests/unit tests/integration
ruff check src testsFrontend:
cd apps/web
pnpm lint
pnpm type-check
pnpm test
pnpm buildWorkspace shortcuts:
pnpm lint
pnpm test
pnpm type-check
pnpm web:build
pnpm web:verify-css- Very large repositories can still be slow/expensive to index depending on provider/model choice, and may hit temporary embedding rate limits before retries succeed.
- Lesson/challenge/graph generation quality depends on model capability and retrieved context.
- Docker setup is optimized for OpenAI defaults unless extra provider vars are explicitly wired.
Shree Bohara
- GitHub: @ShreeBohara
- LinkedIn: ShreeBohara
MIT









