CodebaseQA

Understand any codebase in minutes, not days.

AI-powered codebase understanding, onboarding, and hands-on learning for developers.

Quick Start • Demo Video • Architecture Diagram • Screenshots • Features • Live Demo Mode • API • CLI

Star Repo • Open Issues • Contributing

Why CodebaseQA?

CodebaseQA is built for the moment you open an unfamiliar repository and need answers fast.

It gives you:

Chat Q&A over real code context (RAG + source citations)
Learning paths tailored by persona
Interactive lessons with file-linked references and Mermaid diagrams
Quizzes and coding challenges (bug hunt, code trace, fill-in-the-blank)
Gamification (XP, levels, streaks, achievements, activity heatmap)
Full-workspace dependency graph visualization with adaptive module-first overview, scoped drill-down, and PNG export

Use it from the web UI or from the CLI, depending on your workflow.

90-Second Product Flow

Add a GitHub repository and let CodebaseQA index it.
Ask natural-language questions and get answers with source-backed citations.
Generate a persona-based curriculum and open any lesson.
Practice with quizzes/challenges and track progress with XP, streaks, and achievements.
Explore system structure in the dependency graph and export lesson tours for VS Code.

Best For

New team members onboarding into large codebases
Engineering managers/leads accelerating ramp-up
Developers trying to understand architecture and key execution paths
Interview prep / self-learning on open-source repositories

Demo Video

Click the thumbnail or button above to play the full demo video.

Architecture Diagram

Screenshots

Media pack guide: docs/media/README.md

The gallery below uses the 10 numbered screenshots in docs/media/screenshots.

Web App Flow

1) Landing page hero

2) Landing page feature section

3) Repository import and indexing

4) Chat home with starter prompts

5) Chat answer with citations and code snippets

6) Dependency graph overview

7) Dependency graph deep inspection panel

8) Learning role selection

9) Full-stack learning track map

10) Lesson workspace with practice tools

Features

Feature	Description
Repository Indexing	Clone/index GitHub repos with progress states (pending, cloning, parsing, embedding, completed, failed)
RAG Chat	Streaming responses with query expansion, hybrid retrieval, and source snippets
Semantic Search	Vector + keyword hybrid search with language/file filters
Learning Personas	New Hire, Security Auditor, Full Stack Dev, Archaeologist
Lesson Generation	AI-generated lesson markdown, code references, optional Mermaid diagram
Quiz Generation	Lesson-based multiple-choice quizzes
Challenges	Bug Hunt, Code Trace, Fill-in-the-Blank generation + validation
Gamification	XP rewards, 6 levels, streak tracking, achievements, dashboard analytics
Dependency Graph	Full-workspace intelligent graph with adaptive module overview, focus mode, progressive edge reveal, deterministic extraction, and PNG export
CodeTour Export	Export lesson content as VS Code CodeTour (`.tour`)
CLI Tooling	Index, ask, search, list, lessons, and CodeTour export from terminal
Demo Bootstrap	Seed a demo repository via API/UI (`/api/repos/demo/seed`)

Quick Start

Prerequisites

Docker Desktop (recommended for fastest setup)
Or local dev: Node.js 20+ and Python 3.11+
At least one supported LLM provider (OpenAI, Anthropic, or Ollama)

Docker Setup

git clone https://github.com/ShreeBohara/codebaseqa.git
cd codebaseqa
cp .env.example .env
# Edit .env and add provider credentials (typically OPENAI_API_KEY)

./scripts/start-docker.sh
# Optional demo seed:
# ./scripts/start-docker.sh --with-demo

Endpoints after startup:

Web UI: http://localhost:3000
API docs: http://localhost:8000/docs
Health check: http://localhost:8000/health

Local Development

# Install JS workspace deps once (from repo root)
pnpm install

# Terminal 1: API
cd apps/api
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn src.main:app --reload

# Terminal 2: Web
pnpm web:dev

# Optional (direct package command)
# cd apps/web
# pnpm dev

Frontend Troubleshooting

If the UI looks unstyled (plain text/stacked layout), clear Next.js build artifacts and restart:

rm -rf apps/web/.next
pnpm web:dev

Use pnpm web:dev as the canonical frontend start command.

Live Demo Mode

CodebaseQA supports a public-safe runtime mode for hosted demos.

Set DEMO_MODE=true to pin the deployment to one featured repository.
Default featured repository is vercel/nextjs-subscription-payments (configurable with DEMO_REPO_* env vars).
In demo mode, repo import/delete can be disabled, and chat/learning endpoints apply soft rate limits.
Frontend automatically shows a demo banner and hides destructive repo actions.

Key env vars:

DEMO_MODE, SEED_DEMO
DEMO_REPO_URL, DEMO_REPO_OWNER, DEMO_REPO_NAME, DEMO_REPO_BRANCH
DEMO_ALLOW_PUBLIC_IMPORTS, DEMO_BANNER_TEXT, DEMO_BUSY_MODE
DEMO_RATE_LIMIT_* knobs for soft guardrails

For local validation:

pnpm web:dev
# in another terminal
cd apps/api && uvicorn src.main:app --reload

For production prewarm after first deploy:

pnpm demo:prewarm

Configuration

CodebaseQA reads settings from environment variables (via apps/api/src/config.py).

Core Settings

Variable	Description	Default
`DATABASE_URL`	SQLAlchemy DB URL	`sqlite:///./data/codebaseqa.db`
`CHROMA_PERSIST_DIR`	Chroma storage path	`./data/chroma`
`REPOS_DIR`	Cloned repository cache path	`./data/repos`
`GITHUB_TOKEN`	Needed for private repos / higher API limits	unset
`MAX_FILES_PER_REPO`	Indexing cap per repository	`5000`
`MAX_FILE_SIZE_KB`	Skip files larger than this	`500`
`DEBUG`	API debug mode	`false`

LLM & Embeddings

Variable	Description	Default
`LLM_PROVIDER`	`openai`, `anthropic`, or `ollama`	`openai`
`EMBEDDING_PROVIDER`	`openai` or `ollama`	`openai`
`OPENAI_API_KEY`	OpenAI API key	unset
`OPENAI_MODEL`	OpenAI chat model	`gpt-4o`
`OPENAI_EMBEDDING_MODEL`	OpenAI embedding model	`text-embedding-3-small`
`OPENAI_BASE_URL`	OpenAI-compatible endpoint override	unset
`OPENAI_EMBEDDING_MAX_TOKENS_PER_REQUEST`	Max total tokens per embedding request batch	`250000`
`OPENAI_EMBEDDING_MAX_TEXTS_PER_REQUEST`	Max chunk count per embedding request batch	`128`
`OPENAI_EMBEDDING_REQUEST_CONCURRENCY`	Max concurrent embedding requests	`1`
`OPENAI_EMBEDDING_MIN_SECONDS_BETWEEN_REQUESTS`	Minimum delay between embedding requests	`0.0`
`OPENAI_EMBEDDING_RATE_LIMIT_MAX_RETRIES`	Retry attempts for embedding `HTTP 429` responses	`6`
`OPENAI_EMBEDDING_RATE_LIMIT_BASE_BACKOFF_SECONDS`	Base seconds for exponential backoff on `HTTP 429`	`1.0`
`OPENAI_EMBEDDING_RATE_LIMIT_MAX_BACKOFF_SECONDS`	Maximum wait per retry on `HTTP 429`	`30.0`
`ANTHROPIC_API_KEY`	Anthropic API key	unset
`ANTHROPIC_MODEL`	Anthropic model	`claude-sonnet-4-20250514`
`OLLAMA_BASE_URL`	Ollama host URL	`http://localhost:11434`
`OLLAMA_MODEL`	Ollama generation model	`llama3.1`
`LOCAL_EMBEDDING_MODEL`	Ollama embedding model name	`nomic-ai/nomic-embed-text-v1.5`
`LEARNING_V2_ENABLED`	Enable Learning V2 syllabus/lesson cache pipeline	`false`

Notes:

Docker compose currently passes OPENAI_API_KEY by default; if you want Anthropic/Ollama in Docker, add those env vars in docker/docker-compose.yml.
For local development, all variables above can be set directly in your shell or .env.
Large repositories can trigger temporary embedding rate limits (HTTP 429). Indexing retries automatically; tune the OPENAI_EMBEDDING_* controls above (batch size, pacing, and retry backoff) if needed.
VOYAGE_API_KEY exists in config for future provider support, but EMBEDDING_PROVIDER currently supports only openai and ollama.

API Endpoints

Repository & Indexing

Method	Endpoint	Description
`POST`	`/api/repos/`	Add repository and start background indexing
`GET`	`/api/repos/`	List repositories
`GET`	`/api/repos/{repo_id}`	Get repository details
`GET`	`/api/repos/{repo_id}/progress`	Stream indexing progress (SSE)
`DELETE`	`/api/repos/{repo_id}`	Delete repository and indexed data
`GET`	`/api/repos/{repo_id}/files/content`	Fetch file content by `path` query param
`POST`	`/api/repos/demo/seed`	Seed demo repository

Chat & Search

Method	Endpoint	Description
`POST`	`/api/chat/sessions`	Create chat session
`GET`	`/api/chat/sessions/{session_id}`	Get session + messages
`POST`	`/api/chat/sessions/{session_id}/messages`	Stream assistant response (SSE)
`POST`	`/api/search/`	Hybrid semantic code search

Learning, Graph, Gamification, Challenges

Method	Endpoint	Description
`GET`	`/api/learning/personas`	List available personas
`POST`	`/api/learning/{repo_id}/curriculum`	Generate syllabus
`POST`	`/api/learning/{repo_id}/lessons/{lesson_id}`	Generate lesson content
`POST`	`/api/learning/{repo_id}/lessons/{lesson_id}/quiz`	Generate quiz
`GET`	`/api/learning/{repo_id}/lessons/{lesson_id}/export/codetour`	Export lesson as CodeTour
`GET`	`/api/learning/{repo_id}/graph`	Generate dependency graph
`GET`	`/api/learning/{repo_id}/stats`	User XP/level/streak stats
`GET`	`/api/learning/{repo_id}/activity`	Activity heatmap data
`GET`	`/api/learning/{repo_id}/achievements`	Achievement list + unlock status
`GET`	`/api/learning/{repo_id}/progress`	Completed lessons
`POST`	`/api/learning/{repo_id}/lessons/{lesson_id}/complete`	Mark lesson complete + award XP
`POST`	`/api/learning/{repo_id}/lessons/{lesson_id}/quiz/result`	Submit quiz result + award XP
`POST`	`/api/learning/{repo_id}/challenges/complete`	Record challenge completion + award XP
`POST`	`/api/learning/{repo_id}/graph/viewed`	Record graph view event
`POST`	`/api/learning/{repo_id}/graph/nodes/viewed`	Record unique graph node exploration events
`POST`	`/api/learning/{repo_id}/lessons/{lesson_id}/challenge`	Generate challenge
`POST`	`/api/learning/{repo_id}/challenges/validate/bug_hunt`	Validate bug hunt answer
`POST`	`/api/learning/{repo_id}/challenges/validate/code_trace`	Validate code trace answer
`POST`	`/api/learning/{repo_id}/challenges/validate/fill_blank`	Validate fill-blank answer

Platform Endpoints

Method	Endpoint	Description
`GET`	`/api/platform/config`	Runtime demo flags for frontend behavior
`GET`	`/health`	Service health + dependency checks
`GET`	`/openapi.json`	OpenAPI JSON
`GET`	`/openapi.yaml`	OpenAPI YAML
`GET`	`/api/cache/stats`	LLM cache statistics

CLI Usage

Install:

cd cli
pip install -e .

Optional: set CODEBASEQA_API_URL if your API is not at http://localhost:8000.

Commands:

# Index repository
codebaseqa index https://github.com/expressjs/express

# List repositories
codebaseqa list

# Ask a question
codebaseqa ask <repo_id> "What is the main entry point?"

# Search code
codebaseqa search <repo_id> "authentication middleware"

# List generated lessons (default persona: new_hire)
codebaseqa lessons <repo_id>

# Export lesson as VS Code CodeTour
codebaseqa export-tour <repo_id> <lesson_id>

Architecture (Monorepo)

codebaseqa/
├── apps/
│   ├── api/        # FastAPI backend (RAG, indexing, learning, gamification)
│   └── web/        # Next.js frontend UI
├── cli/            # Python CLI client
├── docker/         # Dockerfiles + compose + entrypoint
├── docs/           # Architecture and design notes
└── scripts/        # Local helper scripts

Backend highlights:

Tree-sitter semantic parsing for Python, JavaScript, TypeScript, Java, Go, Rust, C#, C++, Ruby
Hybrid retrieval (vector + keyword) and query expansion
LLM-based reranking for improved relevance
SQLite metadata + Chroma vector persistence

Testing & Checks

Backend:

cd apps/api
# after activating your virtualenv
python -m pytest tests/unit tests/integration
ruff check src tests

Frontend:

cd apps/web
pnpm lint
pnpm type-check
pnpm test
pnpm build

Workspace shortcuts:

pnpm lint
pnpm test
pnpm type-check
pnpm web:build
pnpm web:verify-css

Current Limitations

Very large repositories can still be slow/expensive to index depending on provider/model choice, and may hit temporary embedding rate limits before retries succeed.
Lesson/challenge/graph generation quality depends on model capability and retrieved context.
Docker setup is optimized for OpenAI defaults unless extra provider vars are explicitly wired.

Author

Shree Bohara

GitHub: @ShreeBohara
LinkedIn: ShreeBohara

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
apps		apps
cli		cli
docker		docker
docs		docs
scripts		scripts
.coverage 2		.coverage 2
.coverage 3		.coverage 3
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
turbo.json		turbo.json

Folders and files

Latest commit

History

Repository files navigation

CodebaseQA

Why CodebaseQA?

90-Second Product Flow

Best For

Demo Video

Architecture Diagram

Screenshots

Web App Flow

1) Landing page hero

2) Landing page feature section

3) Repository import and indexing

4) Chat home with starter prompts

5) Chat answer with citations and code snippets

6) Dependency graph overview

7) Dependency graph deep inspection panel

8) Learning role selection

9) Full-stack learning track map

10) Lesson workspace with practice tools

Features

Quick Start

Prerequisites

Docker Setup

Local Development

Frontend Troubleshooting

Live Demo Mode

Configuration

Core Settings

LLM & Embeddings

API Endpoints

Repository & Indexing

Chat & Search

Learning, Graph, Gamification, Challenges

Platform Endpoints

CLI Usage

Architecture (Monorepo)

Testing & Checks

Current Limitations

Author

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages