An agentic RAG system that clones a public Python GitHub repository, parses and chunks the source code using code-aware logic, stores embeddings in FAISS, and answers natural language questions through a conversational CLI with source file references.
Default repository: https://github.com/pallets/flask
Flask was chosen because it is a real-world Python codebase with meaningful architecture, routing, CLI, configuration, blueprints, request context, sessions, and testing utilities.
- Clones or fetches a public GitHub repository locally.
- Skips binary files and ignored directories.
- Parses Python files with
astand avoids arbitrary splitting inside functions/classes. - Chunks Markdown, TOML, YAML, INI, and text files by structural sections where possible.
- Stores local embeddings in FAISS.
- Uses LangGraph with tool calling.
- Provides required tools:
search_code(query)read_file(path)list_directory(path)summarize_module(module_name)
- Answers with file path and line ranges where available.
- Handles unsupported or out-of-scope questions without hallucinating.
python -m venv .venv
# Windows Git Bash
source .venv/Scripts/activate
# macOS/Linux
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .envAdd your OpenAI key to .env:
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
DEFAULT_REPO_URL=https://github.com/pallets/flask
DEFAULT_REPO_NAME=flask
TOP_K=6python scripts/index_repo.py --repo-url https://github.com/pallets/flask --repo-name flask --forceThis creates:
data/repos/flask/
data/indexes/flask/faiss.index
data/indexes/flask/chunks.jsonl
data/indexes/flask/files.json
data/indexes/flask/repo_info.json
python scripts/ask.py "How does Flask register routes?" --repo-name flaskPYTHONPATH=src python -m codebase_qa.interface.cli chat --repo-name flaskor:
python scripts/ask.py --chat --repo-name flaskWhat is the overall architecture of Flask?
How does Flask register routes?
Where is the request context implemented?
Explain the Blueprint class and how it relates to the Flask app.
What happens when a request is dispatched?
Which files are responsible for CLI commands?
Is there any payment integration in this repo?
Answer:
Flask registers routes through a decorator that eventually delegates to add_url_rule. The route decorator prepares options and returns a wrapper around the view function.
Sources:
- src/flask/sansio/scaffold.py:335-366
- src/flask/sansio/scaffold.py:402-456
Confidence: High
Add screenshots into the screenshots/ folder after running sample queries.
Recommended screenshots:
screenshots/question1.png
screenshots/question2.png
screenshots/question3.png
AI assistance(ChatGPT) was used to help generate and debug parts of this implementation. The final submission should be reviewed and understood by the developer before delivery. No hardcoded answers or fake retrieval results are included.
- The agent does not connect to GitHub at answer time.
- The repo is cloned locally during indexing.
- The runtime agent only uses the local cloned repository and local FAISS index.
- No hardcoded demo answers are used.
- Out-of-scope questions are answered conservatively when the indexed codebase does not support an answer.


