Skip to content

MdRimon29/Codebase-QA-Agent

Repository files navigation

Codebase Q&A Agent

An agentic RAG system that clones a public Python GitHub repository, parses and chunks the source code using code-aware logic, stores embeddings in FAISS, and answers natural language questions through a conversational CLI with source file references.

Chosen Repository

Default repository: https://github.com/pallets/flask

Flask was chosen because it is a real-world Python codebase with meaningful architecture, routing, CLI, configuration, blueprints, request context, sessions, and testing utilities.

Features

  • Clones or fetches a public GitHub repository locally.
  • Skips binary files and ignored directories.
  • Parses Python files with ast and avoids arbitrary splitting inside functions/classes.
  • Chunks Markdown, TOML, YAML, INI, and text files by structural sections where possible.
  • Stores local embeddings in FAISS.
  • Uses LangGraph with tool calling.
  • Provides required tools:
    • search_code(query)
    • read_file(path)
    • list_directory(path)
    • summarize_module(module_name)
  • Answers with file path and line ranges where available.
  • Handles unsupported or out-of-scope questions without hallucinating.

Setup & Installation

python -m venv .venv

# Windows Git Bash
source .venv/Scripts/activate

# macOS/Linux
source .venv/bin/activate

pip install -r requirements.txt
cp .env.example .env

Add your OpenAI key to .env:

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
DEFAULT_REPO_URL=https://github.com/pallets/flask
DEFAULT_REPO_NAME=flask
TOP_K=6

Usage

1. Clone and index the repository

python scripts/index_repo.py --repo-url https://github.com/pallets/flask --repo-name flask --force

This creates:

data/repos/flask/
data/indexes/flask/faiss.index
data/indexes/flask/chunks.jsonl
data/indexes/flask/files.json
data/indexes/flask/repo_info.json

2. Ask one question

python scripts/ask.py "How does Flask register routes?" --repo-name flask

3. Start conversational CLI

PYTHONPATH=src python -m codebase_qa.interface.cli chat --repo-name flask

or:

python scripts/ask.py --chat --repo-name flask

Example Questions

What is the overall architecture of Flask?
How does Flask register routes?
Where is the request context implemented?
Explain the Blueprint class and how it relates to the Flask app.
What happens when a request is dispatched?
Which files are responsible for CLI commands?
Is there any payment integration in this repo?

Example Output Shape

Answer:
Flask registers routes through a decorator that eventually delegates to add_url_rule. The route decorator prepares options and returns a wrapper around the view function.

Sources:
- src/flask/sansio/scaffold.py:335-366
- src/flask/sansio/scaffold.py:402-456

Confidence: High

Screenshots

Add screenshots into the screenshots/ folder after running sample queries.

Recommended screenshots:

screenshots/question1.png
screenshots/question2.png
screenshots/question3.png

img img img

AI Usage Disclosure

AI assistance(ChatGPT) was used to help generate and debug parts of this implementation. The final submission should be reviewed and understood by the developer before delivery. No hardcoded answers or fake retrieval results are included.

Integrity Notes

  • The agent does not connect to GitHub at answer time.
  • The repo is cloned locally during indexing.
  • The runtime agent only uses the local cloned repository and local FAISS index.
  • No hardcoded demo answers are used.
  • Out-of-scope questions are answered conservatively when the indexed codebase does not support an answer.

About

An agentic RAG-based Codebase Q&A system for answering questions about GitHub repositories with file and line references.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages