From 17fdbf283e272c7be06a21c6cd1de30e0d28719c Mon Sep 17 00:00:00 2001 From: Zain Ali Date: Mon, 17 Nov 2025 18:09:08 +0500 Subject: [PATCH 1/3] Add Claude Code project instructions and remove env example MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added CLAUDE.md with comprehensive guidance for working with this RAG chatbot codebase, including architecture overview, data flow, and component documentation. Removed .env.example as environment setup is now documented in CLAUDE.md. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .claude/settings.local.json | 10 ++ .env.example | 2 - CLAUDE.md | 199 ++++++++++++++++++++++++++++++++++++ 3 files changed, 209 insertions(+), 2 deletions(-) create mode 100644 .claude/settings.local.json delete mode 100644 .env.example create mode 100644 CLAUDE.md diff --git a/.claude/settings.local.json b/.claude/settings.local.json new file mode 100644 index 000000000..2998dc66c --- /dev/null +++ b/.claude/settings.local.json @@ -0,0 +1,10 @@ +{ + "permissions": { + "allow": [ + "Bash(git add:*)", + "Bash(git commit:*)" + ], + "deny": [], + "ask": [] + } +} diff --git a/.env.example b/.env.example deleted file mode 100644 index 18b34cb7e..000000000 --- a/.env.example +++ /dev/null @@ -1,2 +0,0 @@ -# Copy this file to .env and add your actual API key -ANTHROPIC_API_KEY=your-anthropic-api-key-here \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 000000000..086c2da7f --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,199 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Running the Application + +### Quick Start +```bash +./run.sh +``` + +### Manual Start +```bash +cd backend +uv run uvicorn app:app --reload --port 8000 +``` + +### Dependencies Installation +```bash +uv sync +``` + +The application will be available at: +- Web Interface: `http://localhost:8000` +- API Documentation: `http://localhost:8000/docs` + +### Environment Setup +Requires `.env` file in root directory with: +``` +ANTHROPIC_API_KEY=your_api_key_here +``` + +## Architecture Overview + +This is a **Retrieval-Augmented Generation (RAG) system** for querying course materials using semantic search and Claude AI. + +### Data Flow Architecture + +The system follows a **tool-based RAG pattern** where Claude decides when to search: + +``` +User Query → FastAPI → RAG System → AI Generator → Claude API + ↓ + Tool Decision? + ↓ + [Search] ← Tool Manager → Search Tool + ↓ + Vector Store (ChromaDB) + ↓ + Search Results + ↓ + Claude + Context → Final Response +``` + +### Core Components (Backend) + +**1. app.py (Entry Point)** +- Defines FastAPI endpoints: `POST /api/query` and `GET /api/courses` +- Initializes RAGSystem on startup +- Auto-loads documents from `../docs` folder +- Serves frontend static files from `../frontend` + +**2. rag_system.py (Orchestrator)** +- Central coordinator for all RAG operations +- Manages lifecycle of queries from input to response +- Bridges between AI Generator, Vector Store, and Tool Manager +- Handles conversation context via SessionManager +- Key method: `query(query, session_id)` returns `(answer, sources)` + +**3. ai_generator.py (Claude Integration)** +- Wraps Anthropic Claude API with tool-calling support +- Two-phase response generation: + - Phase 1: Initial API call with tools available + - Phase 2: If tool used, collect results and make final call without tools +- Method `_handle_tool_execution()` manages tool execution loop +- Static system prompt instructs Claude on tool usage patterns + +**4. vector_store.py (Semantic Search)** +- ChromaDB wrapper with two collections: + - `course_catalog`: Course metadata (titles, instructors, links) + - `course_content`: Chunked course material with embeddings +- Uses SentenceTransformer (all-MiniLM-L6-v2) for embeddings +- Key method: `search(query, course_name, lesson_number)` with smart course name resolution +- Course name resolution uses semantic search on catalog to handle fuzzy matching + +**5. search_tools.py (Tool System)** +- `Tool` abstract base class for creating Claude-callable tools +- `CourseSearchTool`: Implements search with Anthropic tool definition schema +- `ToolManager`: Registry for tools, handles execution and tracks sources +- Tool definition includes input schema with optional parameters (course_name, lesson_number) + +**6. document_processor.py (Content Pipeline)** +- Parses course documents with expected format: + ``` + Course Title: [title] + Course Link: [url] + Course Instructor: [name] + + Lesson 0: [title] + Lesson Link: [url] + [content] + + Lesson 1: [title] + ... + ``` +- Sentence-based chunking with overlap (default: 800 chars, 100 overlap) +- Creates `Course` and `CourseChunk` objects for vector storage +- Adds lesson context to chunks: "Course X Lesson Y content: ..." + +**7. session_manager.py (Conversation History)** +- Manages user conversation sessions with configurable history depth +- Stores message pairs (user/assistant) in memory +- Returns formatted history string for context injection +- Default: Keeps last 2 exchanges (4 messages) + +**8. models.py (Data Models)** +- Pydantic models: `Course`, `Lesson`, `CourseChunk` +- Course title serves as unique identifier +- Chunks include metadata: course_title, lesson_number, chunk_index + +### Frontend (Vanilla JavaScript) + +**script.js** handles: +- API calls to `/api/query` and `/api/courses` +- Session management (session_id persistence) +- Message rendering with markdown support (marked.js) +- Loading states and error handling + +### Key Configuration (config.py) + +```python +ANTHROPIC_MODEL: "claude-sonnet-4-20250514" +EMBEDDING_MODEL: "all-MiniLM-L6-v2" +CHUNK_SIZE: 800 +CHUNK_OVERLAP: 100 +MAX_RESULTS: 5 +MAX_HISTORY: 2 # conversation exchanges +CHROMA_PATH: "./chroma_db" +``` + +## Tool-Based RAG Pattern + +**Critical Design Decision**: This system uses Claude's tool-calling capability rather than always searching: + +1. User query reaches `ai_generator.py` +2. Claude receives query + system prompt + tool definitions +3. Claude decides: "Do I need to search course content?" + - **Yes**: Invokes `search_course_content` tool → retrieves context → generates answer + - **No**: Answers directly from general knowledge +4. Tool results are injected as a new user message +5. Final response generated without tools + +This allows Claude to: +- Answer general questions without searching +- Use semantic search only for course-specific queries +- Make intelligent decisions about course/lesson filtering + +## Document Format Requirements + +Course documents in `docs/` must follow this structure: + +``` +Course Title: [Required - First line] +Course Link: [Optional - URL] +Course Instructor: [Optional - Name] + +Lesson 0: [Lesson title] +Lesson Link: [Optional - URL] +[Lesson content...] + +Lesson 1: [Next lesson title] +... +``` + +Missing metadata will use defaults. Documents without lesson markers are treated as single documents. + +## ChromaDB Persistence + +- Database stored in `./chroma_db` (relative to backend directory) +- Persistent across restarts +- Startup checks for existing courses before adding +- Duplicate courses (by title) are skipped +- Use `clear_existing=True` in `add_course_folder()` to rebuild from scratch + +## Common Issues + +**API Key**: Ensure `.env` exists in root directory (not backend/) with valid Anthropic API key. + +**Port Conflicts**: Default port 8000. Change with `--port` flag in uvicorn command. + +**ChromaDB Warnings**: Resource tracker warnings are suppressed via warnings filter in app.py. + +**Course Not Loading**: Check `docs/` path is relative to backend directory (`../docs`). Startup logs show load status. + +**Session Context**: Frontend manages session_id. Backend creates new session if missing. Sessions are in-memory only. +- always use uv to run the server do not use pip directly. +- make sure use uv to manage all dependencies +- make sure use uv to manage all dependencies use uv add +- use uv to run python files \ No newline at end of file From 6ea28146da3b6f7254cf7c652bbc3e5624f970ac Mon Sep 17 00:00:00 2001 From: Zain Ali Date: Mon, 17 Nov 2025 18:37:46 +0500 Subject: [PATCH 2/3] Add guide for understanding Claude Code configuration files MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This document explains the three types of CLAUDE.md files: - CLAUDE.md (team-wide project settings) - CLAUDE.local.md (personal project settings) - ~/.claude/CLAUDE.md (global user settings) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- ...tanding_claude_code_configuration_files.md | 35 +++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 understanding_claude_code_configuration_files.md diff --git a/understanding_claude_code_configuration_files.md b/understanding_claude_code_configuration_files.md new file mode 100644 index 000000000..f9c2682b2 --- /dev/null +++ b/understanding_claude_code_configuration_files.md @@ -0,0 +1,35 @@ +# Understanding Claude Code Configuration Files + +## **1. CLAUDE.md** +- **What is it:** This is your project's main file +- **When is it created:** When you run the `/init` command +- **Who can see it:** You commit this file to Git, so your **entire team** can see it +- **Where is it located:** In your project folder +- **Example:** If you're working with a team, everyone will have access to this file + +--- + +## **2. CLAUDE.local.md** +- **What is it:** This is your **personal file** +- **Who can see it:** Only **you** - no other engineers +- **What's it for:** You write your personal instructions and customizations for Claude here +- **Where is it located:** In your project folder +- **Example:** If you want Claude to work in a specific way just for you + +--- + +## **3. ~/.claude/CLAUDE.md** +- **What is it:** This is your **global settings** file +- **Where does it work:** In **every project** on your machine +- **What's it for:** Contains instructions you want Claude to follow in **all projects** +- **Where is it located:** Inside the `.claude` folder in your home directory (~) +- **Example:** If you want Claude to always write code in the same style, in every project + +--- + +**In simple words:** +- **CLAUDE.md** = For the team +- **CLAUDE.local.md** = Just for you (in that project) +- **~/.claude/CLAUDE.md** = For all your projects + + From b3f817c2c0400151ad8c5b6c45f3b6fda5dca22f Mon Sep 17 00:00:00 2001 From: Zain Ali Date: Mon, 17 Nov 2025 18:42:43 +0500 Subject: [PATCH 3/3] Rename configuration guide file MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Renamed understanding_claude_code_configuration_files.md to Claude_Code_Configuration_files.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- ...s.md => Claude_Code_Configuration_files.md | 0 claude_files.md | 35 +++++++++++++++++++ 2 files changed, 35 insertions(+) rename understanding_claude_code_configuration_files.md => Claude_Code_Configuration_files.md (100%) create mode 100644 claude_files.md diff --git a/understanding_claude_code_configuration_files.md b/Claude_Code_Configuration_files.md similarity index 100% rename from understanding_claude_code_configuration_files.md rename to Claude_Code_Configuration_files.md diff --git a/claude_files.md b/claude_files.md new file mode 100644 index 000000000..9ff730c1e --- /dev/null +++ b/claude_files.md @@ -0,0 +1,35 @@ +Bilkul! Main aapko yeh teeno CLAUDE.md files ke locations ko simple tareeqe se samjhata hoon: + +## **1. CLAUDE.md** (Laal box) +- **Kya hai:** Yeh aapki project ki main file hai +- **Kab banta hai:** Jab aap `/init` command chalate hain +- **Kisko milta hai:** Yeh file aap Git mein commit karte hain, to aapki **poori team** isko dekh sakti hai +- **Kahan hota hai:** Aapke project ke folder mein +- **Maslan:** Agar aap team ke sath kaam kar rahe hain, to sab ko yeh file milegi + +--- + +## **2. CLAUDE.local.md** (Gulaabi box) +- **Kya hai:** Yeh aapki **personal file** hai +- **Kisko milta hai:** Sirf **aapko** - kisi aur engineer ko nahi +- **Kya kaam hai:** Isme aap apni personal instructions aur customizations likhtay hain Claude ke liye +- **Kahan hota hai:** Aapke project ke folder mein +- **Maslan:** Agar aap chahte hain ke Claude kuch specific tareeqe se sirf aapke liye kaam kare + +--- + +## **3. ~/.claude/CLAUDE.md** (Jamni box) +- **Kya hai:** Yeh aapki **global settings** file hai +- **Kahan kaam karti hai:** Aapki machine ki **har project** mein +- **Kya kaam hai:** Isme wo instructions hoti hain jo aap chahte hain ke Claude **har project** mein follow kare +- **Kahan hota hai:** Aapke home directory mein `.claude` folder ke andar +- **Maslan:** Agar aap chahte hain ke Claude hamesha ek hi style mein code likhe, har project mein + +--- + +**Simple words mein:** +- **CLAUDE.md** = Team ke liye +- **CLAUDE.local.md** = Sirf aapke liye (us project mein) +- **~/.claude/CLAUDE.md** = Aapki sari projects ke liye + +---------