A modular, high-performance Retrieval-Augmented Generation framework with multi-path retrieval, graph extraction, and fusion ranking
📘 中文文档 • ⭐ Key Features • 🏗️ Architecture • 🚀 Quick Start
RAG-ARC is a modular Retrieval-Augmented Generation (RAG) framework designed to build efficient, scalable architectures that support multi-path retrieval, graph structure extraction, and fusion ranking. The system addresses key challenges in traditional RAG systems when processing unstructured documents (PDF, PPT, Excel, etc.) such as information loss, low retrieval accuracy, and difficulty in recognizing multimodal content.
🧩 Full RAG Pipeline Support: Covers the complete pipeline—from document parsing, text chunking, and embedding generation to multi-path retrieval, graph extraction, reranking, and knowledge graph management.
📚 Knowledge-Intensive Tasks: Ideal for question answering, reasoning, and content generation tasks that rely on large-scale structured and unstructured knowledge, ensuring high recall and semantic consistency.
🌐 Cross-Domain Applications: Supports both Standard RAG and GraphRAG modes, making it adaptable for academic research, personal knowledge bases, and enterprise-level knowledge management systems.
RAG-ARC introduces several key innovations that together build a sophisticated integrated framework:
- Support for docx, pdf, ppt, excel, html and other file types
- Flexible parsing strategies with OCR and layout-aware PDF parsing (via dots_ocr module)
- Native and VLM-based OCR capabilities
- Multiple chunking strategies (token-based, semantic, recursive, markdown headers)
- Integration with HuggingFace embedding models for vector representation
- Configurable chunk size and overlap parameters
- Combined BM25 (sparse retrieval), Dense retrieval (Faiss-GPU), and Tantivy full-text search
- Reciprocal Rank Fusion (RRF) for result merging
- Configurable weights and fusion methods
- Extracts entities and relations from facts to build structured knowledge graphs
- Seamlessly integrates with Neo4j graph database
- Enables knowledge-graph-driven reasoning and QA
- Lightweight, incrementally updatable graph construction suitable for enterprise deployment
- Incorporates Subgraph PPR (Personalized PageRank): Compared to HippoRAG2's full-graph PPR, subgraph PPR achieves higher reasoning precision and efficiency
- Qwen3 model for precise result re-ranking
- LLM-based and listwise re-ranking strategies
- Score normalization and metadata enrichment
- Factory pattern for LLM, Embedding, Retriever component creation
- Layered architecture: config, core, encapsulation, application, api
- Singleton pattern for tokenizer management and database connections
- Shared mechanism for retriever and embedding model instance reuse to improve system performance
Built upon the HippoRAG2 evolution, RAG-ARC delivers significant improvements in both efficiency and recall performance:
- 🚀 22.9% Token Cost Reduction Through optimized prompt strategies, it reduces token consumption without sacrificing accuracy.
- 🎯 5.3% Recall Rate Increase Pruning-based optimizations yield more comprehensive and relevant retrieval.
- 🔁 Incremental Knowledge Graph Updates Supports updating graph data without full reconstruction—reducing computational and maintenance overhead.
RAG-ARC/
├── 📁 api/ # API layer (FastAPI routes/MCP integration)
│ ├── routers/ # API route definitions
│ ├── config_examples/ # Configuration examples
│ └── mcp/ # MCP server implementation
│
├── 📁 application/ # Business logic layer
│ ├── rag_inference/ # RAG inference module
│ ├── knowledge/ # Knowledge management
│ └── account/ # User account management
│
├── 📁 core/ # Core capabilities
│ ├── file_management/ # File parsing and chunking
│ ├── retrieval/ # Retrieval strategies
│ ├── rerank/ # Re-ranking algorithms
│ ├── query_rewrite/ # Query rewriting
│ └── prompts/ # Prompt templates
│
├── 📁 config/ # Configuration system
│ ├── application/ # Application configs
│ ├── core/ # Core module configs
│ └── encapsulation/ # Encapsulation configs
│
├── 📁 encapsulation/ # Encapsulation layer
│ ├── database/ # Database interfaces
│ ├── llm/ # LLM interfaces
│ └── data_model/ # Data models and schemas
│
├── 📁 framework/ # Framework core
│ ├── module.py # Base module class
│ ├── register.py # Component registry
│ └── config.py # Configuration system
│
├── 📁 test/ # Test suite
│
├── main.py # 🎯 Main application entry point
├── app_registration.py # Component initialization
├── pyproject.toml # Project dependencies
└── README.md # Project documentation
Three-step deployment:
# 1. Clone the repository
git clone https://github.com/DataArcTech/RAG-ARC.git
cd RAG-ARC
# 2. Build Docker images (one-time setup)
./build.sh
# 3. Start all services
./start.shThe deployment includes:
- ✅ PostgreSQL 16: Metadata storage
- ✅ Redis 7: Caching layer
- ✅ Neo4j: Knowledge graph database
- ✅ RAG-ARC App: FastAPI application with GPU support
What the scripts do:
build.sh:
- Checks Docker environment
- Creates .env configuration
- Selects CPU/GPU mode (auto-detect NVIDIA GPU)
- Pulls base images (PostgreSQL, Redis, Neo4j)
- Builds RAG-ARC application image
start.sh:
- Creates Docker network
- Starts all 4 containers
- Waits for services to be ready
- Verifies deployment
stop.sh:
- Stops all running containers (keeps data)
cleanup.sh:
- Removes all containers and Docker volumes
- Removes Docker network
- Keeps local data directories (
./data,./local,./models) - Use when you want to clean Docker resources but keep your data
clean-docker-data.sh:
- Removes all containers and Docker volumes
- Also removes local data directories (
./data/postgresql,./data/neo4j,./data/redis,./data/graph_index_neo4j) - Use when you want a complete cleanup (
⚠️ This will delete all data!)
Access the service:
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
📖 See Docker Deployment Guide (English) or Docker部署指南(中文) for detailed instructions and troubleshooting
# 1. Clone the repository
git clone https://github.com/DataArcTech/RAG-ARC.git
cd RAG-ARC
# 2. Install uv if not already installed
# Recommended: Using China mirror (faster in China)
curl -LsSf https://astral.ac.cn/uv/install.sh | sh
# Alternative: Using official installer
# curl -LsSf https://astral.sh/uv/install.sh | sh
# Or add to PATH: export PATH="$HOME/.local/bin:$PATH"
# 3. Install dependencies (uv will automatically create a virtual environment)
# Tsinghua mirror is configured in pyproject.toml
uv sync
# 4. Copy and configure environment variables
cp .env.example .env
# Edit .env to configure your settingsRAG-ARC uses a modular configuration system. Key configuration files are located in config/json_configs/, where you can control which GPU each model uses, which models are used in business processes, and other different parameters:
rag_inference.json: RAG retrieval configurationknowledge.json: Knowledge management configurationaccount.json: User account configuration
# Start the FastAPI server (uv run automatically manages the virtual environment)
uv run uvicorn main:app --host 0.0.0.0 --port 8000 --reload# Upload a document
curl -X POST "http://localhost:8000/knowledge" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-F "file=@/path/to/your/document.pdf"
# Chat with the RAG system
curl -X POST "http://localhost:8000/rag_inference/chat" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "What is RAG-ARC?"}'
# Get Token (Login)
curl -X POST "http://localhost:8000/auth/token" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "username=YOUR_USERNAME&password=YOUR_PASSWORD"
# Register a new user
curl -X POST "http://localhost:8000/auth/register" \
-H "Content-Type: application/json" \
-d '{"name": "New User", "user_name": "YOUR_USERNAME", "password": "YOUR_PASSWORD"}'
# Create a new chat session
curl -X POST "http://localhost:8000/session" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"
# List messages in a session
curl -X GET "http://localhost:8000/session/YOUR_SESSION_ID/messages" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"import asyncio
import websockets
async def chat():
uri = 'ws://localhost:8000/rag_inference/stream_chat/YOUR_SESSION_ID'
async with websockets.connect(uri, additional_headers=[('Cookie', 'auth_token=YOUR_ACCESS_TOKEN')]) as ws:
await ws.send('Hello, RAG-ARC!')
print(await ws.recv())
asyncio.run(chat())- Backend: Python 3.11+
- Framework: FastAPI
- Vector Database: FAISS (GPU/CPU)
- Graph Database: Neo4j
- Full-text Search: Tantivy
- ML Frameworks: HuggingFace Transformers, PyTorch
- Data Validation: Pydantic v2
- Serialization: Dill
- LLM Support: Qwen3, OpenAI API, HuggingFace models
RAG-ARC supports configurable multi-path retrieval with the following components:
- Dense Retrieval: Uses FAISS for vector similarity search
- Sparse Retrieval: BM25 implementation via Tantivy
- Graph Retrieval: Neo4j-based knowledge graph retrieval with Pruned HippoRAG
The fusion method can be configured to use:
- Reciprocal Rank Fusion (RRF): Default method for combining results
- Weighted Sum: Custom weights for each retrieval path
- Rank Fusion: Rank-based combination approach
RAG-ARC implements an enhanced GraphRAG approach based on HippoRAG2 with key improvements:
- Subgraph PPR: Instead of computing Personalized PageRank on the entire graph, RAG-ARC computes it on relevant subgraphs for better efficiency and accuracy
- Query-Aware Pruning: Dynamically adjusts the number of neighbors retained during graph expansion based on entity relevance to the query
- Incremental Updates: Supports updating the knowledge graph without full reconstruction
The document processing pipeline consists of several stages:
- File Storage: Documents are stored in a configurable storage backend (local filesystem or cloud storage)
- Parsing: Multiple parsers support different document types:
- Native parsers for standard formats (PDF, DOCX, PPTX, etc.)
- OCR parsers for scanned documents (using DOTS-OCR or VLM-based approaches)
- Chunking: Text is split into chunks using configurable strategies:
- Token-based chunking
- Semantic chunking
- Recursive chunking
- Markdown header-based chunking
- Indexing: Chunks are indexed in multiple systems:
- FAISS for dense retrieval
- Tantivy for sparse retrieval
- Neo4j for graph-based retrieval
RAG-ARC provides a comprehensive REST API with the following key endpoints:
POST /knowledge: Upload documentsGET /knowledge/list_files: List user documentsGET /knowledge/{doc_id}/download: Download documentsDELETE /knowledge/{doc_id}: Delete documents
POST /rag_inference/chat: Chat with the RAG systemWebSocket /rag_inference/stream_chat/{session_id}: WebSocket-based streaming chat
POST /auth/register: User registrationPOST /auth/token: User authentication (login)
POST /session: Create chat sessionsGET /session: List user sessionsGET /session/{session_id}: Get session detailsDELETE /session/{session_id}: Delete sessions
RAG-ARC implements JWT-based authentication with the following features:
- User registration and login
- Role-based access control
- Document-level permissions (VIEW/EDIT)
- Secure password hashing with bcrypt
- Token refresh mechanism
RAG-ARC includes built-in monitoring capabilities:
- Logging with configurable levels
- Performance metrics collection
- Health check endpoints
- Indexing status monitoring
We welcome contributions from the community! Here's how you can help:
- 🍴 Fork the repository
- 🌿 Create a feature branch (
git checkout -b feature/AmazingFeature) - 💾 Commit your changes (
git commit -m 'Add some AmazingFeature') - 📤 Push to the branch (
git push origin feature/AmazingFeature) - 🔄 Open a Pull Request
- New Parsing Strategies: Implement custom document parsing logic
- Retrieval Algorithms: Add new retrieval methods and fusion techniques
- Reranking Models: Integrate additional reranking models
- Chunking Methods: Implement novel text chunking approaches
For questions, issues, or feedback, please open an issue on GitHub or contact the maintainers.
This project is licensed under the MIT License - see the LICENSE file for details.



