TeaRAG is a token‑efficient, agentic Retrieval‑Augmented Generation framework that solves complex queries with fewer tokens and faster reasoning. By compressing both retrieval content and reasoning steps, TeaRAG delivers +4% / +2% EM gains on Llama3‑8B‑Instruct and Qwen2.5‑14B‑Instruct while cutting token usage by ~60%. Built on FlashRAG, it integrates graph‑based knowledge retrieval and a novel Iterative Process‑aware DPO to achieve better results and higher efficiency in agentic RAG.
Our models and the Wiki corpus–based knowledge graph will be open-sourced after company review.
# Clone TeaRAG (built on top of FlashRAG)
git clone https://github.com/Applied-Machine-Learning-Lab/TeaRAG.git
cd TeaRAG
pip install -e .
# FlashRAG dependencies
pip install vllm>=0.4.1
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
# Training dependencies
pip3 install flash-attn --no-build-isolation
pip install accelerate==0.34.2
pip install trl==0.17.0
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
--index-url https://download.pytorch.org/whl/cu126
# Redis setup
pip install redis
sudo apt install redis-server -y
redis-server --dir path/redis \
--appendonly yes \
--appendfilename appendonly.aof \
--daemonize yes \
--port 6379
# Test Redis (note: dataset loading may take some time)
redis-cli GET t:77899428TeaRAG's file organization should be constructed as follows, to minimize modifications to the code.
├── Root Path
├── TeaRAG # Source code for TeaRAG
├── model # Saved pre-trained models
├── data # Datasets and corpora
├── train_log # Intermediate outputs and trained models
├── log # Inference logs (config, intermediate results, final results)
├── redis # Redis-based knowledge graph storage
├── index # Pre-built retrieval indexes
This repository builds on FlashRAG and uses a similar structure except for the alg directory, which contains our customized scripts (This part of the code is still under company review and will be released once it passes the review):
├── alg
├── config # Fixed inference hyper-parameter configs
├── data # Dataset preparation scripts
├── download # Model + dataset download scripts
├── ds_config # DeepSpeed config
├── index # Index construction scripts
├── infer_script # Inference scripts
├── method # Method entry points
├── train # Training code
├── train_script # Training scripts
├── prepare.sh # Full preparation pipeline
├── run_pipeline_medium.sh # Full pipeline for Llama3-8B-Instruct
├── run_pipeline_qwen.sh # Full pipeline for Qwen2.5-14B-Instruct
Prepare (download data, build KG + index, prepare training set)
cd alg
bash prepare.shTrain & Evaluate (Llama-3-8B-Instruct)
cd alg
bash run_pipeline_medium.shTrain & Evaluate (Qwen2.5-14B-Instruct)
cd alg
bash run_pipeline_qwen.shRun inference for all datasets/models/baselines
cd alg/infer_script
bash run_all.shFor questions, suggestions, or bug reports, please reach out:
📧 zclfe00@gmail.com
We welcome contributions and feedback to make TeaRAG even better!
- FlashRAG – TeaRAG is built upon the overall framework of FlashRAG.
- Xiaohongshu – This research was supported by computational resources from Xiaohongshu’s Search group, which greatly facilitated this research.
If TeaRAG is helpful in your research or applications, please consider citing our work:
@article{zhang2025tearag,
title={TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework},
author={Zhang, Chao and Wang, Yuhao and Xu, Derong and Zhang, Haoxin and Lyu, Yuanjie and Chen, Yuhao and Liu, Shuochen and Xu, Tong and Zhao, Xiangyu and Gao, Yan and others},
journal={arXiv preprint arXiv:2511.05385},
year={2025}
}

