AI-powered academic literature review generator with human-in-the-loop workflow.
中文文档 | English
Auto-Scholar helps researchers quickly generate structured literature reviews. Enter a research topic, review candidate papers, and get a well-cited academic review in minutes.
Key Features:
- Smart Paper Search: Automatically generates search keywords and finds relevant papers from Semantic Scholar, arXiv, and PubMed
- AI Runtime Layer: Task-aware model routing automatically selects the optimal model for each task, with multi-layer fallback support
- Multi-Model Support: Supports OpenAI, DeepSeek, Ollama (local), and other LLM providers
- YAML Model Configuration: Flexibly configure model capability scores and cost tiers via
config/models.yaml - Real-Time Cost Tracking: Break down LLM usage costs by task type with USD real-time display
- Human-in-the-Loop: Review and approve papers before they're included in your review
- Anti-Hallucination QA: Validates all citations exist and are properly referenced
- Bilingual Support: Generate reviews in English or Chinese, with UI in both languages
- Real-time Progress: Watch the AI work with live streaming logs
- Python 3.11+
- uv
- bun
- OpenAI API key (or compatible endpoint like DeepSeek/Zhipu)
git clone https://github.com/CAICAIIs/Auto-Scholar.git
cd Auto-Scholar
# Backend
uv sync --extra dev
# Frontend
cd frontend && bun install && cd ..Create .env file in project root:
# Required
LLM_API_KEY=your-openai-api-key
# Optional - for compatible APIs (DeepSeek, Zhipu, etc.)
LLM_BASE_URL=https://api.openai.com/v1
LLM_MODEL=gpt-4o
# Optional - increases Semantic Scholar rate limits
SEMANTIC_SCHOLAR_API_KEY=your-key
# Optional - LLM concurrency for parallel operations
# Default: 2 (safe for free/low-tier API keys)
# Recommended: 2-4 for free tier, 4-8 for paid tier
# Higher values improve performance but may trigger rate limits
LLM_CONCURRENCY=2
# Optional - claim verification concurrency
# Default: 2 (safe for free/low-tier API keys)
# Recommended: 2-4 for free tier, 4-8 for paid tier
CLAIM_VERIFICATION_CONCURRENCY=2
# Optional - disable claim verification for time-sensitive scenarios
# Default: true (maintains 97.3% citation accuracy)
# Set to "false" to disable and reduce workflow time
CLAIM_VERIFICATION_ENABLED=true
# === AI Runtime Layer Configuration ===
# Optional - Path to YAML model configuration file
# If set, takes priority over MODEL_REGISTRY and auto-detection
# Default: "" (auto-detect from environment variables)
MODEL_CONFIG_PATH=config/models.yaml
# Optional - JSON string defining available models (alternative to YAML)
# Example: MODEL_REGISTRY=[{"id":"openai:gpt-4o","provider":"openai",...}]
# Default: "" (auto-detect from environment variables)
MODEL_REGISTRY=
# Optional - Default model ID for per-request routing
# Format: "provider:model_name" (e.g., "openai:gpt-4o", "deepseek:deepseek-chat")
# If empty, uses legacy LLM_BASE_URL + LLM_MODEL as default
LLM_MODEL_ID=
# Optional - DeepSeek API configuration (auto-detected if set)
DEEPSEEK_API_KEY=your-deepseek-key
DEEPSEEK_BASE_URL=https://api.deepseek.com/v1
# Optional - Ollama local models (auto-detected if set)
OLLAMA_BASE_URL=http://localhost:11434/v1
OLLAMA_MODELS=llama3.1:8b,mistral:7bTerminal 1 - Backend:
uv run uvicorn backend.main:app --reload --port 8000Terminal 2 - Frontend:
cd frontend && bun run devVisit http://localhost:3000 and start researching!
Type your research topic in the query input. Examples:
- "transformer architecture in natural language processing"
- "deep learning for medical image analysis"
- "reinforcement learning in robotics"
The system will:
- Generate 3-5 search keywords from your topic
- Search Semantic Scholar, arXiv, and PubMed for relevant papers
- Present candidate papers for your review
Select the papers you want included in your literature review.
After approval, the system will:
- Extract core contributions from each paper
- Generate a structured literature review with proper citations
- Validate all citations (auto-retry if issues found)
The agent console shows real-time progress with live streaming logs:
- Console Title: "控制台" (Chinese) or "Console" (English)
- Collapse/Expand: Click the collapse button to minimize the console to a sidebar
- Model Selection: Choose the LLM model for generation (default: gpt-4o)
- Displays cost tier (Low/Medium/High)
- Shows local model indicator [Local] for Ollama models
- Shows fallback chain indicators
- Cost Tracking: Real-time cost display in USD
- Aggregated from all LLM calls
- Persists in localStorage
- Shows per-task breakdown in status bar
- Language Controls: Unified interface for language switching:
- 界面: Toggle between [中|EN] to switch UI language
- 综述: Toggle between [中|EN] to choose review generation language
State Preservation:
- Switching UI locale preserves all session data (threadId, draft, messages, logs)
- Uses sessionStorage to maintain state across page reloads
- Model selection persists via localStorage
Auto-Regeneration:
- When changing the output language, the review automatically regenerates in the new language
- State is preserved during regeneration with duplicate-request prevention
- FastAPI - Async web framework
- LangGraph - Workflow orchestration with checkpointing
- AI Runtime Layer - Task-aware model routing with fallback chains
- OpenAI - LLM for keyword generation and review writing
- aiohttp - Async HTTP client for Semantic Scholar, arXiv, and PubMed
- Pydantic - Data validation and serialization
- tenacity - Retry logic for API calls
- Next.js 16 - React framework
- Zustand - State management
- next-intl - Internationalization
- Tailwind CSS - Styling
- react-markdown - Review rendering
- Radix UI - Accessible components
Auto-Scholar includes a task-aware AI runtime layer that optimizes model selection based on the specific task requirements.
The system automatically selects the most appropriate model for each workflow task:
| Task Type | Requirements | Model Selection Criteria |
|---|---|---|
| Planning | High reasoning, structured output | Prioritizes reasoning_score, requires supports_structured_output |
| Extraction | Structured output, cost-effective | Balances cost_tier and latency_score |
| Writing | Long context, creative | Requires supports_long_context, prioritizes creativity_score |
| QA | Structured output, low latency | Favors low cost_tier, high latency_score |
| Reflection | High reasoning, structured output | Similar to planning but with cost considerations |
Auto-Scholar supports multiple LLM providers and models through a flexible configuration system:
Supported Providers:
- OpenAI (GPT-4o, GPT-4o-mini)
- DeepSeek (DeepSeek Chat, DeepSeek Reasoner)
- Ollama (Local models)
- Custom providers (Any OpenAI-compatible endpoint)
Model Configuration:
Models are configured via YAML at config/models.yaml. Each model defines:
id: Canonical identifier (e.g.,openai:gpt-4o)display_name: Human-readable name for UIprovider: Provider typeapi_base/api_key_env: Connection details- Capability scores (1-10):
reasoning_score: Reasoning abilitycreativity_score: Creative writing abilitylatency_score: Speed (1=slow, 10=fast)
- Flags:
supports_json_mode: JSON response formatsupports_structured_output: Reliable JSON generationsupports_long_context: ≥32K context window
cost_tier: LOW, MEDIUM, or HIGH classificationmax_output_tokens: Maximum generation tokens
Example Configuration:
models:
- id: "openai:gpt-4o"
provider: "openai"
model_name: "gpt-4o"
display_name: "GPT-4o (OpenAI)"
api_base: "${LLM_BASE_URL:-https://api.openai.com/v1}"
api_key_env: "LLM_API_KEY"
supports_json_mode: true
supports_structured_output: true
max_output_tokens: 8192
is_local: false
max_context_tokens: 128000
supports_long_context: true
cost_tier: 3 # HIGH
reasoning_score: 8
creativity_score: 8
latency_score: 6Environment Variables:
# YAML model configuration (takes priority)
MODEL_CONFIG_PATH=config/models.yaml
# Fallback: JSON-based registry
MODEL_REGISTRY=[{"id":"custom:model","provider":"custom",...}]
# Default model for requests without model_id
LLM_MODEL_ID=openai:gpt-4o
# Provider-specific keys (auto-detected if set)
DEEPSEEK_API_KEY=your-deepseek-key
OLLAMA_BASE_URL=http://localhost:11434/v1
OLLAMA_MODELS=llama3.1:8b,mistral:7bFallback Chains:
When a model fails (rate limit, timeout, error), the runtime automatically tries the next best model in the fallback chain. Chains are generated based on task requirements and model capabilities.
Client Caching:
LLM clients are cached by (base_url, api_key) to avoid connection overhead across multiple requests.
The system tracks LLM usage costs in real-time:
Per-Task Breakdown: Costs are aggregated by task type (planning, extraction, writing, QA, reflection) for detailed analysis.
Real-Time Updates:
- SSE event:
{"event":"cost_update","node":"extraction","total_cost_usd":0.045} - Frontend displays cost in the agent console status bar
- Stored in Zustand state with localStorage persistence
Cost Estimation: Uses provider-specific pricing models to convert token counts to USD. OpenAI and DeepSeek pricing is built-in; custom providers use fallback rates.
Access:
from backend.evaluation.cost_tracker import get_total_cost_usd
total_cost = get_total_cost_usd() # Returns float USD| Metric | Baseline | Target | Status |
|---|---|---|---|
| 10-paper workflow time | 50-95s | 35-65s | Implemented |
| LLM call count (10 papers) | ~26-36 | ~20-28 | Achieved |
| Citation accuracy | 97.3% | ≥97.0% | Maintained |
| Metric | Value | Validation Method |
|---|---|---|
| SSE Network Reduction | 92% | Benchmark test with 263 tokens → 21 flushes |
| Citation Accuracy | 97.3% | Manual validation of 37 citations across 3 topics |
| Max QA Retries | 3 | Configurable in workflow.py (MAX_RETRY_COUNT) |
The following environment variables allow performance tuning for users with higher-tier API keys:
| Variable | Default | Recommended Values | Description |
|---|---|---|---|
LLM_CONCURRENCY |
2 | 2-4 (free tier), 4-8 (paid tier) | Concurrent LLM calls during extraction |
CLAIM_VERIFICATION_CONCURRENCY |
2 | 2-4 (free tier), 4-8 (paid tier) | Concurrent claim verification calls |
CLAIM_VERIFICATION_ENABLED |
true | true (recommended), false (time-sensitive) | Enable/disable claim verification (maintains 97.3% accuracy when true) |
Expected improvements with increased concurrency:
LLM_CONCURRENCY=4: ~50% reduction in extraction time (25-40s → 13-20s)LLM_CONCURRENCY=4+ Phase 1.2 + Phase 2.1: 10-paper workflow 50-95s → 35-65sCLAIM_VERIFICATION_ENABLED=false: ~8-20s reduction in critic_agent time (trade-off: lower citation accuracy)
Note:
- Increasing concurrency may trigger rate limits on lower-tier API plans. Start with default values and increase gradually.
- Disabling claim verification (
CLAIM_VERIFICATION_ENABLED=false) reduces workflow time but may decrease citation accuracy below 97.3%. Only use for time-sensitive scenarios where speed is more important than accuracy.
Workflow Benchmark (tests/benchmark_workflow.py):
- End-to-end performance measurement
- Concurrency comparison (baseline vs optimized)
- Per-node timing breakdown
Citation Validation (tests/validate_citations.py):
- Regression testing for citation accuracy
- Manual validation across multiple topics
- Batch validation support for continuous testing
SSE Debouncing (tests/benchmark_sse.py):
- Raw messages: 263 tokens
- After debounce: 21 network requests
- Compression ratio: 12.5x
- Mechanism: 200ms time window + semantic boundary detection (。!?.!?) + newline
# Backend compile check
find backend -name '*.py' -exec python -m py_compile {} +
# Frontend type check
cd frontend && bun x tsc --noEmit
# Run tests
uv run pytest tests/ -v- Documentation Index
- Architecture
- Development Guide
- API Reference
- Contributing
- Code of Conduct
- Security Policy
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE for details.