Turn your codebase into a searchable knowledge graph powered by embeddings and LLMs
CodeGraph indexes your source code to a graph database, creates semantic embeddings, and exposes a Model Context Protocol (MCP) server that AI tools (Claude Desktop, LM Studio, etc.) can query for project-aware context.
β¨ What you get:
- π Semantic code search across your entire codebase
- π§ LLM-powered code intelligence and analysis
- π Automatic dependency graphs and code relationships
- β‘ Fast vector search with FAISS or cloud SurrealDB HNSW (2-5ms query latency)
- π MCP server for AI tool integration (stdio and streamable HTTP)
- βοΈ Easy-to-use CLI interface
- βοΈ NEW: Jina AI cloud embeddings with modifiable models and dimensions and reranking
- ποΈ NEW: SurrealDB HNSW backend for cloud-native and local vector search
- π¦ NEW: Node.js NAPI bindings for zero-overhead TypeScript integration
- π€ NEW: Agentic code-agent tools with tier-aware multi-step reasoning
- π¬ EXPERIMENTAL: AutoAgents framework integration for improved agent orchestration
FAISS+RocksDB support in MCP server is deprecated in favor of SurrealDB-based architecture.
- β MCP server no longer uses FAISS vector search or RocksDB graph storage
- β CLI and SDK continue to support FAISS/RocksDB for local operations
- β NAPI bindings still provide TypeScript access to all features
- π MCP code-agent tools now require SurrealDB for graph analysis
The new agentic MCP tools (agentic_code_search, agentic_dependency_analysis, etc.) require SurrealDB:
Option 1: Free Cloud Instance (Recommended)
- Sign up at Surreal Cloud
- Get 1GB FREE instance - perfect for testing and small projects
- Configure connection details in environment variables
Option 2: Local Installation
# Install SurrealDB
curl -sSf https://install.surrealdb.com | sh
# Run locally
surreal start --bind 127.0.0.1:3004 --user root --pass root memoryFree Cloud Resources:
- π SurrealDB Cloud: 1GB free instance at surrealdb.com/cloud
- π Jina AI: 10 million free API tokens at jina.ai for embeddings and reranking
- Native graph capabilities: SurrealDB provides built-in graph database features
- Unified storage: Single database for both vectors and graph relationships and extendable to relational and document use-cases!
- Cloud-native: Better support for distributed deployments
- Reduced complexity: Eliminates custom RocksDB integration layer
See CHANGELOG.md for detailed migration guide.
CodeGraph now supports the AutoAgents framework for agentic orchestration as an experimental feature.
- Modern Rust-based agent framework with ReAct (Reasoning + Acting) pattern
- Replaces ~1,200 lines of custom orchestration code
- Maintains compatibility with all 7 existing agentic MCP tools
- Same tier-aware prompting system (Small/Medium/Large/Massive)
Build with experimental feature:
# Using Makefile
make build-mcp-autoagents
# Or directly with cargo
cargo build --release -p codegraph-mcp --features "ai-enhanced,autoagents-experimental,faiss,ollama"Without AutoAgents (default):
cargo build --release -p codegraph-mcp --features "ai-enhanced,faiss,ollama"- β Core implementation complete
- β³ Testing and validation in progress
- π Feedback welcome via GitHub issues
- π Legacy orchestrator remains as stable fallback
The experimental feature is opt-in via build flag and does not affect existing functionality when disabled.
- Choose Your Setup
- Installation
- Configuration
- Usage
- Feature Flags Reference
- Performance
- Troubleshooting
- Advanced Features
Pick the setup that matches your needs:
Best for: Privacy-conscious users, offline work, no API costs
Providers:
- Embeddings: ONNX or Ollama
- LLM: Ollama (Qwen2.5-Coder, CodeLlama, etc.)
Pros: β Free, β Private, β No internet required after setup Cons: β Slower, β Requires local GPU/CPU resources
β Jump to Local Setup Instructions
Best for: Mac users (Apple Silicon), best local performance
Providers:
- Embeddings: LM Studio (Jina embeddings)
- LLM: LM Studio (DeepSeek Coder, etc.)
Pros: β 120 embeddings/sec, β MLX + Flash Attention 2, β Free Cons: β Mac only, β Requires LM Studio app
β Jump to LM Studio Setup Instructions
Best for: Production use, best quality, don't want to manage local models
Providers:
- Embeddings: Jina (You get 10 million tokens for free when you just create an account!)
- LLM: Anthropic Claude or OpenAI GPT-5-*
- Backend: SurrealDB graph database (You get a free cloud instance up-to 1gb! Or run it completely locally!)
Pros: β Best quality, β Fast, β 1M context (sonnet[1m]) Cons: β API costs, β Requires internet, β Data sent to cloud
β Jump to Cloud Setup Instructions
Best for: Balancing cost and quality
Example combinations:
- Local embeddings (ONNX) + Cloud LLM (OpenAI, Claude, x.ai)
- LMStudio embeddings + Cloud LLM (OpenAI, Claude, x.ai)
- Jina AI embeddings + Local LLM (Ollama, LMStudio)
β Jump to Hybrid Setup Instructions
# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# 2. Install FAISS (vector search library)
# macOS:
brew install faiss
# Ubuntu/Debian:
sudo apt-get install libfaiss-dev
# Arch Linux:
sudo pacman -S faissStep 1: Install Ollama
# macOS/Linux:
curl -fsSL https://ollama.com/install.sh | sh
# Or download from: https://ollama.com/download
brew install onnx-runtimeStep 2: Pull models
# Pull embedding model
hf (cli) download qdrant/all-minillm-onnx
# Pull LLM for code intelligence (optional)
ollama pull qwen2.5-coder:14bStep 3: Build CodeGraph
cd codegraph-rust
# Build with ONNX embeddings and Ollama support
cargo build --release --features "onnx,ollama,faiss"Step 4: Configure
Create ~/.codegraph/config.toml:
[embedding]
provider = "onnx" # or "ollama" if you prefer
model = "qdrant/all-minillm-onnx"
dimension = 384
[llm]
enabled = true
provider = "ollama"
model = "qwen2.5-coder:14b"
ollama_url = "http://localhost:11434"Step 5: Index and run
# Index your project
./target/release/codegraph index /path/to/your/project
# Start MCP server
./target/release/codegraph start stdioβ Done! Your local setup is ready.
Step 1: Install LM Studio
- Download from lmstudio.ai
- Install and launch the app
Step 2: Download models in LM Studio
- Embedding model:
jinaai/jina-embeddings-v4 - LLM model (optional):
lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF
Step 3: Start LM Studio server
- In LM Studio, go to "Local Server" tab
- Click "Start Server" (runs on
http://localhost:1234)
Step 4: Build CodeGraph
cd codegraph-rust
# Build with OpenAI-compatible support (for LM Studio)
cargo build --release --features "openai-compatible,faiss"Step 5: Configure
Create ~/.codegraph/config.toml:
[embedding]
provider = "lmstudio"
model = "jinaai/jina-embeddings-v4"
lmstudio_url = "http://localhost:1234"
dimension = 2048
[llm]
enabled = true
provider = "lmstudio"
model = "lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF"
lmstudio_url = "http://localhost:1234"Step 6: Index and run
# Index your project
./target/release/codegraph index /path/to/your/project
# Start MCP server
./target/release/codegraph start stdioβ Done! LM Studio setup complete.
Step 1: Get API keys
- Anthropic: console.anthropic.com(Claude 4.5 models 1M/200k ctx)
- OpenAI: platform.openai.com(GPT-5 models 400k/200k ctx)
- xAI: x.ai (Grok-4-fast with 2M ctx, $0.50-$1.50/M tokens)
- Jina AI: jina.ai (for SOTA embeddings & reranking)
- SurrealDB [https://www.surrealdb.com] (for graph dabase backend local or cloud based setup)
Step 2: Build CodeGraph with cloud features
cd codegraph-rust
# Build with all cloud providers
cargo build --release --features "anthropic,openai-llm,openai,faiss"
# Or with Jina AI cloud embeddings (Matryoska dimensions + reranking)
cargo build --release --features "cloud-jina,anthropic,faiss"
# Or with SurrealDB HNSW cloud/local vector backend
cargo build --release --features "cloud-surrealdb,openai,faiss"Step 3: Run setup wizard (easiest)
./target/release/codegraph-setupThe wizard will guide you through configuration.
Or manually configure ~/.codegraph/config.toml:
For Anthropic Claude:
[embedding]
provider = "jina" # or openai
model = "jina-embeddings-v4"
openai_api_key = "sk-..." # or set OPENAI_API_KEY env var
dimension = 2048
[llm]
enabled = true
provider = "anthropic"
model = "claude-haiku"
anthropic_api_key = "sk-ant-..." # or set ANTHROPIC_API_KEY env var
context_window = 200000For OpenAI (with reasoning models):
[embedding]
provider = "jina" # or openai
model = "jina-embeddings-v4"
openai_api_key = "sk-..."
dimension = 2048
[llm]
enabled = true
provider = "openai"
model = "gpt-5-codex-mini"
openai_api_key = "sk-..."
max_completion_token = 128000
reasoning_effort = "medium" # reasoning models: "minimal", "medium", "high"For Jina AI (cloud embeddings with reranking):
[embedding]
provider = "jina"
model = "jina-embeddings-v4"
jina_api_key = "jina_..." # or set JINA_API_KEY env var
dimension = 2048 # or matryoshka 1024,512,256 adjust the schemas/*.surql file HNSW vector index to match your embedding model dimensions
jina_enable_reranking = true # Optional two-stage retrieval
jina_reranking_model = "jina-reranker-v3"
[llm]
enabled = true
provider = "anthropic"
model = "claude-haiku"
anthropic_api_key = "sk-ant-..."For xAI Grok (2M context window, $0.50-$1.50/M tokens):
[embedding]
provider = "openai" # or "jina"
model = "text-embedding-3-small"
openai_api_key = "sk-..."
dimension = 2048
[llm]
enabled = true
provider = "xai"
model = "grok-4-fast" # or "grok-4-turbo"
xai_api_key = "xai-..." # or set XAI_API_KEY env var
xai_base_url = "https://api.x.ai/v1" # default, can be omitted
reasoning_effort = "medium" # Options: "minimal", "medium", "high"
context_window = 2000000 # 2M tokens!For SurrealDB HNSW (graph database backend with advanced features):
[embedding]
provider = "jina" # or "openai"
model = "jina-embeddings-v4"
openai_api_key = "sk-..."
dimension = 2048
[vector_store]
backend = "surrealdb" # Instead of "faiss"
surrealdb_url = "ws://localhost:8000" # or cloud instance
surrealdb_namespace = "codegraph"
surrealdb_database = "production"
[llm]
enabled = true
provider = "anthropic"
model = "claude-haiku"Step 4: Index and run
# Index your project
./target/release/codegraph index /path/to/your/project
# Start MCP server
./target/release/codegraph start stdioβ Done! Cloud setup complete.
Mix local and cloud providers to balance cost and quality:
Example: Local embeddings + Cloud LLM
[embedding]
provider = "onnx" # Free, local
model = "sentence-transformers/all-MiniLM-L6-v2"
dimension = 384
[llm]
enabled = true
provider = "anthropic" # Best quality for analysis
model = "sonnet[1m]"
anthropic_api_key = "sk-ant-..."Build with required features:
cargo build --release --features "onnx,anthropic,faiss"Use the interactive wizard:
cargo build --release --bin codegraph-setup --features all-cloud-providers
./target/release/codegraph-setupConfiguration directory: ~/.codegraph/
All configuration files are stored in ~/.codegraph/ in TOML format.
Configuration is loaded from (in order):
~/.codegraph/default.toml(base configuration)~/.codegraph/{environment}.toml(e.g., development.toml, production.toml)~/.codegraph/local.toml(local overrides, machine-specific)./config/(fallback for backward compatibility)- Environment variables (CODEGRAPH__* prefix)
See Configuration Guide for complete documentation.
Full configuration example:
[embedding]
provider = "lmstudio" # or "onnx", "ollama", "openai"
model = "jinaai/jina-embeddings-v4"
dimension = 2048
batch_size = 64
[llm]
enabled = true
provider = "anthropic" # or "openai", "ollama", "lmstudio" or "xai"
model = "haiku"
anthropic_api_key = "sk-ant-..."
context_window = 200000
temperature = 0.1
max_completion_token = 4096
[performance]
num_threads = 0 # 0 = auto-detect
cache_size_mb = 512
max_concurrent_requests = 4
[logging]
level = "warn" # trace, debug, info, warn, error
format = "pretty" # pretty, json, compactSee .codegraph.toml.example for all options.
# Index a project
codegraph index -r /path/to/project
# Start MCP server (for Claude Desktop, LM Studio, etc.)
codegraph start stdio
# List available MCP tools
codegraph tools listNote: HTTP transport is not yet implemented with the official rmcp SDK. Use STDIO transport for all MCP integrations.
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on Mac):
{
"mcpServers": {
"codegraph": {
"command": "/path/to/codegraph",
"args": ["start", "stdio"],
"env": {
"RUST_LOG": "warn"
}
}
}
}- Start CodeGraph MCP server:
codegraph start stdio - In LM Studio, enable MCP support in settings
- CodeGraph tools will appear in LM Studio's tool palette
CodeGraph provides powerful code intelligence tools via the Model Context Protocol (MCP).
After indexing your codebase, AI agents can use these tools:
- π
enhanced_search- Semantic code search with AI insights (2-5s) - π¬
pattern_detection- Analyze coding patterns and conventions (1-3s) - β‘
vector_search- Fast similarity-based code search (0.5s) - πΊοΈ
graph_neighbors- Find code dependencies for an element (0.3s) - π
graph_traverse- Follow dependency chains through code (0.5-2s) - π¬
codebase_qa- Ask questions with RAG [feature-gated] (5-30s) - π
code_documentation- Generate AI docs [feature-gated] (10-45s)
# 1. Index your codebase
codegraph index /path/to/your/project
# 2. Start MCP server
codegraph start stdio
# 3. Use tools from your AI agent
enhanced_search("how does authentication work?")
graph_neighbors("node-uuid-from-search-results")When building, include features for the providers you want to use:
| Feature | Providers Enabled | Use Case |
|---|---|---|
onnx |
ONNX embeddings | Local CPU/GPU embeddings |
ollama |
Ollama embeddings + LLM | Local models via Ollama |
openai |
OpenAI embeddings | Cloud embeddings (text-embedding-3-large/small) |
openai-llm |
OpenAI | Cloud LLM (gpt-5, gpt-5-codex, gpt-5-codex-mini) |
anthropic |
Anthropic Claude | Cloud LLM (Claude 4.5, Haiku 4.5) |
openai-compatible |
LMStudio, custom providers | OpenAI Responses API compatible |
cloud-jina |
Jina AI embeddings + reranking | Cloud embeddings & Free usage (SOTA and variable dims) |
cloud-surrealdb |
SurrealDB HNSW | Local & Free Cloud-native graph database backend (up-to 1gb) |
cloud |
Jina AI + SurrealDB | All cloud vector & graph features |
faiss |
FAISS vector search | Local vector search graph backend (rocksdb persisted) |
all-cloud-providers |
All cloud LLM providers | Shortcut for Jina + Surreal + Anthropic + OpenAI |
# Local only (ONNX + Ollama)
cargo build --release --features "onnx,ollama,faiss"
# LM Studio
cargo build --release --features "openai-compatible,faiss"
# Cloud only (Anthropic + OpenAI)
cargo build --release --features "anthropic,openai-llm,openai,faiss"
# Jina AI cloud embeddings + local FAISS
cargo build --release --features "cloud-jina,faiss"
# SurrealDB cloud vector backend
cargo build --release --features "cloud-surrealdb,openai,faiss"
# Full cloud (Jina + SurrealDB + Anthropic)
cargo build --release --features "cloud,anthropic,faiss"
# Everything (local + cloud)
cargo build --release --features "all-cloud-providers,onnx,ollama,cloud,faiss"| Operation | Performance | Notes |
|---|---|---|
| Embedding generation | 120 embeddings/sec | LM Studio with MLX |
| Vector search (local) | 2-5ms latency | FAISS with index caching |
| Vector search (cloud) | 2-5ms latency | SurrealDB HNSW |
| Jina AI embeddings | 50-150ms per query | Cloud API call overhead |
| Jina reranking | 80-200ms for top-K | Two-stage retrieval |
| Ollama embeddings | ~60 embeddings/sec | About half LM Studio speed |
| Optimization | Speedup | Memory Cost |
|---|---|---|
| FAISS index cache | 10-50Γ | 300-600 MB |
| Embedding cache | 10-100Γ | ~90 MB |
| Query cache | 100Γ | ~10 MB |
| Parallel search | 2-3Γ | Minimal |
"Could not find library faiss"
# Install FAISS first
brew install faiss # macOS
sudo apt-get install libfaiss-dev # Ubuntu"Feature X is not enabled"
- Make sure you included the feature flag when building
- Example:
cargo build --release --features "anthropic,faiss"
"API key not found"
- Set environment variable:
export ANTHROPIC_API_KEY="sk-ant-..." - Or add to config file:
anthropic_api_key = "sk-ant-..."
"Model not found"
- For Ollama: Run
ollama pull <model-name>first - For LM Studio: Download the model in LM Studio app
- For cloud: Check your model name matches available models
"Connection refused"
- LM Studio: Make sure the local server is running
- Ollama: Check Ollama is running with
ollama list - Cloud: Check your internet connection
- Check docs/CLOUD_PROVIDERS.md for detailed provider setup
- See LMSTUDIO_SETUP.md for LM Studio specifics
- Open an issue on GitHub with your error message
CodeGraph provides native Node.js bindings through NAPI-RS for seamless TypeScript/JavaScript integration:
Key Features:
- π Native Performance: Direct Rust-to-Node.js bindings with zero serialization overhead
- π Auto-Generated Types: TypeScript definitions generated directly from Rust code
- β‘ Async Runtime: Full tokio async support integrated with Node.js event loop
- π Hot-Reload Config: Update configuration without restarting your Node.js process
- π Dual-Mode Search: Automatic routing between local FAISS and cloud SurrealDB
Option 1: Direct Install (Recommended)
# Build the addon
cd crates/codegraph-napi
npm install
npm run build
# Install in your project
cd /path/to/your-project
npm install /path/to/codegraph-rust/crates/codegraph-napiOption 2: Pack and Install
# Build and pack
cd crates/codegraph-napi
npm install
npm run build
npm pack # Creates codegraph-napi-1.0.0.tgz
# Install in your project
cd /path/to/your-project
npm install /path/to/codegraph-rust/crates/codegraph-napi/codegraph-napi-1.0.0.tgzSemantic Search:
import { semanticSearch } from 'codegraph-napi';
const results = await semanticSearch('find authentication code', {
limit: 10,
useCloud: true, // Use cloud search with automatic fallback
reranking: true // Enable Jina reranking (if configured)
});
console.log(`Found ${results.totalCount} results in ${results.searchTimeMs}ms`);
console.log(`Search mode: ${results.modeUsed}`); // "local" or "cloud"Configuration Management:
import { getCloudConfig, reloadConfig } from 'codegraph-napi';
// Check cloud feature availability
const config = await getCloudConfig();
console.log('Jina AI enabled:', config.jina_enabled);
console.log('SurrealDB enabled:', config.surrealdb_enabled);
// Hot-reload configuration without restart
await reloadConfig();Embedding Operations:
import { getEmbeddingStats, countTokens } from 'codegraph-napi';
// Get embedding provider stats
const stats = await getEmbeddingStats();
console.log(`Provider: ${stats.provider}, Dimension: ${stats.dimension}`);
// Count tokens for cost estimation (Jina AI)
const tokens = await countTokens("query text");
console.log(`Token count: ${tokens}`);Graph Navigation:
import { getNeighbors, getGraphStats } from 'codegraph-napi';
// Get connected nodes
const neighbors = await getNeighbors(nodeId);
// Get graph statistics
const stats = await getGraphStats();
console.log(`Nodes: ${stats.node_count}, Edges: ${stats.edge_count}`);Feature flags for selective compilation:
# Local-only (FAISS, no cloud)
npm run build # Uses default = ["local"]
# Cloud-only (no FAISS)
npm run build -- --features cloud
# Full build (local + cloud)
npm run build -- --features fullSee NAPI README for complete documentation.
We welcome contributions!
# Format code
cargo fmt --all
# Run linter
cargo clippy --workspace --all-targets
# Run tests
cargo test --workspaceOpen an issue to discuss large changes before starting.
Dual-licensed under MIT and Apache 2.0. See LICENSE-MIT and LICENSE-APACHE for details.
- NAPI Bindings Guide - Complete TypeScript integration documentation
- Cloud Providers Guide - Detailed cloud provider setup
- Configuration Reference - All configuration options
- Changelog - Version history and release notes
- Legacy Docs - Historical experiments and architecture notes