CodeXRay

Code search & knowledge engine for the AI era. Semantic + full-text hybrid search, real-time indexing, call graph + code vectors + commit vectors + knowledge vectors — unified into one native MCP server.

Built natively for Claude Code/Codex CLI — zero daemon, zero config overhead.

📖 中文文档

Highlights

🧠 Hybrid Search Engine — Semantic + Full-Text Dual Channel

Goes beyond keyword matching. Dense vector search understands code intent ("login logic" → authenticateUser), while BM25 full-text search locks in exact matches. Results are fused via RRF and re-ranked by Cross-Encoder for precision. When embedding API is unavailable, gracefully falls back to graph search — never breaks.

🔗 4D Knowledge Graph

Call graph + code vectors + commit vectors + knowledge vectors — four dimensions of codebase awareness. Tree-sitter AST parses 7 languages to build complete function/class/method relationships:

Who calls this function?
What does this function depend on?
Find code by describing what it does

⚡ Real-time Incremental Indexing

Full build on first run, only re-processes changed files thereafter (MD5 diff). Auto-indexes on MCP startup and watches file changes during runtime. Auto-cleans orphaned embeddings — index never bloats.

🔌 Native MCP, Local-First

Built specifically for Claude Code/Codex CLI MCP stdio protocol. Install registers MCP automatically — no manual config, no persistent daemon. Starts and exits with Claude Code, zero residue. All code and data stay local, no SaaS required.

Quick Start

Recommended — one curl command

curl -fsSL https://raw.githubusercontent.com/iohub/codexray/main/install.sh | sh

Auto-detects OS/arch/libc, downloads, installs, and registers MCP. Restart Claude Code after — done.

First run: codexray install auto-launches an interactive setup wizard for the embedding API (graph search works without configuration).

Manual download (Linux musl example)

curl -L -o codexray.tar.gz https://github.com/iohub/codexray/releases/latest/download/codexray-linux-x64-musl.tar.gz
tar -xzf codexray.tar.gz
./codexray install && rm codexray.tar.gz

Other platforms: replace linux-x64-musl with darwin-arm64, darwin-x64, or linux-x64 from the latest release.

From source

git clone https://github.com/iohub/codexray.git && cd codexray
cargo build --release && ./rust-core/target/release/codexray install

How It Works

Index Building

Source files
  → Tree-sitter AST parse (7 languages)
  → Extract functions / classes / methods
  → Build call graph (PetCodeGraph)
  → Batch embed via API (SQLite cache)
  → Store vectors in LanceDB
  → Build BM25 index in Tantivy
  → Save to ~/.codexray/<project_hash>/

Idempotent: index builds are incremental — the first run is a full build, subsequent runs compare MD5 hashes and only re-process changed files.

Hybrid Search Pipeline (`codexray search`)

                        ┌─────────────────────┐
User query ────────────→│  Embedding Model     │──→ Query vector
                        └─────────────────────┘
                                  │
          ┌───────────────────────┼───────────────────────┐
          ▼                       ▼                       ▼
   ┌─────────────┐       ┌─────────────┐       ┌─────────────┐
   │ Dense Search │       │ Sparse Search│       │ Graph Search │
   │ (LanceDB ANN)│       │ (Tantivy BM25)│      │ (PetCodeGraph)│
   └──────┬───────┘       └──────┬──────┘       └──────┬──────┘
          │                      │                      │
          └──────────────────────┼──────────────────────┘
                                 ▼
                        ┌─────────────────┐
                        │   RRF Fusion    │  ← Reciprocal Rank Fusion
                        │  (Top-20 candidates)│
                        └────────┬────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │    Reranker     │  ← Cross-Encoder fine re-ranking
                        │ (Qwen3-Reranker)│     scores each (query, code) pair
                        └────────┬────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │   Final Results  │  ← Top-5 (or Top-N)
                        └─────────────────┘

Stage	Technology	Role
Dense Search	LanceDB + Embedding Model	Semantic vector similarity
Sparse Search	Tantivy BM25	Keyword & token matching
RRF Fusion	Reciprocal Rank Fusion	Merge heterogeneous scores fairly
Reranker	Cross-Encoder (Qwen3-Reranker-4B)	Full-interaction precision scoring
Fallback	PetCodeGraph	Graph-based name search (no API needed)

If embedding/reranker are unavailable, the pipeline falls back gracefully to graph-based name search and BM25-only mode.

Auto-Indexing Modes

Mode	When	Trigger
MCP server	On startup + file changes	`codexray install` + restart Claude Code

The MCP server automatically indexes on startup, watches file changes during runtime, and injects CLAUDE.md for tool discovery.

Storage

Config: ~/.codexray/config.json (global, shared across all projects)
Index: ~/.codexray/<md5(project_root)>/
- project.json — Project metadata
- graph.bin — Serialized call graph
- embeddings.lance/ — LanceDB vector data
- tantivy_bm25/ — BM25 full-text index
- file_hashes.json — MD5 incremental tracking
- embedding_hashes.json — Embedding incremental tracking

No daemon, no HTTP server. Every CLI command is a standalone process.

Supported Languages

Language	Functions	Structs/Classes	Call Graph
Rust	✅	✅	✅
Python	✅	✅	✅
JavaScript	✅	✅	✅
TypeScript	✅	✅	✅
Go	✅	✅	✅
C/C++	✅	✅	✅
Java	✅	✅	✅

Configuration

~/.codexray/config.json:

{
  "embedding": {
    "provider": "openai-compatible",
    "model": "Qwen/Qwen3-Embedding-4B",
    "api_token": "sk-...",
    "api_base_url": "https://api.siliconflow.cn/v1",
    "dimensions": 2560
  },
  "index": {
    "min_code_block_length": 16,
    "enable_reranker": true,
    "hybrid": {
      "enable_bm25": true,
      "bm25_top_k": 100,
      "vector_top_k": 100,
      "rrf_k": 60,
      "rrf_top_k": 20,
      "short_code_threshold": 30,
      "short_code_penalty": 0.5
    },
    "reranker": {
      "enabled": true,
      "model": "Qwen/Qwen3-Reranker-4B",
      "api_token": "sk-...",
      "api_base_url": "https://api.siliconflow.cn/v1/rerank",
      "top_n": 5,
      "candidate_multiplier": 5,
      "timeout_secs": 60
    }
  },
  "installed_hooks": {}
}

Model Roles

Model	Role	When
`Qwen/Qwen3-Embedding-4B`	Converts code → vectors for dense search	Index building
`Qwen/Qwen3-Reranker-4B`	Scores (query, code) pairs for precision	Search time

Set via the interactive wizard on first run, or create manually. If embedding API is unavailable, graph-based search still works.

Development

cd rust-core

# Build
cargo build

# Build release
cargo build --release

# Run tests
cargo test

# Run specific test
cargo test test_build_graph_functionality -- --nocapture

License

MIT

Built with: Tree-sitter · Petgraph · LanceDB · Tantivy · Tokio · Clap · Axum

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.claude		.claude
.codeactor		.codeactor
.github/workflows		.github/workflows
rust-core		rust-core
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
build.sh		build.sh
bump-version.sh		bump-version.sh
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeXRay

Highlights

🧠 Hybrid Search Engine — Semantic + Full-Text Dual Channel

🔗 4D Knowledge Graph

⚡ Real-time Incremental Indexing

🔌 Native MCP, Local-First

Quick Start

Recommended — one curl command

Manual download (Linux musl example)

From source

How It Works

Index Building

Hybrid Search Pipeline (`codexray search`)

Auto-Indexing Modes

Storage

Supported Languages

Configuration

Model Roles

Development

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeXRay

Highlights

🧠 Hybrid Search Engine — Semantic + Full-Text Dual Channel

🔗 4D Knowledge Graph

⚡ Real-time Incremental Indexing

🔌 Native MCP, Local-First

Quick Start

Recommended — one curl command

Manual download (Linux musl example)

From source

How It Works

Index Building

Hybrid Search Pipeline (codexray search)

Auto-Indexing Modes

Storage

Supported Languages

Configuration

Model Roles

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Hybrid Search Pipeline (`codexray search`)

Packages