VectorCode — Semantic Code Search MCP Server

Target audience: SDD orchestrator + subagents. This document is the single source of truth for building the entire tool.

1. Project Identity

Field	Value
Name	`vectorcode`
Language	Rust (2021 edition, MSRV 1.75+)
Binary	Single statically-linked binary
Protocol	MCP (Model Context Protocol) over stdio
Storage	SQLite + `sqlite-vec` extension (single file)
License	MIT
Platforms	macOS (arm64, x86_64), Linux (x86_64, arm64), Windows (x86_64)

2. Problem Statement

AI coding agents (OpenCode, Claude Code, Cursor, Gemini CLI) navigate codebases using two strategies:

Exact match — grep, ripgrep, literal string search
Structural — CodeGraph provides symbol-level knowledge graphs (callers, callees, impact)

Neither supports semantic search: finding code by concept when the developer doesn't know the exact symbol name, pattern, or terminology used in the codebase.

Example queries that cannot be answered today:

"code that handles payment retries"
"where do we validate user permissions"
"functions similar to createUser"
"error recovery logic"

VectorCode fills this gap by vectorizing code chunks and enabling cosine-similarity search over natural language queries.

3. Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     vectorcode (Rust binary)                │
│                                                             │
│  ┌──────────┐   ┌──────────────┐   ┌─────────────────────┐ │
│  │ CLI      │   │ MCP Server   │   │ File Watcher        │ │
│  │ (clap)   │   │ (stdio JSON- │   │ (notify crate,      │ │
│  │          │   │  RPC)        │   │  debounced)          │ │
│  └────┬─────┘   └──────┬───────┘   └──────────┬──────────┘ │
│       │                │                       │            │
│       └────────┬───────┴───────────────────────┘            │
│                │                                            │
│       ┌────────▼────────┐                                   │
│       │   Core Engine   │                                   │
│       │                 │                                   │
│       │  ┌───────────┐  │                                   │
│       │  │ Chunker   │  │  Tree-sitter AST parsing          │
│       │  └─────┬─────┘  │                                   │
│       │        │        │                                   │
│       │  ┌─────▼─────┐  │                                   │
│       │  │ Embedder  │  │  Provider trait (ONNX/Gemini/     │
│       │  │ (trait)   │  │  Ollama/OpenAI)                   │
│       │  └─────┬─────┘  │                                   │
│       │        │        │                                   │
│       │  ┌─────▼─────┐  │                                   │
│       │  │ Store     │  │  SQLite + sqlite-vec              │
│       │  │           │  │  (.vectorcode/index.db)           │
│       │  └───────────┘  │                                   │
│       └─────────────────┘                                   │
└─────────────────────────────────────────────────────────────┘

4. Directory Structure (Project Layout)

vectorcode/
├── Cargo.toml
├── Cargo.lock
├── README.md
├── LICENSE
├── install.sh                    # macOS/Linux installer
├── install.ps1                   # Windows installer
├── build.rs                      # Build script (ONNX model bundling)
│
├── models/                       # Embedded ONNX models
│   └── minilm-l6-v2-q8/
│       ├── model.onnx            # INT8 quantized (~23MB)
│       ├── tokenizer.json        # HuggingFace tokenizer
│       └── config.json           # Model metadata
│
├── grammars/                     # Tree-sitter grammar .so/.dylib (built at compile time)
│
├── skills/                       # Distributable Skill files
│   └── semantic-search/
│       └── SKILL.md
│
├── src/
│   ├── main.rs                   # Entry point, CLI dispatch
│   ├── cli/
│   │   ├── mod.rs
│   │   ├── init.rs               # `vectorcode init`
│   │   ├── index.rs              # `vectorcode index`
│   │   ├── search.rs             # `vectorcode search`
│   │   ├── status.rs             # `vectorcode status`
│   │   ├── serve.rs              # `vectorcode serve --mcp`
│   │   ├── install.rs            # `vectorcode install`
│   │   └── upgrade.rs            # `vectorcode upgrade`
│   │
│   ├── mcp/
│   │   ├── mod.rs                # MCP server (stdio transport)
│   │   ├── transport.rs          # JSON-RPC stdio reader/writer
│   │   ├── handler.rs            # Tool dispatch
│   │   └── schema.rs             # Tool definitions (JSON Schema)
│   │
│   ├── engine/
│   │   ├── mod.rs
│   │   ├── chunker.rs            # AST-aware chunking logic
│   │   ├── languages.rs          # Tree-sitter language registry
│   │   ├── indexer.rs            # Orchestrates chunk → embed → store
│   │   └── searcher.rs           # Query embedding + similarity search
│   │
│   ├── embedder/
│   │   ├── mod.rs                # Embedder trait definition
│   │   ├── onnx.rs               # ONNX Runtime provider (bundled model)
│   │   ├── gemini.rs             # Google Gemini API provider
│   │   ├── ollama.rs             # Ollama local API provider
│   │   └── openai.rs             # OpenAI API provider
│   │
│   ├── store/
│   │   ├── mod.rs
│   │   ├── db.rs                 # SQLite connection, migrations
│   │   ├── chunks.rs             # Chunk CRUD operations
│   │   └── vectors.rs            # sqlite-vec operations
│   │
│   ├── watcher/
│   │   ├── mod.rs                # File watcher with debounce
│   │   └── gitignore.rs          # .gitignore-aware filtering
│   │
│   └── config/
│       ├── mod.rs                # Configuration loading
│       └── schema.rs             # Config struct definitions
│
└── tests/
    ├── integration/
    │   ├── indexing_test.rs
    │   ├── search_test.rs
    │   └── mcp_test.rs
    └── fixtures/
        ├── sample_ts/            # TypeScript sample project
        ├── sample_py/            # Python sample project
        └── sample_rs/            # Rust sample project

5. Data Models

5.1 Chunk

A chunk is the atomic unit of indexed code. Each chunk maps to one semantically meaningful block of source code.

pub struct Chunk {
    /// Deterministic ID: blake3(file_path + ":" + byte_start + ":" + byte_end)
    pub id: String,

    /// Absolute path to the source file
    pub file_path: String,

    /// Line range in the source file (1-indexed, inclusive)
    pub start_line: u32,
    pub end_line: u32,

    /// Byte offset range in the source file (0-indexed)
    pub byte_start: u32,
    pub byte_end: u32,

    /// Symbol name if available (e.g., "UserService.authenticate")
    pub symbol: Option<String>,

    /// AST node kind (e.g., "function_declaration", "class_declaration", "impl_item")
    pub kind: String,

    /// The source code content of this chunk
    pub content: String,

    /// Parent context for retrieval enrichment
    /// e.g., "class UserService" or "mod auth::handlers"
    pub parent_context: Option<String>,

    /// Language identifier (e.g., "typescript", "python", "rust")
    pub language: String,

    /// File modification time at indexing (Unix timestamp seconds)
    pub file_mtime: i64,

    /// Content hash for change detection: blake3(content)
    pub content_hash: String,
}

5.2 Index Metadata

pub struct IndexMeta {
    /// Embedding provider used to create this index
    pub provider: String,          // "onnx" | "gemini" | "ollama" | "openai"

    /// Specific model identifier
    pub model: String,             // e.g., "all-MiniLM-L6-v2", "gemini-embedding-001"

    /// Vector dimensions (FIXED at index creation time)
    pub dimensions: u32,           // e.g., 384, 768, 3072

    /// Timestamp of index creation
    pub created_at: String,        // ISO 8601

    /// Timestamp of last completed sync
    pub last_sync_at: Option<String>,

    /// Total files indexed
    pub files_indexed: u32,

    /// Total chunks stored
    pub chunks_stored: u32,

    /// VectorCode version that created this index
    pub vectorcode_version: String,
}

5.3 Search Result

pub struct SearchResult {
    /// Chunk metadata
    pub file_path: String,
    pub start_line: u32,
    pub end_line: u32,
    pub symbol: Option<String>,
    pub kind: String,
    pub language: String,
    pub parent_context: Option<String>,

    /// The source code content
    pub content: String,

    /// Cosine similarity score (0.0 to 1.0, higher = more relevant)
    pub score: f32,
}

6. SQLite Schema

All data lives in a single file: .vectorcode/index.db

-- Index metadata (singleton row)
CREATE TABLE meta (
    key   TEXT PRIMARY KEY,
    value TEXT NOT NULL
);

-- Chunk metadata
CREATE TABLE chunks (
    id             TEXT PRIMARY KEY,
    file_path      TEXT NOT NULL,
    start_line     INTEGER NOT NULL,
    end_line       INTEGER NOT NULL,
    byte_start     INTEGER NOT NULL,
    byte_end       INTEGER NOT NULL,
    symbol         TEXT,
    kind           TEXT NOT NULL,
    content        TEXT NOT NULL,
    parent_context TEXT,
    language       TEXT NOT NULL,
    file_mtime     INTEGER NOT NULL,
    content_hash   TEXT NOT NULL
);

CREATE INDEX idx_chunks_file_path ON chunks(file_path);
CREATE INDEX idx_chunks_symbol ON chunks(symbol) WHERE symbol IS NOT NULL;
CREATE INDEX idx_chunks_language ON chunks(language);
CREATE INDEX idx_chunks_content_hash ON chunks(content_hash);

-- Vector storage (sqlite-vec virtual table)
-- Dimensions are set at creation time based on the embedding provider.
-- The placeholder {DIMS} MUST be replaced with the actual integer value
-- during `vectorcode init` (e.g., 384 for ONNX MiniLM, 768 for Gemini).
CREATE VIRTUAL TABLE vec_chunks USING vec0(
    chunk_id TEXT PRIMARY KEY,
    embedding float[{DIMS}]
);

-- File tracking for incremental sync
CREATE TABLE files (
    path       TEXT PRIMARY KEY,
    mtime      INTEGER NOT NULL,
    size       INTEGER NOT NULL,
    hash       TEXT NOT NULL,
    indexed_at INTEGER NOT NULL
);

7. Embedding Provider System

7.1 Trait Definition

#[async_trait]
pub trait Embedder: Send + Sync {
    /// Generate embedding for a single text
    async fn embed(&self, text: &str) -> Result<Vec<f32>>;

    /// Generate embeddings for a batch of texts
    /// Default implementation calls embed() in sequence;
    /// providers with native batch support should override.
    async fn embed_batch(&self, texts: &[&str]) -> Result<Vec<Vec<f32>>> {
        let mut results = Vec::with_capacity(texts.len());
        for text in texts {
            results.push(self.embed(text).await?);
        }
        Ok(results)
    }

    /// Number of dimensions in the output vectors
    fn dimensions(&self) -> u32;

    /// Provider name for metadata
    fn provider_name(&self) -> &str;

    /// Model identifier for metadata
    fn model_name(&self) -> &str;

    /// Maximum input token length supported
    fn max_tokens(&self) -> u32;
}

7.2 Provider Specifications

ONNX (default, offline)

Field	Value
Provider name	`onnx`
Default model	`all-MiniLM-L6-v2` (INT8 quantized)
Dimensions	384
Max tokens	512
Dependencies	`ort` crate (ONNX Runtime bindings), `tokenizers` crate (HuggingFace)
Model delivery	Bundled in binary via `include_bytes!` or downloaded on first `init`
Batch support	Native (multiple inputs per session run)
Requires internet	No
Requires API key	No

Implementation notes:

Use tokenizers crate from HuggingFace for WordPiece tokenization
Load model via ort::Session::builder().with_model_from_memory()
Input tensors: input_ids, attention_mask, token_type_ids (all i64)
Output: take last_hidden_state, apply mean pooling over token dimension, then L2 normalize
For chunks longer than 512 tokens: truncate (the chunk should already be sized appropriately by the chunker)

Gemini

Field	Value
Provider name	`gemini`
Default model	`gemini-embedding-001`
Dimensions	768 (configurable: 256, 512, 768, 1024, 3072 via Matryoshka)
Max tokens	2048
API endpoint	`https://generativelanguage.googleapis.com/v1beta/models/{model}:embedContent`
Auth	API key via `GEMINI_API_KEY` env var or config
Batch support	Yes — `batchEmbedContents` endpoint, up to 100 items per request
Rate limits	Free tier: 1500 req/min
Requires internet	Yes

Request format:

POST /v1beta/models/gemini-embedding-001:embedContent
{
  "content": {
    "parts": [{ "text": "function handlePayment()..." }]
  },
  "outputDimensionality": 768
}

Batch request format:

POST /v1beta/models/gemini-embedding-001:batchEmbedContents
{
  "requests": [
    {
      "content": { "parts": [{ "text": "chunk 1..." }] },
      "outputDimensionality": 768
    },
    {
      "content": { "parts": [{ "text": "chunk 2..." }] },
      "outputDimensionality": 768
    }
  ]
}

Implementation notes:

Use reqwest for HTTP
Implement exponential backoff with jitter for rate limiting (429 responses)
Batch size: 100 items per request (API max)
The outputDimensionality parameter controls Matryoshka truncation

Ollama

Field	Value
Provider name	`ollama`
Default model	`embeddinggemma:latest`
Dimensions	768
Max tokens	8192
API endpoint	`http://localhost:11434/api/embed` (configurable)
Auth	None
Batch support	Yes — `input` field accepts array of strings
Requires internet	No (after model pull)

Request format:

POST /api/embed
{
  "model": "embeddinggemma:latest",
  "input": ["chunk 1...", "chunk 2..."]
}

Response:

{
  "model": "embeddinggemma:latest",
  "embeddings": [[0.123, -0.456, ...], [0.789, -0.012, ...]]
}

Implementation notes:

Verify Ollama is running and model is available before indexing starts
If Ollama is not reachable, emit a clear error with instructions: ollama pull embeddinggemma:latest

OpenAI

Field	Value
Provider name	`openai`
Default model	`text-embedding-3-small`
Dimensions	1536
Max tokens	8191
API endpoint	`https://api.openai.com/v1/embeddings`
Auth	API key via `OPENAI_API_KEY` env var or config
Batch support	Yes — `input` field accepts array, up to 2048 items
Requires internet	Yes

Request format:

POST /v1/embeddings
{
  "model": "text-embedding-3-small",
  "input": ["chunk 1...", "chunk 2..."]
}

Implementation notes:

Standard OpenAI SDK pattern
Implement retry with exponential backoff for 429/500/503

8. AST-Aware Chunking System

8.1 Supported Languages

Tree-sitter grammars to include (start with these, expand later):

Language	Tree-sitter crate	Priority	File extensions
TypeScript/TSX	`tree-sitter-typescript`	P0	`.ts`, `.tsx`
JavaScript/JSX	`tree-sitter-javascript`	P0	`.js`, `.jsx`, `.mjs`, `.cjs`
Python	`tree-sitter-python`	P0	`.py`
Rust	`tree-sitter-rust`	P0	`.rs`
Go	`tree-sitter-go`	P1	`.go`
Java	`tree-sitter-java`	P1	`.java`
C#	`tree-sitter-c-sharp`	P1	`.cs`
C/C++	`tree-sitter-c`, `tree-sitter-cpp`	P1	`.c`, `.h`, `.cpp`, `.hpp`, `.cc`
Ruby	`tree-sitter-ruby`	P2	`.rb`
Swift	`tree-sitter-swift`	P2	`.swift`
Kotlin	`tree-sitter-kotlin`	P2	`.kt`, `.kts`

Files with unrecognized extensions fall back to a line-based chunker (sliding window, 50 lines per chunk, 10 lines overlap).

8.2 Chunking Strategy

Target AST Node Types (per language)

Each language defines which AST node types constitute a "chunkable" unit:

TypeScript/JavaScript:

function_declaration
arrow_function (only when assigned to a variable/export)
method_definition
class_declaration
interface_declaration
type_alias_declaration
enum_declaration
export_statement (wrapping any of the above)

Python:

function_definition
class_definition
decorated_definition

Rust:

function_item
impl_item
struct_item
enum_item
trait_item
mod_item (only top-level, not inline)

Go:

function_declaration
method_declaration
type_declaration

(Other languages follow the same pattern: extract top-level declarations and methods.)

Chunking Algorithm

FUNCTION chunk_file(source: &str, language: Language) -> Vec<Chunk>:
    tree = tree_sitter_parse(source, language)
    chunks = []

    FOR each top-level node in tree.root_node().children():
        IF node.kind() is in CHUNKABLE_TYPES[language]:
            text = source[node.byte_range()]
            size = text.len()

            IF size < 100:
                // Too small to be useful alone — skip or merge with neighbors
                CONTINUE

            ELSE IF size <= 2000:
                // Ideal size — emit as single chunk
                chunks.push(make_chunk(node, text))

            ELSE:
                // Too large — recursively split by children
                sub_chunks = split_large_node(node, source, language)
                chunks.extend(sub_chunks)

    IF chunks.is_empty():
        // Fallback: line-based sliding window
        chunks = line_based_chunks(source, window=50, overlap=10)

    RETURN chunks


FUNCTION split_large_node(node, source, language) -> Vec<Chunk>:
    children = node.named_children()
        .filter(|c| c.kind() is in CHUNKABLE_TYPES[language])

    IF children.is_empty():
        // No meaningful children — split by statements with overlap
        RETURN statement_split(node, source, max_size=1500, overlap=100)

    chunks = []
    FOR child in children:
        text = source[child.byte_range()]
        IF text.len() <= 2000:
            // Prepend parent signature as context
            chunk = make_chunk(child, text)
            chunk.parent_context = extract_signature(node, source)
            chunks.push(chunk)
        ELSE:
            chunks.extend(split_large_node(child, source, language))

    RETURN chunks

Chunk Metadata Extraction

For each chunk, extract:

symbol: full qualified name when possible (e.g., ClassName.methodName)
kind: the AST node type string
parent_context: the signature of the enclosing scope (e.g., impl UserService or class PaymentHandler)
start_line / end_line: from node.start_position().row + 1 and node.end_position().row + 1

Content Enrichment Before Embedding

Before sending a chunk to the embedder, prepend contextual metadata to improve retrieval quality:

// Format sent to the embedder (NOT stored in `content` field)
"{language} | {file_path} | {parent_context} | {symbol}\n{content}"

Example:

"typescript | src/payment/retry.ts | class PaymentRetryHandler | handleRetry\nasync handleRetry(attempt: number): Promise<PaymentResult> {\n  ..."

This enrichment helps the embedding model understand that handleRetry is a TypeScript method inside a payment retry handler, even if the code alone doesn't mention "payment" explicitly.

9. Indexing Pipeline

9.1 Full Index (`vectorcode index`)

1. Load config (provider, dimensions, languages)
2. Validate index exists (.vectorcode/index.db) or fail with "run vectorcode init first"
3. Discover files:
   a. Walk project directory recursively
   b. Filter by supported extensions (§8.1)
   c. Respect .gitignore via `ignore` crate (same library ripgrep uses)
   d. Skip .vectorcode/, .git/, node_modules/, target/, __pycache__/, vendor/
   e. Skip files > 1MB (configurable via max_file_size)
4. For each file:
   a. Check files table: if mtime + size unchanged AND hash matches → skip
   b. Read file content
   c. Compute content hash (blake3)
   d. Parse with tree-sitter → extract chunks
   e. For each chunk:
      - Compute chunk ID (blake3 of file_path + byte_range)
      - Check if chunk with same ID + content_hash exists → skip
   f. Collect all new/changed chunks
5. Batch embed all new chunks:
   a. Group into batches (size depends on provider: 100 for Gemini, 2048 for OpenAI, etc.)
   b. Call embedder.embed_batch() for each batch
   c. Store vectors in vec_chunks table
   d. Store metadata in chunks table
   e. Update files table with new mtime/size/hash
6. Clean stale data:
   a. Remove chunks for files that no longer exist
   b. Remove chunks for file regions that changed
   c. Remove corresponding vectors
7. Update meta table with last_sync_at, files_indexed, chunks_stored
8. Report: "Indexed {N} files, {M} chunks, {T} new embeddings in {D}s"

9.2 Incremental Sync (`vectorcode sync`)

Same as full index but:

Only processes files where mtime or size differ from files table
Used by the file watcher after debounce

9.3 Progress Reporting

During indexing, emit progress to stderr (not stdout, which is reserved for MCP):

[1/3] Discovering files... 2,515 files found
[2/3] Chunking... 8,432 chunks (2,108 new, 6,324 unchanged)
[3/3] Embedding... 2,108 chunks [████████████████████] 100% (42.3s)
Indexed 2,515 files, 8,432 chunks in 45.1s

9.4 Concurrency

File I/O: use tokio async runtime with bounded concurrency (default: 8 files in parallel)
Embedding API calls: bounded concurrency matching provider rate limits
- ONNX: single session, batch internally (CPU-bound)
- Gemini: 4 concurrent requests (to stay within 1500 req/min)
- Ollama: 1 concurrent request (local, sequential)
- OpenAI: 4 concurrent requests
SQLite writes: single writer, use WAL mode for concurrent reads during MCP serving

10. Query Pipeline

10.1 Search Flow

1. Receive query string from MCP tool call
2. Enrich query (optional): if query is very short (<3 words), prepend "code that"
3. Embed query using same provider/model as index
4. Execute vector similarity search:
   SELECT c.*, v.distance
   FROM vec_chunks v
   JOIN chunks c ON c.id = v.chunk_id
   WHERE v.embedding MATCH ?query_vec
     AND k = ?limit
   ORDER BY v.distance ASC
5. Convert distance to score: score = 1.0 - distance (for cosine distance)
6. Filter results with score < threshold (default: 0.3)
7. Format and return results

10.2 Search Options

Parameter	Type	Default	Description
`query`	string	required	Natural language search query
`limit`	integer	10	Maximum number of results
`threshold`	float	0.3	Minimum similarity score (0.0–1.0)
`language`	string?	null	Filter by language (e.g., "typescript")
`path`	string?	null	Filter by file path prefix (e.g., "src/auth/")
`kind`	string?	null	Filter by chunk kind (e.g., "function_declaration")

11. MCP Server Specification

11.1 Transport

Protocol: MCP over stdio (stdin/stdout)
Format: JSON-RPC 2.0
Launch: vectorcode serve --mcp
The server MUST NOT write anything to stdout except valid JSON-RPC messages
Diagnostic/log output goes to stderr

11.2 Server Capabilities

{
  "name": "vectorcode",
  "version": "0.1.0",
  "capabilities": {
    "tools": {}
  }
}

11.3 Tool Definitions

`vec_search`

Primary tool. Semantic search over the indexed codebase.

{
  "name": "vec_search",
  "description": "Semantic code search — find code by meaning, not just by name. Use when you need to find code related to a concept (e.g., 'payment retry logic', 'user authentication', 'error handling for database connections') and you don't know the exact symbol names or file locations. Returns ranked code chunks with file paths, line numbers, and similarity scores. Complements grep (exact match) and codegraph (structural). Use grep when you know the exact string; use codegraph when you know the symbol name; use vec_search when you know the concept but not the code.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Natural language description of the code you're looking for. Be specific about the domain and behavior (e.g., 'retry logic with exponential backoff' is better than 'retry')."
      },
      "limit": {
        "type": "integer",
        "description": "Maximum number of results to return.",
        "default": 10
      },
      "threshold": {
        "type": "number",
        "description": "Minimum similarity score (0.0–1.0). Lower values return more results with less relevance.",
        "default": 0.3
      },
      "language": {
        "type": "string",
        "description": "Filter results by programming language (e.g., 'typescript', 'python', 'rust')."
      },
      "path": {
        "type": "string",
        "description": "Filter results by file path prefix (e.g., 'src/auth/' to search only in the auth module)."
      }
    },
    "required": ["query"]
  }
}

Response format (returned as text content in MCP tool result):

Found 5 results for "payment retry logic" (threshold: 0.30)

[1] src/payment/retry.ts:45-92 (score: 0.87)
    Symbol: PaymentRetryHandler.handleRetry
    Kind: method_definition

    async handleRetry(attempt: number): Promise<PaymentResult> {
      const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
      const jitter = Math.random() * 1000;
      await sleep(delay + jitter);
      ...
    }

[2] src/payment/processor.ts:120-145 (score: 0.72)
    Symbol: processPaymentWithRetry
    Kind: function_declaration
    ...

(3 more results)

`vec_status`

Report index health and statistics.

{
  "name": "vec_status",
  "description": "Check the status of the VectorCode index — provider, model, dimensions, number of indexed files and chunks, last sync time, and any pending file changes.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "projectPath": {
        "type": "string",
        "description": "Path to a project with .vectorcode/ initialized. Defaults to current directory."
      }
    }
  }
}

Response format:

VectorCode Index Status
═══════════════════════
Provider:    gemini
Model:       gemini-embedding-001
Dimensions:  768
Version:     0.1.0

Files:       2,515 indexed
Chunks:      8,432 stored
Last sync:   2026-06-10T20:00:00Z (3 minutes ago)

Pending sync:
  src/payment/retry.ts (modified 5s ago)
  src/auth/handler.ts (modified 12s ago)

`vec_reindex`

Force a full or partial re-index.

{
  "name": "vec_reindex",
  "description": "Force re-indexing of the codebase or specific files. Use after changing the embedding provider, or when the index seems stale or corrupted.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "path": {
        "type": "string",
        "description": "Specific file or directory to reindex. If omitted, reindexes the entire project."
      },
      "full": {
        "type": "boolean",
        "description": "If true, drops all existing data and rebuilds from scratch. If false, only reindexes changed files.",
        "default": false
      }
    }
  }
}

12. CLI Commands

12.1 Command Reference

vectorcode <COMMAND>

Commands:
  init       Initialize VectorCode in a project directory
  index      Build or update the embedding index
  search     Search the index from the command line
  status     Show index status and health
  serve      Start the MCP server
  install    Auto-configure agents (OpenCode, Claude Code, Cursor, etc.)
  uninstall  Remove VectorCode from agent configurations
  upgrade    Self-update the binary
  help       Print help

Global options:
  --project-path <PATH>   Path to project (default: current directory)
  --verbose               Enable verbose logging to stderr
  --quiet                 Suppress progress output

12.2 `vectorcode init`

vectorcode init [OPTIONS]

Options:
  --provider <PROVIDER>   Embedding provider [default: onnx]
                          [possible values: onnx, gemini, ollama, openai]
  --model <MODEL>         Model name (provider-specific default if omitted)
  --dims <DIMS>           Embedding dimensions (provider-specific default if omitted)
  --index                 Also run initial indexing after init (like codegraph init -i)

Behavior:

Create .vectorcode/ directory
Create index.db with schema (§6)
Write meta table with provider, model, dimensions
Create .vectorcode/.gitignore containing index.db (the DB should not be committed)
Create .vectorcode/config.toml with chosen provider settings
If --index flag: run full indexing pipeline

12.3 `vectorcode index`

vectorcode index [OPTIONS]

Options:
  --full            Drop all data and rebuild from scratch
  --file <PATH>     Index only a specific file
  --concurrency <N> Max concurrent file processing [default: 8]

12.4 `vectorcode search`

vectorcode search <QUERY> [OPTIONS]

Options:
  --limit <N>          Max results [default: 10]
  --threshold <F>      Min similarity score [default: 0.3]
  --language <LANG>    Filter by language
  --path <PREFIX>      Filter by path prefix
  --json               Output results as JSON

12.5 `vectorcode serve`

vectorcode serve [OPTIONS]

Options:
  --mcp              Start as MCP server (stdio transport)
  --watch            Enable file watcher for auto-sync [default: true]
  --debounce <MS>    File watcher debounce interval [default: 2000]

12.6 `vectorcode install`

vectorcode install [OPTIONS]

Options:
  --target <AGENT>   Install for specific agent only
                     [possible values: opencode, claude-code, cursor, gemini-cli, antigravity]

Behavior:

Detect installed agents by checking known config file locations
For each detected agent, add the VectorCode MCP server entry to its config
Patterns per agent:
- OpenCode: modify opencode.json → mcpServers section
- Claude Code: modify ~/.claude/claude_desktop_config.json → mcpServers
- Cursor: modify .cursor/mcp.json
- Gemini CLI: modify ~/.gemini/settings.json → mcpServers
- Antigravity: modify ~/.gemini/antigravity/settings.json → mcpServers

13. Configuration

13.1 Config File Location

.vectorcode/config.toml (per-project)

13.2 Config Schema

# .vectorcode/config.toml

[provider]
# Which embedding provider to use
# Values: "onnx", "gemini", "ollama", "openai"
name = "onnx"

[provider.onnx]
# Model bundled with the binary — no config needed
# model = "all-MiniLM-L6-v2"  (default, currently the only bundled option)

[provider.gemini]
# API key: reads from this field OR from GEMINI_API_KEY env var
api_key = ""
model = "gemini-embedding-001"
dimensions = 768  # Matryoshka: 256, 512, 768, 1024, 3072

[provider.ollama]
url = "http://localhost:11434"
model = "embeddinggemma:latest"

[provider.openai]
# API key: reads from this field OR from OPENAI_API_KEY env var
api_key = ""
model = "text-embedding-3-small"

[indexing]
# Maximum file size to index (bytes). Files larger than this are skipped.
max_file_size = 1_048_576  # 1MB

# Directories to always exclude (in addition to .gitignore)
exclude_dirs = [
    ".vectorcode",
    ".git",
    "node_modules",
    "target",
    "__pycache__",
    "vendor",
    "dist",
    "build",
    ".next",
]

# File extensions to always exclude
exclude_extensions = [
    ".min.js",
    ".map",
    ".lock",
    ".svg",
    ".png",
    ".jpg",
    ".ico",
    ".woff",
    ".woff2",
    ".ttf",
]

# Max concurrent file processing
concurrency = 8

[watcher]
# File watcher debounce in milliseconds
debounce_ms = 2000

# Disable file watcher entirely
disabled = false

[search]
# Default result limit
default_limit = 10

# Default similarity threshold
default_threshold = 0.3

13.3 Environment Variable Overrides

Env var	Overrides
`GEMINI_API_KEY`	`provider.gemini.api_key`
`OPENAI_API_KEY`	`provider.openai.api_key`
`VECTORCODE_PROVIDER`	`provider.name`
`VECTORCODE_NO_WATCH`	`watcher.disabled` (set to `1` to disable)
`VECTORCODE_DEBOUNCE_MS`	`watcher.debounce_ms`

14. File Watcher

14.1 Behavior

When the MCP server runs (vectorcode serve --mcp), it starts a file watcher on the project directory.

Watcher flow:

Use notify crate with native OS events (FSEvents on macOS, inotify on Linux, ReadDirectoryChanges on Windows)
Filter events through .gitignore rules (using ignore crate)
Filter by supported file extensions
Debounce: collect all changed file paths over a configurable window (default 2000ms)
After debounce: run incremental sync on changed files only
Track pending files (changed but not yet re-indexed) for staleness reporting

14.2 Staleness Banner

When a vec_search result references a file that has pending changes (modified after last sync but before debounce completes), prepend a banner:

⚠️ Some files referenced below were modified since the last index sync
and may not reflect the latest content:
  - src/payment/retry.ts (modified 1s ago)
Use grep or read these files directly for accurate content.

Found 5 results for "payment retry logic" (threshold: 0.30)
...

14.3 Connect-Time Catch-Up

When the MCP server starts, before answering the first query:

Run a fast (mtime, size) reconciliation against the files table
If any files changed since last sync: run incremental sync
This catches changes made while no MCP server was running (git pull, editor, etc.)

15. Skill File

15.1 Location

Distributed with the binary and installable via vectorcode install:

Per-project: .agents/skills/semantic-search/SKILL.md
Global: ~/.agents/skills/semantic-search/SKILL.md

15.2 Content

---
name: semantic-search
description: >
  Use when searching for code by concept, meaning, or behavior — not by exact
  symbol name or literal string. Ideal for queries like "payment retry logic",
  "user authentication flow", "error handling for database connections", or
  "functions similar to createUser". Do NOT use for exact string matches (use
  grep) or known symbol lookups (use codegraph_explore).
---

## Semantic Code Search Protocol

### Tool: `vec_search`

Performs cosine-similarity search over embedded code chunks. Returns ranked
results with file paths, line numbers, symbols, and source code.

### When to use `vec_search`

- You need to find code related to a **concept** but don't know the symbol names
- `grep` returned no results because the code uses different terminology
- You want to find **similar** code patterns across the codebase
- You're exploring an unfamiliar area of the codebase by topic

### When NOT to use `vec_search`

- You know the exact function/class name → use `codegraph_explore`
- You know an exact string in the code → use `grep`
- You're looking for past decisions or history → use `mem_search` (Engram)

### Recommended flow: Semantic → Structural → Historical

For comprehensive code discovery, combine all three tools:

1. **`vec_search("payment error handling")`**
   → Finds code chunks semantically related to payment errors
   → Returns file paths, line ranges, and ranked source snippets

2. **`codegraph_explore("PaymentError handlePaymentFailure")`**
   → Takes symbol names found in step 1
   → Returns full source code + call graph + blast radius

3. **`mem_search("payment error handling")`**
   → Checks Engram for prior team decisions about this topic
   → Returns architectural context and history

### Query tips

- Be specific: "retry with exponential backoff" > "retry"
- Include domain terms: "payment validation" > "validation"
- Describe behavior: "function that sends email notifications" > "email"
- Use `--language` filter when you know the target language
- Use `--path` filter to scope to a specific module

### Example

vec_search("middleware that validates JWT tokens and extracts user info")

16. MCP `instructions.md`

This file is placed alongside the MCP tool schemas and is automatically loaded by agents.

16.1 Location

~/.gemini/antigravity/mcp/vectorcode/instructions.md (written by vectorcode install)

16.2 Content

# VectorCode — semantic code search over embedded vectors

VectorCode indexes the codebase into vector embeddings and enables
semantic similarity search. It finds code by meaning, not by name.

## Tool selection

- **"Find code about X concept / behavior / domain"** → `vec_search`
- **"Check if index is healthy / current"** → `vec_status`
- **"Force re-index after major changes"** → `vec_reindex`

## When to use vec_search vs other tools

- **Know the exact string** → grep (exact match, faster)
- **Know the symbol name** → codegraph_explore (structural, precise)
- **Know the concept but not the name** → vec_search (semantic, fuzzy)
- **Looking for past decisions** → mem_search / Engram (memory)

## Anti-patterns

- Don't use vec_search to find a symbol you already know the name of —
  codegraph_explore is faster and returns structural context.
- Don't re-verify vec_search results with grep — the source code in the
  result IS the current indexed content. Check the staleness banner if present.
- Don't ignore the score — results below 0.4 are usually noise.

## Staleness

The file watcher keeps the index current (2-second debounce after edits).
If a result has a ⚠️ staleness banner, read those specific files directly.
All files NOT in the banner are fresh.

17. Rust Crate Dependencies

[dependencies]
# CLI
clap = { version = "4", features = ["derive"] }

# Async runtime
tokio = { version = "1", features = ["full"] }

# SQLite
rusqlite = { version = "0.32", features = ["bundled", "vtab"] }

# sqlite-vec (loaded as extension)
# Build sqlite-vec from source via build.rs or load as shared library

# Tree-sitter
tree-sitter = "0.24"
tree-sitter-typescript = "0.23"
tree-sitter-javascript = "0.23"
tree-sitter-python = "0.23"
tree-sitter-rust = "0.23"
tree-sitter-go = "0.23"
tree-sitter-java = "0.23"
# Add more languages as needed

# ONNX Runtime
ort = { version = "2", features = ["load-dynamic"] }

# Tokenizer (for ONNX provider)
tokenizers = { version = "0.20", features = ["http"] }

# HTTP client (for API providers)
reqwest = { version = "0.12", features = ["json", "rustls-tls"] }

# Serialization
serde = { version = "1", features = ["derive"] }
serde_json = "1"

# Config
toml = "0.8"

# File watching
notify = "7"
notify-debouncer-full = "0.4"

# .gitignore support
ignore = "0.4"

# Hashing
blake3 = "1"

# Error handling
anyhow = "1"
thiserror = "2"

# Logging
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

# Async trait
async-trait = "0.1"

[dev-dependencies]
tempfile = "3"
assert_cmd = "2"
predicates = "3"

Important

Pin exact versions of tree-sitter grammar crates to avoid breaking changes. The ort crate version must match the ONNX Runtime version shipped with the binary. rusqlite must use features = ["bundled"] to avoid system SQLite version conflicts.

18. Error Handling

18.1 Error Categories

#[derive(thiserror::Error, Debug)]
pub enum VectorCodeError {
    #[error("Index not initialized. Run `vectorcode init` first.")]
    NotInitialized,

    #[error("Index was created with provider '{expected}' ({expected_dims}d) but current config uses '{actual}' ({actual_dims}d). Run `vectorcode index --full` to rebuild.")]
    ProviderMismatch {
        expected: String,
        expected_dims: u32,
        actual: String,
        actual_dims: u32,
    },

    #[error("Embedding provider error: {message}")]
    EmbedderError { message: String },

    #[error("API rate limited. Retrying in {retry_after_secs}s...")]
    RateLimited { retry_after_secs: u64 },

    #[error("Ollama not reachable at {url}. Is it running? Try: ollama serve")]
    OllamaUnavailable { url: String },

    #[error("Model '{model}' not found in Ollama. Try: ollama pull {model}")]
    OllamaModelNotFound { model: String },

    #[error("API key not set. Set {env_var} or configure in .vectorcode/config.toml")]
    ApiKeyMissing { env_var: String },

    #[error("Tree-sitter parse error for {file_path}: {message}")]
    ParseError { file_path: String, message: String },

    #[error("Database error: {0}")]
    Database(#[from] rusqlite::Error),

    #[error("IO error: {0}")]
    Io(#[from] std::io::Error),
}

18.2 MCP Error Responses

MCP tool errors MUST be returned as JSON-RPC error objects, never as panics or crashes:

{
  "jsonrpc": "2.0",
  "id": 1,
  "error": {
    "code": -32000,
    "message": "Index not initialized. Run `vectorcode init` first."
  }
}

19. Testing Strategy

19.1 Unit Tests

Module	What to test
`chunker`	Correct AST node extraction per language; size-based splitting; fallback to line-based
`embedder::onnx`	Model loading, tokenization, embedding output dimensions, L2 normalization
`store`	CRUD operations, vector insertion/search, incremental updates, stale cleanup
`config`	TOML parsing, env var overrides, default values
`watcher::gitignore`	Pattern matching against .gitignore rules

19.2 Integration Tests

Test	Description
`full_index_cycle`	Init → index a fixture project → verify chunk count and metadata
`incremental_sync`	Index → modify a file → sync → verify only changed chunks updated
`search_relevance`	Index a fixture project → run known queries → verify expected files appear in top-3
`provider_switch`	Init with ONNX → attempt search → switch to Gemini → verify error on dimension mismatch
`mcp_protocol`	Spawn MCP server → send JSON-RPC requests → verify correct responses
`large_file_handling`	Index a file > 1MB → verify it's skipped with max_file_size config
`gitignore_respect`	Create project with .gitignore → index → verify ignored files excluded

19.3 Fixture Projects

Provide small, self-contained projects in tests/fixtures/:

sample_ts/: TypeScript project (~20 files) with classes, functions, interfaces
sample_py/: Python project (~15 files) with classes, decorators, modules
sample_rs/: Rust project (~10 files) with structs, impls, traits, mods

Each fixture should have a queries.json with expected search results:

[
  {
    "query": "user authentication with password hashing",
    "expected_files": ["src/auth/password.ts", "src/auth/service.ts"],
    "min_score": 0.5
  }
]

20. Distribution & Installation

20.1 Binary Distribution

Build with cargo build --release for each target triple
Strip symbols: strip target/release/vectorcode
Targets:
- x86_64-apple-darwin
- aarch64-apple-darwin
- x86_64-unknown-linux-gnu
- aarch64-unknown-linux-gnu
- x86_64-pc-windows-msvc

20.2 Install Script

macOS/Linux (install.sh):

#!/bin/sh
set -e
REPO="alejandro-technology/vectorcode"
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
ARCH=$(uname -m)
# Map arch names
case "$ARCH" in
  x86_64) ARCH="x86_64" ;;
  arm64|aarch64) ARCH="aarch64" ;;
esac
URL="https://github.com/$REPO/releases/latest/download/vectorcode-$OS-$ARCH.tar.gz"
curl -fsSL "$URL" | tar xz -C /usr/local/bin vectorcode
echo "vectorcode installed to /usr/local/bin/vectorcode"

20.3 Homebrew (future)

class Vectorcode < Formula
  desc "Semantic code search MCP server using embeddings"
  homepage "https://github.com/alejandro-technology/vectorcode"
  # ...
end

21. Non-Functional Requirements

Requirement	Target
Cold search latency	< 200ms for 10K chunks (query embed + vector search + result formatting)
Index throughput (ONNX)	> 300 chunks/sec on Apple M-series
Index throughput (API)	Limited by provider rate limits, not by VectorCode
Memory usage (serving)	< 100MB RSS for 50K chunk index
Disk usage	~2KB per chunk (metadata + vector at 768d) → ~100MB for 50K chunks
Binary size	< 50MB (including bundled ONNX model)
Startup time	< 500ms to first MCP response (excluding catch-up sync)
Crash recovery	WAL mode SQLite — no corruption on unexpected termination

22. Future Considerations (Out of Scope for v0.1)

These are NOT part of the initial build but should be considered in the architecture:

Hybrid search: combine vector similarity with FTS5 keyword search for better precision
Cross-project search: query multiple .vectorcode/ indices in one call
Code-to-code search: "find code similar to this snippet" (embed the snippet, not a query)
Custom model support: allow users to bring their own ONNX model
GPU acceleration: CUDA/Metal execution providers for ONNX Runtime
Index compression: quantize stored vectors from float32 to int8 for 4x storage reduction
Shared index server: HTTP transport for multi-user/CI environments
Engram integration: automatically save search patterns and findings as Engram memories
CodeGraph integration: enrich chunks with call graph metadata before embedding

FilesExpand file tree

vectorcode-spec.md

Latest commit

History

vectorcode-spec.md

File metadata and controls

VectorCode — Semantic Code Search MCP Server

1. Project Identity

2. Problem Statement

3. Architecture Overview

4. Directory Structure (Project Layout)

5. Data Models

5.1 Chunk

5.2 Index Metadata

5.3 Search Result

6. SQLite Schema

7. Embedding Provider System

7.1 Trait Definition

7.2 Provider Specifications

ONNX (default, offline)

Gemini

Ollama

OpenAI

8. AST-Aware Chunking System

8.1 Supported Languages

8.2 Chunking Strategy

Target AST Node Types (per language)

Chunking Algorithm

Chunk Metadata Extraction

Content Enrichment Before Embedding

9. Indexing Pipeline

9.1 Full Index (vectorcode index)

9.2 Incremental Sync (vectorcode sync)

9.3 Progress Reporting

9.4 Concurrency

10. Query Pipeline

10.1 Search Flow

10.2 Search Options

11. MCP Server Specification

11.1 Transport

11.2 Server Capabilities

11.3 Tool Definitions

vec_search

vec_status

vec_reindex

12. CLI Commands

12.1 Command Reference

12.2 vectorcode init

12.3 vectorcode index

12.4 vectorcode search

12.5 vectorcode serve

12.6 vectorcode install

13. Configuration

13.1 Config File Location

13.2 Config Schema

13.3 Environment Variable Overrides

14. File Watcher

14.1 Behavior

14.2 Staleness Banner

14.3 Connect-Time Catch-Up

15. Skill File

15.1 Location

15.2 Content

16. MCP instructions.md

16.1 Location

16.2 Content

17. Rust Crate Dependencies

18. Error Handling

18.1 Error Categories

18.2 MCP Error Responses

19. Testing Strategy

19.1 Unit Tests

19.2 Integration Tests

19.3 Fixture Projects

20. Distribution & Installation

20.1 Binary Distribution

20.2 Install Script

20.3 Homebrew (future)

21. Non-Functional Requirements

22. Future Considerations (Out of Scope for v0.1)

9.1 Full Index (`vectorcode index`)

9.2 Incremental Sync (`vectorcode sync`)

`vec_search`

`vec_status`

`vec_reindex`

12.2 `vectorcode init`

12.3 `vectorcode index`

12.4 `vectorcode search`

12.5 `vectorcode serve`

12.6 `vectorcode install`

16. MCP `instructions.md`