Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,17 +29,17 @@ Use Codegraph when you need fast structural answers about a repo without relying
- Export graph data as JSON, Mermaid, DOT, or SQLite, then inspect it from scripts, Markdown renderers, Graphviz, or SQL tools.
- Keep one workflow across source languages, monorepos, and graph-first document and template formats instead of stitching together separate tools.

For unfamiliar repos, start with `orient --root . --budget small --pretty`, then use `search` and `explain` to land on one concrete code anchor.
For unfamiliar repos with a concrete question, start with `explore "how does auth reach db?" --root . --pretty`; use `orient --root . --budget small --pretty` when you need a map before asking a question.
For daily change work, start with `review --base HEAD --head WORKTREE --summary`; use `impact --base HEAD --head WORKTREE --pretty` as the broader blast-radius map when needed.
Search is code-first by default in hybrid mode, and search, explain, and review packets now include analysis labels so reduced-mode or mixed-semantics runs stay visible.
Search is code-first by default in hybrid mode, and explore, search, explain, and review packets include analysis labels so reduced-mode or mixed-semantics runs stay visible.
Detailed command contracts and JSON shapes live in [docs/cli.md](./docs/cli.md).

## Features

- Multi-language dependency graphs, including imports, re-exports, `require()`, dynamic imports, workspace resolution, document links, stylesheet imports, and SFC script dependencies.
- Per-file symbol indexes with locals, exports, docstrings, line spans, and lightweight complexity metadata.
- Cross-file go-to-definition and find-references support across the shared source-language pipeline.
- Deterministic agent orientation, packet retrieval, search, bounded explanations, portable artifact bundles, and MCP tools across files, symbols, chunks, SQL objects, graph neighborhoods, and review ranges with stable follow-up targets.
- Deterministic agent exploration, orientation, packet retrieval, search, bounded explanations, portable artifact bundles, and MCP tools across files, symbols, chunks, SQL objects, graph neighborhoods, and review ranges with stable follow-up targets.
- Semantic chunking for code and text files, including Vue and Svelte single-file component block splitting.
- Duplicate and near-duplicate detection over indexed symbols, semantic chunks, text chunks, token fingerprints, and AST shape hashes when parser context is available.
- AST grep, public API summaries, unresolved import reports, hotspot analysis, cycle detection, and shortest dependency paths.
Expand Down Expand Up @@ -86,6 +86,9 @@ node ./dist/cli.js review --base HEAD --head WORKTREE --summary
# broader blast-radius map when the review packet needs expansion
node ./dist/cli.js impact --base HEAD --head WORKTREE --pretty

# one-call answer for a concrete repo question
node ./dist/cli.js explore "how does auth reach db?" --root . --pretty

# bounded repo orientation with next-step suggestions
node ./dist/cli.js orient --root . --budget small --pretty

Expand Down Expand Up @@ -131,7 +134,8 @@ Use these as starting points, then see [docs/cli.md](./docs/cli.md) for all flag
codegraph review --base HEAD --head WORKTREE --summary
codegraph impact --base HEAD --head WORKTREE --pretty

# repo orientation and bounded follow-up
# repo question, orientation, and bounded follow-up
codegraph explore "how does auth reach db?" --root . --pretty
codegraph orient --root . --budget small --pretty
codegraph search "build review report" --json
codegraph explain src/review.ts
Expand Down Expand Up @@ -340,7 +344,7 @@ For a custom location, use `codegraph skill install --target <path>/skills/codeg

## Using as a library

Use the TypeScript API when another program needs deterministic file packs, review packets, or model prompts. CLI `--pretty` and `--summary` output is also useful for model-readable triage, but library callers should keep structured fields until the final UI or prompt boundary. For repeated calls, prefer one warm `createCodeReviewSession()` or one agent/MCP session over rebuilding ad hoc indexes.
Use the TypeScript API when another program needs deterministic explore responses, file packs, review packets, or model prompts. CLI `--pretty` and `--summary` output is also useful for model-readable triage, but library callers should keep structured fields until the final UI or prompt boundary. For repeated calls, prefer one warm `createCodeReviewSession()` or one agent/MCP session over rebuilding ad hoc indexes.

```ts
import {
Expand Down Expand Up @@ -430,8 +434,8 @@ For the full capability matrix, limitations, and fixture coverage, see [docs/lan

- [docs/installation.md](./docs/installation.md): source checkout, scoped registry, release tarball, native runtime modes, and reduced-mode behavior
- [docs/cli.md](./docs/cli.md): command reference, output formats, SQLite schema, review bundles, and graph export usage
- [docs/library-api.md](./docs/library-api.md): agent orientation/packet/search/explain/artifacts, semantic chunking, indexing, graph APIs, read-only SQL, impact examples, and programmatic review output
- [docs/agent-workflows.md](./docs/agent-workflows.md): orientation packets, search anchors, MCP, sessions, streaming, tool wrappers, review bundles, and agent-oriented review recipes
- [docs/library-api.md](./docs/library-api.md): agent explore/orientation/packet/search/explain/artifacts, semantic chunking, indexing, graph APIs, read-only SQL, impact examples, and programmatic review output
- [docs/agent-workflows.md](./docs/agent-workflows.md): explore, orientation packets, search anchors, MCP, sessions, streaming, tool wrappers, review bundles, and agent-oriented review recipes
- [docs/mcp.md](./docs/mcp.md): MCP server setup, tool list, safety model, and client configuration examples
- [docs/how-it-works.md](./docs/how-it-works.md): performance, caching, native runtime behavior, architecture, and testing guidance
- [docs/language-parity.md](./docs/language-parity.md): per-language capability matrix
Expand Down
8 changes: 5 additions & 3 deletions codegraph-skill/codegraph/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Use Codegraph for structure-aware repo questions:
- symbol navigation with definitions, references, dependencies, and paths
- PR or worktree impact review with candidate tests and risk signals
- duplicate cleanup and refactor-risk triage
- bounded agent context through orientation, search, packets, explain, and MCP
- bounded agent context through explore, orientation, search, packets, explain, and MCP

Prefer plain text search for raw strings, logs, config keys, secrets, and exact literals.
Do not use Codegraph as the only evidence for runtime behavior; pair it with tests or execution.
Expand All @@ -24,10 +24,11 @@ For PR, worktree, or sweeping review tasks, start with the compact reviewer hand
codegraph review --base HEAD --head WORKTREE --summary
```

Use `codegraph impact --base HEAD --head WORKTREE --pretty` when you need the broader blast-radius map. For unfamiliar repos without a diff, start bounded with `codegraph orient --root . --budget small --pretty`.
Use `codegraph impact --base HEAD --head WORKTREE --pretty` when you need the broader blast-radius map. For unfamiliar repos without a diff, start bounded with `codegraph explore "how does auth reach db?" --root . --pretty` or `codegraph orient --root . --budget small --pretty` when no concrete question exists.
Use `doctor` only when install, native-runtime, or artifact health is the task.
Then choose the smallest useful follow-up:

- explore: `codegraph explore "how does auth reach db?" --pretty`
- packet: `codegraph packet get <file|symbol|sql-object|handle> --pretty`
- search: `codegraph search "auth user" --json`
- explain: `codegraph explain <file|symbol|sql-object|handle>`
Expand All @@ -54,6 +55,7 @@ Hybrid search is code-first by default, and search/explain packets include analy

Current high-value surfaces:

- `explore --pretty`: one-call question answer with anchors, packets, paths, blast radius, candidate tests, and follow-ups
- `orient --pretty`: ranked first-turn focus targets with copyable follow-ups
- `impact --pretty`: ranked "what could this break?" map
- `review --summary`: compact reviewer handoff
Expand All @@ -65,7 +67,7 @@ Treat duplicate leads and call-compatibility hints as review leads, not proof.
## MCP

If MCP tools are available, prefer them over repeated CLI invocations.
Use MCP `orient`, `search`, `packet_get`, `goto`, `refs`, `deps`, `rdeps`, `path`, `impact`, `review`, and `query_sqlite` first.
Use MCP `explore`, `orient`, `search`, `packet_get`, `goto`, `refs`, `deps`, `rdeps`, `path`, `impact`, `review`, and `query_sqlite` first.
After edits, check MCP response `freshness`: `refreshed` means Codegraph rebuilt before answering, and `stale` includes a reason plus bounded changed-file metadata before indexed context is trusted.
Fall back to CLI when MCP is unavailable.

Expand Down
15 changes: 14 additions & 1 deletion docs/agent-workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ codegraph impact --base HEAD --head WORKTREE --pretty
For an unfamiliar repo, keep the first loop bounded and actionable:

```bash
codegraph explore "how does auth reach db?" --root . --pretty
codegraph orient --root . --budget small --pretty
codegraph search "auth user" --json
codegraph explain <file-from-search-or-orient> --json
Expand All @@ -29,7 +30,7 @@ codegraph explain <file-from-search-or-orient> --json
For PR, worktree, or sweeping review tasks, prefer `review` first; use `impact` when you need the broader blast radius map instead of the reviewer handoff.

Use `doctor` only when package/runtime state or an existing artifact path is the question.
Use `search` when the agent has a query but no handle, `explain` when it already knows a file/symbol/SQL object/handle, and `inspect` for a human-readable architecture summary.
Use `explore` when the agent has a broad question and needs search anchors, packets, paths, blast radius, candidate tests, and follow-ups in one bounded response. Use `search` when it only needs anchors, `explain` when it already knows a file/symbol/SQL object/handle, and `inspect` for a human-readable architecture summary.
Use `artifact build` for durable handoff directories and `mcp serve` when repeated follow-up calls should share one warm repo session.

Choose output by the next consumer:
Expand All @@ -42,6 +43,18 @@ For durable repo-local scan scope, add `codegraph.config.json` at the project ro

For raw command flags and output contracts, see [docs/cli.md](./cli.md). For library types and wrappers, see [docs/library-api.md](./library-api.md).

## Explore facade

Start with `explore` when an agent can ask a concrete repo question:

```bash
codegraph explore "how does auth reach db?" --root . --pretty
codegraph explore src/auth.ts --json --limit 5 --max-packets 3
```

Explore orchestrates existing search, packet, path, reverse-dependency, and candidate-test surfaces. It returns `schemaVersion: 1`, the query, analysis metadata, summary bullets, anchors, bounded packets, dependency paths, blast radius, candidate tests, follow-ups, flat limits, and omission counts.
Use `--no-source` when the caller only needs anchors, paths, and follow-up commands.

## Orientation packets

Start with `orient` when an agent needs compact repo context without flooding the first prompt:
Expand Down
10 changes: 8 additions & 2 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ Default workflow:

- code review: `codegraph review --base HEAD --head WORKTREE --summary`
- blast-radius follow-up: `codegraph impact --base HEAD --head WORKTREE --pretty`
- unfamiliar repo: `codegraph orient --root . --budget small --pretty`
- unfamiliar repo: `codegraph explore "how does auth reach db?" --root . --pretty`
- first-turn map: `codegraph orient --root . --budget small --pretty`
- targeted follow-up: `codegraph search "<query>" --json` then `codegraph explain <handle|file|symbol>`

## Runtime selection
Expand Down Expand Up @@ -128,6 +129,8 @@ codegraph index --workers --threads 8 --cache disk
# Search for agent-ready anchors across symbols, paths, chunks, SQL objects, and graph context
codegraph orient --root . --budget small --pretty
codegraph orient --root . ./src --budget medium --json
codegraph explore "how does auth reach db?" --root . --pretty
codegraph explore src/auth.ts --json
codegraph search "build review report" --json
codegraph explain src/review.ts --json
codegraph packet get src/cli.ts --pretty
Expand Down Expand Up @@ -240,6 +243,7 @@ Short JSON shape:

#### Agent orientation and packets

- Use `explore --pretty` for a one-call repo question that combines search anchors, bounded packets, dependency paths, reverse dependencies, candidate tests, limits, omissions, and follow-ups. Use `--limit`, `--max-packets`, `--max-paths`, or `--no-source` to keep output small.
- Use `orient --pretty` as the compact first-turn reading surface for people or models; it prints the ranked `focus` targets and their follow-up commands before the scope sketch.
- Use `orient --json` when follow-up tools need exact focus reasons, limits, and omitted counts. Orient suppresses index rebuild warnings so stdout stays parseable.
- Small orientation budgets default to `--health skip`. Medium and large default to `--health summary`, which counts cycles and unresolved imports while omitting duplicate health; use `--health full` when exhaustive duplicate counts matter.
Expand All @@ -248,6 +252,8 @@ Short JSON shape:

`search` is deterministic and vectorless. Hybrid search is code-first by default: source symbols and implementation files outrank docs unless `--mode text` is explicit or docs are the strongest remaining evidence. Search JSON now includes top-level `analysis` metadata plus per-result `provenance` so mixed or reduced runs stay visible. `explain` resolves file paths, symbol names, SQL object names, and search handles into bounded packets with symbols, graph context, references, snippets, duplicate context, SQL facts, review tasks, candidate tests, analysis metadata, limits, omissions, and follow-ups. Use `--max-duplicates` to tune duplicate context in `explain` and `packet get`; duplicate context also uses an internal pair budget and reports skipped duplicate work through omission counts.

`explore` is a facade over existing primitives, not a second search engine. It returns `schemaVersion: 1`, the original query, `analysis`, summary bullets, anchors, packets, dependency paths, blast radius, candidate tests, follow-ups, flat limits, and omission counts.

For SQL, prefer handles or schema-qualified names when basenames may be ambiguous. Reference and snippet omission counts are lower bounds after bounded navigation reaches its cap.

#### Artifact bundles
Expand All @@ -260,7 +266,7 @@ For SQL, prefer handles or schema-qualified names when basenames may be ambiguou

#### MCP server

- `mcp serve` exposes navigation, search, impact, review, SQLite query, session refresh, and artifact-build tools.
- `mcp serve` exposes explore, navigation, search, impact, review, SQLite query, session refresh, and artifact-build tools.
- MCP uses stdio by default or Streamable HTTP with `--port <number>`.
- Startup is lazy by default; `--warmup` builds the base session cache before serving requests, and `--warmup-symbols` also builds the detailed symbol graph.
- Index-backed responses include `freshness`; small file changes auto-refresh, while stale responses include a reason, total changed-file count, and a bounded changed-file sample.
Expand Down
16 changes: 14 additions & 2 deletions docs/library-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ Small orientation budgets default to `health: "skip"` and set health fields to `
`searchCodegraph()` builds a project snapshot and returns deterministic, agent-ready anchors across files, symbols, chunks, SQL objects, and optional graph neighborhoods. Hybrid search is code-first by default, so implementation files and symbols outrank docs unless `mode: "text"` is explicit or docs are the strongest remaining evidence. Identifier-like queries stay symbol-first. Pure `path` and `text` searches skip detailed symbol graph construction; hybrid, symbol, SQL, and graph searches keep symbol-aware ranking and neighbors. Handles are project-relative and explainable; result packets include top-level `analysis`, per-result `provenance`, `resultCount`, `totalCandidates`, `limits`, and `omittedCounts`.

```ts
import { buildCodegraphArtifact, explainCodegraphTarget, searchCodegraph } from "@lzehrung/codegraph";
import { buildCodegraphArtifact, explainCodegraphTarget, exploreCodegraph, searchCodegraph } from "@lzehrung/codegraph";

const response = await searchCodegraph({
root: process.cwd(),
Expand All @@ -126,8 +126,20 @@ const response = await searchCodegraph({

const first = response.results[0];
console.log(first?.handle, first?.rankReasons, first?.omittedCounts, first?.followUps);

const explored = await exploreCodegraph({
root: process.cwd(),
query: "how does auth reach db?",
limit: 5,
maxPackets: 3,
maxPaths: 3,
});

console.log(explored.summary, explored.paths, explored.followUps);
```

Use `exploreCodegraph()` when the caller has a broad question and needs one bounded response over the existing search, packet, path, reverse-dependency, and candidate-test surfaces. The response has `schemaVersion: 1`, the original query, `analysis`, summary bullets, anchors, packets, paths, blast radius, candidate tests, follow-ups, flat limits, and omission counts.

Use `mode: "sql"` for SQL objects, or pass `from` plus `depth` with `mode: "graph"` to boost matches near a file path, file/chunk/graph handle, symbol handle, SQL handle, or symbol name.

`explainCodegraphTarget()` resolves a file path, symbol name, SQL object name, or search handle into a bounded packet for follow-up agent work. Explanations include the same top-level `analysis` label as search so reduced or mixed runs stay visible. SQL object names resolve by exact name first; unqualified basenames resolve only when unique. File and symbol explanations also include bounded medium-or-higher duplicate context that touches the target, with stable handles and conservative repair hints. SQL related objects include a `relation` such as `incoming:reads_from`, `outgoing:writes_to`, or `same_file`. With changed context enabled, the packet includes compact review tasks and candidate tests:
Expand Down Expand Up @@ -162,7 +174,7 @@ console.log(artifact.manifestPath, artifact.artifacts);

The `graph.json` artifact is self-describing (`schemaVersion: 1`, `format: "codegraph.graph-json"`) and uses project-relative file paths and portable symbol handles. `questions.json` uses the same stable handles for follow-up commands. With `force: true`, stale known Codegraph artifact files are removed before the selected outputs are written; unrelated files in the directory are preserved.

`createAgentSession()` keeps one in-process project snapshot warm for repeated orient, search, explain, packet, artifact, and MCP calls. It uses incremental indexing with disk cache by default, auto-enables native workers for large cold builds, and carries forward top-level analysis metadata from the build report.
`createAgentSession()` keeps one in-process project snapshot warm for repeated explore, orient, search, explain, packet, artifact, and MCP calls. It uses incremental indexing with disk cache by default, auto-enables native workers for large cold builds, and carries forward top-level analysis metadata from the build report.
Session callers can use `freshness: { policy: "check" | "auto" | "manual" }` plus `checkFreshness()` to detect file edits before reusing a warm snapshot. `check` reports stale state without invalidating, `auto` invalidates for bounded changes, and stale results include `changedFileCount`, `omittedChangedFileCount`, `reason`, and a bounded changed-file sample.
Set `buildOptions.useNativeWorkers` to `false` to opt out. Use `buildCodegraphArtifactWithSession()` when a host already has a session and wants SQLite, graph JSON, report, questions, and manifest outputs from the same snapshot. `createCodegraphMcpHandlers()` exposes the same primitives without starting stdio, which is useful for tests or host applications:

Expand Down
Loading