diff --git a/.claude/skills/opencodehub-guide/SKILL.md b/.claude/skills/opencodehub-guide/SKILL.md index d47419dc..dd420fa2 100644 --- a/.claude/skills/opencodehub-guide/SKILL.md +++ b/.claude/skills/opencodehub-guide/SKILL.md @@ -5,7 +5,7 @@ description: "Use when the user asks about OpenCodeHub itself — available MCP # OpenCodeHub Guide -Quick reference for every OpenCodeHub MCP tool, MCP resource, and the DuckDB-backed graph schema. +Quick reference for every OpenCodeHub MCP tool, MCP resource, and the graph + temporal store schema. ## Always Start Here @@ -40,6 +40,7 @@ for the scope rationale. | Draft a PR description from the current diff | `codehub-pr-description` | "write the PR description", "summarize this branch" | | Write an onboarding guide with reading order | `codehub-onboarding` | "write ONBOARDING.md", "what should a new hire read first" | | Map inter-repo contracts for a group | `codehub-contract-map` | "map the contracts", "show the contract matrix for " | +| Build a deterministic 9-item code-pack BOM | `codehub-code-pack` | "pack this repo for an LLM", "deterministic code pack", "pack the platform group" | | Draft an ADR (P1 — not yet shipped) | `codehub-adr` *(P1 backlog)* | — | Fire these directly; do not nest them inside analysis skills. Each is a @@ -57,7 +58,7 @@ standalone artifact producer with its own preconditions and output path. | `mcp__opencodehub__impact` | Blast radius with risk tier + `confidenceBreakdown` | | `mcp__opencodehub__detect_changes` | Map an uncommitted or committed diff to affected symbols and flows | | `mcp__opencodehub__rename` | Graph-assisted multi-file rename; dry-run by default | -| `mcp__opencodehub__sql` | Read-only DuckDB SQL against the graph (5 s timeout) | +| `mcp__opencodehub__sql` | Read-only query: `sql` arg → temporal DuckDB (cochanges/summaries); `cypher` arg → lbug graph (5 s timeout) | | `mcp__opencodehub__signature` | Function signature lookup for a target symbol | ### HTTP / RPC surface @@ -111,63 +112,97 @@ Lightweight reads for navigation (every URI uses the `codehub://` scheme): | `codehub://repo/{name}/context` | Stats + staleness envelope | | `codehub://repo/{name}/schema` | Live node kinds / relation types for `sql` | -> Cluster and process navigation resources (`codehub://repo/{name}/clusters`, `codehub://repo/{name}/processes`, etc.) are slated for a later wave. Use `sql` against the `nodes` table filtered to `kind = 'Community'` or `kind = 'Process'` in the meantime. +> Cluster and process navigation resources (`codehub://repo/{name}/clusters`, `codehub://repo/{name}/processes`, etc.) are slated for a later wave. Until then, use the typed tools or Cypher (below) filtered to `kind = 'Community'` / `kind = 'Process'`. -## Graph schema +## Where the graph lives (ADR 0016) -The graph is a DuckDB-backed store. One unified `nodes` table, one `relations` table, an `embeddings` table, a `cochanges` side table, and `store_meta`. +There are **two stores**, and they are queried differently: -**Node kinds** (load-bearing order — new kinds are appended): -File, Folder, Function, Class, Method, Interface, Constructor, Struct, Enum, Macro, Typedef, Union, Namespace, Trait, Impl, TypeAlias, Const, Static, Variable, Property, Record, Delegate, Annotation, Template, Module, CodeElement, Community, Process, Route, Tool. +- **Graph tier — `graph.lbug`** (ladybug, Cypher dialect). Holds nodes, edges, + and embeddings. Query it via the typed tools (`query` / `context` / `impact` / + `route_map` / …) or, for bespoke questions, **Cypher** via the MCP `sql` + tool's `cypher` argument. There is NO `nodes` or `relations` SQL table. +- **Temporal tier — `temporal.duckdb`** (DuckDB SQL). Holds only the + `cochanges` and `symbol_summaries` tables. The `sql` argument of the MCP + `sql` tool (and `codehub sql` on the CLI) targets THIS store. -**Relation types** (append-only): -CONTAINS, DEFINES, IMPORTS, CALLS, EXTENDS, IMPLEMENTS, HAS_METHOD, HAS_PROPERTY, ACCESSES, METHOD_OVERRIDES, OVERRIDES, METHOD_IMPLEMENTS, MEMBER_OF, PROCESS_STEP, HANDLES_ROUTE, FETCHES, HANDLES_TOOL, ENTRY_POINT_OF, WRAPS, QUERIES, REFERENCES, FOUND_IN, DEPENDS_ON, OWNED_BY. +Pass exactly one of `sql` (temporal DuckDB) or `cypher` (lbug graph) to the MCP +`sql` tool. -Cochange edges live in a **separate `cochanges` table**, NOT in `relations`. Do not query `relations` for them. +### Graph schema (lbug / Cypher) -## SQL cheat-sheet (use `mcp__opencodehub__sql`) +One node label `CodeNode` carrying `kind` as a **property** (NOT a per-kind +label). One relationship table per relation type. Properties are **snake_case** +(`file_path`, `start_line`, `inferred_label`, `step_count`, `entry_point_id`); +a camelCase RETURN alias comes back as the alias you give it, but the stored +property names are snake_case. + +**Node kinds** (`n.kind` values): File, Folder, Function, Class, Method, +Interface, Constructor, Struct, Enum, Macro, Typedef, Union, Namespace, Trait, +Impl, TypeAlias, Const, Static, Variable, Property, Record, Delegate, +Annotation, Template, Module, CodeElement, Community, Process, Route, Tool, +Finding, Dependency, Contributor, Repo, ProjectProfile, Section. + +**Relationship types** (each is its own edge label): CONTAINS, DEFINES, IMPORTS, +CALLS, EXTENDS, IMPLEMENTS, HAS_METHOD, HAS_PROPERTY, ACCESSES, METHOD_OVERRIDES, +OVERRIDES, METHOD_IMPLEMENTS, MEMBER_OF, PROCESS_STEP, HANDLES_ROUTE, FETCHES, +HANDLES_TOOL, ENTRY_POINT_OF, WRAPS, QUERIES, REFERENCES, FOUND_IN, DEPENDS_ON, +OWNED_BY. + +Cochanges live only in the **temporal** `cochanges` table (DuckDB SQL), never as +graph edges. + +## Cypher cheat-sheet (MCP `sql` tool, `cypher` arg) All inbound callers of a function by name: -```sql -SELECT caller.name, caller.file_path, caller.start_line, r.confidence, r.reason -FROM relations r -JOIN nodes caller ON caller.id = r.from_id -JOIN nodes callee ON callee.id = r.to_id -WHERE r.type = 'CALLS' - AND callee.name = 'validateUser' - AND callee.kind = 'Function' +```cypher +MATCH (caller:CodeNode)-[r:CALLS]->(callee:CodeNode) +WHERE callee.name = 'validateUser' AND callee.kind = 'Function' +RETURN caller.name AS name, caller.file_path AS file, caller.start_line AS line, + r.confidence AS confidence, r.reason AS reason ORDER BY r.confidence DESC -LIMIT 50; +LIMIT 50 ``` Top communities by cohesion: -```sql -SELECT name, inferred_label, cohesion, symbol_count, keywords -FROM nodes -WHERE kind = 'Community' -ORDER BY cohesion DESC -LIMIT 20; +```cypher +MATCH (n:CodeNode) +WHERE n.kind = 'Community' +RETURN n.name AS name, n.inferred_label AS label, n.cohesion AS cohesion, + n.symbol_count AS symbols +ORDER BY n.cohesion DESC +LIMIT 20 ``` Process entry points: -```sql -SELECT n.name, n.inferred_label, n.step_count, entry.name AS entry_point -FROM nodes n -LEFT JOIN nodes entry ON entry.id = n.entry_point_id +```cypher +MATCH (n:CodeNode) WHERE n.kind = 'Process' -ORDER BY n.step_count DESC; +RETURN n.name AS name, n.inferred_label AS label, n.step_count AS steps, + n.entry_point_id AS entry_point +ORDER BY n.step_count DESC +``` + +SCIP-confirmed CALLS edges only (strict impact): + +```cypher +MATCH ()-[r:CALLS]->() +WHERE r.confidence >= 0.95 AND r.reason STARTS WITH 'scip:' +RETURN r ``` -SCIP-confirmed edges only (for strict impact queries): +### Temporal SQL cheat-sheet (MCP `sql` tool, `sql` arg) + +Tightest co-change pairs (DuckDB SQL — temporal store): ```sql -SELECT from_id, to_id, type, reason -FROM relations -WHERE confidence >= 0.95 - AND reason LIKE 'scip:%'; +SELECT source_file, target_file, lift, cocommit_count +FROM cochanges +ORDER BY lift DESC +LIMIT 20; ``` ## Invariants agents must respect diff --git a/AGENTS.md b/AGENTS.md index b01398bf..bc86bf5b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -14,7 +14,7 @@ tiers. - `impact` — dependents of a target up to a configurable depth, with a risk tier. - `detect_changes` — map an uncommitted or committed diff to affected symbols. - `rename` — graph-assisted multi-file rename; dry-run is the default. -- `sql` — read-only SQL against the local graph store with a 5 s timeout. +- `sql` — read-only SQL against the local temporal store (the `cochanges` and `symbol_summaries` tables), 5 s timeout. The node/edge graph lives in `graph.lbug` (ADR 0016) and is reached via the typed tools (`query`/`context`/`impact`) or Cypher via the MCP `sql` tool's `cypher` arg — NOT via this SQL path. Run `codehub analyze` after pulling new commits so the index stays aligned with the working tree. `codehub status` reports staleness. diff --git a/CLAUDE.md b/CLAUDE.md index 73db824c..e781f027 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -11,7 +11,7 @@ tiers. - `impact` — dependents of a target up to a configurable depth, with a risk tier. - `detect_changes` — map an uncommitted or committed diff to affected symbols. - `rename` — graph-assisted multi-file rename; dry-run is the default. -- `sql` — read-only SQL against the local graph store with a 5 s timeout. +- `sql` — read-only SQL against the local temporal store (the `cochanges` and `symbol_summaries` tables), 5 s timeout. The node/edge graph lives in `graph.lbug` (ADR 0016) and is reached via the typed tools (`query`/`context`/`impact`) or Cypher via the MCP `sql` tool's `cypher` arg — NOT via this SQL path. Run `codehub analyze` after pulling new commits so the index stays aligned with the working tree. `codehub status` reports staleness. diff --git a/packages/cli/README.md b/packages/cli/README.md index 90094782..d02719bf 100644 --- a/packages/cli/README.md +++ b/packages/cli/README.md @@ -63,7 +63,7 @@ top-level subcommands by phase of the workflow. | `doctor` | Probe the local environment and print actionable hints | | `ci-init` | Emit GitHub Actions / GitLab CI workflow scaffolds | | `augment` | Fast-path BM25 enrichment for editor PreToolUse hooks | -| `sql` | Read-only SQL against the graph store with a 5 s timeout | +| `sql` | Read-only SQL against the temporal store (cochanges + symbol_summaries) | | `group ` | Cross-repo groups: `create`, `list`, `delete`, `status`, `query`, `sync` | ## Design diff --git a/packages/cli/src/agent-context.ts b/packages/cli/src/agent-context.ts index 20fe28cd..1db8fb95 100644 --- a/packages/cli/src/agent-context.ts +++ b/packages/cli/src/agent-context.ts @@ -38,7 +38,7 @@ tiers. - \`impact\` — dependents of a target up to a configurable depth, with a risk tier. - \`detect_changes\` — map an uncommitted or committed diff to affected symbols. - \`rename\` — graph-assisted multi-file rename; dry-run is the default. -- \`sql\` — read-only SQL against the local graph store with a 5 s timeout. +- \`sql\` — read-only SQL against the local temporal store (cochanges + symbol_summaries), 5 s timeout; the node/edge graph is queried via the typed tools or Cypher via the MCP \`sql\` tool. Run \`codehub analyze\` after pulling new commits so the index stays aligned with the working tree. \`codehub status\` reports staleness. diff --git a/packages/cli/src/index.ts b/packages/cli/src/index.ts index 26bc9c90..3f996d3e 100644 --- a/packages/cli/src/index.ts +++ b/packages/cli/src/index.ts @@ -771,7 +771,9 @@ program program .command("sql ") - .description("Run a read-only SQL query against the graph store") + .description( + "Run a read-only SQL query against the temporal store (cochanges + symbol_summaries); the node/edge graph is queried via the typed tools or Cypher", + ) .option("--repo ", "Registered repo name (default: current directory)") .option("--timeout ", "Per-query timeout in ms", (v) => Number.parseInt(v, 10), 5_000) .option("--json", "Emit JSON on stdout") diff --git a/packages/docs/src/content/docs/agents/tool-decision-matrix.mdx b/packages/docs/src/content/docs/agents/tool-decision-matrix.mdx index 738b6f2a..61b6b09d 100644 --- a/packages/docs/src/content/docs/agents/tool-decision-matrix.mdx +++ b/packages/docs/src/content/docs/agents/tool-decision-matrix.mdx @@ -36,7 +36,7 @@ anti-pattern column says what _not_ to reach for first. | "What's the license tier of my deps?" | `license_audit` | Tiers each transitive dep: permissive / weak-copyleft / strong-copyleft / proprietary / unknown. | `license-checker` raw output. | | "Which areas are getting riskier?" | `risk_trends` | Per-community trend lines + 30-day projection from temporal data. | One-off risk snapshots. | | "Who is changing what most, and where" | `risk_trends` + `owners` | Trends point to communities; `owners` names the people. | Either alone. | -| "Bespoke graph query I can't express above" | `sql` | Read-only SQL against the local graph store, 5s timeout. | When a typed tool covers it — typed tools return `next_steps`. | +| "Bespoke temporal query (cochanges / summaries)" | `sql` | Read-only SQL against the temporal store (cochanges + symbol_summaries), 5s timeout. NOT the node/edge graph. | A typed tool covers it; or you need the graph — use Cypher (MCP `sql` `cypher` arg). | ## Cross-repo group intents diff --git a/packages/docs/src/content/docs/reference/cli.md b/packages/docs/src/content/docs/reference/cli.md index 0409a7ec..d89928de 100644 --- a/packages/docs/src/content/docs/reference/cli.md +++ b/packages/docs/src/content/docs/reference/cli.md @@ -380,7 +380,11 @@ codehub augment ## `sql` -Read-only SQL against the graph store. 5-second timeout by default. +Read-only SQL against the **temporal store** — the DuckDB-backed `cochanges` and +`symbol_summaries` tables. 5-second timeout by default. The node/edge graph lives +in `graph.lbug` (see ADR 0016) and is **not** reachable from this SQL path; query +it via the typed tools (`query` / `context` / `impact`) or Cypher via the MCP `sql` +tool. ```bash title="usage" codehub sql diff --git a/packages/mcp/src/tools/dependencies.ts b/packages/mcp/src/tools/dependencies.ts index f4ea7204..b5f42843 100644 --- a/packages/mcp/src/tools/dependencies.ts +++ b/packages/mcp/src/tools/dependencies.ts @@ -122,7 +122,7 @@ export async function runDependencies( ] : [ "call `query` with one of the names above to find import sites", - "call `sql` with 'SELECT * FROM relations WHERE type = ''DEPENDS_ON''' for the raw edges", + "call `sql` with cypher 'MATCH ()-[r:DEPENDS_ON]->() RETURN r' for the raw edges", ]; return withNextSteps( diff --git a/packages/mcp/src/tools/list-findings.ts b/packages/mcp/src/tools/list-findings.ts index a4bbe117..bd85ce08 100644 --- a/packages/mcp/src/tools/list-findings.ts +++ b/packages/mcp/src/tools/list-findings.ts @@ -149,7 +149,7 @@ export async function runListFindings( ] : [ "call `context` with a finding's file path for caller/callee neighbours", - "call `sql` with 'SELECT * FROM relations WHERE type = ''FOUND_IN''' for raw edges", + "call `sql` with cypher 'MATCH ()-[r:FOUND_IN]->() RETURN r' for raw edges", ]; return withNextSteps( diff --git a/plugins/opencodehub/skills/opencodehub-guide/SKILL.md b/plugins/opencodehub/skills/opencodehub-guide/SKILL.md index 466713ae..dd420fa2 100644 --- a/plugins/opencodehub/skills/opencodehub-guide/SKILL.md +++ b/plugins/opencodehub/skills/opencodehub-guide/SKILL.md @@ -5,7 +5,7 @@ description: "Use when the user asks about OpenCodeHub itself — available MCP # OpenCodeHub Guide -Quick reference for every OpenCodeHub MCP tool, MCP resource, and the DuckDB-backed graph schema. +Quick reference for every OpenCodeHub MCP tool, MCP resource, and the graph + temporal store schema. ## Always Start Here @@ -58,7 +58,7 @@ standalone artifact producer with its own preconditions and output path. | `mcp__opencodehub__impact` | Blast radius with risk tier + `confidenceBreakdown` | | `mcp__opencodehub__detect_changes` | Map an uncommitted or committed diff to affected symbols and flows | | `mcp__opencodehub__rename` | Graph-assisted multi-file rename; dry-run by default | -| `mcp__opencodehub__sql` | Read-only DuckDB SQL against the graph (5 s timeout) | +| `mcp__opencodehub__sql` | Read-only query: `sql` arg → temporal DuckDB (cochanges/summaries); `cypher` arg → lbug graph (5 s timeout) | | `mcp__opencodehub__signature` | Function signature lookup for a target symbol | ### HTTP / RPC surface @@ -112,63 +112,97 @@ Lightweight reads for navigation (every URI uses the `codehub://` scheme): | `codehub://repo/{name}/context` | Stats + staleness envelope | | `codehub://repo/{name}/schema` | Live node kinds / relation types for `sql` | -> Cluster and process navigation resources (`codehub://repo/{name}/clusters`, `codehub://repo/{name}/processes`, etc.) are slated for a later wave. Use `sql` against the `nodes` table filtered to `kind = 'Community'` or `kind = 'Process'` in the meantime. +> Cluster and process navigation resources (`codehub://repo/{name}/clusters`, `codehub://repo/{name}/processes`, etc.) are slated for a later wave. Until then, use the typed tools or Cypher (below) filtered to `kind = 'Community'` / `kind = 'Process'`. -## Graph schema +## Where the graph lives (ADR 0016) -The graph is a DuckDB-backed store. One unified `nodes` table, one `relations` table, an `embeddings` table, a `cochanges` side table, and `store_meta`. +There are **two stores**, and they are queried differently: -**Node kinds** (load-bearing order — new kinds are appended): -File, Folder, Function, Class, Method, Interface, Constructor, Struct, Enum, Macro, Typedef, Union, Namespace, Trait, Impl, TypeAlias, Const, Static, Variable, Property, Record, Delegate, Annotation, Template, Module, CodeElement, Community, Process, Route, Tool. +- **Graph tier — `graph.lbug`** (ladybug, Cypher dialect). Holds nodes, edges, + and embeddings. Query it via the typed tools (`query` / `context` / `impact` / + `route_map` / …) or, for bespoke questions, **Cypher** via the MCP `sql` + tool's `cypher` argument. There is NO `nodes` or `relations` SQL table. +- **Temporal tier — `temporal.duckdb`** (DuckDB SQL). Holds only the + `cochanges` and `symbol_summaries` tables. The `sql` argument of the MCP + `sql` tool (and `codehub sql` on the CLI) targets THIS store. -**Relation types** (append-only): -CONTAINS, DEFINES, IMPORTS, CALLS, EXTENDS, IMPLEMENTS, HAS_METHOD, HAS_PROPERTY, ACCESSES, METHOD_OVERRIDES, OVERRIDES, METHOD_IMPLEMENTS, MEMBER_OF, PROCESS_STEP, HANDLES_ROUTE, FETCHES, HANDLES_TOOL, ENTRY_POINT_OF, WRAPS, QUERIES, REFERENCES, FOUND_IN, DEPENDS_ON, OWNED_BY. +Pass exactly one of `sql` (temporal DuckDB) or `cypher` (lbug graph) to the MCP +`sql` tool. -Cochange edges live in a **separate `cochanges` table**, NOT in `relations`. Do not query `relations` for them. +### Graph schema (lbug / Cypher) -## SQL cheat-sheet (use `mcp__opencodehub__sql`) +One node label `CodeNode` carrying `kind` as a **property** (NOT a per-kind +label). One relationship table per relation type. Properties are **snake_case** +(`file_path`, `start_line`, `inferred_label`, `step_count`, `entry_point_id`); +a camelCase RETURN alias comes back as the alias you give it, but the stored +property names are snake_case. + +**Node kinds** (`n.kind` values): File, Folder, Function, Class, Method, +Interface, Constructor, Struct, Enum, Macro, Typedef, Union, Namespace, Trait, +Impl, TypeAlias, Const, Static, Variable, Property, Record, Delegate, +Annotation, Template, Module, CodeElement, Community, Process, Route, Tool, +Finding, Dependency, Contributor, Repo, ProjectProfile, Section. + +**Relationship types** (each is its own edge label): CONTAINS, DEFINES, IMPORTS, +CALLS, EXTENDS, IMPLEMENTS, HAS_METHOD, HAS_PROPERTY, ACCESSES, METHOD_OVERRIDES, +OVERRIDES, METHOD_IMPLEMENTS, MEMBER_OF, PROCESS_STEP, HANDLES_ROUTE, FETCHES, +HANDLES_TOOL, ENTRY_POINT_OF, WRAPS, QUERIES, REFERENCES, FOUND_IN, DEPENDS_ON, +OWNED_BY. + +Cochanges live only in the **temporal** `cochanges` table (DuckDB SQL), never as +graph edges. + +## Cypher cheat-sheet (MCP `sql` tool, `cypher` arg) All inbound callers of a function by name: -```sql -SELECT caller.name, caller.file_path, caller.start_line, r.confidence, r.reason -FROM relations r -JOIN nodes caller ON caller.id = r.from_id -JOIN nodes callee ON callee.id = r.to_id -WHERE r.type = 'CALLS' - AND callee.name = 'validateUser' - AND callee.kind = 'Function' +```cypher +MATCH (caller:CodeNode)-[r:CALLS]->(callee:CodeNode) +WHERE callee.name = 'validateUser' AND callee.kind = 'Function' +RETURN caller.name AS name, caller.file_path AS file, caller.start_line AS line, + r.confidence AS confidence, r.reason AS reason ORDER BY r.confidence DESC -LIMIT 50; +LIMIT 50 ``` Top communities by cohesion: -```sql -SELECT name, inferred_label, cohesion, symbol_count, keywords -FROM nodes -WHERE kind = 'Community' -ORDER BY cohesion DESC -LIMIT 20; +```cypher +MATCH (n:CodeNode) +WHERE n.kind = 'Community' +RETURN n.name AS name, n.inferred_label AS label, n.cohesion AS cohesion, + n.symbol_count AS symbols +ORDER BY n.cohesion DESC +LIMIT 20 ``` Process entry points: -```sql -SELECT n.name, n.inferred_label, n.step_count, entry.name AS entry_point -FROM nodes n -LEFT JOIN nodes entry ON entry.id = n.entry_point_id +```cypher +MATCH (n:CodeNode) WHERE n.kind = 'Process' -ORDER BY n.step_count DESC; +RETURN n.name AS name, n.inferred_label AS label, n.step_count AS steps, + n.entry_point_id AS entry_point +ORDER BY n.step_count DESC +``` + +SCIP-confirmed CALLS edges only (strict impact): + +```cypher +MATCH ()-[r:CALLS]->() +WHERE r.confidence >= 0.95 AND r.reason STARTS WITH 'scip:' +RETURN r ``` -SCIP-confirmed edges only (for strict impact queries): +### Temporal SQL cheat-sheet (MCP `sql` tool, `sql` arg) + +Tightest co-change pairs (DuckDB SQL — temporal store): ```sql -SELECT from_id, to_id, type, reason -FROM relations -WHERE confidence >= 0.95 - AND reason LIKE 'scip:%'; +SELECT source_file, target_file, lift, cocommit_count +FROM cochanges +ORDER BY lift DESC +LIMIT 20; ``` ## Invariants agents must respect