diff --git a/.agents/rules/verify-after-each-step.md b/.agents/rules/verify-after-each-step.md index 0807aec1..b19c48e1 100644 --- a/.agents/rules/verify-after-each-step.md +++ b/.agents/rules/verify-after-each-step.md @@ -19,12 +19,14 @@ AI agents tend to chain many edits across files and only discover breakage at co ## Current Per-File Checks (from `lint-staged.config.js`) -| File pattern | Checks | -| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `*.{js,jsx,ts,tsx,mjs,mts,cjs,cts}` | `bun run format:check`, `bun run lint` | -| `*.{css,json,md,mdc,html,yaml,yml}` | `bun run format:check` | -| `*.{ts,tsx}` | `bun run typecheck` with a temporary `tsconfig.lint-staged.json` that includes only **staged files under `src/`** (project-wide types still interconnect — use `bun run typecheck` if you need full-project certainty) | -| `*.test.ts` | `bun test` (on changed test files) | +| File pattern | Checks | +| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `*.{js,jsx,ts,tsx,mjs,mts,cjs,cts}` | `bun run format:check`, `bun run lint` | +| `*.{css,json,md,mdc,html,yaml,yml}` | `bun run format:check` | +| `*.{ts,tsx}` | `bun run typecheck` with a temporary `tsconfig.lint-staged.json` that includes only **staged files under `src/`** (project-wide types still interconnect — use `bun run typecheck` if you need full-project certainty); **`bun test`** on co-located `*.test.{ts,tsx}` pairs when a staged `src/` source file is present but its test file is not | +| `*.test.ts` | `bun test` (on changed test files) | +| `*.test.tsx` | `bun test` (on changed test files) | +| `scripts/**/*.test.mjs` | `bun test` (on changed test files) | ## What Counts as a Step @@ -41,6 +43,6 @@ A "step" is any self-contained unit of work where you've finished editing and ar 1. **Verify after every step** — Run the matching checks on every file you touched during that step before moving to the next one. 2. **Fix before moving on** — If any check fails, fix it immediately while context is fresh. Never carry forward known failures. 3. **Use the right scope** — Run `bun run lint` and `bun run format:check` on specific files when possible. Prefer `bun run typecheck` project-wide when types may depend on unstaged files. -4. **Run affected tests** — If you modified or created `*.test.ts` files, run `bun test ` on them. +4. **Run affected tests** — If you modified or created `*.test.ts` / `*.test.tsx` / `scripts/**/*.test.mjs` files, run `bun test ` on them; if you changed a `src/` source file with a co-located test, run that pair even when the test file itself is unstaged. 5. **Re-index before querying Codemap** — If you changed indexed source and plan to run SQL against the structural index next, run `bun src/index.ts --files ` with paths **relative to the indexed project root** (set `CODEMAP_TEST_BENCH` / `CODEMAP_ROOT` or `--root` so that root is correct — see [docs/benchmark.md § Indexing another project](../../docs/benchmark.md#indexing-another-project)). 6. **Don't duplicate the hook's job** — You don't need to re-verify at commit time; the pre-commit hook (`lint-staged`) handles that automatically when AI/agent env vars trigger it. Your job is to stay green _between_ commits. diff --git a/.changeset/tiered-lookup-fast-paths.md b/.changeset/tiered-lookup-fast-paths.md new file mode 100644 index 00000000..827c9c59 --- /dev/null +++ b/.changeset/tiered-lookup-fast-paths.md @@ -0,0 +1,5 @@ +--- +"@stainless-code/codemap": patch +--- + +`show` and `snippet` now use fast equality lookup for exact `name` and lone `name:Token` queries (no wildcards); substring, multi-field, and FTS paths stay on the broader slow tier. CLI help, MCP tool descriptions, and bundled agent guidance document the two tiers. diff --git a/docs/architecture.md b/docs/architecture.md index adac4929..5ea9cb29 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -190,7 +190,7 @@ Three **mutually exclusive** CLI entry shapes; all converge on `applyDiffPayload **Backlog (not rejected):** `organize-imports` diff-shape recipe; `codemap-to-tsmorph` Path B adapter — tracked in [roadmap.md § Backlog](./roadmap.md#backlog). **Tracked elsewhere:** C.9 entry-point integration — [`plans/c9-plugin-layer.md`](./plans/c9-plugin-layer.md). -**Show / snippet wiring:** **`src/cli/show-snippet-args.ts`** (shared argv parser) + **`src/cli/show-snippet-render.ts`** (shared terminal/JSON error helpers) + **`src/cli/cmd-show.ts`** + **`src/cli/cmd-snippet.ts`** — sibling CLI verbs sharing the same parser shape (`` or **`--query ''`** + **`--with-fts`** + `--kind` + `--in ` + `--json`; show adds **`--print-sql`**) and the pure engines **`src/application/show-engine.ts`** (exact lookup + envelope builders), **`src/application/search-query-parser.ts`** + **`src/application/search-engine.ts`** (field-qualified search → parameterized SQL on `symbols`, optional `source_fts` join), and **`src/application/show-search-mode.ts`** (shared parse/normalize + FTS resolution + **`executeShowLookup`** + **`formatShowSearchSqlForQuery`** for CLI/MCP/HTTP). Exact lookup: `findSymbolsByName({db, name, kind?, inPath?})`. Query lookup: `searchSymbols({db, parsed, withFts?})`. Snippet FS read: `readSymbolSource({match, projectRoot, indexedContentHash?})` + `getIndexedContentHash(db, filePath)`. **`buildShowResult`** + **`buildSnippetResult`** envelope builders — same engines the MCP show/snippet tools call. Both verbs return the same `{matches, disambiguation?, warning?}` envelope — single match → `{matches: [{...}]}`; multi-match adds `{n, by_kind, files, hint}`; optional **`warning`** when FTS was requested but `source_fts` is empty. Snippet matches add `source` / `stale` / `missing` fields (additive — no shape divergence). **`--in `** and **`path:`** inside **`--query`** normalize through `toProjectRelative(projectRoot, p)` (from **`src/application/validate-engine.ts`**). Stale-file behavior on `snippet`: `hashContent` (from **`src/hash.ts`**) compares on-disk content against `files.content_hash`; mismatch sets `stale: true` but source IS still returned. MCP tools `show` and `snippet` register parallel to the CLI surface (see [§ MCP wiring](#cli-usage)). +**Show / snippet wiring:** **`src/cli/show-snippet-args.ts`** (shared argv parser) + **`src/cli/show-snippet-render.ts`** (shared terminal/JSON error helpers) + **`src/cli/cmd-show.ts`** + **`src/cli/cmd-snippet.ts`** — sibling CLI verbs sharing the same parser shape (`` or **`--query ''`** + **`--with-fts`** + `--kind` + `--in ` + `--json`; show adds **`--print-sql`**) and the pure engines **`src/application/show-engine.ts`** (exact lookup + envelope builders), **`src/application/search-query-parser.ts`** + **`src/application/search-engine.ts`** (field-qualified search → parameterized SQL on `symbols`, optional `source_fts` join), and **`src/application/show-search-mode.ts`** (shared parse/normalize + FTS resolution + tiered routing via **`resolveExactNameFromParsedQuery`** / **`isExactNamePattern`** + **`executeShowLookup`** + **`formatShowSearchSqlForQuery`** for CLI/MCP/HTTP). **Fast tier:** positional `` or lone `name:Token` without `%`/`_` wildcards (no `kind`/`path`/`in`/free text) → `findSymbolsByName` (`name = ?`, **`idx_symbols_name_covering`**). **Slow tier:** `name LIKE` substring, multi-field query, or FTS free-text → `searchSymbols`. Exact lookup with filters: `findSymbolsByName({db, name, kind?, inPath?})`. Snippet FS read: `readSymbolSource({match, projectRoot, indexedContentHash?})` + `getIndexedContentHash(db, filePath)`. **`buildShowResult`** + **`buildSnippetResult`** envelope builders — same engines the MCP show/snippet tools call. Both verbs return the same `{matches, disambiguation?, warning?}` envelope — single match → `{matches: [{...}]}`; multi-match adds `{n, by_kind, files, hint}`; optional **`warning`** when FTS was requested but `source_fts` is empty. Snippet matches add `source` / `stale` / `missing` fields (additive — no shape divergence). **`--in `** and **`path:`** inside **`--query`** normalize through `toProjectRelative(projectRoot, p)` (from **`src/application/validate-engine.ts`**). Stale-file behavior on `snippet`: `hashContent` (from **`src/hash.ts`**) compares on-disk content against `files.content_hash`; mismatch sets `stale: true` but source IS still returned. MCP tools `show` and `snippet` register parallel to the CLI surface (see [§ MCP wiring](#cli-usage)). **Evidence columns (high-judgment recipes):** Some bundled recipes add optional **`reason`** and **`evidence_json`** TEXT columns on each result row — factual detection path for agents, not pass/fail verdicts. Contract: [golden-queries.md § Evidence columns](./golden-queries.md#evidence-columns-high-judgment-recipes). @@ -1053,16 +1053,17 @@ A covering index includes all columns needed by a query, so SQLite never touches Key covering indexes: -| Index | Columns | Covers | -| ------------------------ | --------------------------------------------------------------------- | -------------------------- | -| `idx_symbols_name` | `name, kind, file_path, line_start, line_end, signature, is_exported` | Symbol lookup by name | -| `idx_imports_source` | `source, file_path` | "Who imports X?" queries | -| `idx_imports_resolved` | `resolved_path, file_path` | Resolved path lookups | -| `idx_exports_name` | `name, file_path, kind, is_default` | Export lookup by name | -| `idx_components_name` | `name, file_path, props_type, hooks_used` | Component search by name | -| `idx_components_file` | `file_path, name` | Components in a directory | -| `idx_dependencies_to` | `to_path, from_path` | Reverse dependency lookups | -| `idx_markers_kind` | `kind, file_path, line_number, content` | Marker listing by kind | -| `idx_css_variables_name` | `name, value, scope, file_path` | CSS token lookup by name | -| `idx_css_classes_name` | `name, file_path, is_module` | CSS class lookup | -| `idx_css_keyframes_name` | `name, file_path` | Keyframe lookup | +| Index | Columns | Covers | +| --------------------------- | ---------------------------------------------------------------------------------------------- | ---------------------------------------------------- | +| `idx_symbols_name` | `name, kind, file_path, line_start, line_end, signature, is_exported` | Symbol lookup by name | +| `idx_symbols_name_covering` | `name, kind, file_path, line_start, line_end, signature, is_exported, parent_name, visibility` | Full `findSymbolsByName` SELECT without table lookup | +| `idx_imports_source` | `source, file_path` | "Who imports X?" queries | +| `idx_imports_resolved` | `resolved_path, file_path` | Resolved path lookups | +| `idx_exports_name` | `name, file_path, kind, is_default` | Export lookup by name | +| `idx_components_name` | `name, file_path, props_type, hooks_used` | Component search by name | +| `idx_components_file` | `file_path, name` | Components in a directory | +| `idx_dependencies_to` | `to_path, from_path` | Reverse dependency lookups | +| `idx_markers_kind` | `kind, file_path, line_number, content` | Marker listing by kind | +| `idx_css_variables_name` | `name, value, scope, file_path` | CSS token lookup by name | +| `idx_css_classes_name` | `name, file_path, is_module` | CSS class lookup | +| `idx_css_keyframes_name` | `name, file_path` | Keyframe lookup | diff --git a/docs/glossary.md b/docs/glossary.md index 8687c6b4..b5b2c13c 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -194,7 +194,7 @@ TS shape for the JSON emitted by `codemap context`. Stable contract; agents can ### covering index -A SQLite index that includes every column needed by a query, so SQLite reads everything from the index B-tree without touching the main table. The query plan shows `USING COVERING INDEX`. Used heavily for AI agent query patterns — see [architecture § Covering indexes](./architecture.md#covering-indexes). +A SQLite index that includes every column needed by a query, so SQLite reads everything from the index B-tree without touching the main table. The query plan shows `USING COVERING INDEX`. Used heavily for AI agent query patterns — e.g. `idx_symbols_name_covering` for `codemap show` equality lookups. See [architecture § Covering indexes](./architecture.md#covering-indexes). ### `css_classes` (table) @@ -543,11 +543,11 @@ Long-running transport shutdown rules in `src/application/session-lifecycle.ts`. ### show -`codemap show ` — one-step lookup that returns metadata (`file_path:line_start-line_end` + `signature` + `kind`) for symbol(s). **Exact mode:** `` is case-sensitive; flags `--kind`, `--in`. **Field-qualified mode:** `--query 'kind:… name:… path:… in:…'` with optional free text; `--with-fts` (or `fts5: true` when indexed) searches file bodies via `source_fts` and returns every symbol in matching files; `--print-sql` prints Moat-A equivalent SQL. Output: `{matches, disambiguation?, warning?}` (`warning` when FTS was requested but `source_fts` is empty). MCP: `show` with `{name}` or `{query, with_fts?}`. Distinct from **snippet** (adds source text) and from hand-composed `query` SQL. See [`architecture.md` § Show / snippet wiring](./architecture.md#cli-usage). +`codemap show ` — one-step lookup that returns metadata (`file_path:line_start-line_end` + `signature` + `kind`) for symbol(s). **Fast tier (equality index):** positional `` or lone `name:Token` with no `%`/`_` wildcards and no other query fields — same rows as exact ``. **Slow tier:** `name:%pat%` substring LIKE, multi-field `--query`, or free text (`name LIKE` / `source_fts` with `--with-fts` or `fts5: true`). **Exact mode flags:** `--kind`, `--in` (positional only). `--print-sql` prints Moat-A equivalent SQL. Output: `{matches, disambiguation?, warning?}` (`warning` when FTS was requested but `source_fts` is empty). MCP: `show` with `{name}` or `{query, with_fts?}`. Distinct from **snippet** (adds source text) and from hand-composed `query` SQL. See [`architecture.md` § Show / snippet wiring](./architecture.md#cli-usage). ### snippet -`codemap snippet ` — same lookup modes as **show** (`` + `--kind` / `--in`, or `--query` + `--with-fts`), but each match also carries `source` (file lines from disk at `line_start..line_end`), `stale` (true when content_hash drifted since last index), and `missing` (true when file is gone). Envelope: `{matches, disambiguation?, warning?}` with the same additive fields on each row. Stale-file behavior: `source` is always returned when the file exists; `stale: true` is metadata the agent reads (no auto-reindex). MCP: `snippet` with `{name}` or `{query, with_fts?}`. See [`architecture.md` § Show / snippet wiring](./architecture.md#cli-usage). +`codemap snippet ` — same fast/slow lookup tiers as **show** (`` + `--kind` / `--in`, or `--query` + `--with-fts`), but each match also carries `source` (file lines from disk at `line_start..line_end`), `stale` (true when content_hash drifted since last index), and `missing` (true when file is gone). Envelope: `{matches, disambiguation?, warning?}` with the same additive fields on each row. Stale-file behavior: `source` is always returned when the file exists; `stale: true` is metadata the agent reads (no auto-reindex). MCP: `snippet` with `{name}` or `{query, with_fts?}`. See [`architecture.md` § Show / snippet wiring](./architecture.md#cli-usage). ### `suppressions` (table) / `// codemap-ignore-next-line` / `// codemap-ignore-file` diff --git a/docs/plans/tiered-lookup-fast-paths.md b/docs/plans/tiered-lookup-fast-paths.md deleted file mode 100644 index 974473ee..00000000 --- a/docs/plans/tiered-lookup-fast-paths.md +++ /dev/null @@ -1,109 +0,0 @@ -# Tiered lookup fast paths — plan - -> **Status:** open · **Priority:** P2 (agent session quality) · **Effort:** S–M (~3–5 days) -> -> **Motivator:** Every agent `show` / `find-symbol-definitions` lookup should hit the **`name = ?`** index path first. Today `findSymbolsByName` already uses equality, but the covering index omits columns the SELECT returns; query-mode `name:Foo` still routes through `name LIKE '%Foo%'`; MCP tool descriptions document no fast-vs-slow tier. Roadmap item is unchecked at [`roadmap.md:80`](../roadmap.md#recipe--audit-enrichment). -> -> **Roadmap:** [§ Recipe & audit enrichment — Tiered lookup fast paths](../roadmap.md#recipe--audit-enrichment) - ---- - -## Agent start here - -Read [`show-search-mode.ts`](../../src/application/show-search-mode.ts) routing first, then [`show-engine.ts`](../../src/application/show-engine.ts) and [`db.ts` `createIndexes`](../../src/db.ts). **No new MCP tools** — optimize existing `show` / `snippet` / `query_recipe` paths and document tiers in tool descriptions. - -### Current behavior (fact-checked) - -| Path | Entry | SQL shape | Index use | -| ------------------- | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- | -| **Exact** | CLI/MCP `name` arg (no `query`) | `WHERE name = ?` ([`show-engine.ts:52`](../../src/application/show-engine.ts)) | `idx_symbols_name` leading column `name` ([`db.ts:604`](../../src/db.ts)) | -| **Query `name:`** | `--query 'name:Foo'` | `name LIKE '%Foo%' ESCAPE '\'` ([`search-engine.ts:48-50`](../../src/application/search-engine.ts)) | Leading `name` index **not** used for substring | -| **Query free-text** | `--query 'Auth'` | FTS `source_fts MATCH` when enabled, else `name LIKE '%Auth%'` ([`search-engine.ts:33-37,53-57`](../../src/application/search-engine.ts)) | Slow tier by design | -| **Recipe** | `find-symbol-definitions` | `WHERE name = ?` ([`templates/recipes/find-symbol-definitions.sql:4`](../../templates/recipes/find-symbol-definitions.sql)) | Same as exact show | - -**Covering-index gap:** `findSymbolsByName` SELECT includes `parent_name`, `visibility` ([`show-engine.ts:75-76`](../../src/application/show-engine.ts)); `idx_symbols_name` stops at `is_exported` ([`db.ts:604`](../../src/db.ts)) — SQLite may still table-lookup for those two columns. - -**No tiered routing code exists** in `show-engine.ts`, `search-engine.ts`, or `show-search-mode.ts` (grep: no `fast path` / `tiered`). - -### Key touchpoints - -| File | Role | -| ------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------- | -| [`src/application/show-engine.ts`](../../src/application/show-engine.ts) | `findSymbolsByName` | -| [`src/application/search-engine.ts`](../../src/application/search-engine.ts) | `buildSymbolSearchSql`, `searchSymbols` | -| [`src/application/show-search-mode.ts`](../../src/application/show-search-mode.ts) | `resolveShowLookupMode`, `executeShowLookup`, `resolveSearchWithFts` | -| [`src/application/tool-handlers.ts`](../../src/application/tool-handlers.ts) | `handleShow`, `handleSnippet` | -| [`src/cli/cmd-show.ts`](../../src/cli/cmd-show.ts) | CLI twin | -| [`src/application/resource-handlers.ts`](../../src/application/resource-handlers.ts) | `codemap://symbols/{name}` uses `findSymbolsByName` directly | -| [`src/db.ts`](../../src/db.ts) | `createIndexes` — add/replace covering index | -| [`src/application/mcp-server.ts`](../../src/application/mcp-server.ts) | `registerShowTool`, `registerQueryRecipeTool` descriptions (no latency text today) | -| [`templates/recipes/find-symbol-definitions.sql`](../../templates/recipes/find-symbol-definitions.sql) | Exact-name recipe | -| **Tests** | `show-engine.test.ts`, `search-engine.test.ts`, `show-search-mode.test.ts`, `cmd-show.test.ts`, `tool-handlers.test.ts`, `mcp-server.test.ts` | - ---- - -## Pre-locked decisions - -| # | Decision | Source | -| --- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------- | -| L.1 | **Fast tier** = equality on `symbols.name` via `findSymbolsByName` or recipe `name = ?`. **Slow tier** = `name LIKE` substring, FTS, or multi-field query. | Roadmap wording; existing exact path | -| L.2 | **Query fast-path:** when `parseSearchQuery` yields exactly one `namePatterns[]` entry, **no** `freeText`, **no** `kind`/`path`/`inGlob`, and the pattern contains **no** unescaped `%` or `_`, route to `findSymbolsByName({ name: pattern })` instead of `searchSymbols`. Case-sensitive — same as exact `show` ([`show-engine.ts:26-27`](../../src/application/show-engine.ts)). | Converts `show --query 'name:Foo'` to index-friendly `=` | -| L.3 | **Covering index fix:** add additive index `idx_symbols_name_covering ON symbols(name, kind, file_path, line_start, line_end, signature, is_exported, parent_name, visibility)` via `CREATE INDEX IF NOT EXISTS`. **Do not bump `SCHEMA_VERSION`** — additive index only ([`architecture.md` § Schema Versioning](../architecture.md#schema-versioning)). Keep existing `idx_symbols_name` (other queries may depend on it) unless EXPLAIN shows redundancy; if dropped, do so in a separate commit with benchmark note. | Close parent_name/visibility gap | -| L.4 | **MCP/CLI descriptions:** document tiers in prose — “exact `name` or `name:Token` without wildcards uses equality index; substring / FTS / multi-field scans are broader.” **No invented millisecond budgets** unless measured in `docs/benchmark.md` first. | Roadmap “document latency expectations” | -| L.5 | **FTS remains explicit slow tier** — never auto-enable for `show` exact path. | [`resolveSearchWithFts`](../../src/application/show-search-mode.ts) unchanged for free-text | -| L.6 | **No change** to `query_recipe` SQL execution engine — only `find-symbol-definitions` docs cross-ref fast tier. | Moat A — recipes stay SQL | - ---- - -## Implementation slices - -| Slice | Work | Ship gate | -| ----- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------- | -| **1** | `isExactNamePattern(pattern)` helper + fast-path branch in `executeShowLookup` / `buildSymbolSearchSql` caller; tests proving `name:Foo` uses `name = ?` SQL | Tracer bullet | -| **2** | Add `idx_symbols_name_covering` in `createIndexes`; `db.test.ts` asserts index exists; optional EXPLAIN test in `show-engine.test.ts` | After Slice 1 | -| **3** | Update MCP `show`/`snippet` descriptions ([`mcp-server.ts:341+`](../../src/application/mcp-server.ts)), `printShowCmdHelp`, [`architecture.md` § Show wiring](../architecture.md), [`glossary.md` § covering index](../glossary.md) row for new index | Same PR or docs follow-up | - -### Tracer bullet (Slice 1) - -1. In `show-search-mode.ts`, before `searchSymbols`, detect L.2 shape → call `findSymbolsByName`. -2. Add tests in `show-search-mode.test.ts`: `name:Exact` → same rows as exact `name: Exact`; `name:%foo%` stays LIKE. -3. Run `bun test src/application/show-search-mode.test.ts src/application/show-engine.test.ts`. - -### Out of scope - -- New covering indexes for FTS / `source_fts` (separate backlog) -- `show` fuzzy / case-insensitive match (would break equality semantics) -- Caching query plans in MCP server process (perf-triangulation deferral **6.1**) -- Changing `find-symbol-by-kind` or other recipes to parametrised fast paths - ---- - -## Acceptance - -- [ ] `show --query 'name:MySymbol'` generates SQL with `name = ?`, not `LIKE` (test or `--print-sql`) -- [ ] `show MySymbol` and `show --query 'name:MySymbol'` return identical rows for a fixture symbol -- [ ] `show --query 'name:%Sym%'` still uses `LIKE` (slow tier) -- [ ] `idx_symbols_name_covering` exists after index boot (`db.test.ts`) -- [ ] MCP `show` tool description mentions fast (equality) vs slow (substring/FTS) tiers without numeric latency claims -- [ ] `bun test src/application/show-search-mode.test.ts src/application/show-engine.test.ts src/application/search-engine.test.ts` - ---- - -## Verification - -```bash -bun test src/application/show-search-mode.test.ts src/application/show-engine.test.ts src/db.test.ts -bun src/index.ts show --query 'name:' --print-sql # expect name = ? -bun src/index.ts show --json -bun src/index.ts show --query 'name:' --json # row parity with above -``` - -Replace `` with a symbol known to exist in the indexed project (e.g. `hashContent` in this repo). - ---- - -## Dependencies - -- Shipped: `idx_symbols_name`, field-qualified `show --query`, optional FTS (`fts5` config) -- Synergy with: [`fts-default-on-evaluation.md`](./fts-default-on-evaluation.md) (FTS stays slow tier even when default-on) -- Independent of: codebase map bootstrap, C.9 entry points diff --git a/docs/roadmap.md b/docs/roadmap.md index 99b55a33..20cddef3 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -77,7 +77,7 @@ Long-running MCP / HTTP sessions dominate agent workflows; one-shot CLI keeps th Predicate-as-API only — enrich row shape and audit deltas; no standalone pass/fail verdict primitive ([Moat A](./roadmap.md#moats-load-bearing)). -- [ ] **Tiered lookup fast paths** — `show` / exact-name recipe paths hit covering indexes first; document latency expectations in MCP tool descriptions. FTS and broad scans remain explicit fallbacks. Plan: [`plans/tiered-lookup-fast-paths.md`](./plans/tiered-lookup-fast-paths.md). Effort: S–M. +- [x] **Tiered lookup fast paths** — `show` / `name:Token` (no wildcards) route to equality index (`idx_symbols_name_covering`); substring / FTS / multi-field stay slow tier. MCP + CLI help document tiers. See [architecture.md § CLI usage](./architecture.md#cli-usage). ### Distribution & evaluation depth diff --git a/lint-staged.config.js b/lint-staged.config.js index 8ea59455..21e62f57 100644 --- a/lint-staged.config.js +++ b/lint-staged.config.js @@ -25,11 +25,50 @@ function typecheckStagedFiles(filenames) { return `bun run typecheck -p ${TEMP_TSCONFIG}`; } +function toPosixRel(file) { + return path.relative(process.cwd(), file).replace(/\\/g, "/"); +} + +/** Staged `src/` source `.ts` / `.tsx` (not `*.test.*`). */ +function isSourceTsFile(file) { + const rel = toPosixRel(file); + if (!rel.startsWith("src/")) return false; + if (/\.test\.tsx?$/.test(rel)) return false; + return /\.tsx?$/.test(rel); +} + +/** Co-located pair: `foo.ts` → `foo.test.ts`, `foo.tsx` → `foo.test.tsx`. */ +function pairedTestPath(file) { + const rel = toPosixRel(file); + return rel.replace(/\.tsx$/, ".test.tsx").replace(/\.ts$/, ".test.ts"); +} + +/** + * Run paired unit tests when a source file is staged without its test file. + * Skips pairs already in the staged set (the `*.test.{ts,tsx}` globs run those). + */ +function relatedTests(filenames) { + const staged = new Set(filenames.map(toPosixRel)); + const tests = [ + ...new Set( + filenames + .filter(isSourceTsFile) + .map(pairedTestPath) + .filter((t) => fs.existsSync(t) && !staged.has(t)), + ), + ]; + if (tests.length === 0) { + return "true"; + } + return `bun test ${tests.join(" ")}`; +} + /** @type {import('lint-staged').Configuration} */ export default { "*.{js,jsx,ts,tsx,mjs,mts,cjs,cts}": ["bun run format:check", "bun run lint"], "*.{css,json,md,mdc,html,yaml,yml}": "bun run format:check", - "*.{ts,tsx}": typecheckStagedFiles, + "*.{ts,tsx}": [typecheckStagedFiles, relatedTests], "*.test.ts": "bun test", + "*.test.tsx": "bun test", "scripts/**/*.test.mjs": "bun test", }; diff --git a/src/application/http-server.test.ts b/src/application/http-server.test.ts index 4d3d0abe..4164d64b 100644 --- a/src/application/http-server.test.ts +++ b/src/application/http-server.test.ts @@ -469,7 +469,7 @@ describe("http-server — POST /tool/{other tools}", () => { expect(r.json.matches[0]).toHaveProperty("missing"); }); - it("show with query returns field-qualified matches", async () => { + it("show with query fast tier returns exact name matches", async () => { const db = openDb(); try { db.run( @@ -480,7 +480,28 @@ describe("http-server — POST /tool/{other tools}", () => { closeDb(db); } serverHandle = await startServer(); - const r = await postTool(serverHandle.port, "show", { query: "name:Auth" }); + const r = await postTool(serverHandle.port, "show", { + query: "name:AuthService", + }); + expect(r.status).toBe(200); + expect(r.json.matches).toHaveLength(1); + expect(r.json.matches[0].name).toBe("AuthService"); + }); + + it("show with query returns field-qualified substring matches", async () => { + const db = openDb(); + try { + db.run( + `INSERT INTO symbols (file_path, name, kind, line_start, line_end, signature, doc_comment) + VALUES ('src/a.ts', 'AuthService', 'class', 1, 1, 'class AuthService', NULL)`, + ); + } finally { + closeDb(db); + } + serverHandle = await startServer(); + const r = await postTool(serverHandle.port, "show", { + query: "kind:class name:Auth", + }); expect(r.status).toBe(200); expect(r.json.matches).toHaveLength(1); expect(r.json.matches[0].name).toBe("AuthService"); @@ -518,7 +539,7 @@ describe("http-server — POST /tool/{other tools}", () => { } serverHandle = await startServer(); const r = await postTool(serverHandle.port, "snippet", { - query: "name:Auth", + query: "kind:class name:Auth", }); expect(r.status).toBe(200); expect(r.json.matches).toHaveLength(1); diff --git a/src/application/mcp-server.test.ts b/src/application/mcp-server.test.ts index 1147a5f9..bfea5438 100644 --- a/src/application/mcp-server.test.ts +++ b/src/application/mcp-server.test.ts @@ -1607,6 +1607,23 @@ describe("MCP server — show + snippet tools", () => { } }); + it("show with query fast tier returns exact name matches", async () => { + seedSymbol({ file: "src/a.ts", name: "AuthService", kind: "class" }); + seedSymbol({ file: "src/b.ts", name: "Other", kind: "class" }); + const { client, server } = await makeClient(); + try { + const r = await client.callTool({ + name: "show", + arguments: { query: "name:AuthService" }, + }); + const json = readJson(r); + expect(json.matches).toHaveLength(1); + expect(json.matches[0].name).toBe("AuthService"); + } finally { + await server.close(); + } + }); + it("show with query field search returns substring matches", async () => { seedSymbol({ file: "src/a.ts", name: "AuthService", kind: "class" }); seedSymbol({ file: "src/b.ts", name: "Other", kind: "class" }); @@ -1614,7 +1631,7 @@ describe("MCP server — show + snippet tools", () => { try { const r = await client.callTool({ name: "show", - arguments: { query: "name:Auth" }, + arguments: { query: "kind:class name:Auth" }, }); const json = readJson(r); expect(json.matches).toHaveLength(1); @@ -1637,7 +1654,7 @@ describe("MCP server — show + snippet tools", () => { try { const r = await client.callTool({ name: "snippet", - arguments: { query: "name:Auth" }, + arguments: { query: "kind:class name:Auth" }, }); const json = readJson(r); expect(json.matches).toHaveLength(1); diff --git a/src/application/mcp-server.ts b/src/application/mcp-server.ts index 9b878a80..b16e089c 100644 --- a/src/application/mcp-server.ts +++ b/src/application/mcp-server.ts @@ -342,7 +342,7 @@ function registerShowTool(server: McpServer, opts: ServerOpts): void { "show", withToolAnnotations("show", { description: - "Look up symbol(s) by exact name or field-qualified `query` search; returns {matches: [{name, kind, file_path, line_start, line_end, signature, ...}], disambiguation?, warning?}. Query syntax: kind:, name:, path:, in: fields plus optional free text (name LIKE, or source_fts with with_fts when indexed — FTS matches file bodies and returns every symbol in matching files). Use `snippet` for source text; use `query` tool for arbitrary SQL.", + "Look up symbol(s) by exact name or field-qualified `query` search; returns {matches: [{name, kind, file_path, line_start, line_end, signature, ...}], disambiguation?, warning?}. Fast tier: exact `name` (optional `kind`, `in` filters) or `query` with lone `name:Token` (no %/_ wildcards, no other query fields) uses equality index (`name = ?`). Slow tier: `name:%pat%` substring LIKE, multi-field query, or free text (name LIKE or source_fts with with_fts when indexed — FTS matches file bodies and returns every symbol in matching files). Use `snippet` for source text; use `query` tool for arbitrary SQL.", inputSchema: showArgsSchema, }), (args) => wrapToolResult(handleShow(args, opts.root)), @@ -354,7 +354,7 @@ function registerSnippetTool(server: McpServer, opts: ServerOpts): void { "snippet", withToolAnnotations("snippet", { description: - "Same lookup as `show` (exact `{name}` or field-qualified `{query}` with kind:/name:/path:/in: tokens + optional `with_fts` for free text — FTS matches file bodies and returns every symbol in matching files) but each match carries `source` (file lines from disk at line_start..line_end) plus `stale` (true when content_hash drifted since indexing — line range may have shifted; agent decides whether to act or re-index) and `missing` (true when file is gone). Returns `{matches, disambiguation?, warning?}`; source/stale/missing are additive fields on each match.", + "Same lookup tiers as `show` (fast: exact `name` with optional `kind`/`in`, or lone `name:Token` without wildcards → equality index; slow: substring LIKE, multi-field query, or free text with optional `with_fts` — FTS matches file bodies and returns every symbol in matching files) but each match carries `source` (file lines from disk at line_start..line_end) plus `stale` (true when content_hash drifted since indexing — line range may have shifted; agent decides whether to act or re-index) and `missing` (true when file is gone). Returns `{matches, disambiguation?, warning?}`; source/stale/missing are additive fields on each match.", inputSchema: snippetArgsSchema, }), (args) => wrapToolResult(handleSnippet(args, opts.root)), diff --git a/src/application/show-engine.test.ts b/src/application/show-engine.test.ts index f2b53339..91210eff 100644 --- a/src/application/show-engine.test.ts +++ b/src/application/show-engine.test.ts @@ -3,7 +3,7 @@ import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from "node:fs"; import { tmpdir } from "node:os"; import { join } from "node:path"; -import { createTables } from "../db"; +import { createIndexes, createTables } from "../db"; import type { CodemapDatabase } from "../db"; import { hashContent } from "../hash"; import { openCodemapDatabase } from "../sqlite-db"; @@ -79,6 +79,22 @@ describe("findSymbolsByName", () => { expect(findSymbolsByName(db, { name: "no-such-symbol" })).toEqual([]); }); + it("EXPLAIN uses idx_symbols_name_covering for equality lookup", () => { + createIndexes(db); + const plan = db + .query( + `EXPLAIN QUERY PLAN SELECT name, kind, file_path, line_start, line_end, signature, + is_exported, parent_name, visibility + FROM symbols + WHERE name = ? + ORDER BY file_path ASC, line_start ASC`, + ) + .all("foo") as Array<{ detail: string }>; + const text = plan.map((r) => r.detail).join("\n"); + expect(text).toContain("idx_symbols_name_covering"); + expect(text).toMatch(/USING COVERING INDEX/i); + }); + it("returns all matches for an ambiguous name (deterministic order)", () => { const r = findSymbolsByName(db, { name: "foo" }); expect(r).toHaveLength(3); diff --git a/src/application/show-search-mode.test.ts b/src/application/show-search-mode.test.ts index 733f488b..4420a12d 100644 --- a/src/application/show-search-mode.test.ts +++ b/src/application/show-search-mode.test.ts @@ -6,8 +6,10 @@ import { openCodemapDatabase } from "../sqlite-db"; import { executeShowLookup, formatShowSearchSqlForQuery, + isExactNamePattern, normalizeSearchInGlob, parseAndNormalizeSearchQuery, + resolveExactNameFromParsedQuery, resolveSearchWithFts, resolveShowLookupMode, validateShowSnippetLookupArgs, @@ -99,6 +101,64 @@ describe("formatShowSearchSqlForQuery", () => { expect(result.sql).toContain("kind = 'function'"); expect(result.sql).toContain("name LIKE '%entry%'"); }); + + it("returns equality SQL for lone name:Token fast path", () => { + const result = formatShowSearchSqlForQuery("name:hashContent", "/tmp", { + withFtsCli: false, + db, + }); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.sql).toContain("name = 'hashContent'"); + expect(result.sql).not.toContain("LIKE"); + }); + + it("keeps LIKE SQL for name:%wild% slow tier", () => { + const result = formatShowSearchSqlForQuery("name:%Sym%", "/tmp", { + withFtsCli: false, + db, + }); + expect(result.ok).toBe(true); + if (!result.ok) return; + expect(result.sql).toContain("name LIKE '%\\%Sym\\%%'"); + }); +}); + +describe("isExactNamePattern", () => { + it("accepts literal tokens", () => { + expect(isExactNamePattern("hashContent")).toBe(true); + expect(isExactNamePattern("runShowCmd")).toBe(true); + }); + + it("rejects unescaped wildcards", () => { + expect(isExactNamePattern("%foo%")).toBe(false); + expect(isExactNamePattern("foo_bar")).toBe(false); + }); + + it("allows escaped wildcards", () => { + expect(isExactNamePattern("foo\\%bar")).toBe(true); + }); +}); + +describe("resolveExactNameFromParsedQuery", () => { + it("resolves lone name pattern", () => { + expect( + resolveExactNameFromParsedQuery({ + namePatterns: ["MySymbol"], + freeText: [], + }), + ).toBe("MySymbol"); + }); + + it("returns undefined when kind is set", () => { + expect( + resolveExactNameFromParsedQuery({ + kind: "function", + namePatterns: ["MySymbol"], + freeText: [], + }), + ).toBeUndefined(); + }); }); describe("resolveShowLookupMode", () => { @@ -222,6 +282,66 @@ describe("executeShowLookup", () => { expect(r.warning).toBeUndefined(); }); + it("name:Token fast path matches exact lookup rows", () => { + const exact = executeShowLookup( + db, + { ok: true, kind: "exact", name: "foo", inPath: undefined }, + { withFtsCli: false }, + ); + const query = executeShowLookup( + db, + { + ok: true, + kind: "query", + parsed: { namePatterns: ["foo"], freeText: [] }, + }, + { withFtsCli: false }, + ); + expect(query.matches).toEqual(exact.matches); + }); + + it("kind + name uses slow LIKE tier even for literal name token", () => { + const exact = executeShowLookup( + db, + { ok: true, kind: "exact", name: "foo", inPath: undefined }, + { withFtsCli: false }, + ); + const slow = executeShowLookup( + db, + { + ok: true, + kind: "query", + parsed: { + kind: "function", + namePatterns: ["foo"], + freeText: [], + }, + }, + { withFtsCli: false }, + ); + expect(slow.matches.length).toBeLessThanOrEqual(exact.matches.length); + expect(slow.matches.every((m) => m.kind === "function")).toBe(true); + }); + + it("name:%Sym% stays on slow LIKE tier", () => { + const slow = executeShowLookup( + db, + { + ok: true, + kind: "query", + parsed: { namePatterns: ["%Sym%"], freeText: [] }, + }, + { withFtsCli: false }, + ); + expect( + resolveExactNameFromParsedQuery({ + namePatterns: ["%Sym%"], + freeText: [], + }), + ).toBeUndefined(); + expect(slow.matches).toEqual([]); + }); + it("query mode returns empty matches without error envelope", () => { const r = executeShowLookup( db, diff --git a/src/application/show-search-mode.ts b/src/application/show-search-mode.ts index e7eb941d..a5188f33 100644 --- a/src/application/show-search-mode.ts +++ b/src/application/show-search-mode.ts @@ -17,6 +17,48 @@ import { findSymbolsByName } from "./show-engine"; import type { SymbolMatch } from "./show-engine"; import { toProjectRelative } from "./validate-engine"; +/** True when a `name:` pattern has no unescaped LIKE metacharacters (`%`, `_`). */ +export function isExactNamePattern(pattern: string): boolean { + for (let i = 0; i < pattern.length; i++) { + const c = pattern[i]; + if (c === "\\" && i + 1 < pattern.length) { + i++; + continue; + } + if (c === "%" || c === "_") return false; + } + return true; +} + +/** + * Fast tier: single `name:` token, no wildcards, no kind/path/in/free-text — + * route to equality lookup (`name = ?`) instead of `name LIKE`. + */ +export function resolveExactNameFromParsedQuery( + parsed: ParsedSearchQuery, +): string | undefined { + if (parsed.freeText.length > 0) return undefined; + if (parsed.kind !== undefined) return undefined; + if (parsed.path !== undefined) return undefined; + if (parsed.inGlob !== undefined) return undefined; + if (parsed.namePatterns.length !== 1) return undefined; + const pattern = parsed.namePatterns[0]!; + if (!isExactNamePattern(pattern)) return undefined; + return pattern; +} + +const EXACT_NAME_LOOKUP_SQL = `SELECT name, kind, file_path, line_start, line_end, signature, + is_exported, parent_name, visibility + FROM symbols + WHERE name = ? + ORDER BY file_path ASC, line_start ASC`; + +/** Moat-A SQL preview for fast-tier `name:Token` lookups. */ +export function formatExactNameLookupSqlForDisplay(name: string): string { + const escaped = name.replace(/'/g, "''"); + return EXACT_NAME_LOOKUP_SQL.replace("?", `'${escaped}'`).trim(); +} + export type ShowLookupMode = | { ok: true; kind: "exact"; name: string; inPath: string | undefined } | { ok: true; kind: "query"; parsed: ParsedSearchQuery } @@ -70,6 +112,14 @@ export function formatShowSearchSqlForQuery( let useFts = false; let warning: string | undefined; + const exactName = resolveExactNameFromParsedQuery(parsedQuery.parsed); + if (exactName !== undefined) { + return { + ok: true, + sql: formatExactNameLookupSqlForDisplay(exactName), + }; + } + if (parsedQuery.parsed.freeText.length > 0 && opts.db !== undefined) { const fts = resolveSearchWithFts(opts.db, { withFtsCli: opts.withFtsCli, @@ -117,6 +167,13 @@ export function executeShowLookup( }; } + const exactName = resolveExactNameFromParsedQuery(mode.parsed); + if (exactName !== undefined) { + return { + matches: findSymbolsByName(db, { name: exactName }), + }; + } + const fts = resolveSearchWithFts(db, { withFtsCli: opts.withFtsCli, freeTextCount: mode.parsed.freeText.length, diff --git a/src/cli/cmd-show.ts b/src/cli/cmd-show.ts index 0d7d9ec6..13c7b390 100644 --- a/src/cli/cmd-show.ts +++ b/src/cli/cmd-show.ts @@ -38,13 +38,19 @@ Look up symbol(s) by exact name and return file_path:line_start-line_end + signature. One-step lookup that beats composing \`SELECT … FROM symbols WHERE name = ?\` by hand. +Lookup tiers: + Fast (equality index) — positional , or \`name:\` with no + wildcards (% / _) and no other query fields (same rows as exact ). + Slow (broader scan) — \`name:%pat%\` substring LIKE, multi-field query, + free-text tokens (name LIKE or source_fts when --with-fts / fts5: true). + Field-qualified search (--query): kind: Exact symbols.kind (function, class, const, …). - name: Case-sensitive substring on symbols.name (LIKE). + name: Case-sensitive; equality when pattern has no wildcards, + otherwise substring LIKE on symbols.name. path: File scope — directory prefix or exact file path. in: SQLite GLOB on file_path (e.g. in:src/**/*.ts). - Free text Unqualified tokens → name LIKE, or source_fts phrase - search when FTS5 is indexed (--with-fts or fts5: true). + Free text Unqualified tokens → slow tier (name LIKE or source_fts). With FTS, matches file bodies — returns all symbols in matching files (not symbol-level body hits). diff --git a/src/cli/cmd-snippet.ts b/src/cli/cmd-snippet.ts index 8f67d454..af1267a1 100644 --- a/src/cli/cmd-snippet.ts +++ b/src/cli/cmd-snippet.ts @@ -37,6 +37,12 @@ Look up symbol(s) by exact name and return the source text from disk as \`show\`; difference is the response carries the actual code body sliced from disk at line_start..line_end. +Lookup tiers (same as \`codemap show\`): + Fast (equality index) — positional , or \`name:\` with no + wildcards (% / _) and no other query fields (same rows as exact ). + Slow (broader scan) — \`name:%pat%\` substring LIKE, multi-field query, + free-text tokens (name LIKE or source_fts when --with-fts / fts5: true). + Args: Exact symbol name (case-sensitive). Omit when using --query. diff --git a/src/db.test.ts b/src/db.test.ts index 1e3b2627..7f9da866 100644 --- a/src/db.test.ts +++ b/src/db.test.ts @@ -100,6 +100,22 @@ describe("SQLite layer (in-memory)", () => { } }); + it("createIndexes adds idx_symbols_name_covering", () => { + const db = openCodemapDatabase(":memory:"); + try { + createTables(db); + createIndexes(db); + const row = db + .query( + "SELECT name FROM sqlite_master WHERE type = 'index' AND name = 'idx_symbols_name_covering'", + ) + .get() as { name: string } | null; + expect(row?.name).toBe("idx_symbols_name_covering"); + } finally { + closeDb(db); + } + }); + it("symbols.visibility round-trips with index hit on WHERE visibility = ?", () => { const db = openCodemapDatabase(":memory:"); try { diff --git a/src/db.ts b/src/db.ts index c889a609..ab94cfb8 100644 --- a/src/db.ts +++ b/src/db.ts @@ -602,6 +602,7 @@ export function createIndexes(db: CodemapDatabase) { db.run(` -- Covering indexes: include columns returned by common queries to avoid table lookups CREATE INDEX IF NOT EXISTS idx_symbols_name ON symbols(name, kind, file_path, line_start, line_end, signature, is_exported); + CREATE INDEX IF NOT EXISTS idx_symbols_name_covering ON symbols(name, kind, file_path, line_start, line_end, signature, is_exported, parent_name, visibility); CREATE INDEX IF NOT EXISTS idx_symbols_kind ON symbols(kind, is_exported, name, file_path); CREATE INDEX IF NOT EXISTS idx_symbols_file ON symbols(file_path); diff --git a/templates/agent-content/mcp-instructions.md b/templates/agent-content/mcp-instructions.md index 756a5428..115ad38c 100644 --- a/templates/agent-content/mcp-instructions.md +++ b/templates/agent-content/mcp-instructions.md @@ -29,35 +29,35 @@ Key fields: `pending_sync` (watcher debounce queue or in-flight reindex), `commi ## Common tasks -| Goal | MCP tool | Recipe twin (`query_recipe`) | -| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Exact symbol lookup | **`show`** (`name`, optional `in`) | `find-symbol-definitions` | -| Field-qualified symbol discovery | **`show`** or **`snippet`** (`query` with `kind:` / `name:` / `path:` / `in:` + free text) | `find-symbol-by-kind` for kind-heavy patterns; CLI `codemap show --query '…' --print-sql` to inspect generated SQL (no MCP `print_sql` arg) | -| Kind / pattern lookup | **`query_recipe`** | `find-symbol-by-kind` | -| Source at symbol | **`snippet`** | same rows as `show` + disk text | -| Blast radius | **`impact`** (`target`, `direction`, `via`, `depth`, `in?`) — homonym symbols: unscoped unions per-defining-file graphs; `in` scopes one definition | `fan-in` for file hubs; symbol call graph via SQL or `impact` | -| Call path + snippets | **`trace`** (`from`, `to`, `via?`, `max_depth?`, `budget_chars?`) — adaptive snippet caps 15k/10k/6k when omitted | `call-path` | -| Type extends / implements chain | **`query_recipe`** | `type-ancestors`, `type-descendants` (`file_path` when homonyms; on `type-descendants` also scopes output to that file) | -| Multi-symbol survey | **`explore`** (`names`, `depth?`, `kind?`, `budget_chars?`) — row cap always adaptive (500/250/125); snippets 15k/10k/6k when `budget_chars` omitted | `symbol-neighborhood` (once per name) | -| One-hop symbol card | **`node`** (`name`, `kind?`, `in?`, `include_snippets?`, `budget_chars?`) — adaptive snippet caps when snippets enabled | `show` + `symbol-neighborhood` with `depth=1` | -| Affected tests | **`affected`** (`paths?`, `changed_since?`, `test_glob?`, `max_depth?`) | `affected-tests` (RS-delimit multiple paths in `query_recipe` params) | -| CI / SARIF | **`query_recipe`** + `format: "sarif"` | `deprecated-symbols`, `boundary-violations`, … | -| GitLab Code Quality | **`query_recipe`** + `format: "codeclimate"` | `boundary-violations`, … — locatable rows only; flat `minor` severity | -| CI badge / issue count | **`query_recipe`** + `format: "badge"` (+ `badge_style: "json"` for gates) | presentation only — triage via JSON rows / `--summary` | -| Ad-hoc SQL | **`query`** | — | -| N statements / one round-trip | **`query_batch`** | **`codemap query batch`** | -| Index freshness (index-level) | **`context`** (`index_freshness`) + tool metadata above | — | -| Per-file staleness | **`validate`** | — | -| Drift vs baseline | **`audit`** (`baseline_prefix` and/or per-delta `baselines`) or **`query`** / **`query_recipe`** + `baseline` (one-shot row diff vs `query_baselines`; incompatible with non-`json` `format` / `group_by`) | save via **`save_baseline`**; `summary: true` → count-only diff | -| PR merge-base drift | **`audit`** `base: ` (git committish; sha-keyed cache) | `attribution: introduced` (branch-new) \| `inherited` (pre-existing at merge base) on each `added` row; `jq '.deltas.deprecated.added[] \| select(.attribution == "introduced")'`; `summary: true` → `added_introduced` / `added_inherited` | -| Load coverage data | **`ingest_coverage`** (`path`, optional `runtime` for V8 dirs; auto-detects Istanbul `.json` / LCOV `.info`) | enables `worst-covered-exports`, `files-by-coverage`, `untested-and-dead`, **`coverage-confirmed-dead`** (`confidence: high`); **`high-crap-score`** measured override | -| Load churn data (non-git / fixture) | **`ingest_churn`** (`path`) | precomputed `file_churn` JSON — CLI twin `codemap ingest-churn`; enables `churn-complexity-hotspots` when git history is unavailable | -| Complex + undertested (no ingest) | **`query_recipe`** `high-crap-score` | graph-estimated 85/40/0% tiers — parse `coverage_source` before CI gates; prefer after **`ingest_coverage`** when possible | -| Churn × complexity refactor targets | **`query_recipe`** `churn-complexity-hotspots` (`by_symbol?`, `min_complexity?`, `row_limit?`, `path_prefix?`) | git `file_churn` by default each index; config **`churn.file`** replaces git when set; fixtures: **`ingest_churn`** / `ingest-churn` — **not** the `hotspots` alias (`fan-in`); empty `file_churn` → `context` `churn_hint` | -| High-judgment recipe triage | **`query_recipe`** `unimported-exports`, `boundary-violations`, `deprecated-symbols` | rows include `reason` / `evidence_json` — cite before **`apply`** or deletion | -| Apply recipe diff rows | **`apply`** (`recipe`, `params?`, `dry_run?`, `yes?`, `force?`, `until_empty?`, `max_passes?`, `commit_message?`) | recipe must emit `{file_path, line_start, before_pattern, after_pattern}` rows; `yes: true` required for writes; non-`auto_fixable` recipes need `force: true` | -| Apply agent/codemod rows | **`apply_rows`** (`rows`, `dry_run?`, `yes?`) | same row contract; bypasses recipe `auto_fixable` / allowlist gates | -| Apply unified diff text | **`apply_diff_input`** (`diff_text`, `dry_run?`, `yes?`, `commit_message?`) | parses git-style hunks; same executor as `apply_rows` | +| Goal | MCP tool | Recipe twin (`query_recipe`) | +| -------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Exact symbol lookup (fast tier) | **`show`** / **`snippet`** (`name`, optional `kind`, `in`) — equality index; same rows as lone `query: 'name:Token'` when Token has no `%`/`_` wildcards and no other query fields | `find-symbol-definitions` (`name = ?`) | +| Field-qualified symbol discovery (slow tier) | **`show`** or **`snippet`** (`query` with `kind:` / `name:` / `path:` / `in:` + free text; `name:%pat%` substring LIKE; `with_fts` for file-body search) | `find-symbol-by-kind` for kind-heavy patterns; CLI `codemap show --query '…' --print-sql` to inspect generated SQL (no MCP `print_sql` arg) | +| Kind / pattern lookup | **`query_recipe`** | `find-symbol-by-kind` | +| Source at symbol | **`snippet`** | same rows as `show` + disk text | +| Blast radius | **`impact`** (`target`, `direction`, `via`, `depth`, `in?`) — homonym symbols: unscoped unions per-defining-file graphs; `in` scopes one definition | `fan-in` for file hubs; symbol call graph via SQL or `impact` | +| Call path + snippets | **`trace`** (`from`, `to`, `via?`, `max_depth?`, `budget_chars?`) — adaptive snippet caps 15k/10k/6k when omitted | `call-path` | +| Type extends / implements chain | **`query_recipe`** | `type-ancestors`, `type-descendants` (`file_path` when homonyms; on `type-descendants` also scopes output to that file) | +| Multi-symbol survey | **`explore`** (`names`, `depth?`, `kind?`, `budget_chars?`) — row cap always adaptive (500/250/125); snippets 15k/10k/6k when `budget_chars` omitted | `symbol-neighborhood` (once per name) | +| One-hop symbol card | **`node`** (`name`, `kind?`, `in?`, `include_snippets?`, `budget_chars?`) — adaptive snippet caps when snippets enabled | `show` + `symbol-neighborhood` with `depth=1` | +| Affected tests | **`affected`** (`paths?`, `changed_since?`, `test_glob?`, `max_depth?`) | `affected-tests` (RS-delimit multiple paths in `query_recipe` params) | +| CI / SARIF | **`query_recipe`** + `format: "sarif"` | `deprecated-symbols`, `boundary-violations`, … | +| GitLab Code Quality | **`query_recipe`** + `format: "codeclimate"` | `boundary-violations`, … — locatable rows only; flat `minor` severity | +| CI badge / issue count | **`query_recipe`** + `format: "badge"` (+ `badge_style: "json"` for gates) | presentation only — triage via JSON rows / `--summary` | +| Ad-hoc SQL | **`query`** | — | +| N statements / one round-trip | **`query_batch`** | **`codemap query batch`** | +| Index freshness (index-level) | **`context`** (`index_freshness`) + tool metadata above | — | +| Per-file staleness | **`validate`** | — | +| Drift vs baseline | **`audit`** (`baseline_prefix` and/or per-delta `baselines`) or **`query`** / **`query_recipe`** + `baseline` (one-shot row diff vs `query_baselines`; incompatible with non-`json` `format` / `group_by`) | save via **`save_baseline`**; `summary: true` → count-only diff | +| PR merge-base drift | **`audit`** `base: ` (git committish; sha-keyed cache) | `attribution: introduced` (branch-new) \| `inherited` (pre-existing at merge base) on each `added` row; `jq '.deltas.deprecated.added[] \| select(.attribution == "introduced")'`; `summary: true` → `added_introduced` / `added_inherited` | +| Load coverage data | **`ingest_coverage`** (`path`, optional `runtime` for V8 dirs; auto-detects Istanbul `.json` / LCOV `.info`) | enables `worst-covered-exports`, `files-by-coverage`, `untested-and-dead`, **`coverage-confirmed-dead`** (`confidence: high`); **`high-crap-score`** measured override | +| Load churn data (non-git / fixture) | **`ingest_churn`** (`path`) | precomputed `file_churn` JSON — CLI twin `codemap ingest-churn`; enables `churn-complexity-hotspots` when git history is unavailable | +| Complex + undertested (no ingest) | **`query_recipe`** `high-crap-score` | graph-estimated 85/40/0% tiers — parse `coverage_source` before CI gates; prefer after **`ingest_coverage`** when possible | +| Churn × complexity refactor targets | **`query_recipe`** `churn-complexity-hotspots` (`by_symbol?`, `min_complexity?`, `row_limit?`, `path_prefix?`) | git `file_churn` by default each index; config **`churn.file`** replaces git when set; fixtures: **`ingest_churn`** / `ingest-churn` — **not** the `hotspots` alias (`fan-in`); empty `file_churn` → `context` `churn_hint` | +| High-judgment recipe triage | **`query_recipe`** `unimported-exports`, `boundary-violations`, `deprecated-symbols` | rows include `reason` / `evidence_json` — cite before **`apply`** or deletion | +| Apply recipe diff rows | **`apply`** (`recipe`, `params?`, `dry_run?`, `yes?`, `force?`, `until_empty?`, `max_passes?`, `commit_message?`) | recipe must emit `{file_path, line_start, before_pattern, after_pattern}` rows; `yes: true` required for writes; non-`auto_fixable` recipes need `force: true` | +| Apply agent/codemod rows | **`apply_rows`** (`rows`, `dry_run?`, `yes?`) | same row contract; bypasses recipe `auto_fixable` / allowlist gates | +| Apply unified diff text | **`apply_diff_input`** (`diff_text`, `dry_run?`, `yes?`, `commit_message?`) | parses git-style hunks; same executor as `apply_rows` | ## Chains @@ -70,7 +70,7 @@ Key fields: `pending_sync` (watcher debounce queue or in-flight reindex), `commi ## Anti-patterns -- Don't grep for "where is X defined" — **`show`** (exact `name` or `{query: …}`) or **`query_recipe`**. +- Don't grep for "where is X defined" — **`show`** / **`snippet`** fast tier (`name` with optional `kind`/`in`, or lone `query: 'name:Token'` without wildcards) or **`query_recipe`** `find-symbol-definitions`; slow tier for `name:%pat%`, multi-field `query`, or free text. - Don't hand-roll `WITH RECURSIVE` for impact — **`impact`**. - Convenience tools are thin composers — fall back to **`query_recipe`** / **`query`** when unsure. - Don't skip **`context`** at session start. diff --git a/templates/agent-content/rule/00-full.md b/templates/agent-content/rule/00-full.md index 3b903742..df55f886 100644 --- a/templates/agent-content/rule/00-full.md +++ b/templates/agent-content/rule/00-full.md @@ -77,23 +77,23 @@ If the question matches any of these, use the index instead of grepping: ## Quick reference queries -| I need to... | Query | -| ------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | -| Find a symbol | `SELECT name, kind, file_path, line_start, signature FROM symbols WHERE name = '...'` | -| Field-qualified search | `codemap show --query 'kind:function name:Auth path:src/'` (MCP: `show` / `snippet` with `query`) | -| Call path / neighborhood | `codemap trace` / `explore` / `node` (or MCP twins; recipes: `call-path`, `symbol-neighborhood`) | -| Affected tests | `codemap affected --json` or MCP `affected` (recipe: `affected-tests`) | -| Find a symbol (fuzzy) | `SELECT name, kind, file_path, line_start FROM symbols WHERE name LIKE '%...%'` | -| Symbol docs | `SELECT name, signature, doc_comment FROM symbols WHERE name = '...'` | -| Who imports this file? | `SELECT DISTINCT from_path FROM dependencies WHERE to_path LIKE '%...'` | -| What does this depend on? | `SELECT DISTINCT to_path FROM dependencies WHERE from_path LIKE '%...'` | -| Who calls X? | `SELECT DISTINCT caller_name, file_path FROM calls WHERE callee_name = '...' AND (provenance IS NULL OR provenance = 'ast')` | -| Component info | `SELECT name, props_type, hooks_used FROM components WHERE name = '...'` | -| TODOs in a file | `SELECT line_number, content FROM markers WHERE file_path LIKE '%...' AND kind = 'TODO'` | -| Deprecated symbols | `SELECT name, kind, file_path FROM symbols WHERE doc_comment LIKE '%@deprecated%'` | -| Symbol coverage | `SELECT name, hit_statements, total_statements, coverage_pct FROM coverage WHERE file_path = '...'` | -| Untested + dead exports | `codemap query --json --recipe untested-and-dead` | -| Coverage-confirmed dead | `codemap query --json --recipe coverage-confirmed-dead` (sort by `confidence`) | +| I need to... | Query | +| ---------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Find a symbol (fast tier) | `codemap show ` or lone `codemap show --query 'name:Token'` (no `%`/`_` wildcards; optional `--kind` / `--in` stay fast); MCP: `show` / `snippet` (`name`, optional `kind`, `in`); recipe: `find-symbol-definitions` | +| Field-qualified search (slow tier) | `codemap show --query 'kind:function name:Auth path:src/'` or `name:%pat%` substring; MCP: `show` / `snippet` with `query` (+ `with_fts` for free text) | +| Call path / neighborhood | `codemap trace` / `explore` / `node` (or MCP twins; recipes: `call-path`, `symbol-neighborhood`) | +| Affected tests | `codemap affected --json` or MCP `affected` (recipe: `affected-tests`) | +| Find a symbol (fuzzy) | `SELECT name, kind, file_path, line_start FROM symbols WHERE name LIKE '%...%'` | +| Symbol docs | `SELECT name, signature, doc_comment FROM symbols WHERE name = '...'` | +| Who imports this file? | `SELECT DISTINCT from_path FROM dependencies WHERE to_path LIKE '%...'` | +| What does this depend on? | `SELECT DISTINCT to_path FROM dependencies WHERE from_path LIKE '%...'` | +| Who calls X? | `SELECT DISTINCT caller_name, file_path FROM calls WHERE callee_name = '...' AND (provenance IS NULL OR provenance = 'ast')` | +| Component info | `SELECT name, props_type, hooks_used FROM components WHERE name = '...'` | +| TODOs in a file | `SELECT line_number, content FROM markers WHERE file_path LIKE '%...' AND kind = 'TODO'` | +| Deprecated symbols | `SELECT name, kind, file_path FROM symbols WHERE doc_comment LIKE '%@deprecated%'` | +| Symbol coverage | `SELECT name, hit_statements, total_statements, coverage_pct FROM coverage WHERE file_path = '...'` | +| Untested + dead exports | `codemap query --json --recipe untested-and-dead` | +| Coverage-confirmed dead | `codemap query --json --recipe coverage-confirmed-dead` (sort by `confidence`) | ## When Grep / Read IS appropriate diff --git a/templates/agent-content/skill/10-recipes-context.md b/templates/agent-content/skill/10-recipes-context.md index 74de1cdc..792a43b6 100644 --- a/templates/agent-content/skill/10-recipes-context.md +++ b/templates/agent-content/skill/10-recipes-context.md @@ -48,8 +48,8 @@ Each emitted delta carries its own `base` metadata so mixed-baseline audits are - **`drop_baseline`** — `{name}` → `{dropped}` on success; structured `{error}` on unknown name (MCP sets `isError: true`). - **`context`** — `{compact?, intent?, include_snippets?, include_codebase_map?}`. CLI: `codemap context [--include-snippets] [--no-codebase-map]`. Session-start project envelope with `start_here` shortcuts (one call replaces 4-5 `query`s). Non-compact responses also ship **`map_id`** (hash-stable fingerprint) and **`codebase_map`** (hub paths + codemap CLI/MCP routing hints); omit with `compact: true`, `include_codebase_map: false`, or CLI `--no-codebase-map`. `index_summary.file_churn` row count; **`churn_hint`** when empty (steers to index, **`ingest-churn`**, or **`churn.file`**). `include_snippets` adds one-line export previews on hub leaders (capped to adaptive `signature_max_chars`; may set `stale`/`missing`); no-op when `compact: true`. Whitespace-only `intent` is treated as no intent. Prefer `start_here.hub_leaders` over legacy `hubs` for signatures — `hubs` keeps the full bundled `fan-in` recipe limit for backward compatibility. `sample_markers` count scales down on repos >500 / >5000 files. - **`validate`** — `{paths?: string[]}`. SHA-256 vs `files.content_hash`; returns only out-of-sync rows (`stale` / `missing` / `unindexed` / `rejected` — fresh paths are omitted; `rejected` includes optional `reason`: `path escapes project root` | `path escapes via symlink` | `path resolves outside project root`). Output `path` keys are project-relative POSIX paths. -- **`show`** — `{name, kind?, in?}` or `{query, with_fts?}`. Exact symbol lookup or field-qualified search (`kind:`, `name:`, `path:`, `in:` + free text) → `{matches, disambiguation?, warning?}`. CLI: `codemap show --query '…' [--print-sql]`. -- **`snippet`** — same as `show` (`{name, kind?, in?}` or `{query, with_fts?}`) but each match also carries `source` (file text) + `stale` / `missing` flags → `{matches, disambiguation?, warning?}`. No reindex side-effects. +- **`show`** — `{name, kind?, in?}` or `{query, with_fts?}`. Fast tier: exact `name` (optional `kind`/`in` filters) or lone `name:Token` (no `%`/`_` wildcards, no other query fields) → equality index (`name = ?`). Slow tier: `name:%pat%` LIKE, multi-field query, or free text (name LIKE / `with_fts` source_fts). CLI: `codemap show --query '…' [--print-sql]`. +- **`snippet`** — same lookup tiers as `show` (fast: exact `name` with optional `kind`/`in`, or lone `name:Token`; slow: `query` / `with_fts`) but each match also carries `source` (file text) + `stale` / `missing` flags → `{matches, disambiguation?, warning?}`. No reindex side-effects. - **`impact`** — `{target, direction?, via?, depth?, limit?, in?, summary?}`. Symbol/file blast-radius walker (replaces hand-composed `WITH RECURSIVE`). Auto-resolves symbol vs file target; `via` defaults to every backend compatible with the kind. `in` disambiguates symbol homonyms (prefix/exact like `show`); unscoped homonyms union per-defining-file call graphs; mismatch → empty `matches` + `skipped_scope`. - **`trace`** — `{from, to, max_depth?, via?, budget_chars?}`. CLI: `codemap trace --from … --to …`. Shortest call path + budget-capped snippets (`call-path` recipe twin). Omitted `budget_chars` scales with indexed file count (15k / 10k / 6k). `truncated` when snippet budget hit (`truncation.snippets`); dependency hops set `snippets_skipped_reason` instead of auto-snippets. - **`explore`** — `{names, depth?, kind?, budget_chars?}`. CLI: `codemap explore …`. Multi-name neighborhood survey + snippets (`symbol-neighborhood` per deduped name). Explore row cap is always adaptive (500 / 250 / 125 by repo size); snippet budget is adaptive (15k / 10k / 6k) when `budget_chars` omitted. `truncated` when row cap and/or snippet budget hit (`truncation.rows` / `truncation.snippets`). diff --git a/templates/agent-content/skill/40-query-patterns.md b/templates/agent-content/skill/40-query-patterns.md index 31c36e65..9310cd0d 100644 --- a/templates/agent-content/skill/40-query-patterns.md +++ b/templates/agent-content/skill/40-query-patterns.md @@ -12,7 +12,9 @@ SELECT name, kind, file_path, line_start FROM symbols WHERE name LIKE '%Config%' ORDER BY name; -- Field-qualified search — CLI: `codemap show --query '…'`; MCP/HTTP `show` / `snippet` with `{query: …}`: --- `codemap show --query 'kind:function name:Auth path:src/' --print-sql` +-- Fast tier (equality index): positional `codemap show hashContent` or lone `name:Token` (no %/_ wildcards, no kind/path/in/free text) → `name = ?` +-- Slow tier (broader scan): `name:%pat%` substring LIKE; multi-field `kind:… name:… path:…`; free text → name LIKE or source_fts with --with-fts +-- Examples: `codemap show --query 'name:hashContent' --print-sql` (fast); `codemap show --query 'kind:function name:Auth path:src/' --print-sql` (slow) SELECT name, kind, file_path, line_start, line_end, signature, is_exported, parent_name, visibility FROM symbols diff --git a/templates/recipes/find-symbol-definitions.md b/templates/recipes/find-symbol-definitions.md index 870602c4..5920fff0 100644 --- a/templates/recipes/find-symbol-definitions.md +++ b/templates/recipes/find-symbol-definitions.md @@ -14,7 +14,7 @@ actions: # find-symbol-definitions -Locate every definition of a named symbol with column-precise positions. Foundation for `rename-preview`'s definition-row CTE (Tier 6 will extend to call sites + re-export aliases via [`find-call-sites`](./find-call-sites.md) + [`find-export-sites`](./find-export-sites.md)). +Locate every definition of a named symbol with column-precise positions. **Fast tier** — `WHERE name = ?` equality (same as `codemap show ` / lone `show --query 'name:Token'` without wildcards). For substring discovery use `find-symbol-by-kind` or `show --query 'name:%pat%'`. Foundation for `rename-preview`'s definition-row CTE (Tier 6 will extend to call sites + re-export aliases via [`find-call-sites`](./find-call-sites.md) + [`find-export-sites`](./find-export-sites.md)). ```bash codemap query --recipe find-symbol-definitions --params name=usePermissions