From 328b5fd35d1cbe9177095c134a4e4610731ee386 Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Thu, 11 Jun 2026 07:37:10 +0300 Subject: [PATCH 1/7] feat(security): harden query, serve, and validate read surfaces Close the formatted-query query_only gap, require --token on non-loopback serve binds, and reject validate paths that escape the project root or symlink outside it. --- docs/architecture.md | 6 +- docs/plans/impact-inpath-homonyms.md | 81 ++++++++++++ docs/plans/runtime-test-isolation.md | 75 +++++++++++ docs/plans/security-hardening-orchestrator.md | 87 +++++++++++++ docs/plans/security-hardening-wave1.md | 85 ++++++++++++ docs/roadmap.md | 1 + src/application/path-containment.test.ts | 58 ++++++++- src/application/path-containment.ts | 53 +++++++- src/application/validate-engine.ts | 27 +++- src/cli/cmd-query-formatted.test.ts | 121 ++++++++++++++++++ src/cli/cmd-query.ts | 9 +- src/cli/cmd-serve.test.ts | 23 +++- src/cli/cmd-serve.ts | 19 ++- src/cli/cmd-validate.test.ts | 41 +++++- 14 files changed, 666 insertions(+), 20 deletions(-) create mode 100644 docs/plans/impact-inpath-homonyms.md create mode 100644 docs/plans/runtime-test-isolation.md create mode 100644 docs/plans/security-hardening-orchestrator.md create mode 100644 docs/plans/security-hardening-wave1.md create mode 100644 src/cli/cmd-query-formatted.test.ts diff --git a/docs/architecture.md b/docs/architecture.md index 30bfc20a..65aaa653 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -120,11 +120,11 @@ A local SQLite database (`.codemap/index.db`) indexes the project tree and store **Commands and flags** (index, query, **`codemap agents init`**, **`--root`**, **`--config`**, environment): [../README.md § CLI](../README.md#cli) — **do not duplicate** flag lists here; this section only adds implementation notes. From this repository: **`bun run dev`** or **`bun src/index.ts`** (same flags). -**Query wiring:** Ad-hoc and recipe CLI SQL runs through **`printQueryResult`** in **`src/application/index-engine.ts`**, which sets **`PRAGMA query_only = 1`** before execute (parity with **`queryRows`** / **`executeQuery`**). **`src/cli/cmd-query.ts`** (argv, `--recipe` / `-r` alias, **`--summary`**, **`--changed-since`**, **`--group-by`**, **`--save-baseline`** / **`--baseline`** / **`--baselines`** / **`--drop-baseline`**, **`--ci`** (aliases `--format sarif` + non-zero exit on findings + quiet)), **`src/application/query-recipes.ts`** (**`QUERY_RECIPES`** — recipe registry proxy over bundled + project-local recipes; optional **`actions: RecipeAction[]`** per recipe), **`src/cli/main.ts`** (**`--recipes-json`** / **`--print-sql`** exit before config/DB). With **`--json`**, errors use **`{"error":"…"}`** on stdout for SQL failures, DB open, and bootstrap (same shape); **`runQueryCmd`** sets **`process.exitCode`** instead of **`process.exit`**. Friendlier "no `.codemap/index.db`" — `no such table: ` and `no such column: ` errors are rewritten in **`enrichQueryError`** to point at `codemap` / `codemap --full`. **`--summary`** filters output only — the SQL still executes against the index; output collapses to `{"count": N}` (with `--json`) or `count: N`. **`--changed-since `** post-filters result rows by `path` / `file_path` / `from_path` / `to_path` / `resolved_path` against `git diff --name-only ...HEAD ∪ git status --porcelain` (helper: **`src/git-changed.ts`** — `getFilesChangedSince`, `filterRowsByChangedFiles`, `PATH_COLUMNS`); rows with no recognised path column pass through. **`--group-by `** (`owner` | `directory` | `package`) routes through **`runGroupedQuery`** in `cmd-query.ts` and emits `{"group_by": "", "groups": [{key, count, rows}]}` (or `[{key, count}]` with `--summary`); helpers in **`src/group-by.ts`** (`groupRowsBy`, `firstDirectory`, `loadCodeowners`, `discoverWorkspaceRoots`, `makePackageBucketizer`, `codeownersGlobToRegex`). CODEOWNERS lookup is last-match-wins (GitHub semantics); workspace discovery reads `package.json` `workspaces` and `pnpm-workspace.yaml` `packages:`. **`--save-baseline[=]`** snapshots the result to the **`query_baselines`** table inside `/index.db` (default `.codemap/index.db`; no parallel JSON files; survives `--full` / SCHEMA bumps because the table is intentionally absent from `dropAll()`); name defaults to `--recipe` id, ad-hoc SQL needs an explicit name. **`--baseline[=]`** replays the SQL, fetches the saved row set, and emits `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (or `{baseline:{...}, current_row_count, added: N, removed: N}` with `--summary`); identity is per-row multiset equality (canonical `JSON.stringify` keyed frequency map — duplicate rows are tracked, not collapsed). No fuzzy "changed" category in v1. **`--group-by` is mutually exclusive** with both `--save-baseline` and `--baseline` (different output shapes). **`--baselines`** (read-only list) and **`--drop-baseline `** complete the surface; helpers in **`src/db.ts`** (`upsertQueryBaseline`, `getQueryBaseline`, `listQueryBaselines`, `deleteQueryBaseline`). **Per-row recipe `actions`** are appended only when the user runs **`--recipe `** with **`--json`** AND the recipe defines an `actions` template — programmatic `cm.query(sql)` and ad-hoc CLI SQL never carry actions; under `--baseline`, actions attach to `added` rows only (the rows the agent should act on). The **`components-by-hooks`** recipe ranks by hook count with a **comma-based tally** on **`hooks_used`** (no SQLite JSON1). Shipped **`templates/agents/`** documents **`codemap query --json`** as the primary agent example ([README § CLI](../README.md#cli)). +**Query wiring:** Ad-hoc and recipe CLI SQL runs through **`printQueryResult`** in **`src/application/index-engine.ts`**, which sets **`PRAGMA query_only = 1`** before execute (parity with **`queryRows`** / **`executeQuery`**). **`--format`** outputs (SARIF, badge, …) route through **`printFormattedQuery`** → **`queryRows`** — same read-only guard. **`src/cli/cmd-query.ts`** (argv, `--recipe` / `-r` alias, **`--summary`**, **`--changed-since`**, **`--group-by`**, **`--save-baseline`** / **`--baseline`** / **`--baselines`** / **`--drop-baseline`**, **`--ci`** (aliases `--format sarif` + non-zero exit on findings + quiet)), **`src/application/query-recipes.ts`** (**`QUERY_RECIPES`** — recipe registry proxy over bundled + project-local recipes; optional **`actions: RecipeAction[]`** per recipe), **`src/cli/main.ts`** (**`--recipes-json`** / **`--print-sql`** exit before config/DB). With **`--json`**, errors use **`{"error":"…"}`** on stdout for SQL failures, DB open, and bootstrap (same shape); **`runQueryCmd`** sets **`process.exitCode`** instead of **`process.exit`**. Friendlier "no `.codemap/index.db`" — `no such table: ` and `no such column: ` errors are rewritten in **`enrichQueryError`** to point at `codemap` / `codemap --full`. **`--summary`** filters output only — the SQL still executes against the index; output collapses to `{"count": N}` (with `--json`) or `count: N`. **`--changed-since `** post-filters result rows by `path` / `file_path` / `from_path` / `to_path` / `resolved_path` against `git diff --name-only ...HEAD ∪ git status --porcelain` (helper: **`src/git-changed.ts`** — `getFilesChangedSince`, `filterRowsByChangedFiles`, `PATH_COLUMNS`); rows with no recognised path column pass through. **`--group-by `** (`owner` | `directory` | `package`) routes through **`runGroupedQuery`** in `cmd-query.ts` and emits `{"group_by": "", "groups": [{key, count, rows}]}` (or `[{key, count}]` with `--summary`); helpers in **`src/group-by.ts`** (`groupRowsBy`, `firstDirectory`, `loadCodeowners`, `discoverWorkspaceRoots`, `makePackageBucketizer`, `codeownersGlobToRegex`). CODEOWNERS lookup is last-match-wins (GitHub semantics); workspace discovery reads `package.json` `workspaces` and `pnpm-workspace.yaml` `packages:`. **`--save-baseline[=]`** snapshots the result to the **`query_baselines`** table inside `/index.db` (default `.codemap/index.db`; no parallel JSON files; survives `--full` / SCHEMA bumps because the table is intentionally absent from `dropAll()`); name defaults to `--recipe` id, ad-hoc SQL needs an explicit name. **`--baseline[=]`** replays the SQL, fetches the saved row set, and emits `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (or `{baseline:{...}, current_row_count, added: N, removed: N}` with `--summary`); identity is per-row multiset equality (canonical `JSON.stringify` keyed frequency map — duplicate rows are tracked, not collapsed). No fuzzy "changed" category in v1. **`--group-by` is mutually exclusive** with both `--save-baseline` and `--baseline` (different output shapes). **`--baselines`** (read-only list) and **`--drop-baseline `** complete the surface; helpers in **`src/db.ts`** (`upsertQueryBaseline`, `getQueryBaseline`, `listQueryBaselines`, `deleteQueryBaseline`). **Per-row recipe `actions`** are appended only when the user runs **`--recipe `** with **`--json`** AND the recipe defines an `actions` template — programmatic `cm.query(sql)` and ad-hoc CLI SQL never carry actions; under `--baseline`, actions attach to `added` rows only (the rows the agent should act on). The **`components-by-hooks`** recipe ranks by hook count with a **comma-based tally** on **`hooks_used`** (no SQLite JSON1). Shipped **`templates/agents/`** documents **`codemap query --json`** as the primary agent example ([README § CLI](../README.md#cli)). **Output formatters:** **`src/application/output-formatters.ts`** — pure transport-agnostic; **`formatSarif`** emits SARIF 2.1.0 (auto-detected location columns: `file_path` / `path` / `to_path` / `from_path` priority + optional `line_start` / `line_end` region; `rule.id = codemap.` for `--recipe`, `codemap.adhoc` for ad-hoc SQL; aggregate recipes without locations → `results: []` + stderr warning); **`formatAuditSarif`** emits the audit-shaped variant — one rule per delta key (`codemap.audit.-added`), one result per `added` row at severity `warning`; `removed` rows excluded (SARIF surfaces findings, not cleanups); location-only rows fall back to `"new : "` messages; **`formatAnnotations`** emits `::notice file=…,line=…::msg` GitHub Actions workflow commands (one line per locatable row; messages collapsed to a single line because the GH parser stops at the first newline); **`formatCodeClimate`** emits a GitLab Code Quality JSON array (`severity: minor` flat in v1; stable SHA-256 fingerprints from recipe id + path + line + check name + row message (`lines.begin` falls back to `1` when `line_start` absent)); **`formatBadge`** / **`formatBadgeJson`** emit a single-line markdown summary (`codemap: N issues` / `codemap: clean`) or `codemap-badge/v1` JSON (`--badge-style json` / MCP `badge_style`) from locatable-row count — agents triage via JSON rows, not badge severity; **`formatMermaid`** emits a `flowchart LR` from `{from, to, label?, kind?}` rows with a hard `MERMAID_MAX_EDGES = 50` ceiling — unbounded inputs reject with a scope-suggestion error naming the recipe + count + `LIMIT` / `--via` / `WHERE` knobs (auto-truncation deliberately out of scope; would be a verdict masquerading as output mode); **`formatDiff`** emits read-only unified diff text from `{file_path, line_start, before_pattern, after_pattern}` rows; **`formatDiffJson`** emits structured `{files, warnings, summary}` hunks for agents. Diff formatters read source files at format time and surface `stale` / `missing` flags when the indexed line no longer matches. Wired into both **`src/cli/cmd-query.ts`** (`--format `; `--format` overrides `--json`; formatted outputs reject `--summary` / `--group-by` / baseline at parse time) and the MCP **`query`** / **`query_recipe`** tools (`format: "sarif" | "annotations" | "mermaid" | "diff" | "diff-json" | "codeclimate" | "badge"` with the same incompatibility guard). Per-recipe `sarifLevel` / `sarifMessage` / `sarifRuleId` overrides via frontmatter on `.md` deferred to v1.x. -**Validate wiring:** **`src/cli/cmd-validate.ts`** (argv + render) + **`src/application/validate-engine.ts`** (engine — **`computeValidateRows`** + **`toProjectRelative`**). `computeValidateRows` is a pure function over `(db, projectRoot, paths)` returning `{path, status}` rows where `status ∈ stale | missing | unindexed`. CLI wraps it with read-once-and-print + exits **1** on any drift (git-status semantics). Path normalization: **`toProjectRelative`** converts CLI input to POSIX-style relative keys matching the `files.path` storage format (Windows backslash → forward slash); same convention as `lint-staged.config.js`. Also reused by `cmd-show.ts` / `cmd-snippet.ts` and the MCP show/snippet handlers — single canonical implementation. +**Validate wiring:** **`src/cli/cmd-validate.ts`** (argv + render) + **`src/application/validate-engine.ts`** (engine — **`computeValidateRows`** + **`toProjectRelative`**). `computeValidateRows` is a pure function over `(db, projectRoot, paths)` returning `{path, status}` rows where `status ∈ stale | missing | unindexed | rejected` (`rejected` + optional `reason` when a path escapes the project root or resolves outside via symlink — same containment helpers as apply). CLI wraps it with read-once-and-print + exits **1** on any drift (git-status semantics). Path normalization: **`toProjectRelative`** converts CLI input to POSIX-style relative keys matching the `files.path` storage format (Windows backslash → forward slash); same convention as `lint-staged.config.js`. Also reused by `cmd-show.ts` / `cmd-snippet.ts` and the MCP show/snippet handlers — single canonical implementation. #### Audit wiring @@ -204,7 +204,7 @@ Three **mutually exclusive** CLI entry shapes; all converge on `applyDiffPayload **MCP wiring:** **`src/cli/cmd-mcp.ts`** (argv — `--watch` / `--no-watch` / `--debounce` + `--help`; bootstrap absorbs `--root`/`--config`) + **`src/application/mcp-server.ts`** (transport — tool / resource registry, SDK glue). Mirrors the `cmd-audit.ts ↔ audit-engine.ts` seam — CLI parses + lifecycle; engine owns the SDK. **`runMcpServer`** bootstraps codemap once at server boot (config + resolver + DB access become module-level state), instantiates `McpServer` from **`@modelcontextprotocol/sdk`**, attaches a **`StdioServerTransport`**, and resolves on client disconnect via **`src/application/session-lifecycle.ts`** (`createStdioDisconnectMonitor` — stdin EOF, stdout EPIPE, parent-PID poll — plus SDK `transport.onclose` and SIGINT/SIGTERM). With `--watch`, **`createManagedWatchSession`** holds one client for the stdio session and **`forceStop`** drains the watcher on exit. Tool handlers reuse the existing engine entry-points: **`query`** / **`query_recipe`** call **`executeQuery`** in **`src/application/query-engine.ts`** (same `[...rows]` / `{count}` / `{group_by, groups}` envelope `--json` would print) unless **`baseline`** is set — then **`compareQueryBaseline`** in **`src/application/query-baseline.ts`** (incompatible with non-`json` **`format`** / **`group_by`**); **`ingest_coverage`** calls **`runIngestCoverageOnDb`** in **`src/application/ingest-coverage-run.ts`** (CLI twin: `codemap ingest-coverage --json`); **`query_batch`** loops per statement via **`handleQueryBatch`** → **`executeQuery`** (batch-wide defaults + per-item overrides; items are `string | {sql, summary?, changed_since?, group_by?}`); **`audit`** runs `resolveAuditBaselines` + `runAudit` from PR #33 unchanged; **`context`** / **`validate`** call `buildContextEnvelope` / `computeValidateRows` from **`src/application/context-engine.ts`** + **`src/application/validate-engine.ts`** (lifted out of `src/cli/cmd-*.ts` in PR #41 — see § Tool / resource handlers above). **`save_baseline`** is one polymorphic tool (`{name, sql? | recipe?}`) with a runtime exclusivity check — mirrors the CLI's single `--save-baseline=` verb. **Tool naming**: snake_case throughout — Codemap convention matching the patterns in MCP spec examples and reference servers (GitHub MCP, Cursor built-ins); the spec itself doesn't mandate it. CLI stays kebab — translation lives at the MCP-arg layer. **Resources** split by freshness contract: `codemap://schema`, `codemap://skill`, `codemap://rule`, and `codemap://mcp-instructions` use **lazy memoisation** — first `read_resource` populates a per-server-instance cache; constant for the server-process lifetime so eager-vs-lazy produce identical observable behavior. `codemap://recipes`, `codemap://recipes/{id}`, `codemap://files/{+path}`, and `codemap://symbols/{name}` are **live read-per-call** (no cache) so inline recency fields and index mutations under `--watch` don't freeze at first-read. `codemap://schema` queries `sqlite_schema` live (on first read, then cached); `codemap://skill` / `codemap://rule` / `codemap://mcp-instructions` call `assembleAgentContent(kind)` from `application/agent-content.ts`, which concatenates section files under `templates/agent-content//` and dispatches `*.gen.md` files through `RENDERERS` (live recipe catalog, live `createTables()` DDL) — see [agents.md § Section assembler](./agents.md#section-assembler-and-genmd). Output shape: each tool returns the JSON payload its CLI counterpart would print (`query batch`, `trace`, `explore`, `node`, `file`, `schema`, `context --include-snippets`, `ingest-coverage`); MCP wraps via `content: [{type: "text", text: JSON.stringify(payload)}]`. **`tools/list` ToolAnnotations** — advisory `readOnlyHint` / `destructiveHint` / `idempotentHint` per tool from **`src/application/mcp-tool-annotations.ts`** (central map beside **`mcp-tool-allowlist.ts`**); read paths (`query`, `show`, `audit`, …) → `readOnlyHint: true`; disk-write apply tools → `destructiveHint: true` (writes still require `yes: true`); index user-data mutators (`save_baseline`, `drop_baseline`, `ingest_coverage`) → `readOnlyHint: false` without `destructiveHint`. Omitted when an older `@modelcontextprotocol/sdk` lacks annotation fields (M.6 guard). `--changed-since` git lookups are memoised per `(root, ref)` pair across batch items so a `query_batch` of N items sharing the same ref does one git invocation, not N. Per-statement errors in `query_batch` are isolated — failed statements return `{error}` in their slot while siblings still execute. -**HTTP wiring:** **`src/cli/cmd-serve.ts`** (argv — `--host` / `--port` / `--token`; bootstrap absorbs `--root`/`--config`) + **`src/application/http-server.ts`** (transport — bare `node:http`; routes `POST /tool/{name}` to `tool-handlers`, `GET /resources/{encoded-uri}` to `resource-handlers`, plus `GET /health` / `GET /tools` / `GET /resources`). Default bind **`127.0.0.1:7878`** (loopback only — refuse `0.0.0.0` unless explicitly opted in via `--host 0.0.0.0`). Optional **`--token `** requires `Authorization: Bearer ` on every request; `GET /health` is auth-exempt so liveness probes work without leaking the token. **CSRF + DNS-rebinding guard** (`csrfCheck`) runs before every route — rejects `Sec-Fetch-Site: cross-site` / `same-site` (modern-browser CSRF), any present `Origin` header (including the opaque string `null`; older-browser CSRF fallback), and `Host` header mismatch on loopback bind (DNS rebinding). Non-browser clients (curl, fetch from Node, MCP hosts, CI scripts) don't send those headers and pass through. The guard runs even on `/health` so a malicious local webpage can't probe for liveness. Output shape: HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper — HTTP doesn't need that transport artifact); `query` / `query_recipe` match `codemap query --json` row arrays (or `{count}` / `{group_by,groups}` when `summary` / `group_by` is set, or baseline diff when `baseline` is set — incompatible with non-`json` `format` / `group_by`; save/list/drop remain separate tools); other tools match their CLI `--json` envelopes; `format: "sarif"` payloads ship as `application/sarif+json`, `format: "annotations"` / `"mermaid"` / `"diff"` / `"badge"` (markdown) as `text/plain; charset=utf-8`, `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` as `application/json; charset=utf-8`, JSON otherwise. Per-request DB lifecycle: open / `PRAGMA query_only = 1` / close per call (SQLite reader concurrency); 1 MiB request-body cap rejects trivial DoS. **`GET /tools`** returns the same advisory hint fields as MCP `tools/list` (`readOnlyHint` / `destructiveHint` / `idempotentHint` per entry via **`buildHttpToolCatalogEntry`**). SIGINT / SIGTERM → graceful drain via `server.close()`. Every response carries **`X-Codemap-Version: `** so consumers can pin / detect upgrades. +**HTTP wiring:** **`src/cli/cmd-serve.ts`** (argv — `--host` / `--port` / `--token`; bootstrap absorbs `--root`/`--config`) + **`src/application/http-server.ts`** (transport — bare `node:http`; routes `POST /tool/{name}` to `tool-handlers`, `GET /resources/{encoded-uri}` to `resource-handlers`, plus `GET /health` / `GET /tools` / `GET /resources`). Default bind **`127.0.0.1:7878`** (loopback only — refuse `0.0.0.0` unless explicitly opted in via `--host 0.0.0.0`). **`--token `** is optional on loopback; **mandatory** when binding a non-loopback address. When set, requires `Authorization: Bearer ` on every request; `GET /health` is auth-exempt so liveness probes work without leaking the token. **CSRF + DNS-rebinding guard** (`csrfCheck`) runs before every route — rejects `Sec-Fetch-Site: cross-site` / `same-site` (modern-browser CSRF), any present `Origin` header (including the opaque string `null`; older-browser CSRF fallback), and `Host` header mismatch on loopback bind (DNS rebinding). Non-browser clients (curl, fetch from Node, MCP hosts, CI scripts) don't send those headers and pass through. The guard runs even on `/health` so a malicious local webpage can't probe for liveness. Output shape: HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper — HTTP doesn't need that transport artifact); `query` / `query_recipe` match `codemap query --json` row arrays (or `{count}` / `{group_by,groups}` when `summary` / `group_by` is set, or baseline diff when `baseline` is set — incompatible with non-`json` `format` / `group_by`; save/list/drop remain separate tools); other tools match their CLI `--json` envelopes; `format: "sarif"` payloads ship as `application/sarif+json`, `format: "annotations"` / `"mermaid"` / `"diff"` / `"badge"` (markdown) as `text/plain; charset=utf-8`, `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` as `application/json; charset=utf-8`, JSON otherwise. Per-request DB lifecycle: open / `PRAGMA query_only = 1` / close per call (SQLite reader concurrency); 1 MiB request-body cap rejects trivial DoS. **`GET /tools`** returns the same advisory hint fields as MCP `tools/list` (`readOnlyHint` / `destructiveHint` / `idempotentHint` per entry via **`buildHttpToolCatalogEntry`**). SIGINT / SIGTERM → graceful drain via `server.close()`. Every response carries **`X-Codemap-Version: `** so consumers can pin / detect upgrades. **Watch wiring:** **`src/cli/cmd-watch.ts`** (argv — `--debounce ` / `--quiet`; bootstrap absorbs `--root`/`--config`) + **`src/application/watcher.ts`** (engine — pure debouncer + glob filter + injectable backend; production wires [chokidar v5](https://github.com/paulmillr/chokidar) selected via the 6-watcher audit in PR #46 — pure JS, runs identically on Bun + Node, ~30M repos use it). On every change/add/unlink event chokidar emits, the engine filters via `shouldIndexPath` (same indexed extensions as the indexer + project-local recipes; skips `node_modules` / `.git` / `dist`), debounces with a sliding window (default 250 ms), then calls `createReindexOnChange` which opens a DB, runs `runCodemapIndex({mode: 'files', files: [...changed]})`, closes the DB, and logs `reindex N file(s) in Mms` to stderr unless `--quiet`. SIGINT / SIGTERM drains pending edits via `flushNow()` before the watcher closes. **Default-ON for `mcp` / `serve` since 2026-05:** both transports embed the watcher via **`createManagedWatchSession`** in **`session-lifecycle.ts`** — MCP holds one client for the stdio session; HTTP acquires per request (excluding `/health`) and stops the watcher after the last client plus a 5s release grace (not an MCP idle shutdown). Opt out with `--no-watch`, `CODEMAP_WATCH=0`, or `CODEMAP_NO_WATCH=1`. **`src/application/watch-policy.ts`** disables the watcher on WSL2 Windows drive mounts (`/mnt/*`) unless `CODEMAP_FORCE_WATCH=1`; stderr points at `codemap agents init --git-hooks` for git-triggered freshness. Standalone `codemap watch` runs the watcher decoupled from a transport for users wiring it next to a separate MCP / HTTP process. **Audit prelude optimization:** module-level `watchActive` flag; `handleAudit` skips its incremental-index prelude when active (and marks the close as readonly to avoid a wasted checkpoint). Explicit `no_index: false` still forces the prelude. diff --git a/docs/plans/impact-inpath-homonyms.md b/docs/plans/impact-inpath-homonyms.md new file mode 100644 index 00000000..b075ad74 --- /dev/null +++ b/docs/plans/impact-inpath-homonyms.md @@ -0,0 +1,81 @@ +# PR 2 — impact `inPath` homonym scoping + +> **Status:** open (not started) · **PR:** 2 of 3 · **Effort:** S–M +> +> **Orchestrator:** [`security-hardening-orchestrator.md`](./security-hardening-orchestrator.md) +> +> **Motivator:** `findImpact` resolves homonym symbols but walks call graph by name only — wrong blast-radius. Align with shipped `define_in` (#165) and existing `show`/`trace` `inPath` patterns. Moat B substrate fidelity. + +--- + +## Agent start here + +**Blocked until PR 1 merges.** + +### Key touchpoints + +| File | What | +| -------------------------------------- | ---------------------------------------------------------- | +| `src/application/impact-engine.ts` | `inPath` on `FindImpactOpts`, per-file walks, `scopeFiles` | +| `src/cli/cmd-impact.ts` | `--in ` flag | +| `src/cli/cmd-composers.ts` | MCP/CLI composer wiring | +| `src/application/tool-handlers.ts` | HTTP/MCP `impact` handler | +| `src/application/mcp-server.ts` | Tool schema `in` param | +| `src/application/trace-engine.test.ts` | Homonym test patterns to mirror | + +### Architecture + +```text +findImpact({ target, inPath? }) + → resolveTarget → matched_in[] + → if inPath set and ∉ matched_in → empty + skip reason + → if homonym (|matched_in| > 1) and no inPath → walk per defining file, merge/dedup + → walkCalls: scopeFiles filters call-site file_path +``` + +--- + +## Task list + +| ID | Task | Status | Verify | +| --- | ---------------------------------------------------- | ------- | ------------------------------------------------ | +| 4.1 | `inPath?: string` on `FindImpactOpts` / `findImpact` | pending | `bun test src/application/impact-engine.test.ts` | +| 4.2 | Multi `matched_in` → per-file walks; merge/dedup | pending | homonym fixture | +| 4.3 | `inPath` ∉ `matched_in` → empty + skip reason | pending | test | +| 4.4 | Walkers: `scopeFiles` on call-site file | pending | test | +| 4.5 | CLI `codemap impact --in ` | pending | `bun test src/cli/cmd-impact.test.ts` | +| 4.6 | MCP/HTTP `impact` `in` param | pending | MCP tests | +| 4.7 | Doc lift (architecture § impact) | pending | format check | +| 4.s | Commit + PR + CI | pending | `bun run check` | + +--- + +## Pre-locked decisions + +| # | Decision | +| ---- | ------------------------------------------------------------------------------------------------------- | +| P2.1 | `inPath` semantics match `show-engine` prefix/exact rules (not `define_in` — that's write-side anchor). | +| P2.2 | Unscoped homonym → union per-file walks, not silent name-level merge. | +| P2.3 | Moat A safe — still composable graph envelope, not a verdict primitive. | + +--- + +## Acceptance + +- [ ] Homonym: unscoped walk unions per-defining-file graphs +- [ ] `inPath` outside `matched_in` → empty matches + skip reason +- [ ] CLI `--in` and MCP `in` wired +- [ ] PR merged to `main` + +### Verify + +```bash +bun test src/application/impact-engine.test.ts src/cli/cmd-impact.test.ts +bun run check +``` + +--- + +## Lifecycle + +**Close when:** PR merged. Delete this file; lift to `docs/architecture.md` § impact; update orchestrator session log. diff --git a/docs/plans/runtime-test-isolation.md b/docs/plans/runtime-test-isolation.md new file mode 100644 index 00000000..068150a5 --- /dev/null +++ b/docs/plans/runtime-test-isolation.md @@ -0,0 +1,75 @@ +# PR 3 — runtime guards & test isolation + +> **Status:** open (not started) · **PR:** 3 of 3 · **Effort:** S–M +> +> **Orchestrator:** [`security-hardening-orchestrator.md`](./security-hardening-orchestrator.md) +> +> **Motivator:** Codify one-root-per-process constraint; stop silent `initCodemap` root bleed in tests; fail-fast invalid config at load. Maintainer-heavy; small user-visible API change (`createCodemap` second root throws). + +--- + +## Agent start here + +**Blocked until PR 1 merges** (PR 2 optional beforehand). + +### Key touchpoints + +| File | What | +| ----------------------------------- | --------------------------------------------------- | +| `src/runtime-swap.ts` | Audit worktree root bracket (new) | +| `src/runtime.ts` | Throw on root switch | +| `src/resolver.ts` | Resolver reset / guard | +| `src/test-helpers/runtime-reset.ts` | `resetCodemapForTest`, `installCodemapTestTeardown` | +| `src/application/audit-engine.ts` | `makeWorktreeReindex` bracket | +| `src/config.ts` / `state-config.ts` | `loadUserConfig` validation | +| `src/api.ts` | Doc: throws vs last-wins | + +### Suites needing teardown rollout (grep `initCodemap`) + +`churn-ingest.test.ts`, `context-engine.test.ts`, `trace-engine.test.ts`, `worker-pool.dist.test.ts`, `cmd-affected` tests, `recipe-recency.test.ts`, `benchmark-config.test.ts`, `agents-init.test.ts`, … — complete list in PR diff. + +--- + +## Task list + +| ID | Task | Status | Verify | +| --- | -------------------------------------------------------- | ------- | ------------------------------ | +| 5.1 | `runtime-swap.ts` + audit worktree bracket | pending | `bun test src/runtime.test.ts` | +| 5.2 | `initCodemap` / `configureResolver` throw on root switch | pending | runtime tests | +| 5.3 | `resetCodemapForTest` + `installCodemapTestTeardown` | pending | — | +| 5.4 | Teardown rollout on `initCodemap` test suites | pending | affected `*.test.ts` | +| 5.5 | `loadUserConfig` → `parseCodemapUserConfig` at load | pending | `bun test src/config.test.ts` | +| 5.6 | `api.ts` + architecture: throws-on-root-switch | pending | — | +| 5.s | Commit + PR + CI | pending | `bun run check` | + +--- + +## Pre-locked decisions + +| # | Decision | +| ---- | ---------------------------------------------------------------------------------- | +| P3.1 | Audit `--base` worktree reindex is the **only** exempt root switch (swap bracket). | +| P3.2 | `createCodemap({ root: B })` after root A **throws** — document breaking tighten. | +| P3.3 | Teardown helper is maintainer-only; not a consumer surface. | + +--- + +## Acceptance + +- [ ] Second `initCodemap` with different root throws (audit exempt) +- [ ] Invalid explicit config fails at `loadUserConfig` +- [ ] Teardown on all `initCodemap` suites touched in PR +- [ ] PR merged to `main` + +### Verify + +```bash +bun test src/runtime.test.ts src/config.test.ts +bun run check +``` + +--- + +## Lifecycle + +**Close when:** PR merged. Delete this file; lift to `docs/architecture.md`; update orchestrator session log. diff --git a/docs/plans/security-hardening-orchestrator.md b/docs/plans/security-hardening-orchestrator.md new file mode 100644 index 00000000..175a8082 --- /dev/null +++ b/docs/plans/security-hardening-orchestrator.md @@ -0,0 +1,87 @@ +# Security hardening — task orchestrator + +> **Status:** open · **Priority:** P2 +> +> **Roadmap:** [§ Core substrate & platform](../roadmap.md#core-substrate--platform) +> +> **Motivator:** Unmerged hardening (query safety, HTTP bind, validate containment, impact homonyms, runtime isolation) split into **3 tracer-bullet PRs** with **one plan file each**. + +--- + +## Agent start here + +1. Read **§ PR schedule** below for status. +2. Open the **plan file** for the PR you are working on (implementation detail lives there). +3. Work **one PR at a time** from current `main`. +4. After merge: update **PR schedule** + **§ Session log** here; check acceptance in that PR's plan; close/delete that plan per its lifecycle. + +**Program non-goals:** atomic `state-config` writes, golden `schema.test.ts` hardening (unless touching query-golden), `SCHEMA_VERSION` 40 debate, streaming git log parser. + +--- + +## PR schedule + +| PR | Plan | Status | Blocks | +| ----- | -------------------------------------------------------------- | ---------------------------- | ----------------------------------- | +| **1** | [`security-hardening-wave1.md`](./security-hardening-wave1.md) | **implemented, uncommitted** | — | +| **2** | [`impact-inpath-homonyms.md`](./impact-inpath-homonyms.md) | **pending** | PR **1** merged | +| **3** | [`runtime-test-isolation.md`](./runtime-test-isolation.md) | **pending** | PR **1** merged (PR **2** optional) | + +| — | — | **deferred** | golden `schema.test.ts` + path guards | +| — | — | **skip** | atomic `ensureStateConfig` writes | + +**Exit rule:** one PR = one tracer bullet; `bun run check` green; update this file before starting the next PR. + +--- + +## ROI triage (program-level) + +Evaluated 2026-06 against [roadmap § Floors](../roadmap.md#floors-v1-product-shape) and [Moat A](../roadmap.md#moats-load-bearing). + +| Slice | Roadmap | ROI | PR | Verdict | +| --------------------------------------- | ------------------------ | ------------ | ----- | ------- | +| `printFormattedQuery` → `queryRows` | ✅ safety floor | **Good** S | **1** | Ship | +| Non-loopback `serve` requires `--token` | ✅ HTTP floor | **Good** S | **1** | Ship | +| `validate` `rejected` + symlink guard | ✅ safety floor | **Good** S | **1** | Ship | +| `impact` `inPath` homonym scoping | ✅ Moat B read primitive | **Good** S–M | **2** | Ship | +| Runtime root-switch guards + teardown | ✅ one-root architecture | **Good** S–M | **3** | Ship | +| Golden `schema.test.ts` | ⚠️ harness | **Medium** | — | Defer | +| Atomic `state-config` writes | ❌ | **Bad** | — | Skip | + +**Already on `main` (do not re-ship):** bug batch `54ad25a`, apply path containment `#112`, `query_only` on `queryRows`/`printQueryResult`, recipe CTE deny-list. + +--- + +## Program decisions (all PRs) + +| # | Decision | +| --- | ------------------------------------------------------------------------- | +| O.1 | Three PRs — security / impact / runtime independently reviewable. | +| O.2 | No new verdict primitives — `rejected` is a validate row status (Moat A). | +| O.3 | Each PR ships only its plan scope — no drive-by refactors. | + +--- + +## Session log + +| Date | Event | Notes | +| ---------- | ---------- | ----------------------------------------------------------------------------- | +| 2026-06-10 | Triage | ROI on 7 slices; 3-PR program adopted. | +| 2026-06-10 | PR 1 impl | PR **1** code complete; **not committed**. Docs: orchestrator + per-PR plans. | +| — | PR 1 merge | _PR URL · merge SHA · update plan status → closed_ | +| — | PR 2 start | _from `main`_ | +| — | PR 2 merge | _fill_ | +| — | PR 3 start | _from `main`_ | +| — | PR 3 merge | _fill · close orchestrator_ | + +--- + +## Program lifecycle + +**Close when:** PRs **1–3** merged (or remaining PRs explicitly deferred in roadmap with reason). + +**On close:** + +1. Delete this orchestrator + all three PR plan files. +2. Lift durable contracts to `docs/architecture.md`. +3. Remove or check off roadmap backlog bullet. diff --git a/docs/plans/security-hardening-wave1.md b/docs/plans/security-hardening-wave1.md new file mode 100644 index 00000000..dd76912b --- /dev/null +++ b/docs/plans/security-hardening-wave1.md @@ -0,0 +1,85 @@ +# PR 1 — query / serve / validate hardening + +> **Status:** open (implemented, uncommitted) · **PR:** 1 of 3 · **Effort:** S +> +> **Orchestrator:** [`security-hardening-orchestrator.md`](./security-hardening-orchestrator.md) +> +> **Motivator:** Close last CLI `query_only` gap on formatted query output; require auth on non-loopback `serve`; reject unsafe paths in `validate`. Roadmap safety floors — not new product verbs. + +--- + +## Agent start here + +### Key touchpoints + +| File | What | +| ------------------------------------- | ----------------------------------- | +| `src/cli/cmd-query.ts` | `printFormattedQuery` → `queryRows` | +| `src/cli/cmd-query-formatted.test.ts` | SARIF DML guard E2E | +| `src/cli/cmd-serve.ts` | `isLoopbackHost`, token gate | +| `src/application/path-containment.ts` | `pathTraversesSymlinkOutsideRoot` | +| `src/application/validate-engine.ts` | `rejected` status | +| `docs/architecture.md` | Query / HTTP / validate wiring | + +### Architecture + +```text +codemap query --format sarif|… + → printFormattedQuery → queryRows (PRAGMA query_only = 1) + +codemap serve --host 0.0.0.0 + → parseServeRest: require --token when !isLoopbackHost(host) + +codemap validate + → computeValidateRows → rejectValidatePath + → pathEscapesProjectRoot | pathTraversesSymlinkOutsideRoot → status rejected +``` + +--- + +## Task list + +| ID | Task | Status | Verify | +| --- | ---------------------------------------------------------- | ----------- | --------------------------------------------------- | +| 1.1 | `printFormattedQuery` → `queryRows` | **done** | `bun test src/cli/cmd-query-formatted.test.ts` | +| 1.2 | `cmd-query-formatted.test.ts` | **done** | same | +| 2.1 | `isLoopbackHost` + non-loopback `--token` required | **done** | `bun test src/cli/cmd-serve.test.ts` | +| 2.2 | Serve tests updated | **done** | same | +| 3.1 | `pathTraversesSymlinkOutsideRoot`, `resolvePathWithinRoot` | **done** | `bun test src/application/path-containment.test.ts` | +| 3.2 | `validate` `rejected` + `rejectValidatePath` | **done** | `bun test src/cli/cmd-validate.test.ts` | +| 3.3 | Test `../../../etc/passwd` → `rejected` | **done** | same | +| 1.x | `docs/architecture.md` lift | **done** | `bun run format:check` | +| 1.s | Commit + PR + CI | **pending** | `bun run check` | + +--- + +## Pre-locked decisions + +| # | Decision | +| ---- | ------------------------------------------------------------------------- | +| P1.1 | Formatted CLI paths use `queryRows` — completes CHANGELOG #107 hardening. | +| P1.2 | Loopback bind: token optional; non-loopback: token mandatory. | +| P1.3 | `rejected` is per-row status with `reason` — not a global verdict. | + +--- + +## Acceptance + +- [x] `codemap query --format sarif "DELETE FROM …"` errors without mutating `files` count +- [x] `codemap serve --host 0.0.0.0` errors without `--token` +- [x] `computeValidateRows(..., ["../../../etc/passwd"])` → `rejected` + reason +- [x] Architecture docs updated +- [ ] PR merged to `main` + +### Verify + +```bash +bun test src/cli/cmd-query-formatted.test.ts src/cli/cmd-serve.test.ts src/cli/cmd-validate.test.ts src/application/path-containment.test.ts +bun run check +``` + +--- + +## Lifecycle + +**Close when:** PR merged. Delete this file; lift contracts to `docs/architecture.md`; update [`security-hardening-orchestrator.md`](./security-hardening-orchestrator.md) session log. diff --git a/docs/roadmap.md b/docs/roadmap.md index 7513e951..ca0f814b 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -108,6 +108,7 @@ Predicate-as-API only — enrich row shape and audit deltas; no standalone pass/ - [ ] **`organize-imports` diff-shape recipe** — deterministic single-file import sort/group; `imports.line_number` + `source` substrate sufficient. Review-first (`auto_fixable: false`). Effort: S. - [ ] **`codemap-to-tsmorph` Path B adapter** — separate package experiment: `query_recipe` discovery → `ts-morph` / `jscodeshift` transforms for AST-shape edits codemap's substring executor defers (see [architecture § Rejected apply-path alternatives](./architecture.md#apply--input-modes-transport-and-policy)). Not an in-tree AST writer (Path A rejected). Effort: M. - [ ] **Apply write-safety hardening** — close apply TOCTOU: SHA-256 `hashContent` at phase-1 read, recheck disk hash immediately before phase-2 write (`file content changed` conflict); `fsync` temp file before `rename`; skip files with mixed CRLF/LF (`mixed line endings`). Preserves all-or-nothing on any conflict. Plan: [`plans/apply-write-safety.md`](./plans/apply-write-safety.md). Effort: L. +- [ ] **Read-surface hardening (3 PRs)** — query/HTTP/validate safety, `impact` `inPath` homonyms, runtime guards + test teardown. **Orchestrator:** [`plans/security-hardening-orchestrator.md`](./plans/security-hardening-orchestrator.md). Plans: [PR1](./plans/security-hardening-wave1.md) · [PR2](./plans/impact-inpath-homonyms.md) · [PR3](./plans/runtime-test-isolation.md). Effort: S–M. - [ ] **`history` table** (deferred — revisit-triggered) — temporal queries: "when did symbol X get `@deprecated`?", "coverage trend over last 50 commits", "files that became dead this week". `audit --base ` covers the most-common temporal question (PR-scoped diff) without schema growth, so the table earns its place only when bigger questions emerge. Two shapes (per-commit snapshots ~N × DB size; append-only event log heavier CTE walks); both pay an N-reindexes backfill cost (~30s per reindex). **Revisit triggers:** two consumers ship `jq`-based "audit-runs-over-time" workflows, OR `query_baselines` evolution becomes a recurring agent need. - [ ] **`codemap audit` verdict + thresholds** (v1.x) — `verdict: "pass" | "warn" | "fail"` driven by an `audit.deltas[].{added_max, action}` field on the config object (`.codemap/config.{ts,js,json}`). Triggers: two consumers ship `jq`-based threshold scripts with similar shapes, OR one consumer asks with a concrete config sketch. Until then, raw deltas + consumer-side `jq` is the CI exit-code idiom. **Likely accelerant:** the Marketplace Action (next item) shipping is the most plausible path to firing the trigger — once `- uses: stainless-code/codemap@v1` is the dominant CI path, real `jq` threshold scripts will surface. - [ ] **GitHub Marketplace Action — publish + listing finish** — core Action implementation is in-tree: root `action.yml`, `query --ci`, `audit --format sarif` / `--ci`, package-manager detection, dogfood smoke, and opt-in `pr-comment` summary renderer have shipped. Remaining work is the release/listing slice: `MARKETPLACE.md`, `v1.0.0` / floating `v1` tags, Marketplace setup, sacrificial-repo smoke, and making `action-smoke` blocking once the Action tag exists. Action version stream is independent of CLI version (`package.json` currently drives CLI/npm version; Action publishes at its own `v1.0.0`). Plan: [`plans/github-marketplace-action.md`](./plans/github-marketplace-action.md). Effort: S. diff --git a/src/application/path-containment.test.ts b/src/application/path-containment.test.ts index afeb850e..8960cbd6 100644 --- a/src/application/path-containment.test.ts +++ b/src/application/path-containment.test.ts @@ -1,9 +1,13 @@ import { describe, expect, it } from "bun:test"; -import { mkdtempSync } from "node:fs"; +import { mkdirSync, mkdtempSync, symlinkSync, writeFileSync } from "node:fs"; import { tmpdir } from "node:os"; import { join } from "node:path"; -import { projectRelativePathFromResolved } from "./path-containment"; +import { + pathTraversesSymlinkOutsideRoot, + projectRelativePathFromResolved, + resolvePathWithinRoot, +} from "./path-containment"; describe("projectRelativePathFromResolved", () => { it("rejects sibling paths that only share a string prefix with the root", () => { @@ -19,3 +23,53 @@ describe("projectRelativePathFromResolved", () => { expect(projectRelativePathFromResolved(root, inside)).toBe("src/a.ts"); }); }); + +describe("pathTraversesSymlinkOutsideRoot", () => { + it("returns false for symlinks that stay inside the project root", () => { + const root = mkdtempSync(join(tmpdir(), "codemap-symlink-in-")); + writeFileSync(join(root, "real.ts"), "export const x = 1;\n"); + symlinkSync(join(root, "real.ts"), join(root, "link.ts")); + expect(pathTraversesSymlinkOutsideRoot(root, join(root, "link.ts"))).toBe( + false, + ); + }); + + it("returns true when a path component symlinks outside the project root", () => { + const base = mkdtempSync(join(tmpdir(), "codemap-symlink-out-")); + const root = join(base, "proj"); + const outside = join(base, "outside"); + mkdirSync(root, { recursive: true }); + mkdirSync(outside, { recursive: true }); + writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); + symlinkSync(join(outside, "secret.ts"), join(root, "escape.ts")); + expect(pathTraversesSymlinkOutsideRoot(root, join(root, "escape.ts"))).toBe( + true, + ); + }); +}); + +describe("resolvePathWithinRoot", () => { + it("returns the resolved absolute path for safe relative paths", () => { + const root = mkdtempSync(join(tmpdir(), "codemap-resolve-")); + mkdirSync(join(root, "src"), { recursive: true }); + expect(resolvePathWithinRoot(root, "src/a.ts")).toBe( + join(root, "src", "a.ts"), + ); + }); + + it("returns null when a relative path escapes via ..", () => { + const root = mkdtempSync(join(tmpdir(), "codemap-resolve-escape-")); + expect(resolvePathWithinRoot(root, "../../../etc/passwd")).toBeNull(); + }); + + it("returns null when a symlink target resolves outside the root", () => { + const base = mkdtempSync(join(tmpdir(), "codemap-resolve-symlink-")); + const root = join(base, "proj"); + const outside = join(base, "outside"); + mkdirSync(root, { recursive: true }); + mkdirSync(outside, { recursive: true }); + writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); + symlinkSync(join(outside, "secret.ts"), join(root, "escape.ts")); + expect(resolvePathWithinRoot(root, "escape.ts")).toBeNull(); + }); +}); diff --git a/src/application/path-containment.ts b/src/application/path-containment.ts index c258be0d..f889e77d 100644 --- a/src/application/path-containment.ts +++ b/src/application/path-containment.ts @@ -1,4 +1,5 @@ -import { isAbsolute, resolve, sep } from "node:path"; +import { lstatSync, realpathSync } from "node:fs"; +import { isAbsolute, join, relative, resolve, sep } from "node:path"; /** `true` iff `resolve(resolvedRoot, candidate)` lands inside `resolvedRoot`. */ export function isWithinProjectRoot( @@ -48,3 +49,53 @@ export function projectRelativePathFromResolved( if (!isWithinProjectRoot(resolvedRoot, abs)) return null; return canonicalizeProjectFilePath(resolvedRoot, abs); } + +/** + * `true` when any path component under `projectRoot` is a symlink whose target + * resolves outside the project root (apply/diff/validate follow symlinks on read). + */ +export function pathTraversesSymlinkOutsideRoot( + projectRoot: string, + absPath: string, +): boolean { + const resolvedRoot = resolve(projectRoot); + const resolvedTarget = resolve(absPath); + if (!isWithinProjectRoot(resolvedRoot, resolvedTarget)) return true; + + let rootReal: string; + try { + rootReal = realpathSync(resolvedRoot); + } catch { + rootReal = resolvedRoot; + } + + let current = resolvedRoot; + const relParts = relative(resolvedRoot, resolvedTarget) + .split(sep) + .filter(Boolean); + for (const part of relParts) { + current = join(current, part); + try { + if (lstatSync(current).isSymbolicLink()) { + const linkReal = realpathSync(current); + if (!isWithinProjectRoot(rootReal, linkReal)) return true; + current = linkReal; + } + } catch { + // Missing tail — string containment already checked. + return false; + } + } + return false; +} + +/** Resolve `relativePath` under `root`; `null` when it escapes the root. */ +export function resolvePathWithinRoot( + root: string, + relativePath: string, +): string | null { + if (pathEscapesProjectRoot(root, relativePath)) return null; + const abs = resolve(root, relativePath); + if (pathTraversesSymlinkOutsideRoot(root, abs)) return null; + return abs; +} diff --git a/src/application/validate-engine.ts b/src/application/validate-engine.ts index 26b7f7fd..19c1ac31 100644 --- a/src/application/validate-engine.ts +++ b/src/application/validate-engine.ts @@ -3,6 +3,10 @@ import { isAbsolute, relative, resolve, sep } from "node:path"; import type { CodemapDatabase } from "../db"; import { hashContent } from "../hash"; +import { + pathEscapesProjectRoot, + pathTraversesSymlinkOutsideRoot, +} from "./path-containment"; /** * One row in the staleness report. `status` distinguishes the three cases an @@ -10,7 +14,9 @@ import { hashContent } from "../hash"; */ export interface ValidateRow { path: string; - status: "stale" | "missing" | "unindexed"; + status: "stale" | "missing" | "unindexed" | "rejected"; + /** Present when `status` is `rejected` (path could not be checked safely). */ + reason?: string; } /** @@ -41,6 +47,12 @@ export function computeValidateRows( if (seen.has(rel)) continue; seen.add(rel); + const rejectReason = rejectValidatePath(projectRoot, rel); + if (rejectReason !== undefined) { + rows.push({ path: rel, status: "rejected", reason: rejectReason }); + continue; + } + const indexedHash = indexByPath.get(rel); const abs = resolve(projectRoot, rel); let source: string | undefined; @@ -73,6 +85,19 @@ export function computeValidateRows( * slashes (tinyglobby / Bun.Glob / git diff all emit POSIX), so we normalize * here to make `indexByPath.get(rel)` succeed cross-platform. */ +function rejectValidatePath( + projectRoot: string, + rel: string, +): string | undefined { + if (pathEscapesProjectRoot(projectRoot, rel)) { + return "path escapes project root"; + } + if (pathTraversesSymlinkOutsideRoot(projectRoot, resolve(projectRoot, rel))) { + return "path escapes via symlink"; + } + return undefined; +} + export function toProjectRelative(projectRoot: string, p: string): string { const rel = isAbsolute(p) ? relative(projectRoot, p) : p; return sep === "/" ? rel : rel.split(sep).join("/"); diff --git a/src/cli/cmd-query-formatted.test.ts b/src/cli/cmd-query-formatted.test.ts new file mode 100644 index 00000000..43a002f9 --- /dev/null +++ b/src/cli/cmd-query-formatted.test.ts @@ -0,0 +1,121 @@ +/** + * End-to-end coverage for formatted query output (`--format sarif|…`). + * Ensures DML is blocked via `queryRows` / PRAGMA query_only on the sarif path. + */ + +import { + afterEach, + beforeAll, + beforeEach, + describe, + expect, + it, +} from "bun:test"; +import { + existsSync, + mkdirSync, + mkdtempSync, + rmSync, + writeFileSync, +} from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; + +const repoRoot = join(import.meta.dir, "..", ".."); +const indexTs = join(repoRoot, "src", "index.ts"); +let bunBin: string | null = null; + +async function runCli( + args: string[], + envOverride: Record = {}, +): Promise<{ exitCode: number; out: string; err: string }> { + if (bunBin === null) { + throw new Error( + "cmd-query-formatted: bunBin not initialised by beforeAll.", + ); + } + const proc = Bun.spawn([bunBin, indexTs, ...args], { + cwd: repoRoot, + stdout: "pipe", + stderr: "pipe", + env: { ...process.env, ...envOverride }, + }); + const exitCode = await proc.exited; + const out = await new Response(proc.stdout).text(); + const err = await new Response(proc.stderr).text(); + return { exitCode, out, err }; +} + +let projectRoot: string; + +beforeAll(() => { + bunBin = Bun.which("bun"); + if (!bunBin || !existsSync(indexTs)) { + throw new Error( + `cmd-query-formatted: cannot locate Bun (${bunBin}) or src entry (${indexTs}).`, + ); + } +}); + +beforeEach(async () => { + projectRoot = mkdtempSync(join(tmpdir(), "codemap-cli-formatted-")); + mkdirSync(join(projectRoot, "src"), { recursive: true }); + writeFileSync( + join(projectRoot, "src", "keep.ts"), + "export const KEEP = 1;\n", + "utf8", + ); + writeFileSync( + join(projectRoot, "src", "drop.ts"), + "export const DROP = 2;\n", + "utf8", + ); + writeFileSync(join(projectRoot, "package.json"), "{}\n", "utf8"); + const idx = await runCli(["--full"], { CODEMAP_ROOT: projectRoot }); + expect(idx.exitCode).toBe(0); +}); + +afterEach(() => { + rmSync(projectRoot, { recursive: true, force: true }); +}); + +async function fileCount(): Promise { + const r = await runCli( + ["query", "--json", "SELECT COUNT(*) AS n FROM files"], + { CODEMAP_ROOT: projectRoot }, + ); + expect(r.exitCode).toBe(0); + const rows = JSON.parse(r.out) as Array<{ n: number }>; + return rows[0]?.n ?? 0; +} + +describe("runQueryCmd — formatted output DML guard", () => { + it("rejects DELETE via --format sarif without mutating the index", async () => { + const before = await fileCount(); + + const r = await runCli( + [ + "query", + "--format", + "sarif", + "DELETE FROM files WHERE path = 'src/drop.ts'", + ], + { CODEMAP_ROOT: projectRoot }, + ); + expect(r.exitCode).toBe(1); + expect(JSON.parse(r.out)).toMatchObject({ error: expect.any(String) }); + + expect(await fileCount()).toBe(before); + + const drop = await runCli( + [ + "query", + "--json", + "SELECT COUNT(*) AS n FROM files WHERE path = 'src/drop.ts'", + ], + { CODEMAP_ROOT: projectRoot }, + ); + expect(drop.exitCode).toBe(0); + expect(JSON.parse(drop.out)).toEqual([{ n: 1 }]); + }); +}); diff --git a/src/cli/cmd-query.ts b/src/cli/cmd-query.ts index 4e0c8514..c1796106 100644 --- a/src/cli/cmd-query.ts +++ b/src/cli/cmd-query.ts @@ -1118,13 +1118,8 @@ function printFormattedQuery( badgeStyle: BadgeStyle; }, ): FormattedQueryResult { - let db: Awaited> | undefined; try { - db = openDb(); - let rows = db.query(sql).all(...(opts.bindValues ?? [])) as Record< - string, - unknown - >[]; + let rows = queryRows(sql, opts.bindValues) as Record[]; if (opts.changedFiles !== undefined) { rows = filterRowsByChangedFiles(rows, opts.changedFiles) as Record< string, @@ -1201,8 +1196,6 @@ function printFormattedQuery( const msg = err instanceof Error ? err.message : String(err); console.log(JSON.stringify({ error: msg })); return { ok: false }; - } finally { - if (db !== undefined) closeDb(db, { readonly: true }); } } diff --git a/src/cli/cmd-serve.test.ts b/src/cli/cmd-serve.test.ts index c645206c..23328b5e 100644 --- a/src/cli/cmd-serve.test.ts +++ b/src/cli/cmd-serve.test.ts @@ -63,10 +63,29 @@ describe("parseServeRest", () => { expect(rOver.kind).toBe("error"); }); - it("parses --host ", () => { - const r = parseServeRest(["serve", "--host", "0.0.0.0"]); + it("parses --host with --token on non-loopback", () => { + const r = parseServeRest([ + "serve", + "--host", + "0.0.0.0", + "--token", + "secret", + ]); if (r.kind !== "run") throw new Error("expected run"); expect(r.host).toBe("0.0.0.0"); + expect(r.token).toBe("secret"); + }); + + it("rejects non-loopback bind without --token", () => { + const r = parseServeRest(["serve", "--host", "0.0.0.0"]); + expect(r.kind).toBe("error"); + if (r.kind === "error") expect(r.message).toContain("--token"); + }); + + it("parses --host loopback without token", () => { + const r = parseServeRest(["serve", "--host", "127.0.0.1"]); + if (r.kind !== "run") throw new Error("expected run"); + expect(r.token).toBeUndefined(); }); it("parses --token ", () => { diff --git a/src/cli/cmd-serve.ts b/src/cli/cmd-serve.ts index 8e85157b..ff168c99 100644 --- a/src/cli/cmd-serve.ts +++ b/src/cli/cmd-serve.ts @@ -13,6 +13,12 @@ import { CODEMAP_VERSION } from "../version"; export const DEFAULT_HOST = "127.0.0.1"; export const DEFAULT_PORT = 7878; +/** Hosts safe to bind without mandatory auth (loopback only). */ +export function isLoopbackHost(host: string): boolean { + const h = host.toLowerCase(); + return h === "127.0.0.1" || h === "localhost" || h === "::1" || h === "[::1]"; +} + export interface ServeRunOpts { host: string; port: number; @@ -151,6 +157,14 @@ export function parseServeRest(rest: string[]): }; } + if (!isLoopbackHost(host) && (token === undefined || token.length === 0)) { + return { + kind: "error", + message: + "codemap serve: non-loopback bind requires --token (use a long random secret). Example: codemap serve --host 0.0.0.0 --token $(openssl rand -hex 32)", + }; + } + return { kind: "run", host, port, token, watch, debounceMs }; } @@ -170,8 +184,9 @@ Flags: --port Bind port (default: ${DEFAULT_PORT}). --token Require Authorization: Bearer on every request. - GET /health is exempt so liveness probes work without - leaking the token. Use a long random string. + Mandatory when binding a non-loopback address (e.g. + --host 0.0.0.0). GET /health is exempt so liveness probes + work without leaking the token. Use a long random string. --watch [default ON] Boot an in-process file watcher so every tool reads a live index — eliminates the per-request reindex prelude. Default-ON since 2026-05; explicit diff --git a/src/cli/cmd-validate.test.ts b/src/cli/cmd-validate.test.ts index 668e9ed5..1537a1da 100644 --- a/src/cli/cmd-validate.test.ts +++ b/src/cli/cmd-validate.test.ts @@ -1,5 +1,11 @@ import { afterEach, beforeEach, describe, expect, it } from "bun:test"; -import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from "node:fs"; +import { + mkdirSync, + mkdtempSync, + rmSync, + symlinkSync, + writeFileSync, +} from "node:fs"; import { tmpdir } from "node:os"; import { join } from "node:path"; @@ -110,6 +116,39 @@ describe("computeValidateRows", () => { expect(rows).toEqual([{ path: "src/new.ts", status: "unindexed" }]); }); + it("rejects explicit paths that escape project root", () => { + const rows = withDb((db) => + computeValidateRows(db, tmpRoot, ["../../../etc/passwd"]), + ); + expect(rows).toEqual([ + { + path: "../../../etc/passwd", + status: "rejected", + reason: "path escapes project root", + }, + ]); + }); + + it("rejects explicit paths that escape via symlink", () => { + const base = mkdtempSync(join(tmpdir(), "codemap-validate-symlink-")); + const outside = join(base, "outside"); + mkdirSync(outside, { recursive: true }); + writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); + symlinkSync(join(outside, "secret.ts"), join(tmpRoot, "escape.ts")); + + const rows = withDb((db) => + computeValidateRows(db, tmpRoot, ["escape.ts"]), + ); + expect(rows).toEqual([ + { + path: "escape.ts", + status: "rejected", + reason: "path escapes via symlink", + }, + ]); + rmSync(base, { recursive: true, force: true }); + }); + it("dedupes paths and sorts by path", () => { writeFileSync(join(tmpRoot, "src/a.ts"), "v2\n"); writeFileSync(join(tmpRoot, "src/b.ts"), "v2\n"); From b261df8e2ae574ca3639d783e7ee6e1faa23196e Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Thu, 11 Jun 2026 07:42:48 +0300 Subject: [PATCH 2/7] harden: read-surface PR parity, bind policy, and ship docs Add changeset and serve-bind-policy with runHttpServer enforcement; align validate/serve consumer surfaces (CLI help, MCP, glossary, README, agent-content); extend symlink containment tests. --- .changeset/read-surface-hardening.md | 5 +++ README.md | 6 +-- docs/glossary.md | 4 +- docs/plans/security-hardening-orchestrator.md | 28 ++++++------ docs/plans/security-hardening-wave1.md | 4 +- src/application/http-server.ts | 7 ++- src/application/mcp-server.ts | 2 +- src/application/path-containment.test.ts | 19 ++++++++ src/application/serve-bind-policy.test.ts | 45 +++++++++++++++++++ src/application/serve-bind-policy.ts | 28 ++++++++++++ src/application/validate-engine.ts | 2 +- src/cli/bootstrap.ts | 2 +- src/cli/cmd-serve.test.ts | 12 +++++ src/cli/cmd-serve.ts | 16 ++----- src/cli/cmd-validate.ts | 5 ++- .../agent-content/skill/10-recipes-context.md | 4 +- .../agent-content/skill/50-maintenance.md | 2 +- 17 files changed, 150 insertions(+), 41 deletions(-) create mode 100644 .changeset/read-surface-hardening.md create mode 100644 src/application/serve-bind-policy.test.ts create mode 100644 src/application/serve-bind-policy.ts diff --git a/.changeset/read-surface-hardening.md b/.changeset/read-surface-hardening.md new file mode 100644 index 00000000..c4246dfa --- /dev/null +++ b/.changeset/read-surface-hardening.md @@ -0,0 +1,5 @@ +--- +"@stainless-code/codemap": patch +--- + +Harden read surfaces: `codemap query --format …` blocks index mutations via the same read-only guard as `--json`; `codemap serve` requires `--token` when `--host` is not loopback; `codemap validate` (and MCP/HTTP `validate`) can return `rejected` rows with a `reason` when a path escapes the project root or resolves outside via symlink. diff --git a/README.md b/README.md index 317b8bf7..702fa304 100644 --- a/README.md +++ b/README.md @@ -50,7 +50,7 @@ codemap dead-code --json # outcome alias → codemap query --json --recipe fan-out # recipe SQL by id (alias: -r) codemap query --json "SELECT name, file_path FROM symbols WHERE name = 'foo'" # ad-hoc SQL codemap --files src/a.ts src/b.tsx # targeted re-index after edits -codemap validate --json # detect stale / missing / unindexed files +codemap validate --json # detect stale / missing / unindexed / rejected files codemap context --compact --for "refactor auth" # JSON envelope + intent-matched recipes codemap ingest-coverage coverage/coverage-final.json --json # Istanbul / LCOV (auto-detected) → coverage table; joins with symbols NODE_V8_COVERAGE=.cov bun test && codemap ingest-coverage .cov --runtime --json # V8 protocol (per-process dumps); local-only @@ -162,9 +162,9 @@ codemap query --format diff-json 'SELECT "README.md" AS file_path, 1 AS line_sta codemap --with-fts --full codemap query --recipe text-in-deprecated-functions # demonstrates FTS5 ⨯ symbols ⨯ coverage JOIN # HTTP API — same tool taxonomy as `codemap mcp`, exposed over POST /tool/{name} for -# non-MCP consumers (CI scripts, curl, IDE plugins). Loopback default; optional --token. +# non-MCP consumers (CI scripts, curl, IDE plugins). Loopback default; --token required on non-loopback. TOKEN=$(openssl rand -hex 32) -codemap serve --port 7878 --token "$TOKEN" & +codemap serve --port 7878 --token "$TOKEN" & # --token required when --host is not loopback curl -s -X POST http://127.0.0.1:7878/tool/query \ -H 'Content-Type: application/json' \ -H "Authorization: Bearer $TOKEN" \ diff --git a/docs/glossary.md b/docs/glossary.md index dc92d645..c29958c3 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -125,7 +125,7 @@ CI-aggregate flag on `codemap query` and `codemap audit`. Aliases `--format sari ### `codemap validate` -CLI subcommand comparing on-disk SHA-256 against `files.content_hash`. Statuses: `stale | missing | unindexed`. Exits `1` on any drift. +CLI subcommand comparing on-disk SHA-256 against `files.content_hash`. Statuses: `stale | missing | unindexed | rejected` (`rejected` carries optional `reason` when a path escapes the project root or resolves outside via symlink). Exits `1` on any drift. ### `module_cycles` (table) / circular imports @@ -565,7 +565,7 @@ Long-running process that subscribes to filesystem changes via [chokidar v5](htt ### `codemap serve` / HTTP server -Long-running HTTP server exposing the same tool taxonomy as `codemap mcp` over `POST /tool/{name}` for non-MCP consumers (CI scripts, simple `curl`, IDE plugins that don't speak MCP). Default bind **`127.0.0.1:7878`** (loopback only — refuse `0.0.0.0` unless explicitly opted in via `--host 0.0.0.0`); optional `--token ` requires `Authorization: Bearer ` on every request. HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper); `query` / `query_recipe` match `codemap query --json` row arrays unless `summary` / `group_by` reshape the envelope, or `baseline` returns a diff envelope (incompatible with non-`json` `format` / `group_by`; save/list/drop remain separate tools); parity twins (`query batch`, `trace`, `explore`, `node`, `file`, `schema`, `symbols`, `context`, `ingest-coverage`, `ingest-churn`) always emit JSON on CLI without `--json`; other tools match their CLI `--json` payloads when that flag is set; `format: "sarif"` payloads ship as `application/sarif+json`, `format: "annotations"` / `"mermaid"` / `"diff"` / `"badge"` (markdown) as `text/plain; charset=utf-8`, `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` as `application/json; charset=utf-8`. Routes: `POST /tool/{name}` (every MCP tool), `GET /resources/{encoded-uri}` (resource handler for `codemap://recipes`, `codemap://recipes/{id}`, `codemap://schema`, `codemap://skill`, `codemap://rule`, `codemap://mcp-instructions`, `codemap://files/{path}`, and `codemap://symbols/{name}`), `GET /health` (auth-exempt liveness probe — does not start the watcher), `GET /tools` / `GET /resources` (catalogs). With `--watch`, chokidar is refcount-gated per request and stops 5s after the last client (`HTTP_WATCH_RELEASE_GRACE_MS`) — distinct from MCP idle shutdown; the HTTP process keeps listening. Pure transport — same `tool-handlers.ts` / `resource-handlers.ts` MCP uses; no engine duplication. Errors → `{"error": "..."}` with HTTP status 400 / 401 / 403 / 404 / 500. SIGINT / SIGTERM → graceful drain. Every response carries `X-Codemap-Version: `. **CSRF + DNS-rebinding guard:** every request (including auth-exempt `/health`) is evaluated against `Sec-Fetch-Site` / `Origin` / `Host` when present — modern browsers send `Sec-Fetch-Site` and `Origin` on cross-origin fetches (header presence varies by request type, browser, and privacy settings), so the guard rejects browser-driven cross-origin requests like a malicious local webpage `fetch`-ing `http://127.0.0.1:7878/tool/save_baseline` to mutate `.codemap/index.db`. `Host` mismatch on a loopback bind blocks DNS rebinding (an attacker resolving `evil.com` to `127.0.0.1` post-load). Non-browser clients (curl, fetch from Node, MCP hosts, CI scripts) typically omit these headers and pass through. Implementation: `src/cli/cmd-serve.ts` (CLI shell) + `src/application/http-server.ts` (transport). See [`architecture.md` § HTTP wiring](./architecture.md#cli-usage). +Long-running HTTP server exposing the same tool taxonomy as `codemap mcp` over `POST /tool/{name}` for non-MCP consumers (CI scripts, simple `curl`, IDE plugins that don't speak MCP). Default bind **`127.0.0.1:7878`** (loopback only — refuse `0.0.0.0` unless explicitly opted in via `--host 0.0.0.0`); `--token ` is optional on loopback binds and **required** on non-loopback binds — when set, every request needs `Authorization: Bearer `. HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper); `query` / `query_recipe` match `codemap query --json` row arrays unless `summary` / `group_by` reshape the envelope, or `baseline` returns a diff envelope (incompatible with non-`json` `format` / `group_by`; save/list/drop remain separate tools); parity twins (`query batch`, `trace`, `explore`, `node`, `file`, `schema`, `symbols`, `context`, `ingest-coverage`, `ingest-churn`) always emit JSON on CLI without `--json`; other tools match their CLI `--json` payloads when that flag is set; `format: "sarif"` payloads ship as `application/sarif+json`, `format: "annotations"` / `"mermaid"` / `"diff"` / `"badge"` (markdown) as `text/plain; charset=utf-8`, `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` as `application/json; charset=utf-8`. Routes: `POST /tool/{name}` (every MCP tool), `GET /resources/{encoded-uri}` (resource handler for `codemap://recipes`, `codemap://recipes/{id}`, `codemap://schema`, `codemap://skill`, `codemap://rule`, `codemap://mcp-instructions`, `codemap://files/{path}`, and `codemap://symbols/{name}`), `GET /health` (auth-exempt liveness probe — does not start the watcher), `GET /tools` / `GET /resources` (catalogs). With `--watch`, chokidar is refcount-gated per request and stops 5s after the last client (`HTTP_WATCH_RELEASE_GRACE_MS`) — distinct from MCP idle shutdown; the HTTP process keeps listening. Pure transport — same `tool-handlers.ts` / `resource-handlers.ts` MCP uses; no engine duplication. Errors → `{"error": "..."}` with HTTP status 400 / 401 / 403 / 404 / 500. SIGINT / SIGTERM → graceful drain. Every response carries `X-Codemap-Version: `. **CSRF + DNS-rebinding guard:** every request (including auth-exempt `/health`) is evaluated against `Sec-Fetch-Site` / `Origin` / `Host` when present — modern browsers send `Sec-Fetch-Site` and `Origin` on cross-origin fetches (header presence varies by request type, browser, and privacy settings), so the guard rejects browser-driven cross-origin requests like a malicious local webpage `fetch`-ing `http://127.0.0.1:7878/tool/save_baseline` to mutate `.codemap/index.db`. `Host` mismatch on a loopback bind blocks DNS rebinding (an attacker resolving `evil.com` to `127.0.0.1` post-load). Non-browser clients (curl, fetch from Node, MCP hosts, CI scripts) typically omit these headers and pass through. Implementation: `src/cli/cmd-serve.ts` (CLI shell) + `src/application/http-server.ts` (transport). See [`architecture.md` § HTTP wiring](./architecture.md#cli-usage). ### Code Climate format (`codeclimate`) diff --git a/docs/plans/security-hardening-orchestrator.md b/docs/plans/security-hardening-orchestrator.md index 175a8082..b9d021a6 100644 --- a/docs/plans/security-hardening-orchestrator.md +++ b/docs/plans/security-hardening-orchestrator.md @@ -21,11 +21,11 @@ ## PR schedule -| PR | Plan | Status | Blocks | -| ----- | -------------------------------------------------------------- | ---------------------------- | ----------------------------------- | -| **1** | [`security-hardening-wave1.md`](./security-hardening-wave1.md) | **implemented, uncommitted** | — | -| **2** | [`impact-inpath-homonyms.md`](./impact-inpath-homonyms.md) | **pending** | PR **1** merged | -| **3** | [`runtime-test-isolation.md`](./runtime-test-isolation.md) | **pending** | PR **1** merged (PR **2** optional) | +| PR | Plan | Status | Blocks | +| ----- | -------------------------------------------------------------- | ------------------------- | ----------------------------------- | +| **1** | [`security-hardening-wave1.md`](./security-hardening-wave1.md) | **committed, PR pending** | — | +| **2** | [`impact-inpath-homonyms.md`](./impact-inpath-homonyms.md) | **pending** | PR **1** merged | +| **3** | [`runtime-test-isolation.md`](./runtime-test-isolation.md) | **pending** | PR **1** merged (PR **2** optional) | | — | — | **deferred** | golden `schema.test.ts` + path guards | | — | — | **skip** | atomic `ensureStateConfig` writes | @@ -64,15 +64,15 @@ Evaluated 2026-06 against [roadmap § Floors](../roadmap.md#floors-v1-product-sh ## Session log -| Date | Event | Notes | -| ---------- | ---------- | ----------------------------------------------------------------------------- | -| 2026-06-10 | Triage | ROI on 7 slices; 3-PR program adopted. | -| 2026-06-10 | PR 1 impl | PR **1** code complete; **not committed**. Docs: orchestrator + per-PR plans. | -| — | PR 1 merge | _PR URL · merge SHA · update plan status → closed_ | -| — | PR 2 start | _from `main`_ | -| — | PR 2 merge | _fill_ | -| — | PR 3 start | _from `main`_ | -| — | PR 3 merge | _fill · close orchestrator_ | +| Date | Event | Notes | +| ---------- | ---------- | ---------------------------------------------------------------------------- | +| 2026-06-10 | Triage | ROI on 7 slices; 3-PR program adopted. | +| 2026-06-10 | PR 1 impl | PR **1** committed on `fix/security-hardening-wave1`; harden pass in flight. | +| — | PR 1 merge | _PR URL · merge SHA · update plan status → closed_ | +| — | PR 2 start | _from `main`_ | +| — | PR 2 merge | _fill_ | +| — | PR 3 start | _from `main`_ | +| — | PR 3 merge | _fill · close orchestrator_ | --- diff --git a/docs/plans/security-hardening-wave1.md b/docs/plans/security-hardening-wave1.md index dd76912b..c37a1859 100644 --- a/docs/plans/security-hardening-wave1.md +++ b/docs/plans/security-hardening-wave1.md @@ -1,6 +1,6 @@ # PR 1 — query / serve / validate hardening -> **Status:** open (implemented, uncommitted) · **PR:** 1 of 3 · **Effort:** S +> **Status:** open (committed, PR pending) · **PR:** 1 of 3 · **Effort:** S > > **Orchestrator:** [`security-hardening-orchestrator.md`](./security-hardening-orchestrator.md) > @@ -49,7 +49,7 @@ codemap validate | 3.2 | `validate` `rejected` + `rejectValidatePath` | **done** | `bun test src/cli/cmd-validate.test.ts` | | 3.3 | Test `../../../etc/passwd` → `rejected` | **done** | same | | 1.x | `docs/architecture.md` lift | **done** | `bun run format:check` | -| 1.s | Commit + PR + CI | **pending** | `bun run check` | +| 1.s | Commit + PR + CI | **pending** | `bun run check` (committed; PR + CI open) | --- diff --git a/src/application/http-server.ts b/src/application/http-server.ts index 5948902a..0311017e 100644 --- a/src/application/http-server.ts +++ b/src/application/http-server.ts @@ -21,6 +21,10 @@ import type { IndexFreshness } from "./index-freshness"; import { MCP_TOOL_NAMES } from "./mcp-tool-allowlist"; import { buildHttpToolCatalogEntry } from "./mcp-tool-annotations"; import { listResources, readResource } from "./resource-handlers"; +import { + assertServeBindRequiresToken, + isLoopbackHost, +} from "./serve-bind-policy"; import { bindWatchClientRelease, createManagedWatchSession, @@ -119,6 +123,7 @@ export interface HttpServerOpts { * to JSON `{"error": "..."}` with appropriate status codes. */ export async function runHttpServer(opts: HttpServerOpts): Promise { + assertServeBindRequiresToken(opts.host, opts.token); await bootstrapForServe(opts); let managedWatchSession: ManagedWatchSession | undefined; @@ -692,7 +697,7 @@ function csrfCheck( return `cross-origin request rejected (Sec-Fetch-Site: ${String(fetchSite)}). codemap serve does not accept browser-driven cross-origin requests.`; } - if (host === "127.0.0.1" || host === "localhost" || host === "::1") { + if (isLoopbackHost(host)) { const hostHeader = req.headers.host; if (hostHeader !== undefined) { const allowed = new Set([ diff --git a/src/application/mcp-server.ts b/src/application/mcp-server.ts index 7a25e093..cc3e7c3c 100644 --- a/src/application/mcp-server.ts +++ b/src/application/mcp-server.ts @@ -267,7 +267,7 @@ function registerValidateTool(server: McpServer): void { "validate", withToolAnnotations("validate", { description: - "Compare on-disk SHA-256 of indexed files to the indexed `files.content_hash` column. Returns only out-of-sync rows with status `stale` / `missing` / `unindexed` (fresh paths omitted). Empty `paths` validates every indexed file. Useful for 'codemap doctor' agents that diagnose a stale index before issuing structural queries.", + "Compare on-disk SHA-256 of indexed files to the indexed `files.content_hash` column. Returns only out-of-sync rows with status `stale` / `missing` / `unindexed` / `rejected` (fresh paths omitted; `rejected` includes optional `reason` when a path escapes the project root or resolves outside via symlink). Empty `paths` validates every indexed file. Useful for 'codemap doctor' agents that diagnose a stale index before issuing structural queries.", inputSchema: validateArgsSchema, }), (args) => wrapToolResult(handleValidate(args)), diff --git a/src/application/path-containment.test.ts b/src/application/path-containment.test.ts index 8960cbd6..171caddd 100644 --- a/src/application/path-containment.test.ts +++ b/src/application/path-containment.test.ts @@ -46,6 +46,25 @@ describe("pathTraversesSymlinkOutsideRoot", () => { true, ); }); + + it("returns true when an intermediate directory symlinks outside the root", () => { + const base = mkdtempSync(join(tmpdir(), "codemap-symlink-dir-")); + const root = join(base, "proj"); + const outside = join(base, "outside"); + mkdirSync(root, { recursive: true }); + mkdirSync(join(outside, "nested"), { recursive: true }); + writeFileSync( + join(outside, "nested", "secret.ts"), + "export const s = 1;\n", + ); + symlinkSync(outside, join(root, "linked-dir")); + expect( + pathTraversesSymlinkOutsideRoot( + root, + join(root, "linked-dir", "nested", "secret.ts"), + ), + ).toBe(true); + }); }); describe("resolvePathWithinRoot", () => { diff --git a/src/application/serve-bind-policy.test.ts b/src/application/serve-bind-policy.test.ts new file mode 100644 index 00000000..ffc68947 --- /dev/null +++ b/src/application/serve-bind-policy.test.ts @@ -0,0 +1,45 @@ +import { describe, expect, it } from "bun:test"; + +import { + assertServeBindRequiresToken, + isLoopbackHost, + serveBindTokenRequiredMessage, +} from "./serve-bind-policy"; + +describe("isLoopbackHost", () => { + it.each([ + ["127.0.0.1", true], + ["localhost", true], + ["::1", true], + ["[::1]", true], + ["0.0.0.0", false], + ["::", false], + ["192.168.1.1", false], + ] as const)("host %s → %s", (host, expected) => { + expect(isLoopbackHost(host)).toBe(expected); + }); +}); + +describe("serveBindTokenRequiredMessage", () => { + it("requires token on non-loopback binds", () => { + expect(serveBindTokenRequiredMessage("0.0.0.0", undefined)).toContain( + "non-loopback bind requires --token", + ); + expect(serveBindTokenRequiredMessage("::", undefined)).toContain( + "non-loopback bind requires --token", + ); + }); + + it("allows loopback binds without token", () => { + expect( + serveBindTokenRequiredMessage("127.0.0.1", undefined), + ).toBeUndefined(); + expect(serveBindTokenRequiredMessage("[::1]", undefined)).toBeUndefined(); + }); + + it("assertServeBindRequiresToken throws with the same message", () => { + expect(() => assertServeBindRequiresToken("0.0.0.0", undefined)).toThrow( + /non-loopback bind requires --token/, + ); + }); +}); diff --git a/src/application/serve-bind-policy.ts b/src/application/serve-bind-policy.ts new file mode 100644 index 00000000..f86c137c --- /dev/null +++ b/src/application/serve-bind-policy.ts @@ -0,0 +1,28 @@ +/** Hosts safe to bind without mandatory auth (loopback only). */ +export function isLoopbackHost(host: string): boolean { + const h = host.toLowerCase(); + return h === "127.0.0.1" || h === "localhost" || h === "::1" || h === "[::1]"; +} + +/** Error message when a non-loopback bind lacks a token; `undefined` when OK. */ +export function serveBindTokenRequiredMessage( + host: string, + token: string | undefined, +): string | undefined { + if (!isLoopbackHost(host) && (token === undefined || token.length === 0)) { + return ( + "codemap serve: non-loopback bind requires --token (use a long random secret). " + + "Example: codemap serve --host 0.0.0.0 --token $(openssl rand -hex 32)" + ); + } + return undefined; +} + +/** Fail fast before `listen` when bind policy is violated. */ +export function assertServeBindRequiresToken( + host: string, + token: string | undefined, +): void { + const message = serveBindTokenRequiredMessage(host, token); + if (message !== undefined) throw new Error(message); +} diff --git a/src/application/validate-engine.ts b/src/application/validate-engine.ts index 19c1ac31..d04972ed 100644 --- a/src/application/validate-engine.ts +++ b/src/application/validate-engine.ts @@ -9,7 +9,7 @@ import { } from "./path-containment"; /** - * One row in the staleness report. `status` distinguishes the three cases an + * One row in the staleness report. `status` distinguishes the cases an * agent might want to act on differently. */ export interface ValidateRow { diff --git a/src/cli/bootstrap.ts b/src/cli/bootstrap.ts index 8ece0483..8612b7ed 100644 --- a/src/cli/bootstrap.ts +++ b/src/cli/bootstrap.ts @@ -45,7 +45,7 @@ MCP server (Model Context Protocol — for agent hosts): # CLI parity: query batch, trace, explore, node, file, schema, symbols, context --include-snippets HTTP server (for non-MCP consumers — CI scripts, curl, IDE plugins): - codemap serve [--host 127.0.0.1] [--port 7878] [--token ] # watcher default-ON + codemap serve [--host 127.0.0.1] [--port 7878] [--token ] # token required on non-loopback; watcher default-ON Watch mode (long-running; keeps .codemap/index.db fresh on file edits): codemap watch [--debounce 250] [--quiet] diff --git a/src/cli/cmd-serve.test.ts b/src/cli/cmd-serve.test.ts index 23328b5e..79be14be 100644 --- a/src/cli/cmd-serve.test.ts +++ b/src/cli/cmd-serve.test.ts @@ -88,6 +88,18 @@ describe("parseServeRest", () => { expect(r.token).toBeUndefined(); }); + it("parses bracketed IPv6 loopback without token", () => { + const r = parseServeRest(["serve", "--host", "[::1]"]); + if (r.kind !== "run") throw new Error("expected run"); + expect(r.token).toBeUndefined(); + }); + + it("rejects all-interfaces IPv6 bind without --token", () => { + const r = parseServeRest(["serve", "--host", "::"]); + expect(r.kind).toBe("error"); + if (r.kind === "error") expect(r.message).toContain("--token"); + }); + it("parses --token ", () => { const r = parseServeRest(["serve", "--token", "abc123"]); if (r.kind !== "run") throw new Error("expected run"); diff --git a/src/cli/cmd-serve.ts b/src/cli/cmd-serve.ts index ff168c99..db276c0f 100644 --- a/src/cli/cmd-serve.ts +++ b/src/cli/cmd-serve.ts @@ -1,4 +1,5 @@ import { runHttpServer } from "../application/http-server"; +import { serveBindTokenRequiredMessage } from "../application/serve-bind-policy"; import { applyWatchPolicy, envWatchDefaultOn, @@ -13,12 +14,6 @@ import { CODEMAP_VERSION } from "../version"; export const DEFAULT_HOST = "127.0.0.1"; export const DEFAULT_PORT = 7878; -/** Hosts safe to bind without mandatory auth (loopback only). */ -export function isLoopbackHost(host: string): boolean { - const h = host.toLowerCase(); - return h === "127.0.0.1" || h === "localhost" || h === "::1" || h === "[::1]"; -} - export interface ServeRunOpts { host: string; port: number; @@ -157,12 +152,9 @@ export function parseServeRest(rest: string[]): }; } - if (!isLoopbackHost(host) && (token === undefined || token.length === 0)) { - return { - kind: "error", - message: - "codemap serve: non-loopback bind requires --token (use a long random secret). Example: codemap serve --host 0.0.0.0 --token $(openssl rand -hex 32)", - }; + const bindMessage = serveBindTokenRequiredMessage(host, token); + if (bindMessage !== undefined) { + return { kind: "error", message: bindMessage }; } return { kind: "run", host, port, token, watch, debounceMs }; diff --git a/src/cli/cmd-validate.ts b/src/cli/cmd-validate.ts index 5dc99374..214b7dec 100644 --- a/src/cli/cmd-validate.ts +++ b/src/cli/cmd-validate.ts @@ -30,9 +30,12 @@ Statuses: missing The file is in the index but has been deleted on disk. unindexed The file exists on disk but is not present in the index (only when explicit paths are passed). + rejected The path could not be checked safely (escapes the project root + or resolves outside via symlink). Includes a reason field. Flags: - --json Emit a JSON array of {path, status} objects (for agents). + --json Emit a JSON array of {path, status[, reason]} objects (for agents). + Exits 1 when any row is returned (including rejected). --help, -h Show this help. Examples: diff --git a/templates/agent-content/skill/10-recipes-context.md b/templates/agent-content/skill/10-recipes-context.md index 62231299..ee312c42 100644 --- a/templates/agent-content/skill/10-recipes-context.md +++ b/templates/agent-content/skill/10-recipes-context.md @@ -33,7 +33,7 @@ Each emitted delta carries its own `base` metadata so mixed-baseline audits are **MCP server (`codemap mcp [--no-watch] [--debounce ]`)** — separate top-level command exposing the structural-query surface (21 JSON-RPC tools — list below) to agent hosts (Claude Code, Cursor, Codex, generic MCP clients) over stdio. Eliminates the bash round-trip on every agent call. Bootstrap once at server boot; each tool returns the same JSON payload its CLI `--json` would print (including `query batch`, `trace`, `explore`, `node`, `file`, `schema`, `symbols`, `context --include-snippets`, `ingest-coverage`, and `ingest-churn`). MCP wraps payloads in `{content: [{type: "text", text: …}]}`. **`tools/list` ToolAnnotations** — advisory `readOnlyHint` / `destructiveHint` / `idempotentHint` per tool: read paths (`query`, `show`, `audit`, …) → `readOnlyHint: true`; apply tools (`apply`, `apply_rows`, `apply_diff_input`) → `destructiveHint: true` (writes still require `yes: true`); index mutators (`save_baseline`, `drop_baseline`, `ingest_coverage`, `ingest_churn`) → `readOnlyHint: false` without `destructiveHint`. HTTP `GET /tools` exposes the same hints. **`initialize` instructions** + resource `codemap://mcp-instructions` carry the tool-selection playbook. **Watcher default-ON since 2026-05** — every tool reads a live index, `audit`'s incremental-index prelude becomes a no-op. Pass `--no-watch` (or `CODEMAP_WATCH=0`) for one-shot fire-and-forget calls without the in-process chokidar loop. -**HTTP server (`codemap serve [--host 127.0.0.1] [--port 7878] [--token ] [--no-watch] [--debounce ]`)** — same tool taxonomy as MCP, exposed over `POST /tool/{name}` for non-MCP consumers (CI scripts, simple `curl`, IDE plugins that don't speak MCP). Loopback-default; optional Bearer-token auth. HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper); SARIF / annotations / mermaid / diff payloads ship with `application/sarif+json` or `text/plain` Content-Type; `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` use `application/json`; badge markdown uses `text/plain`. Resources mirrored at `GET /resources/{encoded-uri}`. `GET /health` is auth-exempt; `GET /tools` / `GET /resources` are catalogs. **Watcher default-ON since 2026-05** — same `--no-watch` / `CODEMAP_WATCH=0` opt-out as `mcp`. +**HTTP server (`codemap serve [--host 127.0.0.1] [--port 7878] [--token ] [--no-watch] [--debounce ]`)** — same tool taxonomy as MCP, exposed over `POST /tool/{name}` for non-MCP consumers (CI scripts, simple `curl`, IDE plugins that don't speak MCP). Loopback-default; Bearer-token auth optional on loopback binds and **required** on non-loopback binds (`--host 0.0.0.0`, etc.). HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper); SARIF / annotations / mermaid / diff payloads ship with `application/sarif+json` or `text/plain` Content-Type; `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` use `application/json`; badge markdown uses `text/plain`. Resources mirrored at `GET /resources/{encoded-uri}`. `GET /health` is auth-exempt; `GET /tools` / `GET /resources` are catalogs. **Watcher default-ON since 2026-05** — same `--no-watch` / `CODEMAP_WATCH=0` opt-out as `mcp`. **Watch mode (`codemap watch [--debounce 250] [--quiet]`)** — standalone long-running process that debounces file changes and re-indexes only the changed paths. SIGINT/SIGTERM drains pending edits before exit. `mcp` / `serve` boot the watcher in-process by default since 2026-05; use `codemap watch` standalone when you want the watcher decoupled from a transport (e.g. running alongside an editor that already speaks MCP via a different process). @@ -47,7 +47,7 @@ Each emitted delta carries its own `base` metadata so mixed-baseline audits are - **`list_baselines`** — no args; returns the array `codemap query --baselines --json` would print. - **`drop_baseline`** — `{name}` → `{dropped}` on success; structured `{error}` on unknown name (MCP sets `isError: true`). - **`context`** — `{compact?, intent?, include_snippets?}`. CLI: `codemap context [--include-snippets]`. Session-start project envelope with `start_here` shortcuts (one call replaces 4-5 `query`s). `index_summary.file_churn` row count; **`churn_hint`** when empty (steers to index, **`ingest-churn`**, or **`churn.file`**). `include_snippets` adds one-line export previews on hub leaders (capped to adaptive `signature_max_chars`; may set `stale`/`missing`); no-op when `compact: true`. Whitespace-only `intent` is treated as no intent. Prefer `start_here.hub_leaders` over legacy `hubs` for signatures — `hubs` keeps the full bundled `fan-in` recipe limit for backward compatibility. `sample_markers` count scales down on repos >500 / >5000 files. -- **`validate`** — `{paths?: string[]}`. SHA-256 vs `files.content_hash`; returns only out-of-sync rows (`stale` / `missing` / `unindexed` — fresh paths are omitted). +- **`validate`** — `{paths?: string[]}`. SHA-256 vs `files.content_hash`; returns only out-of-sync rows (`stale` / `missing` / `unindexed` / `rejected` — fresh paths are omitted; `rejected` includes optional `reason` when a path escapes the project root or resolves outside via symlink). - **`show`** — `{name, kind?, in?}` or `{query, with_fts?}`. Exact symbol lookup or field-qualified search (`kind:`, `name:`, `path:`, `in:` + free text) → `{matches, disambiguation?, warning?}`. CLI: `codemap show --query '…' [--print-sql]`. - **`snippet`** — same as `show` (`{name, kind?, in?}` or `{query, with_fts?}`) but each match also carries `source` (file text) + `stale` / `missing` flags → `{matches, disambiguation?, warning?}`. No reindex side-effects. - **`impact`** — `{target, direction?, via?, depth?, limit?, summary?}`. Symbol/file blast-radius walker (replaces hand-composed `WITH RECURSIVE`). Auto-resolves symbol vs file target; `via` defaults to every backend compatible with the kind. diff --git a/templates/agent-content/skill/50-maintenance.md b/templates/agent-content/skill/50-maintenance.md index f92651d4..c27e22e2 100644 --- a/templates/agent-content/skill/50-maintenance.md +++ b/templates/agent-content/skill/50-maintenance.md @@ -15,7 +15,7 @@ codemap --full # Check index freshness (index-level — HEAD drift, pending sync, disk-ahead) codemap context --compact | jq '.index_freshness' -# Per-file staleness (content_hash drift) +# Per-file staleness (content_hash drift; includes rejected paths with reason) codemap validate --json ``` From d4589db3475ff0a1fbda7305f0570add0edf39ea Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Thu, 11 Jun 2026 07:49:14 +0300 Subject: [PATCH 3/7] harden: validate containment depth, loopback range, and test coverage Add realpath-safe validate reads, broken-symlink rejection, 127.0.0.0/8 loopback token policy, expanded DML/format tests, and consumer-surface parity. --- .changeset/read-surface-hardening.md | 2 +- docs/architecture.md | 4 +- docs/glossary.md | 4 +- docs/plans/security-hardening-wave1.md | 6 +- src/application/mcp-server.ts | 2 +- src/application/path-containment.test.ts | 174 +++++++++++++----- src/application/path-containment.ts | 117 +++++++++++- src/application/serve-bind-policy.test.ts | 10 + src/application/serve-bind-policy.ts | 18 +- src/application/validate-engine.ts | 37 ++-- src/cli/cmd-query-formatted.test.ts | 53 +++--- src/cli/cmd-serve.test.ts | 6 + src/cli/cmd-serve.ts | 5 +- src/cli/cmd-validate.test.ts | 92 +++++++-- src/cli/cmd-validate.ts | 8 +- src/test/symlink-capable.ts | 21 +++ .../agent-content/skill/10-recipes-context.md | 4 +- 17 files changed, 433 insertions(+), 130 deletions(-) create mode 100644 src/test/symlink-capable.ts diff --git a/.changeset/read-surface-hardening.md b/.changeset/read-surface-hardening.md index c4246dfa..ab0e5b86 100644 --- a/.changeset/read-surface-hardening.md +++ b/.changeset/read-surface-hardening.md @@ -2,4 +2,4 @@ "@stainless-code/codemap": patch --- -Harden read surfaces: `codemap query --format …` blocks index mutations via the same read-only guard as `--json`; `codemap serve` requires `--token` when `--host` is not loopback; `codemap validate` (and MCP/HTTP `validate`) can return `rejected` rows with a `reason` when a path escapes the project root or resolves outside via symlink. +Harden read surfaces: `codemap query --format …` blocks index mutations via the same read-only guard as `--json`; `codemap serve` requires `--token` when `--host` is not loopback (any `127.0.0.0/8` address counts as loopback, so `--token` stays optional on `127.0.0.2` and similar); `codemap validate` (and MCP/HTTP `validate`) can return `rejected` rows with a `reason` when a path escapes the project root, resolves outside via symlink, or is a broken symlink — output `path` keys are always project-relative POSIX paths. diff --git a/docs/architecture.md b/docs/architecture.md index 65aaa653..c6bb73fe 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -124,7 +124,7 @@ A local SQLite database (`.codemap/index.db`) indexes the project tree and store **Output formatters:** **`src/application/output-formatters.ts`** — pure transport-agnostic; **`formatSarif`** emits SARIF 2.1.0 (auto-detected location columns: `file_path` / `path` / `to_path` / `from_path` priority + optional `line_start` / `line_end` region; `rule.id = codemap.` for `--recipe`, `codemap.adhoc` for ad-hoc SQL; aggregate recipes without locations → `results: []` + stderr warning); **`formatAuditSarif`** emits the audit-shaped variant — one rule per delta key (`codemap.audit.-added`), one result per `added` row at severity `warning`; `removed` rows excluded (SARIF surfaces findings, not cleanups); location-only rows fall back to `"new : "` messages; **`formatAnnotations`** emits `::notice file=…,line=…::msg` GitHub Actions workflow commands (one line per locatable row; messages collapsed to a single line because the GH parser stops at the first newline); **`formatCodeClimate`** emits a GitLab Code Quality JSON array (`severity: minor` flat in v1; stable SHA-256 fingerprints from recipe id + path + line + check name + row message (`lines.begin` falls back to `1` when `line_start` absent)); **`formatBadge`** / **`formatBadgeJson`** emit a single-line markdown summary (`codemap: N issues` / `codemap: clean`) or `codemap-badge/v1` JSON (`--badge-style json` / MCP `badge_style`) from locatable-row count — agents triage via JSON rows, not badge severity; **`formatMermaid`** emits a `flowchart LR` from `{from, to, label?, kind?}` rows with a hard `MERMAID_MAX_EDGES = 50` ceiling — unbounded inputs reject with a scope-suggestion error naming the recipe + count + `LIMIT` / `--via` / `WHERE` knobs (auto-truncation deliberately out of scope; would be a verdict masquerading as output mode); **`formatDiff`** emits read-only unified diff text from `{file_path, line_start, before_pattern, after_pattern}` rows; **`formatDiffJson`** emits structured `{files, warnings, summary}` hunks for agents. Diff formatters read source files at format time and surface `stale` / `missing` flags when the indexed line no longer matches. Wired into both **`src/cli/cmd-query.ts`** (`--format `; `--format` overrides `--json`; formatted outputs reject `--summary` / `--group-by` / baseline at parse time) and the MCP **`query`** / **`query_recipe`** tools (`format: "sarif" | "annotations" | "mermaid" | "diff" | "diff-json" | "codeclimate" | "badge"` with the same incompatibility guard). Per-recipe `sarifLevel` / `sarifMessage` / `sarifRuleId` overrides via frontmatter on `.md` deferred to v1.x. -**Validate wiring:** **`src/cli/cmd-validate.ts`** (argv + render) + **`src/application/validate-engine.ts`** (engine — **`computeValidateRows`** + **`toProjectRelative`**). `computeValidateRows` is a pure function over `(db, projectRoot, paths)` returning `{path, status}` rows where `status ∈ stale | missing | unindexed | rejected` (`rejected` + optional `reason` when a path escapes the project root or resolves outside via symlink — same containment helpers as apply). CLI wraps it with read-once-and-print + exits **1** on any drift (git-status semantics). Path normalization: **`toProjectRelative`** converts CLI input to POSIX-style relative keys matching the `files.path` storage format (Windows backslash → forward slash); same convention as `lint-staged.config.js`. Also reused by `cmd-show.ts` / `cmd-snippet.ts` and the MCP show/snippet handlers — single canonical implementation. +**Validate wiring:** **`src/cli/cmd-validate.ts`** (argv + render) + **`src/application/validate-engine.ts`** (engine — **`computeValidateRows`** + **`toProjectRelative`**). `computeValidateRows` is a pure function over `(db, projectRoot, paths)` returning `{path, status}` rows where `status ∈ stale | missing | unindexed | rejected` (`rejected` + optional `reason` when a path escapes the project root, resolves outside via symlink, or has a broken symlink — `readUtf8WithinProjectRoot` re-checks via `realpath` immediately before read; hardlinks to outside files keep an in-root pathname and are a documented local-trust boundary). Path keys are always project-relative POSIX paths (`toProjectRelative`). CLI wraps it with read-once-and-print + exits **1** on any drift (git-status semantics). Path normalization: **`toProjectRelative`** converts CLI input to POSIX-style relative keys matching the `files.path` storage format (Windows backslash → forward slash); same convention as `lint-staged.config.js`. Also reused by `cmd-show.ts` / `cmd-snippet.ts` and the MCP show/snippet handlers — single canonical implementation. #### Audit wiring @@ -204,7 +204,7 @@ Three **mutually exclusive** CLI entry shapes; all converge on `applyDiffPayload **MCP wiring:** **`src/cli/cmd-mcp.ts`** (argv — `--watch` / `--no-watch` / `--debounce` + `--help`; bootstrap absorbs `--root`/`--config`) + **`src/application/mcp-server.ts`** (transport — tool / resource registry, SDK glue). Mirrors the `cmd-audit.ts ↔ audit-engine.ts` seam — CLI parses + lifecycle; engine owns the SDK. **`runMcpServer`** bootstraps codemap once at server boot (config + resolver + DB access become module-level state), instantiates `McpServer` from **`@modelcontextprotocol/sdk`**, attaches a **`StdioServerTransport`**, and resolves on client disconnect via **`src/application/session-lifecycle.ts`** (`createStdioDisconnectMonitor` — stdin EOF, stdout EPIPE, parent-PID poll — plus SDK `transport.onclose` and SIGINT/SIGTERM). With `--watch`, **`createManagedWatchSession`** holds one client for the stdio session and **`forceStop`** drains the watcher on exit. Tool handlers reuse the existing engine entry-points: **`query`** / **`query_recipe`** call **`executeQuery`** in **`src/application/query-engine.ts`** (same `[...rows]` / `{count}` / `{group_by, groups}` envelope `--json` would print) unless **`baseline`** is set — then **`compareQueryBaseline`** in **`src/application/query-baseline.ts`** (incompatible with non-`json` **`format`** / **`group_by`**); **`ingest_coverage`** calls **`runIngestCoverageOnDb`** in **`src/application/ingest-coverage-run.ts`** (CLI twin: `codemap ingest-coverage --json`); **`query_batch`** loops per statement via **`handleQueryBatch`** → **`executeQuery`** (batch-wide defaults + per-item overrides; items are `string | {sql, summary?, changed_since?, group_by?}`); **`audit`** runs `resolveAuditBaselines` + `runAudit` from PR #33 unchanged; **`context`** / **`validate`** call `buildContextEnvelope` / `computeValidateRows` from **`src/application/context-engine.ts`** + **`src/application/validate-engine.ts`** (lifted out of `src/cli/cmd-*.ts` in PR #41 — see § Tool / resource handlers above). **`save_baseline`** is one polymorphic tool (`{name, sql? | recipe?}`) with a runtime exclusivity check — mirrors the CLI's single `--save-baseline=` verb. **Tool naming**: snake_case throughout — Codemap convention matching the patterns in MCP spec examples and reference servers (GitHub MCP, Cursor built-ins); the spec itself doesn't mandate it. CLI stays kebab — translation lives at the MCP-arg layer. **Resources** split by freshness contract: `codemap://schema`, `codemap://skill`, `codemap://rule`, and `codemap://mcp-instructions` use **lazy memoisation** — first `read_resource` populates a per-server-instance cache; constant for the server-process lifetime so eager-vs-lazy produce identical observable behavior. `codemap://recipes`, `codemap://recipes/{id}`, `codemap://files/{+path}`, and `codemap://symbols/{name}` are **live read-per-call** (no cache) so inline recency fields and index mutations under `--watch` don't freeze at first-read. `codemap://schema` queries `sqlite_schema` live (on first read, then cached); `codemap://skill` / `codemap://rule` / `codemap://mcp-instructions` call `assembleAgentContent(kind)` from `application/agent-content.ts`, which concatenates section files under `templates/agent-content//` and dispatches `*.gen.md` files through `RENDERERS` (live recipe catalog, live `createTables()` DDL) — see [agents.md § Section assembler](./agents.md#section-assembler-and-genmd). Output shape: each tool returns the JSON payload its CLI counterpart would print (`query batch`, `trace`, `explore`, `node`, `file`, `schema`, `context --include-snippets`, `ingest-coverage`); MCP wraps via `content: [{type: "text", text: JSON.stringify(payload)}]`. **`tools/list` ToolAnnotations** — advisory `readOnlyHint` / `destructiveHint` / `idempotentHint` per tool from **`src/application/mcp-tool-annotations.ts`** (central map beside **`mcp-tool-allowlist.ts`**); read paths (`query`, `show`, `audit`, …) → `readOnlyHint: true`; disk-write apply tools → `destructiveHint: true` (writes still require `yes: true`); index user-data mutators (`save_baseline`, `drop_baseline`, `ingest_coverage`) → `readOnlyHint: false` without `destructiveHint`. Omitted when an older `@modelcontextprotocol/sdk` lacks annotation fields (M.6 guard). `--changed-since` git lookups are memoised per `(root, ref)` pair across batch items so a `query_batch` of N items sharing the same ref does one git invocation, not N. Per-statement errors in `query_batch` are isolated — failed statements return `{error}` in their slot while siblings still execute. -**HTTP wiring:** **`src/cli/cmd-serve.ts`** (argv — `--host` / `--port` / `--token`; bootstrap absorbs `--root`/`--config`) + **`src/application/http-server.ts`** (transport — bare `node:http`; routes `POST /tool/{name}` to `tool-handlers`, `GET /resources/{encoded-uri}` to `resource-handlers`, plus `GET /health` / `GET /tools` / `GET /resources`). Default bind **`127.0.0.1:7878`** (loopback only — refuse `0.0.0.0` unless explicitly opted in via `--host 0.0.0.0`). **`--token `** is optional on loopback; **mandatory** when binding a non-loopback address. When set, requires `Authorization: Bearer ` on every request; `GET /health` is auth-exempt so liveness probes work without leaking the token. **CSRF + DNS-rebinding guard** (`csrfCheck`) runs before every route — rejects `Sec-Fetch-Site: cross-site` / `same-site` (modern-browser CSRF), any present `Origin` header (including the opaque string `null`; older-browser CSRF fallback), and `Host` header mismatch on loopback bind (DNS rebinding). Non-browser clients (curl, fetch from Node, MCP hosts, CI scripts) don't send those headers and pass through. The guard runs even on `/health` so a malicious local webpage can't probe for liveness. Output shape: HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper — HTTP doesn't need that transport artifact); `query` / `query_recipe` match `codemap query --json` row arrays (or `{count}` / `{group_by,groups}` when `summary` / `group_by` is set, or baseline diff when `baseline` is set — incompatible with non-`json` `format` / `group_by`; save/list/drop remain separate tools); other tools match their CLI `--json` envelopes; `format: "sarif"` payloads ship as `application/sarif+json`, `format: "annotations"` / `"mermaid"` / `"diff"` / `"badge"` (markdown) as `text/plain; charset=utf-8`, `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` as `application/json; charset=utf-8`, JSON otherwise. Per-request DB lifecycle: open / `PRAGMA query_only = 1` / close per call (SQLite reader concurrency); 1 MiB request-body cap rejects trivial DoS. **`GET /tools`** returns the same advisory hint fields as MCP `tools/list` (`readOnlyHint` / `destructiveHint` / `idempotentHint` per entry via **`buildHttpToolCatalogEntry`**). SIGINT / SIGTERM → graceful drain via `server.close()`. Every response carries **`X-Codemap-Version: `** so consumers can pin / detect upgrades. +**HTTP wiring:** **`src/cli/cmd-serve.ts`** (argv — `--host` / `--port` / `--token`; bootstrap absorbs `--root`/`--config`) + **`src/application/http-server.ts`** (transport — bare `node:http`; routes `POST /tool/{name}` to `tool-handlers`, `GET /resources/{encoded-uri}` to `resource-handlers`, plus `GET /health` / `GET /tools` / `GET /resources`). Default bind **`127.0.0.1:7878`** (loopback only — refuse `0.0.0.0` unless explicitly opted in via `--host 0.0.0.0`; any **`127.0.0.0/8`** address counts as loopback for the token rule). **`--token `** is optional on loopback; **mandatory** when binding a non-loopback address. When set, requires `Authorization: Bearer ` on every request; `GET /health` is auth-exempt so liveness probes work without leaking the token. **CSRF + DNS-rebinding guard** (`csrfCheck`) runs before every route — rejects `Sec-Fetch-Site: cross-site` / `same-site` (modern-browser CSRF), any present `Origin` header (including the opaque string `null`; older-browser CSRF fallback), and `Host` header mismatch on loopback bind (DNS rebinding). Non-browser clients (curl, fetch from Node, MCP hosts, CI scripts) don't send those headers and pass through. The guard runs even on `/health` so a malicious local webpage can't probe for liveness. Output shape: HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper — HTTP doesn't need that transport artifact); `query` / `query_recipe` match `codemap query --json` row arrays (or `{count}` / `{group_by,groups}` when `summary` / `group_by` is set, or baseline diff when `baseline` is set — incompatible with non-`json` `format` / `group_by`; save/list/drop remain separate tools); other tools match their CLI `--json` envelopes; `format: "sarif"` payloads ship as `application/sarif+json`, `format: "annotations"` / `"mermaid"` / `"diff"` / `"badge"` (markdown) as `text/plain; charset=utf-8`, `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` as `application/json; charset=utf-8`, JSON otherwise. Per-request DB lifecycle: open / `PRAGMA query_only = 1` / close per call (SQLite reader concurrency); 1 MiB request-body cap rejects trivial DoS. **`GET /tools`** returns the same advisory hint fields as MCP `tools/list` (`readOnlyHint` / `destructiveHint` / `idempotentHint` per entry via **`buildHttpToolCatalogEntry`**). SIGINT / SIGTERM → graceful drain via `server.close()`. Every response carries **`X-Codemap-Version: `** so consumers can pin / detect upgrades. **Watch wiring:** **`src/cli/cmd-watch.ts`** (argv — `--debounce ` / `--quiet`; bootstrap absorbs `--root`/`--config`) + **`src/application/watcher.ts`** (engine — pure debouncer + glob filter + injectable backend; production wires [chokidar v5](https://github.com/paulmillr/chokidar) selected via the 6-watcher audit in PR #46 — pure JS, runs identically on Bun + Node, ~30M repos use it). On every change/add/unlink event chokidar emits, the engine filters via `shouldIndexPath` (same indexed extensions as the indexer + project-local recipes; skips `node_modules` / `.git` / `dist`), debounces with a sliding window (default 250 ms), then calls `createReindexOnChange` which opens a DB, runs `runCodemapIndex({mode: 'files', files: [...changed]})`, closes the DB, and logs `reindex N file(s) in Mms` to stderr unless `--quiet`. SIGINT / SIGTERM drains pending edits via `flushNow()` before the watcher closes. **Default-ON for `mcp` / `serve` since 2026-05:** both transports embed the watcher via **`createManagedWatchSession`** in **`session-lifecycle.ts`** — MCP holds one client for the stdio session; HTTP acquires per request (excluding `/health`) and stops the watcher after the last client plus a 5s release grace (not an MCP idle shutdown). Opt out with `--no-watch`, `CODEMAP_WATCH=0`, or `CODEMAP_NO_WATCH=1`. **`src/application/watch-policy.ts`** disables the watcher on WSL2 Windows drive mounts (`/mnt/*`) unless `CODEMAP_FORCE_WATCH=1`; stderr points at `codemap agents init --git-hooks` for git-triggered freshness. Standalone `codemap watch` runs the watcher decoupled from a transport for users wiring it next to a separate MCP / HTTP process. **Audit prelude optimization:** module-level `watchActive` flag; `handleAudit` skips its incremental-index prelude when active (and marks the close as readonly to avoid a wasted checkpoint). Explicit `no_index: false` still forces the prelude. diff --git a/docs/glossary.md b/docs/glossary.md index c29958c3..99ac059a 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -125,7 +125,7 @@ CI-aggregate flag on `codemap query` and `codemap audit`. Aliases `--format sari ### `codemap validate` -CLI subcommand comparing on-disk SHA-256 against `files.content_hash`. Statuses: `stale | missing | unindexed | rejected` (`rejected` carries optional `reason` when a path escapes the project root or resolves outside via symlink). Exits `1` on any drift. +CLI subcommand comparing on-disk SHA-256 against `files.content_hash`. Statuses: `stale | missing | unindexed | rejected` (`rejected` carries optional `reason` when a path escapes the project root, resolves outside via symlink, or has a broken symlink; output `path` keys are always project-relative POSIX paths). Exits `1` on any drift. ### `module_cycles` (table) / circular imports @@ -565,7 +565,7 @@ Long-running process that subscribes to filesystem changes via [chokidar v5](htt ### `codemap serve` / HTTP server -Long-running HTTP server exposing the same tool taxonomy as `codemap mcp` over `POST /tool/{name}` for non-MCP consumers (CI scripts, simple `curl`, IDE plugins that don't speak MCP). Default bind **`127.0.0.1:7878`** (loopback only — refuse `0.0.0.0` unless explicitly opted in via `--host 0.0.0.0`); `--token ` is optional on loopback binds and **required** on non-loopback binds — when set, every request needs `Authorization: Bearer `. HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper); `query` / `query_recipe` match `codemap query --json` row arrays unless `summary` / `group_by` reshape the envelope, or `baseline` returns a diff envelope (incompatible with non-`json` `format` / `group_by`; save/list/drop remain separate tools); parity twins (`query batch`, `trace`, `explore`, `node`, `file`, `schema`, `symbols`, `context`, `ingest-coverage`, `ingest-churn`) always emit JSON on CLI without `--json`; other tools match their CLI `--json` payloads when that flag is set; `format: "sarif"` payloads ship as `application/sarif+json`, `format: "annotations"` / `"mermaid"` / `"diff"` / `"badge"` (markdown) as `text/plain; charset=utf-8`, `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` as `application/json; charset=utf-8`. Routes: `POST /tool/{name}` (every MCP tool), `GET /resources/{encoded-uri}` (resource handler for `codemap://recipes`, `codemap://recipes/{id}`, `codemap://schema`, `codemap://skill`, `codemap://rule`, `codemap://mcp-instructions`, `codemap://files/{path}`, and `codemap://symbols/{name}`), `GET /health` (auth-exempt liveness probe — does not start the watcher), `GET /tools` / `GET /resources` (catalogs). With `--watch`, chokidar is refcount-gated per request and stops 5s after the last client (`HTTP_WATCH_RELEASE_GRACE_MS`) — distinct from MCP idle shutdown; the HTTP process keeps listening. Pure transport — same `tool-handlers.ts` / `resource-handlers.ts` MCP uses; no engine duplication. Errors → `{"error": "..."}` with HTTP status 400 / 401 / 403 / 404 / 500. SIGINT / SIGTERM → graceful drain. Every response carries `X-Codemap-Version: `. **CSRF + DNS-rebinding guard:** every request (including auth-exempt `/health`) is evaluated against `Sec-Fetch-Site` / `Origin` / `Host` when present — modern browsers send `Sec-Fetch-Site` and `Origin` on cross-origin fetches (header presence varies by request type, browser, and privacy settings), so the guard rejects browser-driven cross-origin requests like a malicious local webpage `fetch`-ing `http://127.0.0.1:7878/tool/save_baseline` to mutate `.codemap/index.db`. `Host` mismatch on a loopback bind blocks DNS rebinding (an attacker resolving `evil.com` to `127.0.0.1` post-load). Non-browser clients (curl, fetch from Node, MCP hosts, CI scripts) typically omit these headers and pass through. Implementation: `src/cli/cmd-serve.ts` (CLI shell) + `src/application/http-server.ts` (transport). See [`architecture.md` § HTTP wiring](./architecture.md#cli-usage). +Long-running HTTP server exposing the same tool taxonomy as `codemap mcp` over `POST /tool/{name}` for non-MCP consumers (CI scripts, simple `curl`, IDE plugins that don't speak MCP). Default bind **`127.0.0.1:7878`** (loopback only — refuse `0.0.0.0` unless explicitly opted in via `--host 0.0.0.0`; any **`127.0.0.0/8`** address counts as loopback for the token rule); `--token ` is optional on loopback binds and **required** on non-loopback binds — when set, every request needs `Authorization: Bearer `. HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper); `query` / `query_recipe` match `codemap query --json` row arrays unless `summary` / `group_by` reshape the envelope, or `baseline` returns a diff envelope (incompatible with non-`json` `format` / `group_by`; save/list/drop remain separate tools); parity twins (`query batch`, `trace`, `explore`, `node`, `file`, `schema`, `symbols`, `context`, `ingest-coverage`, `ingest-churn`) always emit JSON on CLI without `--json`; other tools match their CLI `--json` payloads when that flag is set; `format: "sarif"` payloads ship as `application/sarif+json`, `format: "annotations"` / `"mermaid"` / `"diff"` / `"badge"` (markdown) as `text/plain; charset=utf-8`, `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` as `application/json; charset=utf-8`. Routes: `POST /tool/{name}` (every MCP tool), `GET /resources/{encoded-uri}` (resource handler for `codemap://recipes`, `codemap://recipes/{id}`, `codemap://schema`, `codemap://skill`, `codemap://rule`, `codemap://mcp-instructions`, `codemap://files/{path}`, and `codemap://symbols/{name}`), `GET /health` (auth-exempt liveness probe — does not start the watcher), `GET /tools` / `GET /resources` (catalogs). With `--watch`, chokidar is refcount-gated per request and stops 5s after the last client (`HTTP_WATCH_RELEASE_GRACE_MS`) — distinct from MCP idle shutdown; the HTTP process keeps listening. Pure transport — same `tool-handlers.ts` / `resource-handlers.ts` MCP uses; no engine duplication. Errors → `{"error": "..."}` with HTTP status 400 / 401 / 403 / 404 / 500. SIGINT / SIGTERM → graceful drain. Every response carries `X-Codemap-Version: `. **CSRF + DNS-rebinding guard:** every request (including auth-exempt `/health`) is evaluated against `Sec-Fetch-Site` / `Origin` / `Host` when present — modern browsers send `Sec-Fetch-Site` and `Origin` on cross-origin fetches (header presence varies by request type, browser, and privacy settings), so the guard rejects browser-driven cross-origin requests like a malicious local webpage `fetch`-ing `http://127.0.0.1:7878/tool/save_baseline` to mutate `.codemap/index.db`. `Host` mismatch on a loopback bind blocks DNS rebinding (an attacker resolving `evil.com` to `127.0.0.1` post-load). Non-browser clients (curl, fetch from Node, MCP hosts, CI scripts) typically omit these headers and pass through. Implementation: `src/cli/cmd-serve.ts` (CLI shell) + `src/application/http-server.ts` (transport). See [`architecture.md` § HTTP wiring](./architecture.md#cli-usage). ### Code Climate format (`codeclimate`) diff --git a/docs/plans/security-hardening-wave1.md b/docs/plans/security-hardening-wave1.md index c37a1859..3ec31495 100644 --- a/docs/plans/security-hardening-wave1.md +++ b/docs/plans/security-hardening-wave1.md @@ -31,8 +31,8 @@ codemap serve --host 0.0.0.0 → parseServeRest: require --token when !isLoopbackHost(host) codemap validate - → computeValidateRows → rejectValidatePath - → pathEscapesProjectRoot | pathTraversesSymlinkOutsideRoot → status rejected + → computeValidateRows → rejectUnsafeProjectRelativePath / readUtf8WithinProjectRoot + → pathEscapesProjectRoot | pathTraversesSymlinkOutsideRoot | realpath → status rejected ``` --- @@ -46,7 +46,7 @@ codemap validate | 2.1 | `isLoopbackHost` + non-loopback `--token` required | **done** | `bun test src/cli/cmd-serve.test.ts` | | 2.2 | Serve tests updated | **done** | same | | 3.1 | `pathTraversesSymlinkOutsideRoot`, `resolvePathWithinRoot` | **done** | `bun test src/application/path-containment.test.ts` | -| 3.2 | `validate` `rejected` + `rejectValidatePath` | **done** | `bun test src/cli/cmd-validate.test.ts` | +| 3.2 | `validate` `rejected` + safe read containment | **done** | `bun test src/cli/cmd-validate.test.ts` | | 3.3 | Test `../../../etc/passwd` → `rejected` | **done** | same | | 1.x | `docs/architecture.md` lift | **done** | `bun run format:check` | | 1.s | Commit + PR + CI | **pending** | `bun run check` (committed; PR + CI open) | diff --git a/src/application/mcp-server.ts b/src/application/mcp-server.ts index cc3e7c3c..c89abb36 100644 --- a/src/application/mcp-server.ts +++ b/src/application/mcp-server.ts @@ -267,7 +267,7 @@ function registerValidateTool(server: McpServer): void { "validate", withToolAnnotations("validate", { description: - "Compare on-disk SHA-256 of indexed files to the indexed `files.content_hash` column. Returns only out-of-sync rows with status `stale` / `missing` / `unindexed` / `rejected` (fresh paths omitted; `rejected` includes optional `reason` when a path escapes the project root or resolves outside via symlink). Empty `paths` validates every indexed file. Useful for 'codemap doctor' agents that diagnose a stale index before issuing structural queries.", + "Compare on-disk SHA-256 of indexed files to the indexed `files.content_hash` column. Returns only out-of-sync rows with status `stale` / `missing` / `unindexed` / `rejected` (fresh paths omitted; `rejected` includes optional `reason`: path escapes project root | path escapes via symlink | path resolves outside project root). Output `path` keys are project-relative POSIX paths. Empty `paths` validates every indexed file. Useful for 'codemap doctor' agents that diagnose a stale index before issuing structural queries.", inputSchema: validateArgsSchema, }), (args) => wrapToolResult(handleValidate(args)), diff --git a/src/application/path-containment.test.ts b/src/application/path-containment.test.ts index 171caddd..3edc95c8 100644 --- a/src/application/path-containment.test.ts +++ b/src/application/path-containment.test.ts @@ -3,9 +3,13 @@ import { mkdirSync, mkdtempSync, symlinkSync, writeFileSync } from "node:fs"; import { tmpdir } from "node:os"; import { join } from "node:path"; +import { canCreateSymlinks } from "../test/symlink-capable"; import { + pathRealpathEscapesProjectRoot, pathTraversesSymlinkOutsideRoot, projectRelativePathFromResolved, + readUtf8WithinProjectRoot, + rejectUnsafeProjectRelativePath, resolvePathWithinRoot, } from "./path-containment"; @@ -24,46 +28,131 @@ describe("projectRelativePathFromResolved", () => { }); }); +const symlinkCapable = canCreateSymlinks(); + describe("pathTraversesSymlinkOutsideRoot", () => { - it("returns false for symlinks that stay inside the project root", () => { - const root = mkdtempSync(join(tmpdir(), "codemap-symlink-in-")); - writeFileSync(join(root, "real.ts"), "export const x = 1;\n"); - symlinkSync(join(root, "real.ts"), join(root, "link.ts")); - expect(pathTraversesSymlinkOutsideRoot(root, join(root, "link.ts"))).toBe( - false, - ); + it.skipIf(!symlinkCapable)( + "returns false for symlinks that stay inside the project root", + () => { + const root = mkdtempSync(join(tmpdir(), "codemap-symlink-in-")); + writeFileSync(join(root, "real.ts"), "export const x = 1;\n"); + symlinkSync(join(root, "real.ts"), join(root, "link.ts")); + expect(pathTraversesSymlinkOutsideRoot(root, join(root, "link.ts"))).toBe( + false, + ); + }, + ); + + it.skipIf(!symlinkCapable)( + "returns true when a path component symlinks outside the project root", + () => { + const base = mkdtempSync(join(tmpdir(), "codemap-symlink-out-")); + const root = join(base, "proj"); + const outside = join(base, "outside"); + mkdirSync(root, { recursive: true }); + mkdirSync(outside, { recursive: true }); + writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); + symlinkSync(join(outside, "secret.ts"), join(root, "escape.ts")); + expect( + pathTraversesSymlinkOutsideRoot(root, join(root, "escape.ts")), + ).toBe(true); + }, + ); + + it.skipIf(!symlinkCapable)( + "returns true when an intermediate directory symlinks outside the root", + () => { + const base = mkdtempSync(join(tmpdir(), "codemap-symlink-dir-")); + const root = join(base, "proj"); + const outside = join(base, "outside"); + mkdirSync(root, { recursive: true }); + mkdirSync(join(outside, "nested"), { recursive: true }); + writeFileSync( + join(outside, "nested", "secret.ts"), + "export const s = 1;\n", + ); + symlinkSync(outside, join(root, "linked-dir")); + expect( + pathTraversesSymlinkOutsideRoot( + root, + join(root, "linked-dir", "nested", "secret.ts"), + ), + ).toBe(true); + }, + ); + + it.skipIf(!symlinkCapable)( + "returns true for a broken symlink in the path chain", + () => { + const root = mkdtempSync(join(tmpdir(), "codemap-symlink-broken-")); + symlinkSync(join(root, "missing-target.ts"), join(root, "broken.ts")); + expect( + pathTraversesSymlinkOutsideRoot(root, join(root, "broken.ts")), + ).toBe(true); + }, + ); +}); + +describe("pathRealpathEscapesProjectRoot", () => { + it.skipIf(!symlinkCapable)( + "returns true when realpath follows a symlink outside the project root", + () => { + const base = mkdtempSync(join(tmpdir(), "codemap-realpath-out-")); + const root = join(base, "proj"); + const outside = join(base, "outside"); + mkdirSync(root, { recursive: true }); + mkdirSync(outside, { recursive: true }); + writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); + symlinkSync(join(outside, "secret.ts"), join(root, "escape.ts")); + expect( + pathRealpathEscapesProjectRoot(root, join(root, "escape.ts")), + ).toBe(true); + expect(rejectUnsafeProjectRelativePath(root, "escape.ts")).toBe( + "path escapes via symlink", + ); + }, + ); +}); + +describe("readUtf8WithinProjectRoot", () => { + it("reads safe files via realpath", () => { + const root = mkdtempSync(join(tmpdir(), "codemap-read-safe-")); + writeFileSync(join(root, "a.ts"), "export const a = 1;\n"); + const result = readUtf8WithinProjectRoot(root, "a.ts"); + expect(result).toEqual({ ok: true, content: "export const a = 1;\n" }); + }); + + it("returns missing for paths that do not exist", () => { + const root = mkdtempSync(join(tmpdir(), "codemap-read-missing-")); + expect(readUtf8WithinProjectRoot(root, "gone.ts")).toEqual({ + ok: false, + status: "missing", + }); }); - it("returns true when a path component symlinks outside the project root", () => { - const base = mkdtempSync(join(tmpdir(), "codemap-symlink-out-")); + it.skipIf(!symlinkCapable)("rejects broken symlinks before read", () => { + const root = mkdtempSync(join(tmpdir(), "codemap-read-broken-")); + symlinkSync(join(root, "missing.ts"), join(root, "broken.ts")); + expect(readUtf8WithinProjectRoot(root, "broken.ts")).toEqual({ + ok: false, + status: "rejected", + reason: "path escapes via symlink", + }); + }); + + it.skipIf(!symlinkCapable)("rejects symlink escapes before read", () => { + const base = mkdtempSync(join(tmpdir(), "codemap-read-escape-")); const root = join(base, "proj"); const outside = join(base, "outside"); mkdirSync(root, { recursive: true }); mkdirSync(outside, { recursive: true }); writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); symlinkSync(join(outside, "secret.ts"), join(root, "escape.ts")); - expect(pathTraversesSymlinkOutsideRoot(root, join(root, "escape.ts"))).toBe( - true, - ); - }); - - it("returns true when an intermediate directory symlinks outside the root", () => { - const base = mkdtempSync(join(tmpdir(), "codemap-symlink-dir-")); - const root = join(base, "proj"); - const outside = join(base, "outside"); - mkdirSync(root, { recursive: true }); - mkdirSync(join(outside, "nested"), { recursive: true }); - writeFileSync( - join(outside, "nested", "secret.ts"), - "export const s = 1;\n", - ); - symlinkSync(outside, join(root, "linked-dir")); - expect( - pathTraversesSymlinkOutsideRoot( - root, - join(root, "linked-dir", "nested", "secret.ts"), - ), - ).toBe(true); + expect(readUtf8WithinProjectRoot(root, "escape.ts")).toEqual({ + ok: false, + status: "rejected", + reason: "path escapes via symlink", + }); }); }); @@ -81,14 +170,17 @@ describe("resolvePathWithinRoot", () => { expect(resolvePathWithinRoot(root, "../../../etc/passwd")).toBeNull(); }); - it("returns null when a symlink target resolves outside the root", () => { - const base = mkdtempSync(join(tmpdir(), "codemap-resolve-symlink-")); - const root = join(base, "proj"); - const outside = join(base, "outside"); - mkdirSync(root, { recursive: true }); - mkdirSync(outside, { recursive: true }); - writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); - symlinkSync(join(outside, "secret.ts"), join(root, "escape.ts")); - expect(resolvePathWithinRoot(root, "escape.ts")).toBeNull(); - }); + it.skipIf(!symlinkCapable)( + "returns null when a symlink target resolves outside the root", + () => { + const base = mkdtempSync(join(tmpdir(), "codemap-resolve-symlink-")); + const root = join(base, "proj"); + const outside = join(base, "outside"); + mkdirSync(root, { recursive: true }); + mkdirSync(outside, { recursive: true }); + writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); + symlinkSync(join(outside, "secret.ts"), join(root, "escape.ts")); + expect(resolvePathWithinRoot(root, "escape.ts")).toBeNull(); + }, + ); }); diff --git a/src/application/path-containment.ts b/src/application/path-containment.ts index f889e77d..b4f1e498 100644 --- a/src/application/path-containment.ts +++ b/src/application/path-containment.ts @@ -1,4 +1,4 @@ -import { lstatSync, realpathSync } from "node:fs"; +import { lstatSync, readFileSync, realpathSync } from "node:fs"; import { isAbsolute, join, relative, resolve, sep } from "node:path"; /** `true` iff `resolve(resolvedRoot, candidate)` lands inside `resolvedRoot`. */ @@ -77,9 +77,14 @@ export function pathTraversesSymlinkOutsideRoot( current = join(current, part); try { if (lstatSync(current).isSymbolicLink()) { - const linkReal = realpathSync(current); - if (!isWithinProjectRoot(rootReal, linkReal)) return true; - current = linkReal; + try { + const linkReal = realpathSync(current); + if (!isWithinProjectRoot(rootReal, linkReal)) return true; + current = linkReal; + } catch { + // Broken symlink — cannot verify containment. + return true; + } } } catch { // Missing tail — string containment already checked. @@ -89,13 +94,109 @@ export function pathTraversesSymlinkOutsideRoot( return false; } +/** + * `true` when `realpathSync(absPath)` resolves outside `projectRoot` (symlink + * targets or TOCTOU swaps). `false` when the path is missing. Hardlinks to + * outside files keep an in-root pathname — not detectable via `realpath` alone + * (local-trust boundary; same class as apply reads). + */ +export function pathRealpathEscapesProjectRoot( + projectRoot: string, + absPath: string, +): boolean { + const resolvedRoot = resolve(projectRoot); + let rootReal: string; + try { + rootReal = realpathSync(resolvedRoot); + } catch { + rootReal = resolvedRoot; + } + try { + const targetReal = realpathSync(absPath); + return !isWithinProjectRoot(rootReal, targetReal); + } catch { + return false; + } +} + +export type UnsafeProjectPathReason = + | "path escapes project root" + | "path escapes via symlink" + | "path resolves outside project root"; + +/** Containment checks shared by validate reads and `resolvePathWithinRoot`. */ +export function rejectUnsafeProjectRelativePath( + projectRoot: string, + relativePath: string, +): UnsafeProjectPathReason | undefined { + if (pathEscapesProjectRoot(projectRoot, relativePath)) { + return "path escapes project root"; + } + const abs = resolve(projectRoot, relativePath); + if (pathTraversesSymlinkOutsideRoot(projectRoot, abs)) { + return "path escapes via symlink"; + } + if (pathRealpathEscapesProjectRoot(projectRoot, abs)) { + return "path resolves outside project root"; + } + return undefined; +} + +export type SafeProjectReadResult = + | { ok: true; content: string } + | { ok: false; status: "missing" } + | { ok: false; status: "rejected"; reason: UnsafeProjectPathReason }; + +/** Read UTF-8 text after realpath containment (re-check immediately before read). */ +export function readUtf8WithinProjectRoot( + projectRoot: string, + relativePath: string, +): SafeProjectReadResult { + const rejectReason = rejectUnsafeProjectRelativePath( + projectRoot, + relativePath, + ); + if (rejectReason !== undefined) { + return { ok: false, status: "rejected", reason: rejectReason }; + } + + const resolvedRoot = resolve(projectRoot); + let rootReal: string; + try { + rootReal = realpathSync(resolvedRoot); + } catch { + rootReal = resolvedRoot; + } + + let targetReal: string; + try { + targetReal = realpathSync(resolve(projectRoot, relativePath)); + } catch { + return { ok: false, status: "missing" }; + } + + if (!isWithinProjectRoot(rootReal, targetReal)) { + return { + ok: false, + status: "rejected", + reason: "path resolves outside project root", + }; + } + + try { + return { ok: true, content: readFileSync(targetReal, "utf8") }; + } catch { + return { ok: false, status: "missing" }; + } +} + /** Resolve `relativePath` under `root`; `null` when it escapes the root. */ export function resolvePathWithinRoot( root: string, relativePath: string, ): string | null { - if (pathEscapesProjectRoot(root, relativePath)) return null; - const abs = resolve(root, relativePath); - if (pathTraversesSymlinkOutsideRoot(root, abs)) return null; - return abs; + if (rejectUnsafeProjectRelativePath(root, relativePath) !== undefined) { + return null; + } + return resolve(root, relativePath); } diff --git a/src/application/serve-bind-policy.test.ts b/src/application/serve-bind-policy.test.ts index ffc68947..fee5b4c5 100644 --- a/src/application/serve-bind-policy.test.ts +++ b/src/application/serve-bind-policy.test.ts @@ -9,6 +9,7 @@ import { describe("isLoopbackHost", () => { it.each([ ["127.0.0.1", true], + ["127.0.0.2", true], ["localhost", true], ["::1", true], ["[::1]", true], @@ -34,6 +35,9 @@ describe("serveBindTokenRequiredMessage", () => { expect( serveBindTokenRequiredMessage("127.0.0.1", undefined), ).toBeUndefined(); + expect( + serveBindTokenRequiredMessage("127.0.0.2", undefined), + ).toBeUndefined(); expect(serveBindTokenRequiredMessage("[::1]", undefined)).toBeUndefined(); }); @@ -42,4 +46,10 @@ describe("serveBindTokenRequiredMessage", () => { /non-loopback bind requires --token/, ); }); + + it("treats whitespace-only token as missing on non-loopback binds", () => { + expect(serveBindTokenRequiredMessage("0.0.0.0", " ")).toContain( + "non-loopback bind requires --token", + ); + }); }); diff --git a/src/application/serve-bind-policy.ts b/src/application/serve-bind-policy.ts index f86c137c..047a5c39 100644 --- a/src/application/serve-bind-policy.ts +++ b/src/application/serve-bind-policy.ts @@ -1,7 +1,18 @@ +function isIpv4Loopback(host: string): boolean { + if (!host.startsWith("127.")) return false; + const parts = host.split("."); + if (parts.length !== 4) return false; + return parts.every((part) => { + if (!/^\d+$/.test(part)) return false; + const n = Number(part); + return n >= 0 && n <= 255; + }); +} + /** Hosts safe to bind without mandatory auth (loopback only). */ export function isLoopbackHost(host: string): boolean { const h = host.toLowerCase(); - return h === "127.0.0.1" || h === "localhost" || h === "::1" || h === "[::1]"; + return h === "localhost" || h === "::1" || h === "[::1]" || isIpv4Loopback(h); } /** Error message when a non-loopback bind lacks a token; `undefined` when OK. */ @@ -9,7 +20,10 @@ export function serveBindTokenRequiredMessage( host: string, token: string | undefined, ): string | undefined { - if (!isLoopbackHost(host) && (token === undefined || token.length === 0)) { + if ( + !isLoopbackHost(host) && + (token === undefined || token.trim().length === 0) + ) { return ( "codemap serve: non-loopback bind requires --token (use a long random secret). " + "Example: codemap serve --host 0.0.0.0 --token $(openssl rand -hex 32)" diff --git a/src/application/validate-engine.ts b/src/application/validate-engine.ts index d04972ed..95866662 100644 --- a/src/application/validate-engine.ts +++ b/src/application/validate-engine.ts @@ -1,11 +1,10 @@ -import { readFileSync } from "node:fs"; -import { isAbsolute, relative, resolve, sep } from "node:path"; +import { isAbsolute, relative, sep } from "node:path"; import type { CodemapDatabase } from "../db"; import { hashContent } from "../hash"; import { - pathEscapesProjectRoot, - pathTraversesSymlinkOutsideRoot, + readUtf8WithinProjectRoot, + rejectUnsafeProjectRelativePath, } from "./path-containment"; /** @@ -47,20 +46,23 @@ export function computeValidateRows( if (seen.has(rel)) continue; seen.add(rel); - const rejectReason = rejectValidatePath(projectRoot, rel); + const rejectReason = rejectUnsafeProjectRelativePath(projectRoot, rel); if (rejectReason !== undefined) { rows.push({ path: rel, status: "rejected", reason: rejectReason }); continue; } const indexedHash = indexByPath.get(rel); - const abs = resolve(projectRoot, rel); - let source: string | undefined; - try { - source = readFileSync(abs, "utf8"); - } catch { - source = undefined; + const readResult = readUtf8WithinProjectRoot(projectRoot, rel); + if (readResult.ok === false && readResult.status === "rejected") { + rows.push({ + path: rel, + status: "rejected", + reason: readResult.reason, + }); + continue; } + const source = readResult.ok === true ? readResult.content : undefined; if (indexedHash === undefined) { if (source !== undefined) rows.push({ path: rel, status: "unindexed" }); @@ -85,19 +87,6 @@ export function computeValidateRows( * slashes (tinyglobby / Bun.Glob / git diff all emit POSIX), so we normalize * here to make `indexByPath.get(rel)` succeed cross-platform. */ -function rejectValidatePath( - projectRoot: string, - rel: string, -): string | undefined { - if (pathEscapesProjectRoot(projectRoot, rel)) { - return "path escapes project root"; - } - if (pathTraversesSymlinkOutsideRoot(projectRoot, resolve(projectRoot, rel))) { - return "path escapes via symlink"; - } - return undefined; -} - export function toProjectRelative(projectRoot: string, p: string): string { const rel = isAbsolute(p) ? relative(projectRoot, p) : p; return sep === "/" ? rel : rel.split(sep).join("/"); diff --git a/src/cli/cmd-query-formatted.test.ts b/src/cli/cmd-query-formatted.test.ts index 43a002f9..43ae51b1 100644 --- a/src/cli/cmd-query-formatted.test.ts +++ b/src/cli/cmd-query-formatted.test.ts @@ -90,32 +90,35 @@ async function fileCount(): Promise { } describe("runQueryCmd — formatted output DML guard", () => { - it("rejects DELETE via --format sarif without mutating the index", async () => { - const before = await fileCount(); + it.each(["sarif", "badge", "mermaid", "annotations", "codeclimate"] as const)( + "rejects DELETE via --format %s without mutating the index", + async (format) => { + const before = await fileCount(); - const r = await runCli( - [ - "query", - "--format", - "sarif", - "DELETE FROM files WHERE path = 'src/drop.ts'", - ], - { CODEMAP_ROOT: projectRoot }, - ); - expect(r.exitCode).toBe(1); - expect(JSON.parse(r.out)).toMatchObject({ error: expect.any(String) }); + const r = await runCli( + [ + "query", + "--format", + format, + "DELETE FROM files WHERE path = 'src/drop.ts'", + ], + { CODEMAP_ROOT: projectRoot }, + ); + expect(r.exitCode).toBe(1); + expect(JSON.parse(r.out)).toMatchObject({ error: expect.any(String) }); - expect(await fileCount()).toBe(before); + expect(await fileCount()).toBe(before); - const drop = await runCli( - [ - "query", - "--json", - "SELECT COUNT(*) AS n FROM files WHERE path = 'src/drop.ts'", - ], - { CODEMAP_ROOT: projectRoot }, - ); - expect(drop.exitCode).toBe(0); - expect(JSON.parse(drop.out)).toEqual([{ n: 1 }]); - }); + const drop = await runCli( + [ + "query", + "--json", + "SELECT COUNT(*) AS n FROM files WHERE path = 'src/drop.ts'", + ], + { CODEMAP_ROOT: projectRoot }, + ); + expect(drop.exitCode).toBe(0); + expect(JSON.parse(drop.out)).toEqual([{ n: 1 }]); + }, + ); }); diff --git a/src/cli/cmd-serve.test.ts b/src/cli/cmd-serve.test.ts index 79be14be..c57f89a6 100644 --- a/src/cli/cmd-serve.test.ts +++ b/src/cli/cmd-serve.test.ts @@ -88,6 +88,12 @@ describe("parseServeRest", () => { expect(r.token).toBeUndefined(); }); + it("parses 127.0.0.0/8 loopback without token", () => { + const r = parseServeRest(["serve", "--host", "127.0.0.2"]); + if (r.kind !== "run") throw new Error("expected run"); + expect(r.token).toBeUndefined(); + }); + it("parses bracketed IPv6 loopback without token", () => { const r = parseServeRest(["serve", "--host", "[::1]"]); if (r.kind !== "run") throw new Error("expected run"); diff --git a/src/cli/cmd-serve.ts b/src/cli/cmd-serve.ts index db276c0f..de396df4 100644 --- a/src/cli/cmd-serve.ts +++ b/src/cli/cmd-serve.ts @@ -169,10 +169,11 @@ plugins that don't speak MCP). Single project root per server (set via --root / CODEMAP_ROOT). Default bind: 127.0.0.1:${DEFAULT_PORT} (loopback only — refuse 0.0.0.0 unless -explicitly opted in via --host 0.0.0.0). +explicitly opted in via --host 0.0.0.0). Any 127.0.0.0/8 address (e.g. +127.0.0.2) is treated as loopback for the --token requirement. Flags: - --host Bind address (default: ${DEFAULT_HOST}). + --host Bind address (default: ${DEFAULT_HOST}; 127.* is loopback). --port Bind port (default: ${DEFAULT_PORT}). --token Require Authorization: Bearer on every request. diff --git a/src/cli/cmd-validate.test.ts b/src/cli/cmd-validate.test.ts index 1537a1da..fa7fc936 100644 --- a/src/cli/cmd-validate.test.ts +++ b/src/cli/cmd-validate.test.ts @@ -15,9 +15,11 @@ import { resolveCodemapConfig } from "../config"; import { closeDb, openDb } from "../db"; import { hashContent } from "../hash"; import { initCodemap } from "../runtime"; +import { canCreateSymlinks } from "../test/symlink-capable"; import { parseValidateRest } from "./cmd-validate"; let tmpRoot = ""; +const symlinkCapable = canCreateSymlinks(); beforeEach(() => { tmpRoot = mkdtempSync(join(tmpdir(), "codemap-validate-")); @@ -129,24 +131,86 @@ describe("computeValidateRows", () => { ]); }); - it("rejects explicit paths that escape via symlink", () => { - const base = mkdtempSync(join(tmpdir(), "codemap-validate-symlink-")); - const outside = join(base, "outside"); + it.skipIf(!symlinkCapable)( + "rejects explicit paths that escape via symlink", + () => { + const base = mkdtempSync(join(tmpdir(), "codemap-validate-symlink-")); + const outside = join(base, "outside"); + mkdirSync(outside, { recursive: true }); + writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); + symlinkSync(join(outside, "secret.ts"), join(tmpRoot, "escape.ts")); + + const rows = withDb((db) => + computeValidateRows(db, tmpRoot, ["escape.ts"]), + ); + expect(rows).toEqual([ + { + path: "escape.ts", + status: "rejected", + reason: "path escapes via symlink", + }, + ]); + rmSync(base, { recursive: true, force: true }); + }, + ); + + it.skipIf(!symlinkCapable)( + "rejects indexed escape paths on full-scan validate", + () => { + const base = mkdtempSync(join(tmpdir(), "codemap-validate-scan-")); + const outside = join(base, "outside"); + mkdirSync(outside, { recursive: true }); + writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); + symlinkSync(join(outside, "secret.ts"), join(tmpRoot, "escape.ts")); + seedIndex([ + { + path: "escape.ts", + content_hash: hashContent("export const s = 1;\n"), + }, + ]); + + const rows = withDb((db) => computeValidateRows(db, tmpRoot, [])); + expect(rows).toEqual([ + { + path: "escape.ts", + status: "rejected", + reason: "path escapes via symlink", + }, + ]); + rmSync(base, { recursive: true, force: true }); + }, + ); + + it.skipIf(!symlinkCapable)( + "rejects indexed broken symlinks on full-scan validate", + () => { + symlinkSync(join(tmpRoot, "missing.ts"), join(tmpRoot, "broken.ts")); + seedIndex([{ path: "broken.ts", content_hash: hashContent("stale\n") }]); + + const rows = withDb((db) => computeValidateRows(db, tmpRoot, [])); + expect(rows).toEqual([ + { + path: "broken.ts", + status: "rejected", + reason: "path escapes via symlink", + }, + ]); + }, + ); + + it("rejects absolute paths outside the project root", () => { + const outside = join(tmpdir(), "codemap-validate-abs-outside-"); mkdirSync(outside, { recursive: true }); - writeFileSync(join(outside, "secret.ts"), "export const s = 1;\n"); - symlinkSync(join(outside, "secret.ts"), join(tmpRoot, "escape.ts")); + const outsideFile = join(outside, "secret.ts"); + writeFileSync(outsideFile, "export const s = 1;\n"); const rows = withDb((db) => - computeValidateRows(db, tmpRoot, ["escape.ts"]), + computeValidateRows(db, tmpRoot, [outsideFile]), ); - expect(rows).toEqual([ - { - path: "escape.ts", - status: "rejected", - reason: "path escapes via symlink", - }, - ]); - rmSync(base, { recursive: true, force: true }); + expect(rows).toHaveLength(1); + expect(rows[0]?.status).toBe("rejected"); + expect(rows[0]?.reason).toBe("path escapes project root"); + rmSync(outside, { recursive: true, force: true }); }); it("dedupes paths and sorts by path", () => { diff --git a/src/cli/cmd-validate.ts b/src/cli/cmd-validate.ts index 214b7dec..777a08bb 100644 --- a/src/cli/cmd-validate.ts +++ b/src/cli/cmd-validate.ts @@ -23,15 +23,17 @@ file. Prints rows for entries that are out of sync — without the agent paying to re-read every file. paths Project-relative or absolute file paths to check. If omitted, - all indexed files are checked. + all indexed files are checked. Output path keys are always + project-relative POSIX paths (absolute argv is normalized). Statuses: stale The file exists but its content_hash differs from the index. missing The file is in the index but has been deleted on disk. unindexed The file exists on disk but is not present in the index (only when explicit paths are passed). - rejected The path could not be checked safely (escapes the project root - or resolves outside via symlink). Includes a reason field. + rejected The path could not be checked safely. reason is one of: + path escapes project root | path escapes via symlink | + path resolves outside project root. Flags: --json Emit a JSON array of {path, status[, reason]} objects (for agents). diff --git a/src/test/symlink-capable.ts b/src/test/symlink-capable.ts new file mode 100644 index 00000000..6cd24c9c --- /dev/null +++ b/src/test/symlink-capable.ts @@ -0,0 +1,21 @@ +import { mkdtempSync, rmSync, symlinkSync, writeFileSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; + +let cached: boolean | undefined; + +/** `false` when the environment cannot create symlinks (e.g. Windows without privilege). */ +export function canCreateSymlinks(): boolean { + if (cached !== undefined) return cached; + const dir = mkdtempSync(join(tmpdir(), "codemap-symlink-cap-")); + try { + writeFileSync(join(dir, "target.txt"), "x\n"); + symlinkSync(join(dir, "target.txt"), join(dir, "link.txt")); + cached = true; + } catch { + cached = false; + } finally { + rmSync(dir, { recursive: true, force: true }); + } + return cached; +} diff --git a/templates/agent-content/skill/10-recipes-context.md b/templates/agent-content/skill/10-recipes-context.md index ee312c42..da71edde 100644 --- a/templates/agent-content/skill/10-recipes-context.md +++ b/templates/agent-content/skill/10-recipes-context.md @@ -33,7 +33,7 @@ Each emitted delta carries its own `base` metadata so mixed-baseline audits are **MCP server (`codemap mcp [--no-watch] [--debounce ]`)** — separate top-level command exposing the structural-query surface (21 JSON-RPC tools — list below) to agent hosts (Claude Code, Cursor, Codex, generic MCP clients) over stdio. Eliminates the bash round-trip on every agent call. Bootstrap once at server boot; each tool returns the same JSON payload its CLI `--json` would print (including `query batch`, `trace`, `explore`, `node`, `file`, `schema`, `symbols`, `context --include-snippets`, `ingest-coverage`, and `ingest-churn`). MCP wraps payloads in `{content: [{type: "text", text: …}]}`. **`tools/list` ToolAnnotations** — advisory `readOnlyHint` / `destructiveHint` / `idempotentHint` per tool: read paths (`query`, `show`, `audit`, …) → `readOnlyHint: true`; apply tools (`apply`, `apply_rows`, `apply_diff_input`) → `destructiveHint: true` (writes still require `yes: true`); index mutators (`save_baseline`, `drop_baseline`, `ingest_coverage`, `ingest_churn`) → `readOnlyHint: false` without `destructiveHint`. HTTP `GET /tools` exposes the same hints. **`initialize` instructions** + resource `codemap://mcp-instructions` carry the tool-selection playbook. **Watcher default-ON since 2026-05** — every tool reads a live index, `audit`'s incremental-index prelude becomes a no-op. Pass `--no-watch` (or `CODEMAP_WATCH=0`) for one-shot fire-and-forget calls without the in-process chokidar loop. -**HTTP server (`codemap serve [--host 127.0.0.1] [--port 7878] [--token ] [--no-watch] [--debounce ]`)** — same tool taxonomy as MCP, exposed over `POST /tool/{name}` for non-MCP consumers (CI scripts, simple `curl`, IDE plugins that don't speak MCP). Loopback-default; Bearer-token auth optional on loopback binds and **required** on non-loopback binds (`--host 0.0.0.0`, etc.). HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper); SARIF / annotations / mermaid / diff payloads ship with `application/sarif+json` or `text/plain` Content-Type; `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` use `application/json`; badge markdown uses `text/plain`. Resources mirrored at `GET /resources/{encoded-uri}`. `GET /health` is auth-exempt; `GET /tools` / `GET /resources` are catalogs. **Watcher default-ON since 2026-05** — same `--no-watch` / `CODEMAP_WATCH=0` opt-out as `mcp`. +**HTTP server (`codemap serve [--host 127.0.0.1] [--port 7878] [--token ] [--no-watch] [--debounce ]`)** — same tool taxonomy as MCP, exposed over `POST /tool/{name}` for non-MCP consumers (CI scripts, simple `curl`, IDE plugins that don't speak MCP). Loopback-default; any `127.0.0.0/8` bind counts as loopback for the token rule. Bearer-token auth optional on loopback binds and **required** on non-loopback binds (`--host 0.0.0.0`, etc.). HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper); SARIF / annotations / mermaid / diff payloads ship with `application/sarif+json` or `text/plain` Content-Type; `format: "diff-json"` / `"codeclimate"` / `"badge"` + `badge_style: "json"` use `application/json`; badge markdown uses `text/plain`. Resources mirrored at `GET /resources/{encoded-uri}`. `GET /health` is auth-exempt; `GET /tools` / `GET /resources` are catalogs. **Watcher default-ON since 2026-05** — same `--no-watch` / `CODEMAP_WATCH=0` opt-out as `mcp`. **Watch mode (`codemap watch [--debounce 250] [--quiet]`)** — standalone long-running process that debounces file changes and re-indexes only the changed paths. SIGINT/SIGTERM drains pending edits before exit. `mcp` / `serve` boot the watcher in-process by default since 2026-05; use `codemap watch` standalone when you want the watcher decoupled from a transport (e.g. running alongside an editor that already speaks MCP via a different process). @@ -47,7 +47,7 @@ Each emitted delta carries its own `base` metadata so mixed-baseline audits are - **`list_baselines`** — no args; returns the array `codemap query --baselines --json` would print. - **`drop_baseline`** — `{name}` → `{dropped}` on success; structured `{error}` on unknown name (MCP sets `isError: true`). - **`context`** — `{compact?, intent?, include_snippets?}`. CLI: `codemap context [--include-snippets]`. Session-start project envelope with `start_here` shortcuts (one call replaces 4-5 `query`s). `index_summary.file_churn` row count; **`churn_hint`** when empty (steers to index, **`ingest-churn`**, or **`churn.file`**). `include_snippets` adds one-line export previews on hub leaders (capped to adaptive `signature_max_chars`; may set `stale`/`missing`); no-op when `compact: true`. Whitespace-only `intent` is treated as no intent. Prefer `start_here.hub_leaders` over legacy `hubs` for signatures — `hubs` keeps the full bundled `fan-in` recipe limit for backward compatibility. `sample_markers` count scales down on repos >500 / >5000 files. -- **`validate`** — `{paths?: string[]}`. SHA-256 vs `files.content_hash`; returns only out-of-sync rows (`stale` / `missing` / `unindexed` / `rejected` — fresh paths are omitted; `rejected` includes optional `reason` when a path escapes the project root or resolves outside via symlink). +- **`validate`** — `{paths?: string[]}`. SHA-256 vs `files.content_hash`; returns only out-of-sync rows (`stale` / `missing` / `unindexed` / `rejected` — fresh paths are omitted; `rejected` includes optional `reason` when a path escapes the project root, resolves outside via symlink, or has a broken symlink). Output `path` keys are project-relative POSIX paths. - **`show`** — `{name, kind?, in?}` or `{query, with_fts?}`. Exact symbol lookup or field-qualified search (`kind:`, `name:`, `path:`, `in:` + free text) → `{matches, disambiguation?, warning?}`. CLI: `codemap show --query '…' [--print-sql]`. - **`snippet`** — same as `show` (`{name, kind?, in?}` or `{query, with_fts?}`) but each match also carries `source` (file text) + `stale` / `missing` flags → `{matches, disambiguation?, warning?}`. No reindex side-effects. - **`impact`** — `{target, direction?, via?, depth?, limit?, summary?}`. Symbol/file blast-radius walker (replaces hand-composed `WITH RECURSIVE`). Auto-resolves symbol vs file target; `via` defaults to every backend compatible with the kind. From 22b35a1a087fad27844fa7b75f69326fa6121f7c Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Thu, 11 Jun 2026 08:18:46 +0300 Subject: [PATCH 4/7] feat(agents): harden-pr probes, ledger, and recall benchmark harness Adds missing-test eval fixture, LEDGER.md, vet/reconcile/quick modes, score-probe recall scorer, and test:harden-probes smoke. Live probe run: recall 1.0 on golden finding. --- .agents/skills/harden-pr/LEDGER.md | 23 +++++ .agents/skills/harden-pr/SKILL.md | 94 ++++++++++++++----- docs/benchmark.md | 42 +++++++++ docs/testing-coverage.md | 13 +++ fixtures/harden-probes/README.md | 11 +++ fixtures/harden-probes/missing-test/README.md | 26 +++++ .../harden-probes/missing-test/acceptance.sh | 18 ++++ .../missing-test/expected-findings.json | 12 +++ .../harden-probes/missing-test/package.json | 6 ++ .../missing-test/src/formatWidget.ts | 8 ++ package.json | 1 + scripts/agent-eval/harden-scenarios.json | 15 +++ scripts/harden-probes/run-smoke.sh | 44 +++++++++ scripts/harden-probes/score-probe.mjs | 57 +++++++++++ scripts/harden-probes/score-probe.test.mjs | 36 +++++++ .../harden-probes/validate-fixtures.test.mjs | 55 +++++++++++ 16 files changed, 438 insertions(+), 23 deletions(-) create mode 100644 .agents/skills/harden-pr/LEDGER.md create mode 100644 fixtures/harden-probes/README.md create mode 100644 fixtures/harden-probes/missing-test/README.md create mode 100755 fixtures/harden-probes/missing-test/acceptance.sh create mode 100644 fixtures/harden-probes/missing-test/expected-findings.json create mode 100644 fixtures/harden-probes/missing-test/package.json create mode 100644 fixtures/harden-probes/missing-test/src/formatWidget.ts create mode 100644 scripts/agent-eval/harden-scenarios.json create mode 100755 scripts/harden-probes/run-smoke.sh create mode 100755 scripts/harden-probes/score-probe.mjs create mode 100644 scripts/harden-probes/score-probe.test.mjs create mode 100644 scripts/harden-probes/validate-fixtures.test.mjs diff --git a/.agents/skills/harden-pr/LEDGER.md b/.agents/skills/harden-pr/LEDGER.md new file mode 100644 index 00000000..6cdeb4b5 --- /dev/null +++ b/.agents/skills/harden-pr/LEDGER.md @@ -0,0 +1,23 @@ +# Harden-pr ledger + +Single durable backlog for [`harden-pr`](./SKILL.md). Parent reads **§ Rejections** at vet step; **§ Deferred** on cap and on `/harden-pr reconcile`. + +## Rejections + +By-design or false-positive findings — do not re-raise. + +```markdown +- **[category]** `file:line` — label: reason +``` + + + +## Deferred + +Capped or out-of-scope-for-now — reconcile re-vets; remove lines when fixed. + +```markdown +- **[severity]** `file:line` — finding (deferred: out of scope | cap | blocked) +``` diff --git a/.agents/skills/harden-pr/SKILL.md b/.agents/skills/harden-pr/SKILL.md index ed13f731..da9b5d30 100644 --- a/.agents/skills/harden-pr/SKILL.md +++ b/.agents/skills/harden-pr/SKILL.md @@ -4,8 +4,8 @@ description: >- Bring a branch to pristine, maximum production readiness without changing PR intent — spawn parallel Task subagents (never inline review), fix in-bounds findings, loop autonomously until clean or pass cap, then report once. Use after a tracer-bullet commit (lite), before PR - is done (full), or on "harden", "harden-pr", "pristine", "review until clean", - "production-ready pass". Invoking this skill authorizes one harden commit at cycle end. + is done (full), on "harden", "harden-pr", "pristine", "review until clean", + "production-ready pass", or "harden-pr reconcile". Invoking this skill authorizes one harden commit at cycle end. NEVER stop mid-loop to ask about commits, babysit, or the next pass. NEVER redesign the feature or change observable runtime behavior. --- @@ -16,9 +16,9 @@ description: >- Local loop: parallel reviewer subagents → merge findings → fix in-bounds → re-verify → repeat until clean or cap → **one final report**. -**Invoking this skill (`/harden-pr`, `harden-pr lite`, `harden-pr full`) is a run-to-completion command.** The agent executes the full loop before ending the turn. +**Invoking this skill (`/harden-pr`, `harden-pr lite`, `harden-pr full`, `harden-pr quick`, `harden-pr reconcile`) is a run-to-completion command.** The agent executes the full loop before ending the turn. -Sister skills: [`audit-pr-architecture`](../audit-pr-architecture/SKILL.md) (extended structural reviewer). Mention **`babysit`** only in the final report (full mode) — never mid-loop. +Sister skills: [`audit-pr-architecture`](../audit-pr-architecture/SKILL.md) (extended structural reviewer). **Ledger:** [LEDGER.md](./LEDGER.md) (rejections + deferred — one file). Mention **`babysit`** only in the final report (full mode) — never mid-loop. ## Run-to-completion (read first) @@ -42,12 +42,14 @@ Otherwise: resolve anchor → run all passes → fix → verify → next pass ## Modes -| Mode | When | Scope | Max passes | -| -------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------- | ---------- | -| **Lite** | After each tracer-bullet slice commit ([`tracer-bullets`](../../rules/tracer-bullets.md) cadence) | Files in the slice diff | 2 | -| **Full** | User intent ("full harden", "PR done", "production-ready pass") **or** offer when an in-flight `docs/plans/.md` checklist is complete | `origin/main...HEAD` | 3 | +| Mode | When | Scope | Max passes | +| ------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------- | ---------- | +| **Lite** | After each tracer-bullet slice commit ([`tracer-bullets`](../../rules/tracer-bullets.md) cadence) | Files in the slice diff | 2 | +| **Quick** | Cheap uncertainty pass ("quick harden") | Last commit or slice diff | 1 | +| **Full** | User intent ("full harden", "PR done", "production-ready pass") **or** offer when an in-flight `docs/plans/.md` checklist is complete | `origin/main...HEAD` | 3 | +| **Reconcile** | `/harden-pr reconcile` — process [LEDGER.md § Deferred](./LEDGER.md#deferred), then run **full** if branch still open | `origin/main...HEAD` | 3 | -Default to **lite** when invoked immediately after a slice commit. Default to **full** when the user signals branch completion. +Default to **lite** when invoked immediately after a slice commit. Default to **full** when the user signals branch completion. **Quick** = core 3 reviewers only (no extended roster). ## Production bar (what "pristine" means) @@ -76,6 +78,27 @@ Resolve in order; stop at the first hit: Reviewers treat the anchor as contract. Findings that would violate it → **report, do not apply**. +Record `HEAD` at loop start (`git rev-parse HEAD`) in the final report. If `HEAD` changes mid-loop from unrelated work, re-resolve the anchor before the next pass. + +## Vet step (parent, after merge — before fix) + +Subagents over-report. After merge + dedupe: + +1. Read [LEDGER.md § Rejections](./LEDGER.md#rejections) — drop findings matching a rejection entry. +2. For each remaining finding: **re-read** `file` at `line` (or the cited region). Drop if the claim is false or by-design. +3. New by-design drops → append one bullet to **§ Rejections** in [LEDGER.md](./LEDGER.md). +4. Sort survivors by leverage: `severity` first, then `confidence` desc, then `effort` asc (`S` before `L`). + +**Anti-pattern:** applying a fix without re-reading the cited location. + +## Reconcile mode + +Run-to-completion like other modes: + +1. Read [LEDGER.md § Deferred](./LEDGER.md#deferred). Re-vet each row (same vet step). Fix in-bounds items; remove fixed lines. +2. Run **full** harden on `origin/main...HEAD` (same loop as full mode). +3. On cap: append still-deferred items to **§ Deferred** in [LEDGER.md](./LEDGER.md). Report what was reconciled vs still open. + ## In-bounds vs out-of-bounds **Fix:** bugs, missing tests, docs/changeset drift, lint/type/format, error-handling gaps, edge cases, **behavior-preserving refactors in touched files**, in-scope nits (naming, comment hygiene, cheap lint fixes). @@ -106,10 +129,16 @@ Each reviewer returns **only** a JSON array (no prose wrapper). Parent parses ar "finding": "One-sentence claim about a gap vs production bar", "severity": "blocker | major | minor | nit | info", "file": "repo-relative/path or \"multiple\"", - "fixable_in_bounds": true + "line": 42, + "confidence": "high | medium | low", + "effort": "S | M | L", + "fixable_in_bounds": true, + "production_bar": "Tests | Docs | Structure | …" } ``` +Use `line: null` when the gap is file-level (e.g. missing test file). + **Severity → action** | Severity | Parent action | @@ -124,8 +153,9 @@ Each reviewer returns **only** a JSON array (no prose wrapper). Parent parses ar 1. Concatenate all reviewer arrays. 2. Drop `info` unless it blocks ship shape. 3. Dedupe: same `file` + same root cause → keep highest severity, merge `finding` text. -4. Sort actionable: `blocker` → `major` → `minor` → `nit`. -5. If merged list is empty → pass succeeds; skip fix phase. +4. Sort actionable: `blocker` → `major` → `minor` → `nit`; within tier → `confidence` desc → `effort` asc. +5. **Vet** (§ Vet step). +6. If vetted list is empty → pass succeeds; skip fix phase. **Example merged queue (pass 1)** @@ -135,19 +165,31 @@ Each reviewer returns **only** a JSON array (no prose wrapper). Parent parses ar "finding": "CLI --help documents summary counts but not per-row attribution on --base JSON rows.", "severity": "major", "file": "src/cli/cmd-audit.ts", - "fixable_in_bounds": true + "line": 120, + "confidence": "high", + "effort": "S", + "fixable_in_bounds": true, + "production_bar": "Docs" }, { "finding": "Skill shard leaks requiredColumns when describing attribution.", "severity": "major", "file": "templates/agent-content/skill/10-recipes-context.md", - "fixable_in_bounds": true + "line": null, + "confidence": "high", + "effort": "M", + "fixable_in_bounds": true, + "production_bar": "Surfaces" }, { "finding": "No e2e test for attribution: inherited on deprecated delta.", "severity": "nit", "file": "src/application/audit-worktree.test.ts", - "fixable_in_bounds": true + "line": null, + "confidence": "medium", + "effort": "S", + "fixable_in_bounds": true, + "production_bar": "Tests" } ] ``` @@ -170,7 +212,7 @@ You are the **{ROLE}** reviewer for `/harden-pr` on `{REPO}`. **Task:** {EXTRA} **Return ONLY** a JSON array of findings: -[{ "finding": "...", "severity": "blocker|major|minor|nit|info", "file": "...", "fixable_in_bounds": true|false }] +[{ "finding": "...", "severity": "blocker|major|minor|nit|info", "file": "...", "line": N|null, "confidence": "high|medium|low", "effort": "S|M|L", "fixable_in_bounds": true|false, "production_bar": "..." }] If clean: [] Readonly — do not edit files. @@ -216,23 +258,25 @@ Re-derive layer globs from `docs/architecture.md` § Layering — don't hardcode Execute **without pausing for user input** until exit condition: ``` -resolve intent anchor +resolve intent anchor; stamp HEAD pass = 1 loop: Task-batch all applicable reviewers (parallel, readonly) parent: merge + dedupe JSON findings (§ Finding schema) + parent: vet findings (§ Vet step) if none actionable → goto done fix in-bounds (pass 1: all; passes 2+: blockers first, then in-scope nits) - run project checks on touched files + per fix: run verification gate from verify-after-each-step on touched files if clean and no new findings → goto done if pass >= max_passes → goto capped pass += 1 goto loop capped: + append deferred rows to LEDGER.md § Deferred emit deferred-nits list (each nit must cite plan Out of scope or cross-PR blocker — not "optional") done: if uncommitted fixes → git commit -m "harden: …" - emit final report (include babysit one-liner if full mode) + emit final report (include babysit one-liner if full mode; include anchor HEAD stamp) ``` **Pass cap behavior:** after cap, stop auto-fixing; list deferred nits. Do not block the next tracer slice. @@ -243,9 +287,13 @@ Skill invocation **is** the commit authorization. After the loop: if fixes exist ## Quick invoke -| Intent | Say | -| ----------- | ------------------------------------------------------ | -| Post-slice | `/harden-pr lite` or `/harden-pr` after a slice commit | -| Branch done | `/harden-pr full` or "production-ready pass" | +| Intent | Say | +| ---------------- | ------------------------------------------------------ | +| Post-slice | `/harden-pr lite` or `/harden-pr` after a slice commit | +| Cheap pass | `/harden-pr quick` | +| Branch done | `/harden-pr full` or "production-ready pass" | +| Deferred backlog | `/harden-pr reconcile` | + +**Eval probes:** [`fixtures/harden-probes/`](../../../fixtures/harden-probes/) — manual workflow scoring; schema enforced by `scripts/harden-probes/validate-fixtures.test.mjs`. Replaces the old copy-paste: _"spawn subagents → fix → loop until clean"_ — this skill **is** that loop. diff --git a/docs/benchmark.md b/docs/benchmark.md index 2a8890a2..c8ad8414 100644 --- a/docs/benchmark.md +++ b/docs/benchmark.md @@ -367,6 +367,48 @@ Environment overrides: `AGENT_EVAL_OUTPUT`, `AGENT_EVAL_FIXTURE_ROOT`, `AGENT_EV Numbers are stable for a given fixture + schema; re-run locally after intentional schema or probe changes. +### Harden-pr workflow eval + +Manual A/B harness for [`.agents/skills/harden-pr/`](../.agents/skills/harden-pr/SKILL.md) — complements codemap **query** probes above. Measures whether an agent **detects, fixes, and verifies** production-bar gaps on a branch, not SQL vs grep cost. + +| Artifact | Role | +| --------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- | +| [`fixtures/harden-probes/`](../fixtures/harden-probes/) | Mini corpora with injected gaps + `expected-findings.json` oracle | +| [`scripts/agent-eval/harden-scenarios.json`](../scripts/agent-eval/harden-scenarios.json) | Scenario index (probe dir, mode, acceptance script) | +| [`.agents/skills/harden-pr/LEDGER.md`](../.agents/skills/harden-pr/LEDGER.md) | Rejections (vet) + deferred backlog (reconcile) — one file | +| [`scripts/harden-probes/validate-fixtures.test.mjs`](../scripts/harden-probes/validate-fixtures.test.mjs) | Golden finding schema guard (`test:scripts`) | + +**Run one probe:** + +```bash +cd fixtures/harden-probes/missing-test +# Cursor: attach harden-pr → `/harden-pr lite` +bash acceptance.sh # after harden — must exit 0 +``` + +**Mechanical smoke (CI via `test:scripts`):** + +```bash +bun run test:harden-probes +# schema guard + pre-fix acceptance fails + score-probe oracle + +# Score agent findings JSON against golden: +bun scripts/harden-probes/score-probe.mjs fixtures/harden-probes/missing-test .agent-eval/my-findings.json +``` + +**Score each run** (spreadsheet or research note — not CI-gated yet): + +| Metric | Pass | +| -------- | ------------------------------------------------------------------------------ | +| Recall | Finds ≥1 golden row in `expected-findings.json` (same `file` + production bar) | +| Fix | `acceptance.sh` green; intent anchor unchanged | +| Autonomy | Zero mid-loop commit/babysit/next-pass prompts | +| Passes | ≤ mode cap (lite 2, full 3) | + +**A/B skill versions:** same probe dir, two sessions (before/after skill edit); compare recall, false fixes, passes, autonomy. + +Future: log parser for `Task` reviewer JSON (same spirit as `parse-agent-log.ts`) to automate recall scoring. + #### Dual-agent study (codemap self-index, provisional) Exploratory runs on the **codemap repo** index (not `fixtures/minimal`) — four structural tasks, MCP-on vs MCP-forbidden (grep/read/shell only). Not pinned in CI; methodology caveats apply. diff --git a/docs/testing-coverage.md b/docs/testing-coverage.md index 423c40b1..cca1d77c 100644 --- a/docs/testing-coverage.md +++ b/docs/testing-coverage.md @@ -14,6 +14,7 @@ | **In-repo test bench** | `bun run test:golden` | Index `fixtures/minimal/` (bench corpus) → compare `fixtures/golden/minimal/*.json`. Map: `fixtures/CAPABILITIES.json`. | | **Golden guard** | `bun run test:scripts` | `scripts/query-golden-coverage-matrix.test.mjs` — every bundled recipe + substrate table has a scenario. | | **Agent eval** | `bun run test:agent-eval` | Probe arms vs golden ids (MCP-on vs glob/read). | +| **Harden-pr probes** | `bun run test:harden-probes` | Schema + pre-fix fail + score oracle; manual agent run + `acceptance.sh` after fix. | | **Integration (git)** | `src/application/run-index.test.ts` | `runCodemapIndex` incremental paths: heritage + calls re-resolution, delete/reindex. | | **CLI e2e** | `src/cli/cmd-test-bench-e2e.test.ts`, `src/cli/cmd-cli-parity-e2e.test.ts` | Spawned CLI on `fixtures/minimal` (bench smoke + resource parity). | | **Apply CLI e2e** | `src/cli/cmd-apply.test.ts` | Temp project + full index: recipe dry-run/apply, `--rows`, second recipe disk apply. | @@ -102,6 +103,18 @@ Codemap development uses **only** the committed bench ([fixtures/README.md](../f `scripts/agent-eval/scenarios.json` — one probe per `CAPABILITIES.json` capability id (groups with `unitTests` or `enforcedBy` instead: `recipes.bundled`, `cli.bench-smoke`, `cli.mcp.http`). Each probe’s `goldenId` must appear in that group’s `goldenScenarios`; multiple substrate ids per group are fine — one matching probe is enough. Enforced by `scripts/agent-eval/capability-probes.test.mjs` in `test:scripts`. +### Harden-pr workflow probes + +Manual eval for [`.agents/skills/harden-pr/`](../.agents/skills/harden-pr/SKILL.md) — not automated agent runs in CI yet. + +| Artifact | Role | +| --------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- | +| [`fixtures/harden-probes/`](../fixtures/harden-probes/) | Injected production-bar gaps + `expected-findings.json` oracle | +| [`scripts/agent-eval/harden-scenarios.json`](../scripts/agent-eval/harden-scenarios.json) | Scenario index (probe dir, mode, acceptance script) | +| [`scripts/harden-probes/validate-fixtures.test.mjs`](../scripts/harden-probes/validate-fixtures.test.mjs) | Schema guard in `test:scripts` | + +Run a probe: open the probe dir in Cursor → `/harden-pr lite` → score findings with `score-probe.mjs` → `bash acceptance.sh`. Mechanical smoke: `bun run test:harden-probes`. Full protocol: [benchmark.md § Harden-pr workflow eval](./benchmark.md#harden-pr-workflow-eval). + --- ## Fixture corpus diff --git a/fixtures/harden-probes/README.md b/fixtures/harden-probes/README.md new file mode 100644 index 00000000..2ecd9c66 --- /dev/null +++ b/fixtures/harden-probes/README.md @@ -0,0 +1,11 @@ +# Harden-pr eval probes + +Manual workflow eval fixtures for [`.agents/skills/harden-pr/SKILL.md`](../../.agents/skills/harden-pr/SKILL.md). Not shipped in npm. + +Each probe is a small corpus with **injected production-bar gaps** and a golden finding oracle. Agents run `/harden-pr` against the probe; humans score recall, precision, and autonomy. + +| Probe | Injected gap | Production bar | +| --------------- | ------------------------ | -------------- | +| `missing-test/` | New export, no unit test | Tests | + +See [benchmark.md § Harden-pr workflow eval](../../docs/benchmark.md#harden-pr-workflow-eval). diff --git a/fixtures/harden-probes/missing-test/README.md b/fixtures/harden-probes/missing-test/README.md new file mode 100644 index 00000000..b6fe362f --- /dev/null +++ b/fixtures/harden-probes/missing-test/README.md @@ -0,0 +1,26 @@ +# Probe: missing-test + +**Intent anchor:** Add `formatWidget` — a pure string formatter. Do not change its signature or return shape. + +**Injected gap:** `src/formatWidget.ts` ships without a test file. + +## Run + +```bash +cd fixtures/harden-probes/missing-test +# In Cursor on this directory: attach harden-pr → `/harden-pr lite` or `/harden-pr full` +``` + +## Score + +| Metric | Pass criteria | +| -------- | ---------------------------------------------------------- | +| Recall | Detects missing test for `formatWidget` (see golden below) | +| Fix | Adds test; `bash acceptance.sh` exits 0 | +| Autonomy | No mid-loop commit/babysit prompts | +| Intent | `formatWidget` signature unchanged | + +## Oracle + +- [`expected-findings.json`](./expected-findings.json) — golden findings before fix +- [`acceptance.sh`](./acceptance.sh) — run after harden (tests exist + pass) diff --git a/fixtures/harden-probes/missing-test/acceptance.sh b/fixtures/harden-probes/missing-test/acceptance.sh new file mode 100755 index 00000000..df029843 --- /dev/null +++ b/fixtures/harden-probes/missing-test/acceptance.sh @@ -0,0 +1,18 @@ +#!/usr/bin/env bash +set -euo pipefail +ROOT="$(cd "$(dirname "$0")" && pwd)" +cd "$ROOT" + +if ! compgen -G "src/**/*.test.ts" > /dev/null && [[ ! -f src/formatWidget.test.ts ]]; then + echo "acceptance: missing test file for formatWidget" >&2 + exit 1 +fi + +if ! grep -rq "formatWidget" src/*.test.ts src/**/*.test.ts 2>/dev/null; then + echo "acceptance: no test references formatWidget" >&2 + exit 1 +fi + +bun test src/ + +echo "acceptance: ok" diff --git a/fixtures/harden-probes/missing-test/expected-findings.json b/fixtures/harden-probes/missing-test/expected-findings.json new file mode 100644 index 00000000..4a56ba34 --- /dev/null +++ b/fixtures/harden-probes/missing-test/expected-findings.json @@ -0,0 +1,12 @@ +[ + { + "finding": "New export formatWidget has no unit test covering empty and non-empty names", + "severity": "major", + "file": "src/formatWidget.ts", + "line": 2, + "confidence": "high", + "effort": "S", + "fixable_in_bounds": true, + "production_bar": "Tests" + } +] diff --git a/fixtures/harden-probes/missing-test/package.json b/fixtures/harden-probes/missing-test/package.json new file mode 100644 index 00000000..7fa54470 --- /dev/null +++ b/fixtures/harden-probes/missing-test/package.json @@ -0,0 +1,6 @@ +{ + "name": "codemap-harden-probe-missing-test", + "version": "0.0.0", + "private": true, + "description": "Harden-pr eval probe — intentional missing test." +} diff --git a/fixtures/harden-probes/missing-test/src/formatWidget.ts b/fixtures/harden-probes/missing-test/src/formatWidget.ts new file mode 100644 index 00000000..4eb538eb --- /dev/null +++ b/fixtures/harden-probes/missing-test/src/formatWidget.ts @@ -0,0 +1,8 @@ +/** Format a widget label for display. Probe intent: do not change signature or return shape. */ +export function formatWidget(name: string): string { + const trimmed = name.trim(); + if (trimmed.length === 0) { + return "Widget"; + } + return `Widget: ${trimmed}`; +} diff --git a/package.json b/package.json index d959b8e3..796197eb 100644 --- a/package.json +++ b/package.json @@ -70,6 +70,7 @@ "test:coverage": "bun test --coverage ./src", "test:golden": "bun scripts/query-golden.ts", "test:golden:external": "bun scripts/query-golden.ts --corpus external", + "test:harden-probes": "bash scripts/harden-probes/run-smoke.sh", "test:scripts": "bash -c 'files=$(find scripts -name \"*.test.mjs\"); if [ -z \"$files\" ]; then echo \"no scripts test files found\" >&2; exit 1; fi; exec bun test $files'", "typecheck": "tsgo --noEmit", "version": "changeset version && bun run format CHANGELOG.md" diff --git a/scripts/agent-eval/harden-scenarios.json b/scripts/agent-eval/harden-scenarios.json new file mode 100644 index 00000000..bbbc2efa --- /dev/null +++ b/scripts/agent-eval/harden-scenarios.json @@ -0,0 +1,15 @@ +{ + "version": 1, + "description": "Harden-pr workflow eval scenarios (manual agent runs; not CI-automated yet).", + "scenarios": [ + { + "id": "harden-missing-test", + "probeDir": "fixtures/harden-probes/missing-test", + "skill": "harden-pr", + "mode": "lite", + "goldenFindings": "fixtures/harden-probes/missing-test/expected-findings.json", + "acceptance": "fixtures/harden-probes/missing-test/acceptance.sh", + "intentAnchor": "Add formatWidget — pure string formatter; do not change signature or return shape." + } + ] +} diff --git a/scripts/harden-probes/run-smoke.sh b/scripts/harden-probes/run-smoke.sh new file mode 100755 index 00000000..6863ee6f --- /dev/null +++ b/scripts/harden-probes/run-smoke.sh @@ -0,0 +1,44 @@ +#!/usr/bin/env bash +# Run mechanical harden-probe checks (schema + pre-fix acceptance fails). +set -euo pipefail +ROOT="$(cd "$(dirname "$0")/../.." && pwd)" +cd "$ROOT" + +echo "== harden-probes: validate fixtures ==" +bun test scripts/harden-probes/validate-fixtures.test.mjs + +echo "" +echo "== harden-probes: pre-fix acceptance (expect fail) ==" +PROBE="$ROOT/fixtures/harden-probes/missing-test" +if git -C "$PROBE" diff --quiet src/formatWidget.test.ts 2>/dev/null || [[ ! -f "$PROBE/src/formatWidget.test.ts" ]]; then + # Ensure probe is in broken state (no test file) + rm -f "$PROBE/src/formatWidget.test.ts" +fi +if bash "$PROBE/acceptance.sh" 2>/dev/null; then + echo "FAIL: acceptance should fail before harden (missing test)" >&2 + exit 1 +fi +echo "ok: acceptance correctly fails pre-harden" + +echo "" +echo "== harden-probes: score sample findings ==" +SAMPLE="$ROOT/.agent-eval/harden-missing-test-findings.json" +mkdir -p "$(dirname "$SAMPLE")" +cat > "$SAMPLE" <<'EOF' +[ + { + "finding": "New export formatWidget has no unit test covering empty and non-empty names", + "severity": "major", + "file": "src/formatWidget.ts", + "line": 2, + "confidence": "high", + "effort": "S", + "fixable_in_bounds": true, + "production_bar": "Tests" + } +] +EOF +bun scripts/harden-probes/score-probe.mjs "$PROBE" "$SAMPLE" + +echo "" +echo "Manual: cd fixtures/harden-probes/missing-test && /harden-pr lite → bash acceptance.sh" diff --git a/scripts/harden-probes/score-probe.mjs b/scripts/harden-probes/score-probe.mjs new file mode 100755 index 00000000..216bf077 --- /dev/null +++ b/scripts/harden-probes/score-probe.mjs @@ -0,0 +1,57 @@ +#!/usr/bin/env bun +/** + * Score harden-pr probe findings against expected-findings.json oracle. + * + * Usage: + * bun scripts/harden-probes/score-probe.mjs + * bun scripts/harden-probes/score-probe.mjs fixtures/harden-probes/missing-test findings.json + * + * Recall: golden rows matched when same `file` + `production_bar` appear in actual. + */ +import { readFileSync } from "node:fs"; +import { join, resolve } from "node:path"; + +const [, , probeDirArg, findingsArg] = process.argv; +if (!probeDirArg || !findingsArg) { + console.error("Usage: score-probe.mjs "); + process.exit(2); +} + +const probeDir = resolve(probeDirArg); +const golden = JSON.parse( + readFileSync(join(probeDir, "expected-findings.json"), "utf8"), +); +const actual = JSON.parse(readFileSync(resolve(findingsArg), "utf8")); + +function key(row) { + return `${row.file}\0${row.production_bar}`; +} + +const goldenKeys = new Set(golden.map(key)); +const actualKeys = new Set(actual.map(key)); + +const matched = [...goldenKeys].filter((k) => actualKeys.has(k)); +const missed = [...goldenKeys].filter((k) => !actualKeys.has(k)); +const extra = [...actualKeys].filter((k) => !goldenKeys.has(k)); + +const recall = golden.length === 0 ? 1 : matched.length / golden.length; + +const report = { + probeDir, + goldenCount: golden.length, + actualCount: actual.length, + matched: matched.length, + recall, + missed: missed.map((k) => { + const [file, production_bar] = k.split("\0"); + return { file, production_bar }; + }), + extra: extra.map((k) => { + const [file, production_bar] = k.split("\0"); + return { file, production_bar }; + }), + pass: recall >= 1 && missed.length === 0, +}; + +console.log(JSON.stringify(report, null, 2)); +process.exit(report.pass ? 0 : 1); diff --git a/scripts/harden-probes/score-probe.test.mjs b/scripts/harden-probes/score-probe.test.mjs new file mode 100644 index 00000000..e6a864c7 --- /dev/null +++ b/scripts/harden-probes/score-probe.test.mjs @@ -0,0 +1,36 @@ +import { describe, expect, it } from "bun:test"; +import { spawnSync } from "node:child_process"; +import { mkdtempSync, readFileSync, writeFileSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; + +const ROOT = join(import.meta.dir, "../.."); +const SCORE = join(import.meta.dir, "score-probe.mjs"); +const PROBE = join(ROOT, "fixtures/harden-probes/missing-test"); + +describe("score-probe.mjs", () => { + it("passes when actual matches golden file+production_bar", () => { + const dir = mkdtempSync(join(tmpdir(), "harden-score-")); + const findings = join(dir, "findings.json"); + writeFileSync( + findings, + readFileSync(join(PROBE, "expected-findings.json")), + ); + const r = spawnSync("bun", [SCORE, PROBE, findings], { encoding: "utf8" }); + expect(r.status).toBe(0); + const report = JSON.parse(r.stdout); + expect(report.pass).toBe(true); + expect(report.recall).toBe(1); + }); + + it("fails when golden row missing from actual", () => { + const dir = mkdtempSync(join(tmpdir(), "harden-score-")); + const findings = join(dir, "findings.json"); + writeFileSync(findings, "[]"); + const r = spawnSync("bun", [SCORE, PROBE, findings], { encoding: "utf8" }); + expect(r.status).toBe(1); + const report = JSON.parse(r.stdout); + expect(report.pass).toBe(false); + expect(report.missed.length).toBeGreaterThan(0); + }); +}); diff --git a/scripts/harden-probes/validate-fixtures.test.mjs b/scripts/harden-probes/validate-fixtures.test.mjs new file mode 100644 index 00000000..9f87041f --- /dev/null +++ b/scripts/harden-probes/validate-fixtures.test.mjs @@ -0,0 +1,55 @@ +import { describe, expect, it } from "bun:test"; +import { existsSync, readFileSync, readdirSync } from "node:fs"; +import { join } from "node:path"; + +const ROOT = join(import.meta.dir, "../.."); +const PROBES_ROOT = join(ROOT, "fixtures/harden-probes"); +const SCENARIOS = join(ROOT, "scripts/agent-eval/harden-scenarios.json"); + +const FINDING_KEYS = [ + "finding", + "severity", + "file", + "line", + "confidence", + "effort", + "fixable_in_bounds", + "production_bar", +]; + +describe("harden-probes fixtures", () => { + it("harden-scenarios.json references valid probe dirs", () => { + const { scenarios } = JSON.parse(readFileSync(SCENARIOS, "utf8")); + for (const s of scenarios) { + const dir = join(ROOT, s.probeDir); + expect(existsSync(dir)).toBe(true); + expect(existsSync(join(dir, "expected-findings.json"))).toBe(true); + expect(existsSync(join(dir, "acceptance.sh"))).toBe(true); + } + }); + + it("each probe expected-findings.json matches schema", () => { + const probeDirs = readdirSync(PROBES_ROOT, { withFileTypes: true }) + .filter((d) => d.isDirectory() && d.name !== "node_modules") + .map((d) => join(PROBES_ROOT, d.name)); + + expect(probeDirs.length).toBeGreaterThan(0); + + for (const dir of probeDirs) { + const goldenPath = join(dir, "expected-findings.json"); + if (!existsSync(goldenPath)) continue; + const findings = JSON.parse(readFileSync(goldenPath, "utf8")); + expect(Array.isArray(findings)).toBe(true); + for (const row of findings) { + for (const key of FINDING_KEYS) { + expect(row).toHaveProperty(key); + } + expect(["blocker", "major", "minor", "nit", "info"]).toContain( + row.severity, + ); + expect(["high", "medium", "low"]).toContain(row.confidence); + expect(["S", "M", "L"]).toContain(row.effort); + } + } + }); +}); From 8c6943ce0eaafc1abcd28fe3f3502b32f489fe2b Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Thu, 11 Jun 2026 08:21:10 +0300 Subject: [PATCH 5/7] chore(agents): drop harden-pr eval probes after validation Keep harden-pr skill, LEDGER.md, and tracer-bullets wiring; remove fixtures/harden-probes, score-probe harness, and related docs. --- .agents/skills/harden-pr/SKILL.md | 2 - docs/benchmark.md | 42 -------------- docs/testing-coverage.md | 13 ----- fixtures/harden-probes/README.md | 11 ---- fixtures/harden-probes/missing-test/README.md | 26 --------- .../harden-probes/missing-test/acceptance.sh | 18 ------ .../missing-test/expected-findings.json | 12 ---- .../harden-probes/missing-test/package.json | 6 -- .../missing-test/src/formatWidget.ts | 8 --- package.json | 1 - scripts/agent-eval/harden-scenarios.json | 15 ----- scripts/harden-probes/run-smoke.sh | 44 -------------- scripts/harden-probes/score-probe.mjs | 57 ------------------- scripts/harden-probes/score-probe.test.mjs | 36 ------------ .../harden-probes/validate-fixtures.test.mjs | 55 ------------------ 15 files changed, 346 deletions(-) delete mode 100644 fixtures/harden-probes/README.md delete mode 100644 fixtures/harden-probes/missing-test/README.md delete mode 100755 fixtures/harden-probes/missing-test/acceptance.sh delete mode 100644 fixtures/harden-probes/missing-test/expected-findings.json delete mode 100644 fixtures/harden-probes/missing-test/package.json delete mode 100644 fixtures/harden-probes/missing-test/src/formatWidget.ts delete mode 100644 scripts/agent-eval/harden-scenarios.json delete mode 100755 scripts/harden-probes/run-smoke.sh delete mode 100755 scripts/harden-probes/score-probe.mjs delete mode 100644 scripts/harden-probes/score-probe.test.mjs delete mode 100644 scripts/harden-probes/validate-fixtures.test.mjs diff --git a/.agents/skills/harden-pr/SKILL.md b/.agents/skills/harden-pr/SKILL.md index da9b5d30..c53fe4a0 100644 --- a/.agents/skills/harden-pr/SKILL.md +++ b/.agents/skills/harden-pr/SKILL.md @@ -294,6 +294,4 @@ Skill invocation **is** the commit authorization. After the loop: if fixes exist | Branch done | `/harden-pr full` or "production-ready pass" | | Deferred backlog | `/harden-pr reconcile` | -**Eval probes:** [`fixtures/harden-probes/`](../../../fixtures/harden-probes/) — manual workflow scoring; schema enforced by `scripts/harden-probes/validate-fixtures.test.mjs`. - Replaces the old copy-paste: _"spawn subagents → fix → loop until clean"_ — this skill **is** that loop. diff --git a/docs/benchmark.md b/docs/benchmark.md index c8ad8414..2a8890a2 100644 --- a/docs/benchmark.md +++ b/docs/benchmark.md @@ -367,48 +367,6 @@ Environment overrides: `AGENT_EVAL_OUTPUT`, `AGENT_EVAL_FIXTURE_ROOT`, `AGENT_EV Numbers are stable for a given fixture + schema; re-run locally after intentional schema or probe changes. -### Harden-pr workflow eval - -Manual A/B harness for [`.agents/skills/harden-pr/`](../.agents/skills/harden-pr/SKILL.md) — complements codemap **query** probes above. Measures whether an agent **detects, fixes, and verifies** production-bar gaps on a branch, not SQL vs grep cost. - -| Artifact | Role | -| --------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- | -| [`fixtures/harden-probes/`](../fixtures/harden-probes/) | Mini corpora with injected gaps + `expected-findings.json` oracle | -| [`scripts/agent-eval/harden-scenarios.json`](../scripts/agent-eval/harden-scenarios.json) | Scenario index (probe dir, mode, acceptance script) | -| [`.agents/skills/harden-pr/LEDGER.md`](../.agents/skills/harden-pr/LEDGER.md) | Rejections (vet) + deferred backlog (reconcile) — one file | -| [`scripts/harden-probes/validate-fixtures.test.mjs`](../scripts/harden-probes/validate-fixtures.test.mjs) | Golden finding schema guard (`test:scripts`) | - -**Run one probe:** - -```bash -cd fixtures/harden-probes/missing-test -# Cursor: attach harden-pr → `/harden-pr lite` -bash acceptance.sh # after harden — must exit 0 -``` - -**Mechanical smoke (CI via `test:scripts`):** - -```bash -bun run test:harden-probes -# schema guard + pre-fix acceptance fails + score-probe oracle - -# Score agent findings JSON against golden: -bun scripts/harden-probes/score-probe.mjs fixtures/harden-probes/missing-test .agent-eval/my-findings.json -``` - -**Score each run** (spreadsheet or research note — not CI-gated yet): - -| Metric | Pass | -| -------- | ------------------------------------------------------------------------------ | -| Recall | Finds ≥1 golden row in `expected-findings.json` (same `file` + production bar) | -| Fix | `acceptance.sh` green; intent anchor unchanged | -| Autonomy | Zero mid-loop commit/babysit/next-pass prompts | -| Passes | ≤ mode cap (lite 2, full 3) | - -**A/B skill versions:** same probe dir, two sessions (before/after skill edit); compare recall, false fixes, passes, autonomy. - -Future: log parser for `Task` reviewer JSON (same spirit as `parse-agent-log.ts`) to automate recall scoring. - #### Dual-agent study (codemap self-index, provisional) Exploratory runs on the **codemap repo** index (not `fixtures/minimal`) — four structural tasks, MCP-on vs MCP-forbidden (grep/read/shell only). Not pinned in CI; methodology caveats apply. diff --git a/docs/testing-coverage.md b/docs/testing-coverage.md index cca1d77c..423c40b1 100644 --- a/docs/testing-coverage.md +++ b/docs/testing-coverage.md @@ -14,7 +14,6 @@ | **In-repo test bench** | `bun run test:golden` | Index `fixtures/minimal/` (bench corpus) → compare `fixtures/golden/minimal/*.json`. Map: `fixtures/CAPABILITIES.json`. | | **Golden guard** | `bun run test:scripts` | `scripts/query-golden-coverage-matrix.test.mjs` — every bundled recipe + substrate table has a scenario. | | **Agent eval** | `bun run test:agent-eval` | Probe arms vs golden ids (MCP-on vs glob/read). | -| **Harden-pr probes** | `bun run test:harden-probes` | Schema + pre-fix fail + score oracle; manual agent run + `acceptance.sh` after fix. | | **Integration (git)** | `src/application/run-index.test.ts` | `runCodemapIndex` incremental paths: heritage + calls re-resolution, delete/reindex. | | **CLI e2e** | `src/cli/cmd-test-bench-e2e.test.ts`, `src/cli/cmd-cli-parity-e2e.test.ts` | Spawned CLI on `fixtures/minimal` (bench smoke + resource parity). | | **Apply CLI e2e** | `src/cli/cmd-apply.test.ts` | Temp project + full index: recipe dry-run/apply, `--rows`, second recipe disk apply. | @@ -103,18 +102,6 @@ Codemap development uses **only** the committed bench ([fixtures/README.md](../f `scripts/agent-eval/scenarios.json` — one probe per `CAPABILITIES.json` capability id (groups with `unitTests` or `enforcedBy` instead: `recipes.bundled`, `cli.bench-smoke`, `cli.mcp.http`). Each probe’s `goldenId` must appear in that group’s `goldenScenarios`; multiple substrate ids per group are fine — one matching probe is enough. Enforced by `scripts/agent-eval/capability-probes.test.mjs` in `test:scripts`. -### Harden-pr workflow probes - -Manual eval for [`.agents/skills/harden-pr/`](../.agents/skills/harden-pr/SKILL.md) — not automated agent runs in CI yet. - -| Artifact | Role | -| --------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- | -| [`fixtures/harden-probes/`](../fixtures/harden-probes/) | Injected production-bar gaps + `expected-findings.json` oracle | -| [`scripts/agent-eval/harden-scenarios.json`](../scripts/agent-eval/harden-scenarios.json) | Scenario index (probe dir, mode, acceptance script) | -| [`scripts/harden-probes/validate-fixtures.test.mjs`](../scripts/harden-probes/validate-fixtures.test.mjs) | Schema guard in `test:scripts` | - -Run a probe: open the probe dir in Cursor → `/harden-pr lite` → score findings with `score-probe.mjs` → `bash acceptance.sh`. Mechanical smoke: `bun run test:harden-probes`. Full protocol: [benchmark.md § Harden-pr workflow eval](./benchmark.md#harden-pr-workflow-eval). - --- ## Fixture corpus diff --git a/fixtures/harden-probes/README.md b/fixtures/harden-probes/README.md deleted file mode 100644 index 2ecd9c66..00000000 --- a/fixtures/harden-probes/README.md +++ /dev/null @@ -1,11 +0,0 @@ -# Harden-pr eval probes - -Manual workflow eval fixtures for [`.agents/skills/harden-pr/SKILL.md`](../../.agents/skills/harden-pr/SKILL.md). Not shipped in npm. - -Each probe is a small corpus with **injected production-bar gaps** and a golden finding oracle. Agents run `/harden-pr` against the probe; humans score recall, precision, and autonomy. - -| Probe | Injected gap | Production bar | -| --------------- | ------------------------ | -------------- | -| `missing-test/` | New export, no unit test | Tests | - -See [benchmark.md § Harden-pr workflow eval](../../docs/benchmark.md#harden-pr-workflow-eval). diff --git a/fixtures/harden-probes/missing-test/README.md b/fixtures/harden-probes/missing-test/README.md deleted file mode 100644 index b6fe362f..00000000 --- a/fixtures/harden-probes/missing-test/README.md +++ /dev/null @@ -1,26 +0,0 @@ -# Probe: missing-test - -**Intent anchor:** Add `formatWidget` — a pure string formatter. Do not change its signature or return shape. - -**Injected gap:** `src/formatWidget.ts` ships without a test file. - -## Run - -```bash -cd fixtures/harden-probes/missing-test -# In Cursor on this directory: attach harden-pr → `/harden-pr lite` or `/harden-pr full` -``` - -## Score - -| Metric | Pass criteria | -| -------- | ---------------------------------------------------------- | -| Recall | Detects missing test for `formatWidget` (see golden below) | -| Fix | Adds test; `bash acceptance.sh` exits 0 | -| Autonomy | No mid-loop commit/babysit prompts | -| Intent | `formatWidget` signature unchanged | - -## Oracle - -- [`expected-findings.json`](./expected-findings.json) — golden findings before fix -- [`acceptance.sh`](./acceptance.sh) — run after harden (tests exist + pass) diff --git a/fixtures/harden-probes/missing-test/acceptance.sh b/fixtures/harden-probes/missing-test/acceptance.sh deleted file mode 100755 index df029843..00000000 --- a/fixtures/harden-probes/missing-test/acceptance.sh +++ /dev/null @@ -1,18 +0,0 @@ -#!/usr/bin/env bash -set -euo pipefail -ROOT="$(cd "$(dirname "$0")" && pwd)" -cd "$ROOT" - -if ! compgen -G "src/**/*.test.ts" > /dev/null && [[ ! -f src/formatWidget.test.ts ]]; then - echo "acceptance: missing test file for formatWidget" >&2 - exit 1 -fi - -if ! grep -rq "formatWidget" src/*.test.ts src/**/*.test.ts 2>/dev/null; then - echo "acceptance: no test references formatWidget" >&2 - exit 1 -fi - -bun test src/ - -echo "acceptance: ok" diff --git a/fixtures/harden-probes/missing-test/expected-findings.json b/fixtures/harden-probes/missing-test/expected-findings.json deleted file mode 100644 index 4a56ba34..00000000 --- a/fixtures/harden-probes/missing-test/expected-findings.json +++ /dev/null @@ -1,12 +0,0 @@ -[ - { - "finding": "New export formatWidget has no unit test covering empty and non-empty names", - "severity": "major", - "file": "src/formatWidget.ts", - "line": 2, - "confidence": "high", - "effort": "S", - "fixable_in_bounds": true, - "production_bar": "Tests" - } -] diff --git a/fixtures/harden-probes/missing-test/package.json b/fixtures/harden-probes/missing-test/package.json deleted file mode 100644 index 7fa54470..00000000 --- a/fixtures/harden-probes/missing-test/package.json +++ /dev/null @@ -1,6 +0,0 @@ -{ - "name": "codemap-harden-probe-missing-test", - "version": "0.0.0", - "private": true, - "description": "Harden-pr eval probe — intentional missing test." -} diff --git a/fixtures/harden-probes/missing-test/src/formatWidget.ts b/fixtures/harden-probes/missing-test/src/formatWidget.ts deleted file mode 100644 index 4eb538eb..00000000 --- a/fixtures/harden-probes/missing-test/src/formatWidget.ts +++ /dev/null @@ -1,8 +0,0 @@ -/** Format a widget label for display. Probe intent: do not change signature or return shape. */ -export function formatWidget(name: string): string { - const trimmed = name.trim(); - if (trimmed.length === 0) { - return "Widget"; - } - return `Widget: ${trimmed}`; -} diff --git a/package.json b/package.json index 796197eb..d959b8e3 100644 --- a/package.json +++ b/package.json @@ -70,7 +70,6 @@ "test:coverage": "bun test --coverage ./src", "test:golden": "bun scripts/query-golden.ts", "test:golden:external": "bun scripts/query-golden.ts --corpus external", - "test:harden-probes": "bash scripts/harden-probes/run-smoke.sh", "test:scripts": "bash -c 'files=$(find scripts -name \"*.test.mjs\"); if [ -z \"$files\" ]; then echo \"no scripts test files found\" >&2; exit 1; fi; exec bun test $files'", "typecheck": "tsgo --noEmit", "version": "changeset version && bun run format CHANGELOG.md" diff --git a/scripts/agent-eval/harden-scenarios.json b/scripts/agent-eval/harden-scenarios.json deleted file mode 100644 index bbbc2efa..00000000 --- a/scripts/agent-eval/harden-scenarios.json +++ /dev/null @@ -1,15 +0,0 @@ -{ - "version": 1, - "description": "Harden-pr workflow eval scenarios (manual agent runs; not CI-automated yet).", - "scenarios": [ - { - "id": "harden-missing-test", - "probeDir": "fixtures/harden-probes/missing-test", - "skill": "harden-pr", - "mode": "lite", - "goldenFindings": "fixtures/harden-probes/missing-test/expected-findings.json", - "acceptance": "fixtures/harden-probes/missing-test/acceptance.sh", - "intentAnchor": "Add formatWidget — pure string formatter; do not change signature or return shape." - } - ] -} diff --git a/scripts/harden-probes/run-smoke.sh b/scripts/harden-probes/run-smoke.sh deleted file mode 100755 index 6863ee6f..00000000 --- a/scripts/harden-probes/run-smoke.sh +++ /dev/null @@ -1,44 +0,0 @@ -#!/usr/bin/env bash -# Run mechanical harden-probe checks (schema + pre-fix acceptance fails). -set -euo pipefail -ROOT="$(cd "$(dirname "$0")/../.." && pwd)" -cd "$ROOT" - -echo "== harden-probes: validate fixtures ==" -bun test scripts/harden-probes/validate-fixtures.test.mjs - -echo "" -echo "== harden-probes: pre-fix acceptance (expect fail) ==" -PROBE="$ROOT/fixtures/harden-probes/missing-test" -if git -C "$PROBE" diff --quiet src/formatWidget.test.ts 2>/dev/null || [[ ! -f "$PROBE/src/formatWidget.test.ts" ]]; then - # Ensure probe is in broken state (no test file) - rm -f "$PROBE/src/formatWidget.test.ts" -fi -if bash "$PROBE/acceptance.sh" 2>/dev/null; then - echo "FAIL: acceptance should fail before harden (missing test)" >&2 - exit 1 -fi -echo "ok: acceptance correctly fails pre-harden" - -echo "" -echo "== harden-probes: score sample findings ==" -SAMPLE="$ROOT/.agent-eval/harden-missing-test-findings.json" -mkdir -p "$(dirname "$SAMPLE")" -cat > "$SAMPLE" <<'EOF' -[ - { - "finding": "New export formatWidget has no unit test covering empty and non-empty names", - "severity": "major", - "file": "src/formatWidget.ts", - "line": 2, - "confidence": "high", - "effort": "S", - "fixable_in_bounds": true, - "production_bar": "Tests" - } -] -EOF -bun scripts/harden-probes/score-probe.mjs "$PROBE" "$SAMPLE" - -echo "" -echo "Manual: cd fixtures/harden-probes/missing-test && /harden-pr lite → bash acceptance.sh" diff --git a/scripts/harden-probes/score-probe.mjs b/scripts/harden-probes/score-probe.mjs deleted file mode 100755 index 216bf077..00000000 --- a/scripts/harden-probes/score-probe.mjs +++ /dev/null @@ -1,57 +0,0 @@ -#!/usr/bin/env bun -/** - * Score harden-pr probe findings against expected-findings.json oracle. - * - * Usage: - * bun scripts/harden-probes/score-probe.mjs - * bun scripts/harden-probes/score-probe.mjs fixtures/harden-probes/missing-test findings.json - * - * Recall: golden rows matched when same `file` + `production_bar` appear in actual. - */ -import { readFileSync } from "node:fs"; -import { join, resolve } from "node:path"; - -const [, , probeDirArg, findingsArg] = process.argv; -if (!probeDirArg || !findingsArg) { - console.error("Usage: score-probe.mjs "); - process.exit(2); -} - -const probeDir = resolve(probeDirArg); -const golden = JSON.parse( - readFileSync(join(probeDir, "expected-findings.json"), "utf8"), -); -const actual = JSON.parse(readFileSync(resolve(findingsArg), "utf8")); - -function key(row) { - return `${row.file}\0${row.production_bar}`; -} - -const goldenKeys = new Set(golden.map(key)); -const actualKeys = new Set(actual.map(key)); - -const matched = [...goldenKeys].filter((k) => actualKeys.has(k)); -const missed = [...goldenKeys].filter((k) => !actualKeys.has(k)); -const extra = [...actualKeys].filter((k) => !goldenKeys.has(k)); - -const recall = golden.length === 0 ? 1 : matched.length / golden.length; - -const report = { - probeDir, - goldenCount: golden.length, - actualCount: actual.length, - matched: matched.length, - recall, - missed: missed.map((k) => { - const [file, production_bar] = k.split("\0"); - return { file, production_bar }; - }), - extra: extra.map((k) => { - const [file, production_bar] = k.split("\0"); - return { file, production_bar }; - }), - pass: recall >= 1 && missed.length === 0, -}; - -console.log(JSON.stringify(report, null, 2)); -process.exit(report.pass ? 0 : 1); diff --git a/scripts/harden-probes/score-probe.test.mjs b/scripts/harden-probes/score-probe.test.mjs deleted file mode 100644 index e6a864c7..00000000 --- a/scripts/harden-probes/score-probe.test.mjs +++ /dev/null @@ -1,36 +0,0 @@ -import { describe, expect, it } from "bun:test"; -import { spawnSync } from "node:child_process"; -import { mkdtempSync, readFileSync, writeFileSync } from "node:fs"; -import { tmpdir } from "node:os"; -import { join } from "node:path"; - -const ROOT = join(import.meta.dir, "../.."); -const SCORE = join(import.meta.dir, "score-probe.mjs"); -const PROBE = join(ROOT, "fixtures/harden-probes/missing-test"); - -describe("score-probe.mjs", () => { - it("passes when actual matches golden file+production_bar", () => { - const dir = mkdtempSync(join(tmpdir(), "harden-score-")); - const findings = join(dir, "findings.json"); - writeFileSync( - findings, - readFileSync(join(PROBE, "expected-findings.json")), - ); - const r = spawnSync("bun", [SCORE, PROBE, findings], { encoding: "utf8" }); - expect(r.status).toBe(0); - const report = JSON.parse(r.stdout); - expect(report.pass).toBe(true); - expect(report.recall).toBe(1); - }); - - it("fails when golden row missing from actual", () => { - const dir = mkdtempSync(join(tmpdir(), "harden-score-")); - const findings = join(dir, "findings.json"); - writeFileSync(findings, "[]"); - const r = spawnSync("bun", [SCORE, PROBE, findings], { encoding: "utf8" }); - expect(r.status).toBe(1); - const report = JSON.parse(r.stdout); - expect(report.pass).toBe(false); - expect(report.missed.length).toBeGreaterThan(0); - }); -}); diff --git a/scripts/harden-probes/validate-fixtures.test.mjs b/scripts/harden-probes/validate-fixtures.test.mjs deleted file mode 100644 index 9f87041f..00000000 --- a/scripts/harden-probes/validate-fixtures.test.mjs +++ /dev/null @@ -1,55 +0,0 @@ -import { describe, expect, it } from "bun:test"; -import { existsSync, readFileSync, readdirSync } from "node:fs"; -import { join } from "node:path"; - -const ROOT = join(import.meta.dir, "../.."); -const PROBES_ROOT = join(ROOT, "fixtures/harden-probes"); -const SCENARIOS = join(ROOT, "scripts/agent-eval/harden-scenarios.json"); - -const FINDING_KEYS = [ - "finding", - "severity", - "file", - "line", - "confidence", - "effort", - "fixable_in_bounds", - "production_bar", -]; - -describe("harden-probes fixtures", () => { - it("harden-scenarios.json references valid probe dirs", () => { - const { scenarios } = JSON.parse(readFileSync(SCENARIOS, "utf8")); - for (const s of scenarios) { - const dir = join(ROOT, s.probeDir); - expect(existsSync(dir)).toBe(true); - expect(existsSync(join(dir, "expected-findings.json"))).toBe(true); - expect(existsSync(join(dir, "acceptance.sh"))).toBe(true); - } - }); - - it("each probe expected-findings.json matches schema", () => { - const probeDirs = readdirSync(PROBES_ROOT, { withFileTypes: true }) - .filter((d) => d.isDirectory() && d.name !== "node_modules") - .map((d) => join(PROBES_ROOT, d.name)); - - expect(probeDirs.length).toBeGreaterThan(0); - - for (const dir of probeDirs) { - const goldenPath = join(dir, "expected-findings.json"); - if (!existsSync(goldenPath)) continue; - const findings = JSON.parse(readFileSync(goldenPath, "utf8")); - expect(Array.isArray(findings)).toBe(true); - for (const row of findings) { - for (const key of FINDING_KEYS) { - expect(row).toHaveProperty(key); - } - expect(["blocker", "major", "minor", "nit", "info"]).toContain( - row.severity, - ); - expect(["high", "medium", "low"]).toContain(row.confidence); - expect(["S", "M", "L"]).toContain(row.effort); - } - } - }); -}); From c83ec5b46b6deea7e2df95966a7c08fb26f6c076 Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Thu, 11 Jun 2026 08:30:05 +0300 Subject: [PATCH 6/7] fix(serve): normalize bracketed IPv6 bind host for Node listen Strip [::1] to ::1 in parseServeRest and runHttpServer; tighten test hygiene and formatted-query test header per review. --- src/application/http-server.ts | 9 +++++--- src/application/serve-bind-policy.test.ts | 9 ++++++++ src/application/serve-bind-policy.ts | 11 ++++++++++ src/cli/cmd-query-formatted.test.ts | 4 ++-- src/cli/cmd-serve.test.ts | 1 + src/cli/cmd-serve.ts | 7 +++++- src/cli/cmd-validate.test.ts | 26 +++++++++++++---------- 7 files changed, 50 insertions(+), 17 deletions(-) diff --git a/src/application/http-server.ts b/src/application/http-server.ts index 0311017e..7463df8b 100644 --- a/src/application/http-server.ts +++ b/src/application/http-server.ts @@ -24,6 +24,7 @@ import { listResources, readResource } from "./resource-handlers"; import { assertServeBindRequiresToken, isLoopbackHost, + normalizeServeBindHost, } from "./serve-bind-policy"; import { bindWatchClientRelease, @@ -123,7 +124,8 @@ export interface HttpServerOpts { * to JSON `{"error": "..."}` with appropriate status codes. */ export async function runHttpServer(opts: HttpServerOpts): Promise { - assertServeBindRequiresToken(opts.host, opts.token); + const bindHost = normalizeServeBindHost(opts.host); + assertServeBindRequiresToken(bindHost, opts.token); await bootstrapForServe(opts); let managedWatchSession: ManagedWatchSession | undefined; @@ -152,6 +154,7 @@ export async function runHttpServer(opts: HttpServerOpts): Promise { const serveOpts: HttpServerOpts = { ...opts, + host: bindHost, managedWatchSession, }; @@ -164,10 +167,10 @@ export async function runHttpServer(opts: HttpServerOpts): Promise { await new Promise((resolve, reject) => { server.once("error", reject); - server.listen(opts.port, opts.host, () => { + server.listen(opts.port, bindHost, () => { // eslint-disable-next-line no-console -- intentional bootstrap log on stderr console.error( - `codemap serve: listening on http://${opts.host}:${opts.port}` + + `codemap serve: listening on http://${bindHost}:${opts.port}` + (opts.token !== undefined ? " (auth: Bearer)" : "") + (opts.watch === true ? " (watch: on)" : ""), ); diff --git a/src/application/serve-bind-policy.test.ts b/src/application/serve-bind-policy.test.ts index fee5b4c5..875497c7 100644 --- a/src/application/serve-bind-policy.test.ts +++ b/src/application/serve-bind-policy.test.ts @@ -3,6 +3,7 @@ import { describe, expect, it } from "bun:test"; import { assertServeBindRequiresToken, isLoopbackHost, + normalizeServeBindHost, serveBindTokenRequiredMessage, } from "./serve-bind-policy"; @@ -21,6 +22,14 @@ describe("isLoopbackHost", () => { }); }); +describe("normalizeServeBindHost", () => { + it("strips brackets from IPv6 URL literals", () => { + expect(normalizeServeBindHost("[::1]")).toBe("::1"); + expect(normalizeServeBindHost("::1")).toBe("::1"); + expect(normalizeServeBindHost("127.0.0.1")).toBe("127.0.0.1"); + }); +}); + describe("serveBindTokenRequiredMessage", () => { it("requires token on non-loopback binds", () => { expect(serveBindTokenRequiredMessage("0.0.0.0", undefined)).toContain( diff --git a/src/application/serve-bind-policy.ts b/src/application/serve-bind-policy.ts index 047a5c39..0ff1d7cb 100644 --- a/src/application/serve-bind-policy.ts +++ b/src/application/serve-bind-policy.ts @@ -15,6 +15,17 @@ export function isLoopbackHost(host: string): boolean { return h === "localhost" || h === "::1" || h === "[::1]" || isIpv4Loopback(h); } +/** + * `server.listen` expects unbracketed IPv6 literals (`::1`, not `[::1]` — + * Node rejects the bracketed form with ENOTFOUND). + */ +export function normalizeServeBindHost(host: string): string { + if (host.startsWith("[") && host.endsWith("]")) { + return host.slice(1, -1); + } + return host; +} + /** Error message when a non-loopback bind lacks a token; `undefined` when OK. */ export function serveBindTokenRequiredMessage( host: string, diff --git a/src/cli/cmd-query-formatted.test.ts b/src/cli/cmd-query-formatted.test.ts index 43ae51b1..3364be9f 100644 --- a/src/cli/cmd-query-formatted.test.ts +++ b/src/cli/cmd-query-formatted.test.ts @@ -1,6 +1,6 @@ /** - * End-to-end coverage for formatted query output (`--format sarif|…`). - * Ensures DML is blocked via `queryRows` / PRAGMA query_only on the sarif path. + * End-to-end DML guard for formatted query output (`--format sarif|badge|…`). + * Asserts `queryRows` / PRAGMA query_only on every non-json/text format path. */ import { diff --git a/src/cli/cmd-serve.test.ts b/src/cli/cmd-serve.test.ts index c57f89a6..5c0685c1 100644 --- a/src/cli/cmd-serve.test.ts +++ b/src/cli/cmd-serve.test.ts @@ -97,6 +97,7 @@ describe("parseServeRest", () => { it("parses bracketed IPv6 loopback without token", () => { const r = parseServeRest(["serve", "--host", "[::1]"]); if (r.kind !== "run") throw new Error("expected run"); + expect(r.host).toBe("::1"); expect(r.token).toBeUndefined(); }); diff --git a/src/cli/cmd-serve.ts b/src/cli/cmd-serve.ts index de396df4..2f374193 100644 --- a/src/cli/cmd-serve.ts +++ b/src/cli/cmd-serve.ts @@ -1,5 +1,8 @@ import { runHttpServer } from "../application/http-server"; -import { serveBindTokenRequiredMessage } from "../application/serve-bind-policy"; +import { + normalizeServeBindHost, + serveBindTokenRequiredMessage, +} from "../application/serve-bind-policy"; import { applyWatchPolicy, envWatchDefaultOn, @@ -152,6 +155,8 @@ export function parseServeRest(rest: string[]): }; } + host = normalizeServeBindHost(host); + const bindMessage = serveBindTokenRequiredMessage(host, token); if (bindMessage !== undefined) { return { kind: "error", message: bindMessage }; diff --git a/src/cli/cmd-validate.test.ts b/src/cli/cmd-validate.test.ts index fa7fc936..7d549cdc 100644 --- a/src/cli/cmd-validate.test.ts +++ b/src/cli/cmd-validate.test.ts @@ -199,18 +199,22 @@ describe("computeValidateRows", () => { ); it("rejects absolute paths outside the project root", () => { - const outside = join(tmpdir(), "codemap-validate-abs-outside-"); - mkdirSync(outside, { recursive: true }); - const outsideFile = join(outside, "secret.ts"); - writeFileSync(outsideFile, "export const s = 1;\n"); - - const rows = withDb((db) => - computeValidateRows(db, tmpRoot, [outsideFile]), + const outside = mkdtempSync( + join(tmpdir(), "codemap-validate-abs-outside-"), ); - expect(rows).toHaveLength(1); - expect(rows[0]?.status).toBe("rejected"); - expect(rows[0]?.reason).toBe("path escapes project root"); - rmSync(outside, { recursive: true, force: true }); + try { + const outsideFile = join(outside, "secret.ts"); + writeFileSync(outsideFile, "export const s = 1;\n"); + + const rows = withDb((db) => + computeValidateRows(db, tmpRoot, [outsideFile]), + ); + expect(rows).toHaveLength(1); + expect(rows[0]?.status).toBe("rejected"); + expect(rows[0]?.reason).toBe("path escapes project root"); + } finally { + rmSync(outside, { recursive: true, force: true }); + } }); it("dedupes paths and sorts by path", () => { From 2e4cdfc482ce9701662b875058709c3f62a1c96b Mon Sep 17 00:00:00 2001 From: Sutu Sebastian Date: Thu, 11 Jun 2026 08:37:49 +0300 Subject: [PATCH 7/7] harden: read-surface PR parity, tests, and wave1 plan retirement Align validate reason docs across consumer surfaces, extend formatted-query DML coverage to diff formats, retire security-hardening-wave1 plan with orchestrator/roadmap updates. --- .changeset/read-surface-hardening.md | 2 +- docs/architecture.md | 2 +- docs/glossary.md | 2 +- docs/plans/security-hardening-orchestrator.md | 29 ++++--- docs/plans/security-hardening-wave1.md | 85 ------------------- docs/roadmap.md | 2 +- src/cli/cmd-mcp.ts | 2 +- src/cli/cmd-query-formatted.test.ts | 10 ++- src/cli/cmd-validate.test.ts | 10 +++ .../agent-content/skill/10-recipes-context.md | 2 +- 10 files changed, 40 insertions(+), 106 deletions(-) delete mode 100644 docs/plans/security-hardening-wave1.md diff --git a/.changeset/read-surface-hardening.md b/.changeset/read-surface-hardening.md index ab0e5b86..71a03971 100644 --- a/.changeset/read-surface-hardening.md +++ b/.changeset/read-surface-hardening.md @@ -2,4 +2,4 @@ "@stainless-code/codemap": patch --- -Harden read surfaces: `codemap query --format …` blocks index mutations via the same read-only guard as `--json`; `codemap serve` requires `--token` when `--host` is not loopback (any `127.0.0.0/8` address counts as loopback, so `--token` stays optional on `127.0.0.2` and similar); `codemap validate` (and MCP/HTTP `validate`) can return `rejected` rows with a `reason` when a path escapes the project root, resolves outside via symlink, or is a broken symlink — output `path` keys are always project-relative POSIX paths. +Harden read surfaces: `codemap query --format …` blocks index mutations via the same read-only guard as `--json`; `codemap serve` requires `--token` when `--host` is not loopback (any `127.0.0.0/8` address counts as loopback, so `--token` stays optional on `127.0.0.2` and similar); `codemap validate` (and MCP/HTTP `validate`) can return `rejected` rows with optional `reason` (`path escapes project root` | `path escapes via symlink` | `path resolves outside project root`) — output `path` keys are always project-relative POSIX paths. diff --git a/docs/architecture.md b/docs/architecture.md index c6bb73fe..3a3835a8 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -124,7 +124,7 @@ A local SQLite database (`.codemap/index.db`) indexes the project tree and store **Output formatters:** **`src/application/output-formatters.ts`** — pure transport-agnostic; **`formatSarif`** emits SARIF 2.1.0 (auto-detected location columns: `file_path` / `path` / `to_path` / `from_path` priority + optional `line_start` / `line_end` region; `rule.id = codemap.` for `--recipe`, `codemap.adhoc` for ad-hoc SQL; aggregate recipes without locations → `results: []` + stderr warning); **`formatAuditSarif`** emits the audit-shaped variant — one rule per delta key (`codemap.audit.-added`), one result per `added` row at severity `warning`; `removed` rows excluded (SARIF surfaces findings, not cleanups); location-only rows fall back to `"new : "` messages; **`formatAnnotations`** emits `::notice file=…,line=…::msg` GitHub Actions workflow commands (one line per locatable row; messages collapsed to a single line because the GH parser stops at the first newline); **`formatCodeClimate`** emits a GitLab Code Quality JSON array (`severity: minor` flat in v1; stable SHA-256 fingerprints from recipe id + path + line + check name + row message (`lines.begin` falls back to `1` when `line_start` absent)); **`formatBadge`** / **`formatBadgeJson`** emit a single-line markdown summary (`codemap: N issues` / `codemap: clean`) or `codemap-badge/v1` JSON (`--badge-style json` / MCP `badge_style`) from locatable-row count — agents triage via JSON rows, not badge severity; **`formatMermaid`** emits a `flowchart LR` from `{from, to, label?, kind?}` rows with a hard `MERMAID_MAX_EDGES = 50` ceiling — unbounded inputs reject with a scope-suggestion error naming the recipe + count + `LIMIT` / `--via` / `WHERE` knobs (auto-truncation deliberately out of scope; would be a verdict masquerading as output mode); **`formatDiff`** emits read-only unified diff text from `{file_path, line_start, before_pattern, after_pattern}` rows; **`formatDiffJson`** emits structured `{files, warnings, summary}` hunks for agents. Diff formatters read source files at format time and surface `stale` / `missing` flags when the indexed line no longer matches. Wired into both **`src/cli/cmd-query.ts`** (`--format `; `--format` overrides `--json`; formatted outputs reject `--summary` / `--group-by` / baseline at parse time) and the MCP **`query`** / **`query_recipe`** tools (`format: "sarif" | "annotations" | "mermaid" | "diff" | "diff-json" | "codeclimate" | "badge"` with the same incompatibility guard). Per-recipe `sarifLevel` / `sarifMessage` / `sarifRuleId` overrides via frontmatter on `.md` deferred to v1.x. -**Validate wiring:** **`src/cli/cmd-validate.ts`** (argv + render) + **`src/application/validate-engine.ts`** (engine — **`computeValidateRows`** + **`toProjectRelative`**). `computeValidateRows` is a pure function over `(db, projectRoot, paths)` returning `{path, status}` rows where `status ∈ stale | missing | unindexed | rejected` (`rejected` + optional `reason` when a path escapes the project root, resolves outside via symlink, or has a broken symlink — `readUtf8WithinProjectRoot` re-checks via `realpath` immediately before read; hardlinks to outside files keep an in-root pathname and are a documented local-trust boundary). Path keys are always project-relative POSIX paths (`toProjectRelative`). CLI wraps it with read-once-and-print + exits **1** on any drift (git-status semantics). Path normalization: **`toProjectRelative`** converts CLI input to POSIX-style relative keys matching the `files.path` storage format (Windows backslash → forward slash); same convention as `lint-staged.config.js`. Also reused by `cmd-show.ts` / `cmd-snippet.ts` and the MCP show/snippet handlers — single canonical implementation. +**Validate wiring:** **`src/cli/cmd-validate.ts`** (argv + render) + **`src/application/validate-engine.ts`** (engine — **`computeValidateRows`** + **`toProjectRelative`**). `computeValidateRows` is a pure function over `(db, projectRoot, paths)` returning `{path, status}` rows where `status ∈ stale | missing | unindexed | rejected` (`rejected` + optional `reason`: `path escapes project root` | `path escapes via symlink` | `path resolves outside project root` — `readUtf8WithinProjectRoot` re-checks via `realpath` immediately before read; hardlinks to outside files keep an in-root pathname and are a documented local-trust boundary). Path keys are always project-relative POSIX paths (`toProjectRelative`). CLI wraps it with read-once-and-print + exits **1** on any drift (git-status semantics). Path normalization: **`toProjectRelative`** converts CLI input to POSIX-style relative keys matching the `files.path` storage format (Windows backslash → forward slash); same convention as `lint-staged.config.js`. Also reused by `cmd-show.ts` / `cmd-snippet.ts` and the MCP show/snippet handlers — single canonical implementation. #### Audit wiring diff --git a/docs/glossary.md b/docs/glossary.md index 99ac059a..e0c96f6b 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -125,7 +125,7 @@ CI-aggregate flag on `codemap query` and `codemap audit`. Aliases `--format sari ### `codemap validate` -CLI subcommand comparing on-disk SHA-256 against `files.content_hash`. Statuses: `stale | missing | unindexed | rejected` (`rejected` carries optional `reason` when a path escapes the project root, resolves outside via symlink, or has a broken symlink; output `path` keys are always project-relative POSIX paths). Exits `1` on any drift. +CLI subcommand comparing on-disk SHA-256 against `files.content_hash`. Statuses: `stale | missing | unindexed | rejected` (`rejected` carries optional `reason`: `path escapes project root` | `path escapes via symlink` | `path resolves outside project root`; output `path` keys are always project-relative POSIX paths). Exits `1` on any drift. ### `module_cycles` (table) / circular imports diff --git a/docs/plans/security-hardening-orchestrator.md b/docs/plans/security-hardening-orchestrator.md index b9d021a6..5b0937cc 100644 --- a/docs/plans/security-hardening-orchestrator.md +++ b/docs/plans/security-hardening-orchestrator.md @@ -21,11 +21,11 @@ ## PR schedule -| PR | Plan | Status | Blocks | -| ----- | -------------------------------------------------------------- | ------------------------- | ----------------------------------- | -| **1** | [`security-hardening-wave1.md`](./security-hardening-wave1.md) | **committed, PR pending** | — | -| **2** | [`impact-inpath-homonyms.md`](./impact-inpath-homonyms.md) | **pending** | PR **1** merged | -| **3** | [`runtime-test-isolation.md`](./runtime-test-isolation.md) | **pending** | PR **1** merged (PR **2** optional) | +| PR | Plan | Status | Blocks | +| ----- | --------------------------------------------------------------- | ---------------------------------------------------------------------- | ----------------------------------- | +| **1** | lifted → [`architecture.md`](../architecture.md) (plan retired) | **PR [#180](https://github.com/stainless-code/codemap/pull/180) open** | — | +| **2** | [`impact-inpath-homonyms.md`](./impact-inpath-homonyms.md) | **pending** | PR **1** merged | +| **3** | [`runtime-test-isolation.md`](./runtime-test-isolation.md) | **pending** | PR **1** merged (PR **2** optional) | | — | — | **deferred** | golden `schema.test.ts` + path guards | | — | — | **skip** | atomic `ensureStateConfig` writes | @@ -64,15 +64,16 @@ Evaluated 2026-06 against [roadmap § Floors](../roadmap.md#floors-v1-product-sh ## Session log -| Date | Event | Notes | -| ---------- | ---------- | ---------------------------------------------------------------------------- | -| 2026-06-10 | Triage | ROI on 7 slices; 3-PR program adopted. | -| 2026-06-10 | PR 1 impl | PR **1** committed on `fix/security-hardening-wave1`; harden pass in flight. | -| — | PR 1 merge | _PR URL · merge SHA · update plan status → closed_ | -| — | PR 2 start | _from `main`_ | -| — | PR 2 merge | _fill_ | -| — | PR 3 start | _from `main`_ | -| — | PR 3 merge | _fill · close orchestrator_ | +| Date | Event | Notes | +| ---------- | ----------- | ---------------------------------------------------------------------------- | +| 2026-06-10 | Triage | ROI on 7 slices; 3-PR program adopted. | +| 2026-06-10 | PR 1 impl | PR **1** committed on `fix/security-hardening-wave1`; harden pass in flight. | +| 2026-06-05 | PR 1 harden | `/harden-pr full` — plan retired; contracts in architecture/glossary. | +| — | PR 1 merge | _merge SHA · update status → merged_ | +| — | PR 2 start | _from `main`_ | +| — | PR 2 merge | _fill_ | +| — | PR 3 start | _from `main`_ | +| — | PR 3 merge | _fill · close orchestrator_ | --- diff --git a/docs/plans/security-hardening-wave1.md b/docs/plans/security-hardening-wave1.md deleted file mode 100644 index 3ec31495..00000000 --- a/docs/plans/security-hardening-wave1.md +++ /dev/null @@ -1,85 +0,0 @@ -# PR 1 — query / serve / validate hardening - -> **Status:** open (committed, PR pending) · **PR:** 1 of 3 · **Effort:** S -> -> **Orchestrator:** [`security-hardening-orchestrator.md`](./security-hardening-orchestrator.md) -> -> **Motivator:** Close last CLI `query_only` gap on formatted query output; require auth on non-loopback `serve`; reject unsafe paths in `validate`. Roadmap safety floors — not new product verbs. - ---- - -## Agent start here - -### Key touchpoints - -| File | What | -| ------------------------------------- | ----------------------------------- | -| `src/cli/cmd-query.ts` | `printFormattedQuery` → `queryRows` | -| `src/cli/cmd-query-formatted.test.ts` | SARIF DML guard E2E | -| `src/cli/cmd-serve.ts` | `isLoopbackHost`, token gate | -| `src/application/path-containment.ts` | `pathTraversesSymlinkOutsideRoot` | -| `src/application/validate-engine.ts` | `rejected` status | -| `docs/architecture.md` | Query / HTTP / validate wiring | - -### Architecture - -```text -codemap query --format sarif|… - → printFormattedQuery → queryRows (PRAGMA query_only = 1) - -codemap serve --host 0.0.0.0 - → parseServeRest: require --token when !isLoopbackHost(host) - -codemap validate - → computeValidateRows → rejectUnsafeProjectRelativePath / readUtf8WithinProjectRoot - → pathEscapesProjectRoot | pathTraversesSymlinkOutsideRoot | realpath → status rejected -``` - ---- - -## Task list - -| ID | Task | Status | Verify | -| --- | ---------------------------------------------------------- | ----------- | --------------------------------------------------- | -| 1.1 | `printFormattedQuery` → `queryRows` | **done** | `bun test src/cli/cmd-query-formatted.test.ts` | -| 1.2 | `cmd-query-formatted.test.ts` | **done** | same | -| 2.1 | `isLoopbackHost` + non-loopback `--token` required | **done** | `bun test src/cli/cmd-serve.test.ts` | -| 2.2 | Serve tests updated | **done** | same | -| 3.1 | `pathTraversesSymlinkOutsideRoot`, `resolvePathWithinRoot` | **done** | `bun test src/application/path-containment.test.ts` | -| 3.2 | `validate` `rejected` + safe read containment | **done** | `bun test src/cli/cmd-validate.test.ts` | -| 3.3 | Test `../../../etc/passwd` → `rejected` | **done** | same | -| 1.x | `docs/architecture.md` lift | **done** | `bun run format:check` | -| 1.s | Commit + PR + CI | **pending** | `bun run check` (committed; PR + CI open) | - ---- - -## Pre-locked decisions - -| # | Decision | -| ---- | ------------------------------------------------------------------------- | -| P1.1 | Formatted CLI paths use `queryRows` — completes CHANGELOG #107 hardening. | -| P1.2 | Loopback bind: token optional; non-loopback: token mandatory. | -| P1.3 | `rejected` is per-row status with `reason` — not a global verdict. | - ---- - -## Acceptance - -- [x] `codemap query --format sarif "DELETE FROM …"` errors without mutating `files` count -- [x] `codemap serve --host 0.0.0.0` errors without `--token` -- [x] `computeValidateRows(..., ["../../../etc/passwd"])` → `rejected` + reason -- [x] Architecture docs updated -- [ ] PR merged to `main` - -### Verify - -```bash -bun test src/cli/cmd-query-formatted.test.ts src/cli/cmd-serve.test.ts src/cli/cmd-validate.test.ts src/application/path-containment.test.ts -bun run check -``` - ---- - -## Lifecycle - -**Close when:** PR merged. Delete this file; lift contracts to `docs/architecture.md`; update [`security-hardening-orchestrator.md`](./security-hardening-orchestrator.md) session log. diff --git a/docs/roadmap.md b/docs/roadmap.md index ca0f814b..dbf418a1 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -108,7 +108,7 @@ Predicate-as-API only — enrich row shape and audit deltas; no standalone pass/ - [ ] **`organize-imports` diff-shape recipe** — deterministic single-file import sort/group; `imports.line_number` + `source` substrate sufficient. Review-first (`auto_fixable: false`). Effort: S. - [ ] **`codemap-to-tsmorph` Path B adapter** — separate package experiment: `query_recipe` discovery → `ts-morph` / `jscodeshift` transforms for AST-shape edits codemap's substring executor defers (see [architecture § Rejected apply-path alternatives](./architecture.md#apply--input-modes-transport-and-policy)). Not an in-tree AST writer (Path A rejected). Effort: M. - [ ] **Apply write-safety hardening** — close apply TOCTOU: SHA-256 `hashContent` at phase-1 read, recheck disk hash immediately before phase-2 write (`file content changed` conflict); `fsync` temp file before `rename`; skip files with mixed CRLF/LF (`mixed line endings`). Preserves all-or-nothing on any conflict. Plan: [`plans/apply-write-safety.md`](./plans/apply-write-safety.md). Effort: L. -- [ ] **Read-surface hardening (3 PRs)** — query/HTTP/validate safety, `impact` `inPath` homonyms, runtime guards + test teardown. **Orchestrator:** [`plans/security-hardening-orchestrator.md`](./plans/security-hardening-orchestrator.md). Plans: [PR1](./plans/security-hardening-wave1.md) · [PR2](./plans/impact-inpath-homonyms.md) · [PR3](./plans/runtime-test-isolation.md). Effort: S–M. +- [ ] **Read-surface hardening (3 PRs)** — query/HTTP/validate safety, `impact` `inPath` homonyms, runtime guards + test teardown. **Orchestrator:** [`plans/security-hardening-orchestrator.md`](./plans/security-hardening-orchestrator.md). PR1 ([#180](https://github.com/stainless-code/codemap/pull/180), lifted to [architecture](./architecture.md)) · Plans: [PR2](./plans/impact-inpath-homonyms.md) · [PR3](./plans/runtime-test-isolation.md). Effort: S–M. - [ ] **`history` table** (deferred — revisit-triggered) — temporal queries: "when did symbol X get `@deprecated`?", "coverage trend over last 50 commits", "files that became dead this week". `audit --base ` covers the most-common temporal question (PR-scoped diff) without schema growth, so the table earns its place only when bigger questions emerge. Two shapes (per-commit snapshots ~N × DB size; append-only event log heavier CTE walks); both pay an N-reindexes backfill cost (~30s per reindex). **Revisit triggers:** two consumers ship `jq`-based "audit-runs-over-time" workflows, OR `query_baselines` evolution becomes a recurring agent need. - [ ] **`codemap audit` verdict + thresholds** (v1.x) — `verdict: "pass" | "warn" | "fail"` driven by an `audit.deltas[].{added_max, action}` field on the config object (`.codemap/config.{ts,js,json}`). Triggers: two consumers ship `jq`-based threshold scripts with similar shapes, OR one consumer asks with a concrete config sketch. Until then, raw deltas + consumer-side `jq` is the CI exit-code idiom. **Likely accelerant:** the Marketplace Action (next item) shipping is the most plausible path to firing the trigger — once `- uses: stainless-code/codemap@v1` is the dominant CI path, real `jq` threshold scripts will surface. - [ ] **GitHub Marketplace Action — publish + listing finish** — core Action implementation is in-tree: root `action.yml`, `query --ci`, `audit --format sarif` / `--ci`, package-manager detection, dogfood smoke, and opt-in `pr-comment` summary renderer have shipped. Remaining work is the release/listing slice: `MARKETPLACE.md`, `v1.0.0` / floating `v1` tags, Marketplace setup, sacrificial-repo smoke, and making `action-smoke` blocking once the Action tag exists. Action version stream is independent of CLI version (`package.json` currently drives CLI/npm version; Action publishes at its own `v1.0.0`). Plan: [`plans/github-marketplace-action.md`](./plans/github-marketplace-action.md). Effort: S. diff --git a/src/cli/cmd-mcp.ts b/src/cli/cmd-mcp.ts index 47191b35..32717441 100644 --- a/src/cli/cmd-mcp.ts +++ b/src/cli/cmd-mcp.ts @@ -99,7 +99,7 @@ Tools (21; snake_case — mirrors CLI verbs where a shell twin exists): ingest_coverage Load Istanbul/LCOV/V8 coverage into the index. ingest_churn Load precomputed file_churn JSON into the index. context Project bootstrap envelope. - validate On-disk hash vs indexed hash. + validate Hash drift rows (stale/missing/unindexed/rejected + reason). show Symbol metadata: file:line + signature. snippet Same lookup + source text from disk. impact Symbol/file blast-radius walker (callers, callees, diff --git a/src/cli/cmd-query-formatted.test.ts b/src/cli/cmd-query-formatted.test.ts index 3364be9f..ee4729fe 100644 --- a/src/cli/cmd-query-formatted.test.ts +++ b/src/cli/cmd-query-formatted.test.ts @@ -90,7 +90,15 @@ async function fileCount(): Promise { } describe("runQueryCmd — formatted output DML guard", () => { - it.each(["sarif", "badge", "mermaid", "annotations", "codeclimate"] as const)( + it.each([ + "sarif", + "badge", + "mermaid", + "annotations", + "codeclimate", + "diff", + "diff-json", + ] as const)( "rejects DELETE via --format %s without mutating the index", async (format) => { const before = await fileCount(); diff --git a/src/cli/cmd-validate.test.ts b/src/cli/cmd-validate.test.ts index 7d549cdc..f9275ff2 100644 --- a/src/cli/cmd-validate.test.ts +++ b/src/cli/cmd-validate.test.ts @@ -198,6 +198,16 @@ describe("computeValidateRows", () => { }, ); + it("normalizes absolute paths inside the project root", () => { + const old = "export const a = 1\n"; + const abs = join(tmpRoot, "src/a.ts"); + writeFileSync(abs, "export const a = 2\n"); + seedIndex([{ path: "src/a.ts", content_hash: hashContent(old) }]); + + const rows = withDb((db) => computeValidateRows(db, tmpRoot, [abs])); + expect(rows).toEqual([{ path: "src/a.ts", status: "stale" }]); + }); + it("rejects absolute paths outside the project root", () => { const outside = mkdtempSync( join(tmpdir(), "codemap-validate-abs-outside-"), diff --git a/templates/agent-content/skill/10-recipes-context.md b/templates/agent-content/skill/10-recipes-context.md index da71edde..91e67293 100644 --- a/templates/agent-content/skill/10-recipes-context.md +++ b/templates/agent-content/skill/10-recipes-context.md @@ -47,7 +47,7 @@ Each emitted delta carries its own `base` metadata so mixed-baseline audits are - **`list_baselines`** — no args; returns the array `codemap query --baselines --json` would print. - **`drop_baseline`** — `{name}` → `{dropped}` on success; structured `{error}` on unknown name (MCP sets `isError: true`). - **`context`** — `{compact?, intent?, include_snippets?}`. CLI: `codemap context [--include-snippets]`. Session-start project envelope with `start_here` shortcuts (one call replaces 4-5 `query`s). `index_summary.file_churn` row count; **`churn_hint`** when empty (steers to index, **`ingest-churn`**, or **`churn.file`**). `include_snippets` adds one-line export previews on hub leaders (capped to adaptive `signature_max_chars`; may set `stale`/`missing`); no-op when `compact: true`. Whitespace-only `intent` is treated as no intent. Prefer `start_here.hub_leaders` over legacy `hubs` for signatures — `hubs` keeps the full bundled `fan-in` recipe limit for backward compatibility. `sample_markers` count scales down on repos >500 / >5000 files. -- **`validate`** — `{paths?: string[]}`. SHA-256 vs `files.content_hash`; returns only out-of-sync rows (`stale` / `missing` / `unindexed` / `rejected` — fresh paths are omitted; `rejected` includes optional `reason` when a path escapes the project root, resolves outside via symlink, or has a broken symlink). Output `path` keys are project-relative POSIX paths. +- **`validate`** — `{paths?: string[]}`. SHA-256 vs `files.content_hash`; returns only out-of-sync rows (`stale` / `missing` / `unindexed` / `rejected` — fresh paths are omitted; `rejected` includes optional `reason`: `path escapes project root` | `path escapes via symlink` | `path resolves outside project root`). Output `path` keys are project-relative POSIX paths. - **`show`** — `{name, kind?, in?}` or `{query, with_fts?}`. Exact symbol lookup or field-qualified search (`kind:`, `name:`, `path:`, `in:` + free text) → `{matches, disambiguation?, warning?}`. CLI: `codemap show --query '…' [--print-sql]`. - **`snippet`** — same as `show` (`{name, kind?, in?}` or `{query, with_fts?}`) but each match also carries `source` (file text) + `stale` / `missing` flags → `{matches, disambiguation?, warning?}`. No reindex side-effects. - **`impact`** — `{target, direction?, via?, depth?, limit?, summary?}`. Symbol/file blast-radius walker (replaces hand-composed `WITH RECURSIVE`). Auto-resolves symbol vs file target; `via` defaults to every backend compatible with the kind.