Three-stage scan pipeline + schema simplification by nahiyankhan · Pull Request #68 · block/ghost

nahiyankhan · 2026-04-30T04:03:23Z

Summary

Three-stage scan pipeline: split a target's scan into map.md (topology) → bucket.json (objective values) → expression.md (subjective interpretation), all owned by ghost-expression. ghost-map is folded back in; the four-tool topology is now ghost-expression / ghost-drift / ghost-fleet / ghost-ui.
Schema cleanup: dropped bucket.libraries[] (icon/primitive choices aren't part of the design language; load-bearing library calls now surface as prose evidence) and dropped expression roles[] (slot bindings either fall out of decisions or live in bucket.components[] — neither pattern needs a parallel structure that doesn't scale).
Recipes & dogfood: tightened the survey/profile recipes for exhaustiveness, dogfooded against ghost-ui, fixed an oklch-backfill bug in ghost-drift compare, aligned READMEs and the docs site with the new topology, shipped the ghost-ui expression-fidelity test bundles (arcade + market).

Breaking changes

ghost-expression is bumped major in this PR's changeset because roles[] is no longer a valid frontmatter field — .strict() rejects it. Existing expression.md files carrying roles: must drop the section to migrate. All in-repo fixtures have been updated.
The CLI/library surface for ghost-map is gone — its verbs (inventory, the topology recipe) live under ghost-expression now.
bucket.json no longer accepts a libraries[] section.

Test plan

pnpm build clean
pnpm check clean (biome + typecheck + file-size + cli-manifest)
pnpm test — 291 tests across 30 files, all passing locally
Lint each in-repo expression.md (ghost-ui, fleet members, arcade/market test bundles) against the new schema
Re-run the scan pipeline against ghost-ui end-to-end (map → survey → profile) and confirm it produces a clean expression.md
Verify ghost-drift compare self-distance is 0 on every fixture

🤖 Generated with Claude Code

…st-map Land the bucket pipeline and consolidate to a four-tool topology. A scan now runs in three stages — topology (map.md) → objective (bucket.json) → subjective (expression.md) — all owned by ghost-expression. ghost.bucket/v1: new artifact catalogues every concrete design value with structured specs, occurrence counts, and deterministic content-hashed IDs. Schema, lint, merge, and fix-ids primitives live in @ghost/core. New ghost-expression verbs: - inventory [path] — raw repo signals JSON (migrated from ghost-map). - lint [file] — auto-detects expression.md / map.md / bucket.json. - bucket merge — deterministic concat with id-based dedup, idempotent. - bucket fix-ids — recompute row IDs from content; lets surveyor agents author rows with empty id fields and finalize in one pass. - scan-status [dir] — report per-stage state and recommended next stage. New skill recipes (ghost-expression bundle): - map.md — topology stage (migrated from ghost-map skill recipe). - survey.md — objective stage. LLM-driven extraction with dialect-specific grep strategies, exhaustiveness discipline, saturation predicate. - scan.md — meta-recipe orchestrating topology → survey → profile. Refactored: profile.md is now strictly the subjective interpretation stage (reads bucket.json as ground truth; cannot fabricate values). ghost-map package deleted. ghost.map/v1 schema/types moved to @ghost/core; inventory + lint moved to ghost-expression. ghost-fleet now imports map types from @ghost/core. CLAUDE.md, README.md, and docs IA updated to reflect the four-tool topology. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…veness Self-distance reported 17.5% on freshly authored expressions because loadExpression never backfilled `oklch` on hex-only palette colors. comparePalette then treated every color as fully unmatched and contributed distance 1.0 per color even when comparing an expression to itself. Two-layer fix: - loadExpression now backfills oklch deterministically (parseColorToOklch is pure: same hex → same oklch), so re-parsing produces identical in-memory shapes. - comparePalette resolves oklch on-the-fly when missing and falls back to hex-string equality for non-parseable values (CSS vars, opaque refs). Defensive against third-party producers that emit hex-only. Survey recipe tightened with a load-bearing exhaustiveness rule. Triggered by a dogfood scan that produced ~10% recall on components (6 rows for a 97-component package). The rule is repo-agnostic: the agent identifies the canonical signal in this repo, enumerates exhaustively, and cross-checks counts. Recipe explicitly forbids sampling and placeholder/glob library names. Coverage check (step 8) is now a real gate — exhaustiveness must match independent counts before declaring saturation. Pinned attempt-1 artifacts under dogfood/ghost-ui/attempt-1/ with a structured NOTES.md documenting the failure modes. Future attempts go alongside so we can track improvement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Recall delta vs attempt 1: - values: 22 → 190 (10x) - tokens: 11 → 238 (22x, ~99% of 240 declared in main.css) - components: 6 → 97 (16x, 100% of registry:ui items) - libraries: 6 → 42 (7x, every design-surface dep enumerated) Decision count: 7 → 11. New decisions added — chart-strategy, surface-hierarchy, font-sourcing, density, interactive-patterns — and renames adjusted toward pattern-naming (token-architecture → theming-architecture, shadow-hierarchy → elevation, with the count corrected from 4 tiers to 7). Self-distance: 0.0% (was 17.5%). Confirms the oklch backfill fix. Agent followed the recipe by writing build-bucket.mjs (pinned alongside artifacts) — walks main.css for tokens, registry.json for components, package.json for libraries. Reproducible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

External libraries (icon sets, primitive collections, motion libs, charting) no longer have a top-level bucket section. Whether a system uses Radix or hand-rolls primitives doesn't change what its design language *is*; what matters surfaces elsewhere — font families show up as typography tokens, and load-bearing library choices (icon family, font sourcing) belong in interpreter prose under the relevant decision dimension. Bucket sections are now: values, tokens, components. Removed from @ghost/core exports: LibraryRow, LibraryRowSchema, libraryRowId. Lint, merge, and fix-ids no longer touch a libraries section. Skill recipes (survey.md, profile.md) updated — survey.md no longer instructs the agent to enumerate libraries; profile.md tells the agent that load-bearing library choices land in prose, not as structured rows. zod schema stays non-strict, so historical buckets that still carry a libraries field continue to lint clean (the field is simply ignored). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…pology Six documentation files were carrying stale framing from the five-tool era or hadn't been updated for the bucket pipeline / libraries-cut. - packages/ghost-expression/README.md: full rewrite. Was framed around "four verbs" (lint/describe/diff/emit) with no mention of the scan pipeline. Now leads with the three-stage table (topology → objective → subjective), enumerates all seven verbs (inventory, scan-status, lint, describe, diff, bucket, emit), and points at all four skill recipes (scan, map, survey, profile). - packages/ghost-drift/README.md: dropped "five-tool decomposition" and the ghost-map reference. The "Authoring expression.md?" sidebar now surfaces inventory + scan-status + bucket merge alongside the existing lint/describe/diff/emit verbs. - CLAUDE.md / AGENTS.md (symlinked): bucket description no longer mentions "and libraries", scan-status added to the verbs table, scan recipe added to the workflows list. - apps/docs/src/content/docs/cli-reference.mdx: added scan-status and bucket sections under ghost-expression. Updated overview to seventeen verbs and added the scan recipe to the skill-recipes table. - apps/docs/src/content/docs/getting-started.mdx: tools table grew to include scan-status, added a three-stage diagram, replaced the single-step "Profile Your First System" with a stage-aware "Scan Your First System" walkthrough. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nents Slot → token bindings either fall out of decisions[] (pattern consequences) or live in bucket.json components[] (exhaustive catalog). The hybrid roles[] slot was filling neither cleanly, didn't scale to systems with many components, and the schema never committed on whether it was exemplary or exhaustive. Removes roles[] from the zod schema (.strict() now rejects it), Expression type, lint (broken-role-reference rule, slug-binding propagation, and the references.ts token resolver), profile/scan/schema/verify/review recipes, expression.template.md, the docs site, and every fixture (arcade, market, ghost-ui, fleet members, .scratch). unused-palette is simplified to check decision evidence/prose only. Also: ignore .scratch/ for dogfood scans, and ship the ghost-ui expression-fidelity test bundles (arcade + market) as new fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nahiyankhan and others added 6 commits April 29, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Three-stage scan pipeline + schema simplification#68

Three-stage scan pipeline + schema simplification#68
nahiyankhan wants to merge 6 commits intomainfrom
agent-generation

nahiyankhan commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nahiyankhan commented Apr 30, 2026

Summary

Breaking changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant