diff --git a/.erpaval/INDEX.md b/.erpaval/INDEX.md index 9ac15d5..74508cf 100644 --- a/.erpaval/INDEX.md +++ b/.erpaval/INDEX.md @@ -52,6 +52,10 @@ development sessions. Solutions are reusable; specs are per-feature. - [Collapse parallel switches into a Record registry](solutions/architecture-patterns/collapse-parallel-switches-into-record-registry.md) — when 2+ functions each switch over the same closed union (one per derived attribute), fold them into `Record`. tsc preserves exhaustiveness, the functions become one-line lookups, honest `| null` replaces placeholder lies, and ONE table-driven test with a `Record` fixture pins every (key, attribute) pair — better coverage than the zero direct tests the switches had. +- [A vendored-artifact dep bump must re-vendor in the same PR](solutions/conventions/vendored-artifact-bump-must-revendor-in-same-pr.md) — bumping a dep that has a committed vendored artifact (`web-tree-sitter` → `vendor/wasms/*.wasm` + manifest) without re-running the vendor script passes ALL CI (the `prepublishOnly` guard isn't a CI step) and detonates at `npm publish`, aborting the dependency-ordered multi-package release mid-stream. Scan Dependabot consolidations for vendored-artifact deps and re-vendor before merge. + +- [npm trusted publisher matches the ENTRY workflow, not the reusable one](solutions/conventions/npm-trusted-publisher-matches-entry-workflow-not-reusable.md) — npm OIDC matches "Workflow filename" against the workflow that INITIATED the run, not the one running `npm publish`. With `release-please.yml` → `workflow_call` → `release.yml`, register `release-please.yml`. Wrong registration silently 404s the token exchange; only `workflow_dispatch` runs (entry = release.yml) ever publish, so npm lags the tags. Config is web-UI-only, passkey-gated, one entry per package (17 here). + ## Specs - [001-scip-replaces-lsp](specs/001-scip-replaces-lsp/spec.md) — rip-and-replace LSP with SCIP for TS/Py/Go/Rust/Java. Task map: [tasks.md](specs/001-scip-replaces-lsp/tasks.md). diff --git a/.erpaval/solutions/conventions/npm-trusted-publisher-matches-entry-workflow-not-reusable.md b/.erpaval/solutions/conventions/npm-trusted-publisher-matches-entry-workflow-not-reusable.md new file mode 100644 index 0000000..a44edf8 --- /dev/null +++ b/.erpaval/solutions/conventions/npm-trusted-publisher-matches-entry-workflow-not-reusable.md @@ -0,0 +1,73 @@ +--- +name: npm-trusted-publisher-matches-entry-workflow-not-reusable +description: npm OIDC trusted publishing matches the "Workflow filename" against the ENTRY workflow that initiated the run, NOT the reusable workflow where `npm publish` actually executes. If release.yml is invoked via `workflow_call` from release-please.yml, you must register release-please.yml as the trusted publisher — registering release.yml silently 404s the OIDC token exchange and falls back to an unauthenticated (failing) publish. +metadata: + type: convention + category: conventions +tags: [release, npm, oidc, trusted-publishing, github-actions, workflow-call, provenance] +discovered: 2026-05-28 +session: session-88b46e +related: + - release-published-event-needs-pat-or-inline + - vendored-artifact-bump-must-revendor-in-same-pr +--- + +# npm trusted publisher matches the ENTRY workflow, not the reusable one + +## The symptom + +npm publish via OIDC trusted publishing fails with: + +``` +WARN Skipped OIDC: ERR_PNPM_AUTH_TOKEN_EXCHANGE: Failed token exchange request ... (status code 404) +📦 @opencodehub/scanners@0.2.0 → https://registry.npmjs.org/ +ERR E404 404 Not Found - PUT https://registry.npmjs.org/@opencodehub%2fscanners - Not found +``` + +The 404 on the **token exchange** is the tell: the OIDC claim npm received didn't match any configured trusted publisher, so pnpm got no token, fell back to an unauthenticated publish, and the `PUT` 404'd. + +## Root cause + +npm matches the trusted-publisher "Workflow filename" against the **workflow that initiated the run** (`workflow_ref` / the entry workflow), NOT the workflow that contains the `npm publish` command. npm's own docs call this out: *"Some workflows use `workflow_call` to invoke other workflows that run `npm publish`... validation checks the calling workflow's name instead of the workflow that actually contains the publish command."* + +This repo's release flow is: + +``` +push:main → release-please.yml (ENTRY workflow — what npm sees in the OIDC claim) + └─ uses: ./.github/workflows/release.yml (workflow_call → reusable) + └─ npm-publish job runs `pnpm -r publish --provenance` +``` + +So the OIDC claim carries `release-please.yml`. The trusted publisher was registered for `release.yml` (where publish *runs*) → no match → 404. + +## Why it was invisible + +The ONLY release.yml runs that ever published successfully were `workflow_dispatch` (manual) — because a manual dispatch makes `release.yml` itself the entry workflow, which matched the registration. Every automated `push → release-please → workflow_call` run silently failed to publish. Result: npm sat on `@opencodehub/cli@0.4.0` while the repo had long since tagged 0.5.x. Check this whenever "the tags exist but npm is behind." + +## The fix + +Register the **entry** workflow as the trusted publisher filename: + +``` +npm package → Settings → Trusted Publisher → Workflow filename: + release.yml ✗ (where publish runs) + release-please.yml ✓ (what triggers the run) +``` + +Also required (npm enforces it): `id-token: write` on BOTH the parent job (the `release` job in release-please.yml that does `uses: ./.github/workflows/release.yml`) AND the child (the npm-publish job in release.yml). This repo already had both. + +## Operational pain to plan for + +- Trusted-publisher config is **web-UI only**, no API/CLI, and **passkey/2FA-gated per save**. With N packages it's N manual saves. This monorepo has **17 publishable packages** (`packages/*` minus `docs`, which is `private: true`), so it's 17 saves. Do them back-to-back to ride the authenticator's warm-credential window. +- **Each package has exactly one trusted publisher** — no org-level or account-level setting applies to all at once. Changing the filename means re-saving all 17. +- Tradeoff: registering `release-please.yml` means a manual `workflow_dispatch` of `release.yml` (entry = release.yml) will STOP matching. The automated flow is the one that matters, so that's the right trade; treat manual dispatch as admin-only. + +## Why not just add a PAT instead? + +A `RELEASE_PLEASE_PAT` would let release-please cut a release that fires `release: published`, making `release.yml` the entry workflow (matching the old registration). But that reintroduces a long-lived token this repo deliberately removed (see [[release-published-event-needs-pat-or-inline]] — the `workflow_call` design exists precisely to avoid the PAT). For an OIDC-only, Sigstore + SLSA-L3 repo, fixing the npm registration is the hardening-preserving choice; the PAT is a regression. + +## Linked + +- [[release-published-event-needs-pat-or-inline]] — the sibling decision: why the flow uses `workflow_call` (no PAT) in the first place. THIS lesson is the npm-side consequence of that choice. +- npm docs: https://docs.npmjs.com/trusted-publishers +- Session 2026-05-28: diagnosed during the v0.6.2 → v0.6.3 release recovery. diff --git a/.erpaval/solutions/conventions/vendored-artifact-bump-must-revendor-in-same-pr.md b/.erpaval/solutions/conventions/vendored-artifact-bump-must-revendor-in-same-pr.md new file mode 100644 index 0000000..383bb78 --- /dev/null +++ b/.erpaval/solutions/conventions/vendored-artifact-bump-must-revendor-in-same-pr.md @@ -0,0 +1,51 @@ +--- +name: vendored-artifact-bump-must-revendor-in-same-pr +description: A Dependabot/manual bump of a dependency that has a committed vendored artifact (a copied .wasm, a generated lockfile-derived blob, a snapshotted schema) must re-run the vendoring script IN THE SAME PR. The prepublishOnly guard that checks artifact-vs-pin drift does NOT run in CI — it only fires at `npm publish`, so the bump passes every gate and detonates at release time, aborting a dependency-ordered multi-package publish mid-stream. +metadata: + type: convention + category: conventions +tags: [release, dependabot, vendored-wasm, prepublish, web-tree-sitter, monorepo-publish] +discovered: 2026-05-28 +session: session-88b46e +related: + - doctor-probe-drift-after-rip-and-replace + - release-published-event-needs-pat-or-inline +--- + +# A vendored-artifact dependency bump must re-vendor in the same PR + +## What bit us + +PR #137 (consolidated Dependabot bumps) moved `web-tree-sitter` 0.26.8 → 0.26.9 in `packages/ingestion/package.json`. It did NOT re-run `scripts/build-vendor-wasms.sh`, so: + +- `packages/ingestion/vendor/wasms/web-tree-sitter.wasm` stayed the 0.26.8 blob (sha `082795b…`, not the 0.26.9 `1ed02fe…`) +- `vendor/wasms/manifest.json` still recorded `"web-tree-sitter": "0.26.8"` + +PR #137 passed **every CI gate** — lint, typecheck, all 1959 tests, banned-strings, CodeQL — and merged clean. The drift only surfaced 4 PRs later at `npm publish` time, where ingestion's `prepublishOnly` guard (`verify-vendor-wasms.mjs`) compares the manifest version string against the package.json pin and `process.exit(1)` on mismatch. + +Because `pnpm -r publish` runs in **dependency order** and ingestion is upstream of cli/mcp/pack, that one guard failure **aborted the entire publish mid-stream** — the v0.6.2 release published 11 leaf packages then died, leaving cli/ingestion/mcp/pack/cobol-proleap unpublished. Recovery took a fix PR (#147) + a second release (0.6.3). + +## Why CI doesn't catch it + +The guard runs as `prepublishOnly`, NOT as a test or a CI step. By design: it's a release-time integrity check, and re-vendoring needs a toolchain (docker/emcc for build-from-source grammars) that CI deliberately doesn't provision. So the guard is invisible until `npm publish` — exactly the same failure-shape as [[doctor-probe-drift-after-rip-and-replace]] (a check that only fires in an environment CI doesn't exercise). + +## The rule + +When bumping a dependency that has a committed vendored artifact, the SAME PR must regenerate the artifact: + +1. Identify vendored-artifact deps before merging a bump. In this repo: `web-tree-sitter` (and the `tree-sitter-*` grammars) → `packages/ingestion/vendor/wasms/`. Any dep whose version is mirrored into a committed blob + a manifest is in this class. +2. After the bump, run the vendor script (`bash scripts/build-vendor-wasms.sh`) — or, if only the runtime blob changed and the build-from-source grammars are unmoved, do the targeted copy the script's tail does: `cp node_modules/.pnpm/@/node_modules//.wasm vendor/wasms/` + bump the manifest string. +3. Run the guard directly as the gate: `node packages/ingestion/scripts/verify-vendor-wasms.mjs` must exit 0. +4. Commit the regenerated artifact + manifest alongside the package.json bump. + +### Make Dependabot bumps of these deps fail loud in CI +The real fix is to stop relying on the publish-time guard. Add the guard (or a content-hash check) to the normal CI test job for `@opencodehub/ingestion`, so a stale vendored artifact turns a PR red instead of a release. A bump that touches a vendored-artifact dep should never be green in CI while the artifact is stale. Until that's wired, treat every `web-tree-sitter` / `tree-sitter-*` Dependabot PR as "must re-vendor before merge" and check the manifest in review. + +## How to apply during a Dependabot consolidation + +When squashing Dependabot bumps (the recurring task in this repo — see the history-rewrite + consolidation playbook), scan the bump list for vendored-artifact deps FIRST. For each one, re-vendor in the consolidation branch before running `pnpm run check`. The 0.26.8→0.26.9 bump shipped in a consolidation PR precisely because the consolidation step didn't have this check. + +## Linked + +- [[doctor-probe-drift-after-rip-and-replace]] — same family: a check that only runs outside CI drifts silently. +- PR #137 (the bump that drifted), #147 (the re-vendor fix), v0.6.3 (the recovery release).