Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .erpaval/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ development sessions. Solutions are reusable; specs are per-feature.

- [Collapse parallel switches into a Record registry](solutions/architecture-patterns/collapse-parallel-switches-into-record-registry.md) — when 2+ functions each switch over the same closed union (one per derived attribute), fold them into `Record<Union, Entry>`. tsc preserves exhaustiveness, the functions become one-line lookups, honest `| null` replaces placeholder lies, and ONE table-driven test with a `Record<Union, Expected>` fixture pins every (key, attribute) pair — better coverage than the zero direct tests the switches had.

- [A vendored-artifact dep bump must re-vendor in the same PR](solutions/conventions/vendored-artifact-bump-must-revendor-in-same-pr.md) — bumping a dep that has a committed vendored artifact (`web-tree-sitter` → `vendor/wasms/*.wasm` + manifest) without re-running the vendor script passes ALL CI (the `prepublishOnly` guard isn't a CI step) and detonates at `npm publish`, aborting the dependency-ordered multi-package release mid-stream. Scan Dependabot consolidations for vendored-artifact deps and re-vendor before merge.

- [npm trusted publisher matches the ENTRY workflow, not the reusable one](solutions/conventions/npm-trusted-publisher-matches-entry-workflow-not-reusable.md) — npm OIDC matches "Workflow filename" against the workflow that INITIATED the run, not the one running `npm publish`. With `release-please.yml` → `workflow_call` → `release.yml`, register `release-please.yml`. Wrong registration silently 404s the token exchange; only `workflow_dispatch` runs (entry = release.yml) ever publish, so npm lags the tags. Config is web-UI-only, passkey-gated, one entry per package (17 here).

## Specs

- [001-scip-replaces-lsp](specs/001-scip-replaces-lsp/spec.md) — rip-and-replace LSP with SCIP for TS/Py/Go/Rust/Java. Task map: [tasks.md](specs/001-scip-replaces-lsp/tasks.md).
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
name: npm-trusted-publisher-matches-entry-workflow-not-reusable
description: npm OIDC trusted publishing matches the "Workflow filename" against the ENTRY workflow that initiated the run, NOT the reusable workflow where `npm publish` actually executes. If release.yml is invoked via `workflow_call` from release-please.yml, you must register release-please.yml as the trusted publisher — registering release.yml silently 404s the OIDC token exchange and falls back to an unauthenticated (failing) publish.
metadata:
type: convention
category: conventions
tags: [release, npm, oidc, trusted-publishing, github-actions, workflow-call, provenance]
discovered: 2026-05-28
session: session-88b46e
related:
- release-published-event-needs-pat-or-inline
- vendored-artifact-bump-must-revendor-in-same-pr
---

# npm trusted publisher matches the ENTRY workflow, not the reusable one

## The symptom

npm publish via OIDC trusted publishing fails with:

```
WARN Skipped OIDC: ERR_PNPM_AUTH_TOKEN_EXCHANGE: Failed token exchange request ... (status code 404)
📦 @opencodehub/scanners@0.2.0 → https://registry.npmjs.org/
ERR E404 404 Not Found - PUT https://registry.npmjs.org/@opencodehub%2fscanners - Not found
```

The 404 on the **token exchange** is the tell: the OIDC claim npm received didn't match any configured trusted publisher, so pnpm got no token, fell back to an unauthenticated publish, and the `PUT` 404'd.

## Root cause

npm matches the trusted-publisher "Workflow filename" against the **workflow that initiated the run** (`workflow_ref` / the entry workflow), NOT the workflow that contains the `npm publish` command. npm's own docs call this out: *"Some workflows use `workflow_call` to invoke other workflows that run `npm publish`... validation checks the calling workflow's name instead of the workflow that actually contains the publish command."*

This repo's release flow is:

```
push:main → release-please.yml (ENTRY workflow — what npm sees in the OIDC claim)
└─ uses: ./.github/workflows/release.yml (workflow_call → reusable)
└─ npm-publish job runs `pnpm -r publish --provenance`
```

So the OIDC claim carries `release-please.yml`. The trusted publisher was registered for `release.yml` (where publish *runs*) → no match → 404.

## Why it was invisible

The ONLY release.yml runs that ever published successfully were `workflow_dispatch` (manual) — because a manual dispatch makes `release.yml` itself the entry workflow, which matched the registration. Every automated `push → release-please → workflow_call` run silently failed to publish. Result: npm sat on `@opencodehub/cli@0.4.0` while the repo had long since tagged 0.5.x. Check this whenever "the tags exist but npm is behind."

## The fix

Register the **entry** workflow as the trusted publisher filename:

```
npm package → Settings → Trusted Publisher → Workflow filename:
release.yml ✗ (where publish runs)
release-please.yml ✓ (what triggers the run)
```

Also required (npm enforces it): `id-token: write` on BOTH the parent job (the `release` job in release-please.yml that does `uses: ./.github/workflows/release.yml`) AND the child (the npm-publish job in release.yml). This repo already had both.

## Operational pain to plan for

- Trusted-publisher config is **web-UI only**, no API/CLI, and **passkey/2FA-gated per save**. With N packages it's N manual saves. This monorepo has **17 publishable packages** (`packages/*` minus `docs`, which is `private: true`), so it's 17 saves. Do them back-to-back to ride the authenticator's warm-credential window.
- **Each package has exactly one trusted publisher** — no org-level or account-level setting applies to all at once. Changing the filename means re-saving all 17.
- Tradeoff: registering `release-please.yml` means a manual `workflow_dispatch` of `release.yml` (entry = release.yml) will STOP matching. The automated flow is the one that matters, so that's the right trade; treat manual dispatch as admin-only.

## Why not just add a PAT instead?

A `RELEASE_PLEASE_PAT` would let release-please cut a release that fires `release: published`, making `release.yml` the entry workflow (matching the old registration). But that reintroduces a long-lived token this repo deliberately removed (see [[release-published-event-needs-pat-or-inline]] — the `workflow_call` design exists precisely to avoid the PAT). For an OIDC-only, Sigstore + SLSA-L3 repo, fixing the npm registration is the hardening-preserving choice; the PAT is a regression.

## Linked

- [[release-published-event-needs-pat-or-inline]] — the sibling decision: why the flow uses `workflow_call` (no PAT) in the first place. THIS lesson is the npm-side consequence of that choice.
- npm docs: https://docs.npmjs.com/trusted-publishers
- Session 2026-05-28: diagnosed during the v0.6.2 → v0.6.3 release recovery.
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
name: vendored-artifact-bump-must-revendor-in-same-pr
description: A Dependabot/manual bump of a dependency that has a committed vendored artifact (a copied .wasm, a generated lockfile-derived blob, a snapshotted schema) must re-run the vendoring script IN THE SAME PR. The prepublishOnly guard that checks artifact-vs-pin drift does NOT run in CI — it only fires at `npm publish`, so the bump passes every gate and detonates at release time, aborting a dependency-ordered multi-package publish mid-stream.
metadata:
type: convention
category: conventions
tags: [release, dependabot, vendored-wasm, prepublish, web-tree-sitter, monorepo-publish]
discovered: 2026-05-28
session: session-88b46e
related:
- doctor-probe-drift-after-rip-and-replace
- release-published-event-needs-pat-or-inline
---

# A vendored-artifact dependency bump must re-vendor in the same PR

## What bit us

PR #137 (consolidated Dependabot bumps) moved `web-tree-sitter` 0.26.8 → 0.26.9 in `packages/ingestion/package.json`. It did NOT re-run `scripts/build-vendor-wasms.sh`, so:

- `packages/ingestion/vendor/wasms/web-tree-sitter.wasm` stayed the 0.26.8 blob (sha `082795b…`, not the 0.26.9 `1ed02fe…`)
- `vendor/wasms/manifest.json` still recorded `"web-tree-sitter": "0.26.8"`

PR #137 passed **every CI gate** — lint, typecheck, all 1959 tests, banned-strings, CodeQL — and merged clean. The drift only surfaced 4 PRs later at `npm publish` time, where ingestion's `prepublishOnly` guard (`verify-vendor-wasms.mjs`) compares the manifest version string against the package.json pin and `process.exit(1)` on mismatch.

Because `pnpm -r publish` runs in **dependency order** and ingestion is upstream of cli/mcp/pack, that one guard failure **aborted the entire publish mid-stream** — the v0.6.2 release published 11 leaf packages then died, leaving cli/ingestion/mcp/pack/cobol-proleap unpublished. Recovery took a fix PR (#147) + a second release (0.6.3).

## Why CI doesn't catch it

The guard runs as `prepublishOnly`, NOT as a test or a CI step. By design: it's a release-time integrity check, and re-vendoring needs a toolchain (docker/emcc for build-from-source grammars) that CI deliberately doesn't provision. So the guard is invisible until `npm publish` — exactly the same failure-shape as [[doctor-probe-drift-after-rip-and-replace]] (a check that only fires in an environment CI doesn't exercise).

## The rule

When bumping a dependency that has a committed vendored artifact, the SAME PR must regenerate the artifact:

1. Identify vendored-artifact deps before merging a bump. In this repo: `web-tree-sitter` (and the `tree-sitter-*` grammars) → `packages/ingestion/vendor/wasms/`. Any dep whose version is mirrored into a committed blob + a manifest is in this class.
2. After the bump, run the vendor script (`bash scripts/build-vendor-wasms.sh`) — or, if only the runtime blob changed and the build-from-source grammars are unmoved, do the targeted copy the script's tail does: `cp node_modules/.pnpm/<dep>@<ver>/node_modules/<dep>/<artifact>.wasm vendor/wasms/` + bump the manifest string.
3. Run the guard directly as the gate: `node packages/ingestion/scripts/verify-vendor-wasms.mjs` must exit 0.
4. Commit the regenerated artifact + manifest alongside the package.json bump.

### Make Dependabot bumps of these deps fail loud in CI
The real fix is to stop relying on the publish-time guard. Add the guard (or a content-hash check) to the normal CI test job for `@opencodehub/ingestion`, so a stale vendored artifact turns a PR red instead of a release. A bump that touches a vendored-artifact dep should never be green in CI while the artifact is stale. Until that's wired, treat every `web-tree-sitter` / `tree-sitter-*` Dependabot PR as "must re-vendor before merge" and check the manifest in review.

## How to apply during a Dependabot consolidation

When squashing Dependabot bumps (the recurring task in this repo — see the history-rewrite + consolidation playbook), scan the bump list for vendored-artifact deps FIRST. For each one, re-vendor in the consolidation branch before running `pnpm run check`. The 0.26.8→0.26.9 bump shipped in a consolidation PR precisely because the consolidation step didn't have this check.

## Linked

- [[doctor-probe-drift-after-rip-and-replace]] — same family: a check that only runs outside CI drifts silently.
- PR #137 (the bump that drifted), #147 (the re-vendor fix), v0.6.3 (the recovery release).
Loading