Skip to content

docs(AGENTS): mandate pre-merge verification#378

Open
brianmhunt wants to merge 1 commit intomainfrom
docs/pre-merge-verification
Open

docs(AGENTS): mandate pre-merge verification#378
brianmhunt wants to merge 1 commit intomainfrom
docs/pre-merge-verification

Conversation

@brianmhunt
Copy link
Copy Markdown
Member

Summary

Adds a new `## Pre-Merge Verification` section to AGENTS.md mandating three checks before declaring any PR with a runnable artifact ready:

  1. Run it. Local invocation, exit 0, expected artifact. `bun install` to sync deps when verifying a PR is part of the work.
  2. Test it. Existing test exercises the new path, or a new spec / smoke test added in the same PR.
  3. CI parallel. `gh run list --branch ` shows every triggered workflow GREEN.

Why

Surfaced today on PR #364 (coverage harness): pushed without local run; first push lint-failed on a Biome formatting violation, then local invocation failed on a missing dep. Both 60-second pre-checks the agent should have done before declaring the PR ready. User had to prompt with "did we run it?" to surface gaps that should have been mandatory upfront.

The dark-factory thesis (small team + AI agents) only works if agents close the verification loop without prompting. Long PR threads where a maintainer has to chase basic checks defeat the throughput model.

What changed

One new section in AGENTS.md, between "Always Improve" and "Review Your Own Change Adversarially". 10 lines of prose, explicit output convention ("ran locally: ✓; CI green: ✓; smoke test added: ✓").

Test plan

  • Section reads as standalone, doesn't duplicate adjacent guidance
  • Output convention is mechanical enough to verify in PR review
  • Future agents pick this up via the standard AGENTS.md load

Codify the pre-merge checks so future agents do not require the
maintainer to ask "did we run it / did we test it / does CI gate
this?". Three checks, all mandatory, all done before declaring a
PR ready:

1. Run it locally (with `bun install` to sync deps when verifying).
2. Test it (existing test or new smoke test in the same PR).
3. Confirm CI parallel runs green via `gh run list --branch <branch>`.

Surfaced after PR #364 (coverage harness) was pushed without local
run, missed a Biome formatting failure on first push and a missing-
dep failure on local invocation — both 60-second checks the agent
should have done before declaring the PR ready.

Adversarial review: verified the new section reads as one of three
self-contained blocks, doesn't duplicate the adjacent "Always
Improve" or "Review Your Own Change Adversarially" sections, and
states the discipline in actionable terms (numbered, with explicit
✓/✗ output convention).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 5, 2026 12:12
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Warning

Rate limit exceeded

@brianmhunt has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 44 minutes and 6 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5a8eb18e-d73d-4731-b5be-b2e6be104fa2

📥 Commits

Reviewing files that changed from the base of the PR and between 681f036 and 1546ee3.

📒 Files selected for processing (1)
  • AGENTS.md
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/pre-merge-verification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Pre-Merge Verification policy to AGENTS.md so contributors and agents must close the local-run/test/CI loop before marking runnable changes as ready. This fits the repo’s broader “dark factory” workflow by tightening process around scripts, workflows, CLIs, and build targets.

Changes:

  • Adds a ## Pre-Merge Verification section to AGENTS.md.
  • Requires three checks before merge readiness: run the artifact, verify test coverage, and confirm CI is green.
  • Introduces a standard verification-status string for final summaries / PR comments.

Comment thread AGENTS.md

A PR that adds or modifies a runnable artifact (script, workflow, CLI entry point, build target) is not mergeable until **all three** of these are done — without waiting to be asked:

1. **Run it.** Invoke the new entry point locally and confirm exit 0 plus the expected artifact (file, output, exit code). For coverage harnesses this means actually generating a report; for workflow changes, push to a branch and watch the run. `bun install` to sync deps when verifying a PR is part of the work — the "ask before bun install" rule applies to *unrelated* installs.
Comment thread AGENTS.md

1. **Run it.** Invoke the new entry point locally and confirm exit 0 plus the expected artifact (file, output, exit code). For coverage harnesses this means actually generating a report; for workflow changes, push to a branch and watch the run. `bun install` to sync deps when verifying a PR is part of the work — the "ask before bun install" rule applies to *unrelated* installs.
2. **Test it.** Either an existing test exercises the new path, or a new spec / smoke test is added in the same PR. "I trust the author / I trust the tooling" doesn't count.
3. **CI parallel.** After push, run `gh run list --branch <branch>` and confirm every triggered workflow is GREEN. If lint / tests / publish-check fail, fix and re-push *before* declaring ready for merge.
Comment thread AGENTS.md
2. **Test it.** Either an existing test exercises the new path, or a new spec / smoke test is added in the same PR. "I trust the author / I trust the tooling" doesn't count.
3. **CI parallel.** After push, run `gh run list --branch <branch>` and confirm every triggered workflow is GREEN. If lint / tests / publish-check fail, fix and re-push *before* declaring ready for merge.

State the verification result explicitly in the final summary or PR comment: "ran locally: ✓; CI green: ✓; smoke test added: ✓". If anything is ✗, fix it. Don't declare done with known failures.
Comment thread AGENTS.md
2. **Test it.** Either an existing test exercises the new path, or a new spec / smoke test is added in the same PR. "I trust the author / I trust the tooling" doesn't count.
3. **CI parallel.** After push, run `gh run list --branch <branch>` and confirm every triggered workflow is GREEN. If lint / tests / publish-check fail, fix and re-push *before* declaring ready for merge.

State the verification result explicitly in the final summary or PR comment: "ran locally: ✓; CI green: ✓; smoke test added: ✓". If anything is ✗, fix it. Don't declare done with known failures.
@phillipc
Copy link
Copy Markdown
Member

phillipc commented May 5, 2026

@brianmhunt I would like to share at this point some options to save tokens:

  1. I think we need a instruction to build the solution before any validations. I often found ai-loops like this:
    I found why the tests still timed out: these specs import from packages/binding.if/dist, so they were still exercising stale built output. I’m rebuilding binding.if first, then rerunning the scoped tests with the new case. Update: We can also with to import from src.

  2. The “Adversarial review” is good but expensive. We can probably make it risk-dependent or per release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants