From c2f3b71c041e5559be63b2baacc498046dd7b343 Mon Sep 17 00:00:00 2001 From: bgagent <345885+scottschreckengaust@users.noreply.github.com> Date: Tue, 19 May 2026 07:03:15 +0000 Subject: [PATCH 1/4] docs: ADR-005 feedback loop, ADR-008 definition of done, ADR-009 security posture ADR-005: PR review feedback propagates upstream to issues and ADRs. Classification (nit/bug/design/architecture), pause-assess-propagate- resolve-resume protocol, stacked PR chain recovery. ADR-008: Progressive definition of done (Level 1-4). Default levels per issue type. Verification responsibility scales with risk. ADR-009: Development-time agent security. Role separation (planner/ implementor/reviewer/admin), blast radius classification, 2P review for high-risk changes, no self-approval. Refs #136, #139, #140 Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/decisions/005-feedback-loop.md | 68 +++++++++++++++ docs/decisions/008-definition-of-done.md | 82 ++++++++++++++++++ .../009-security-posture-dev-agents.md | 72 ++++++++++++++++ .../docs/decisions/005-feedback-loop.md | 72 ++++++++++++++++ .../docs/decisions/008-definition-of-done.md | 86 +++++++++++++++++++ .../009-security-posture-dev-agents.md | 76 ++++++++++++++++ 6 files changed, 456 insertions(+) create mode 100644 docs/decisions/005-feedback-loop.md create mode 100644 docs/decisions/008-definition-of-done.md create mode 100644 docs/decisions/009-security-posture-dev-agents.md create mode 100644 docs/src/content/docs/decisions/005-feedback-loop.md create mode 100644 docs/src/content/docs/decisions/008-definition-of-done.md create mode 100644 docs/src/content/docs/decisions/009-security-posture-dev-agents.md diff --git a/docs/decisions/005-feedback-loop.md b/docs/decisions/005-feedback-loop.md new file mode 100644 index 0000000..8d2ee83 --- /dev/null +++ b/docs/decisions/005-feedback-loop.md @@ -0,0 +1,68 @@ +# ADR-005: Feedback loop — PR reviews propagate to issues and ADRs + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +PR review comments are addressed locally (fix the code) but systemic issues they reveal are not propagated upstream. A reviewer says "this approach is wrong" but the issue still says "use this approach." ADRs are treated as immutable when they should be living decisions that evolve with implementation experience. + +Without a feedback protocol, review insights are lost, issue bodies rot, and architectural mistakes persist across stacked PR chains. + +## Decision + +### Review comment classification + +| Type | Action | Propagates to | +|------|--------|---------------| +| Nit (style, naming) | Fix in PR | Nothing | +| Bug (logic error) | Fix in PR | Nothing (unless systemic) | +| Design concern | Pause PR; evaluate | Issue body | +| Architecture challenge | Pause PR; escalate | ADR (supersede? amend?) | +| Scope question | Clarify | Issue body | +| Blocker (won't approve as-is) | Pause PR | Issue body | + +### Upstream propagation + +When a review surfaces a design concern or architecture challenge: + +1. **Pause** — Do not force-merge. Do not continue stacked PRs above this one. +2. **Assess** — Does this invalidate the issue's approach? The ADR's decision? +3. **Propagate** — Update the relevant upstream document (issue body, ADR, stacked PR dependents). +4. **Resolve** — Revise the approach, defend with evidence, or cancel the work. +5. **Resume** — Once resolved, unblock the PR and dependents. + +### ADR evolution + +| Trigger | Response | +|---------|----------| +| Implementation reveals the decision doesn't work | New RFC proposing a successor ADR | +| Reviewer challenges the architectural premise | `**UNRESOLVED**` on the issue; pause | +| New information makes the decision obsolete | Successor ADR with `Supersedes: ADR-NNN` | +| Decision works but needs refinement | Amend via PR (minor, no new ADR) | + +Never silently ignore a challenged decision. + +### Stacked PR chain revision + +When feedback on PR N invalidates PRs N+1 through N+M: +1. Comment on all affected PRs +2. Do not rebase dependent PRs until the base is stable +3. If architectural: re-evaluate whether the remaining stack is valid +4. If redesign needed: close dependent PRs, revise issue, re-plan + +## Consequences + +- (+) Review insights propagate to architectural decisions +- (+) Issue bodies stay current with implementation learnings +- (+) ADRs evolve rather than silently becoming outdated +- (+) Stacked PR chains have a defined recovery protocol +- (-) Adds process overhead to reviews (classification step) +- (-) Pausing stacked chains delays delivery +- (!) Requires discipline to actually propagate feedback upstream + +## References + +- Issue #136 — full RFC with open questions +- ADR-003 — governance (issue body as source of truth) +- ADR-001 — stacked PRs (chain revision protocol) diff --git a/docs/decisions/008-definition-of-done.md b/docs/decisions/008-definition-of-done.md new file mode 100644 index 0000000..7f637ed --- /dev/null +++ b/docs/decisions/008-definition-of-done.md @@ -0,0 +1,82 @@ +# ADR-008: Definition of Done (progressive maturity) + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +"Done" is implicit and varies by contributor. Some consider a passing build sufficient; others expect documentation, tests, and deployment verification. Agents have no unambiguous checklist to know they have completed work. Over-engineering "done" early blocks velocity; under-defining it ships incomplete work. + +The definition must be progressive — rising as the project matures — so it does not block early momentum but ensures quality at scale. + +## Decision + +### Progressive levels + +**Level 1 — Basic (minimum viable):** +- Code compiles without errors +- Existing tests pass (no regressions) +- New code has tests (unit level minimum) +- Linting passes +- PR description explains what and why +- Linked issue exists + +**Level 2 — Standard (current project default):** +- All of Level 1 +- Pre-commit hooks pass +- CDK synth succeeds (if infrastructure changes) +- Security scans pass (no new HIGH/CRITICAL findings) +- Documentation updated if behavior changes +- Starlight mirrors synced (if docs changed) + +**Level 3 — Rigorous (critical paths):** +- All of Level 2 +- Integration or E2E test covers the happy path +- Error paths tested +- Reviewer approved (human or qualified agent) +- Deployed to ephemeral stack and smoke-tested (if infrastructure) +- ADR written (if architectural decision made) + +**Level 4 — Self-verifying (future target):** +- All of Level 3 +- Tabula rasa agent can replicate the outcome using only docs +- CI includes behavioral verification +- Documentation drift detection passes + +### Default level by issue type + +| Issue type | Default level | +|-----------|---------------| +| Bug fix | Level 2 | +| New feature | Level 2-3 (based on blast radius) | +| Infrastructure/IAM change | Level 3 | +| Documentation only | Level 1 | +| Security fix | Level 3 | +| RFC/ADR implementation | Level 2 + ADR written | + +Issues may override by specifying `Done: Level N` in the body. + +### Verification responsibility + +| Level | Who verifies | +|-------|-------------| +| 1 | CI (automated) | +| 2 | CI + self-check by implementor | +| 3 | CI + reviewer + implementor | +| 4 | CI + reviewer + independent agent | + +## Consequences + +- (+) Agents have an unambiguous completion checklist +- (+) Quality bar rises as the project matures +- (+) Over-engineering is prevented (Level 1 for simple docs changes) +- (+) Critical paths get rigorous verification (Level 3) +- (-) Requires labeling or explicit level assignment per issue +- (-) Level 4 is aspirational and depends on ADR-007 (knowledge acquisition) +- (!) The project must eventually graduate from Level 2 to Level 3 default + +## References + +- Issue #139 — full RFC with open questions +- ADR-003 — governance (defines when to start; this defines when to stop) +- ADR-007 — knowledge acquisition (Level 4 depends on tabula rasa verification) diff --git a/docs/decisions/009-security-posture-dev-agents.md b/docs/decisions/009-security-posture-dev-agents.md new file mode 100644 index 0000000..9dc07be --- /dev/null +++ b/docs/decisions/009-security-posture-dev-agents.md @@ -0,0 +1,72 @@ +# ADR-009: Security posture and blast radius for development-time agents + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +The existing `SECURITY.md` covers runtime agent execution (inside MicroVMs). It does not cover **development-time agents** — those writing code, creating PRs, and modifying infrastructure in this repository. A development-time agent operates with the credentials of whoever invoked it, creating a risk of self-approval, policy modification, and unbounded blast radius. + +The core principle: **planners and implementors must be separated by context and ideally by identity. No self-approval.** + +## Decision + +### Role separation + +| Role | Can do | Cannot do | +|------|--------|-----------| +| **Planner** | Create/edit issues, write RFCs/ADRs, define roadmap | Write code, push branches, approve PRs | +| **Implementor** | Write code, create PRs, push branches, run tests | Approve own PRs, merge own PRs, modify CI/security config | +| **Reviewer** | Approve PRs, request changes, merge | Write code on the same PR being reviewed | +| **Admin** | All of the above + modify policies, approve issues | Still requires 2P for policy changes | + +### Blast radius classification + +| Action | Risk | Gate | +|--------|------|------| +| Edit code in existing patterns | Low | CI + peer review | +| Add new dependency | Medium | Security scan + review | +| Modify IAM policy / security config | High | 2P review + admin approval | +| Modify CI/CD workflow | High | 2P review + admin approval | +| Modify branch protection / approval rules | Critical | Admin-only + audit trail | +| Modify governance ADRs | Critical | Admin-only + 2P review | +| Delete or force-push protected branches | Critical | Never automated; human-only | + +### 2P (two-person) review + +For High and Critical actions: +- The author cannot be one of the two approvers +- At least one approver must be a human +- Approvals reference the specific risk being accepted + +### No self-approval (structural) + +- Branch protection requires review from someone other than the pusher +- If an agent plans AND implements, review must come from an identity that did neither +- The identity that writes code cannot approve or merge it + +### Credential scoping + +| Agent context | Minimum credentials | +|---------------|-------------------| +| Planning (issues, RFCs) | GitHub Issues write, read-only repo | +| Implementation (code, PRs) | Repo write, PR create, no merge capability | +| Review | PR review write, no push capability | +| Deployment | Separate deploy key, environment approval gate | + +## Consequences + +- (+) Prevents self-approval of dangerous changes +- (+) Blast radius is explicit and enforceable +- (+) Role separation enables audit trail +- (+) 2P review catches compromised or confused agents +- (-) Credential management complexity increases +- (-) Small tasks require multi-identity orchestration +- (!) Personal PATs grant all permissions — structural enforcement requires GitHub Apps or fine-grained tokens + +## References + +- Issue #140 — full RFC with open questions +- `docs/design/SECURITY.md` — runtime agent security (complementary) +- Cedar HITL gates (PR #88) — runtime tool-call governance +- ADR-003 — governance (approval gates enforced here technically) diff --git a/docs/src/content/docs/decisions/005-feedback-loop.md b/docs/src/content/docs/decisions/005-feedback-loop.md new file mode 100644 index 0000000..4cfee1a --- /dev/null +++ b/docs/src/content/docs/decisions/005-feedback-loop.md @@ -0,0 +1,72 @@ +--- +title: 005 feedback loop +--- + +# ADR-005: Feedback loop — PR reviews propagate to issues and ADRs + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +PR review comments are addressed locally (fix the code) but systemic issues they reveal are not propagated upstream. A reviewer says "this approach is wrong" but the issue still says "use this approach." ADRs are treated as immutable when they should be living decisions that evolve with implementation experience. + +Without a feedback protocol, review insights are lost, issue bodies rot, and architectural mistakes persist across stacked PR chains. + +## Decision + +### Review comment classification + +| Type | Action | Propagates to | +|------|--------|---------------| +| Nit (style, naming) | Fix in PR | Nothing | +| Bug (logic error) | Fix in PR | Nothing (unless systemic) | +| Design concern | Pause PR; evaluate | Issue body | +| Architecture challenge | Pause PR; escalate | ADR (supersede? amend?) | +| Scope question | Clarify | Issue body | +| Blocker (won't approve as-is) | Pause PR | Issue body | + +### Upstream propagation + +When a review surfaces a design concern or architecture challenge: + +1. **Pause** — Do not force-merge. Do not continue stacked PRs above this one. +2. **Assess** — Does this invalidate the issue's approach? The ADR's decision? +3. **Propagate** — Update the relevant upstream document (issue body, ADR, stacked PR dependents). +4. **Resolve** — Revise the approach, defend with evidence, or cancel the work. +5. **Resume** — Once resolved, unblock the PR and dependents. + +### ADR evolution + +| Trigger | Response | +|---------|----------| +| Implementation reveals the decision doesn't work | New RFC proposing a successor ADR | +| Reviewer challenges the architectural premise | `**UNRESOLVED**` on the issue; pause | +| New information makes the decision obsolete | Successor ADR with `Supersedes: ADR-NNN` | +| Decision works but needs refinement | Amend via PR (minor, no new ADR) | + +Never silently ignore a challenged decision. + +### Stacked PR chain revision + +When feedback on PR N invalidates PRs N+1 through N+M: +1. Comment on all affected PRs +2. Do not rebase dependent PRs until the base is stable +3. If architectural: re-evaluate whether the remaining stack is valid +4. If redesign needed: close dependent PRs, revise issue, re-plan + +## Consequences + +- (+) Review insights propagate to architectural decisions +- (+) Issue bodies stay current with implementation learnings +- (+) ADRs evolve rather than silently becoming outdated +- (+) Stacked PR chains have a defined recovery protocol +- (-) Adds process overhead to reviews (classification step) +- (-) Pausing stacked chains delays delivery +- (!) Requires discipline to actually propagate feedback upstream + +## References + +- Issue #136 — full RFC with open questions +- ADR-003 — governance (issue body as source of truth) +- ADR-001 — stacked PRs (chain revision protocol) diff --git a/docs/src/content/docs/decisions/008-definition-of-done.md b/docs/src/content/docs/decisions/008-definition-of-done.md new file mode 100644 index 0000000..27d1ea8 --- /dev/null +++ b/docs/src/content/docs/decisions/008-definition-of-done.md @@ -0,0 +1,86 @@ +--- +title: 008 definition of done +--- + +# ADR-008: Definition of Done (progressive maturity) + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +"Done" is implicit and varies by contributor. Some consider a passing build sufficient; others expect documentation, tests, and deployment verification. Agents have no unambiguous checklist to know they have completed work. Over-engineering "done" early blocks velocity; under-defining it ships incomplete work. + +The definition must be progressive — rising as the project matures — so it does not block early momentum but ensures quality at scale. + +## Decision + +### Progressive levels + +**Level 1 — Basic (minimum viable):** +- Code compiles without errors +- Existing tests pass (no regressions) +- New code has tests (unit level minimum) +- Linting passes +- PR description explains what and why +- Linked issue exists + +**Level 2 — Standard (current project default):** +- All of Level 1 +- Pre-commit hooks pass +- CDK synth succeeds (if infrastructure changes) +- Security scans pass (no new HIGH/CRITICAL findings) +- Documentation updated if behavior changes +- Starlight mirrors synced (if docs changed) + +**Level 3 — Rigorous (critical paths):** +- All of Level 2 +- Integration or E2E test covers the happy path +- Error paths tested +- Reviewer approved (human or qualified agent) +- Deployed to ephemeral stack and smoke-tested (if infrastructure) +- ADR written (if architectural decision made) + +**Level 4 — Self-verifying (future target):** +- All of Level 3 +- Tabula rasa agent can replicate the outcome using only docs +- CI includes behavioral verification +- Documentation drift detection passes + +### Default level by issue type + +| Issue type | Default level | +|-----------|---------------| +| Bug fix | Level 2 | +| New feature | Level 2-3 (based on blast radius) | +| Infrastructure/IAM change | Level 3 | +| Documentation only | Level 1 | +| Security fix | Level 3 | +| RFC/ADR implementation | Level 2 + ADR written | + +Issues may override by specifying `Done: Level N` in the body. + +### Verification responsibility + +| Level | Who verifies | +|-------|-------------| +| 1 | CI (automated) | +| 2 | CI + self-check by implementor | +| 3 | CI + reviewer + implementor | +| 4 | CI + reviewer + independent agent | + +## Consequences + +- (+) Agents have an unambiguous completion checklist +- (+) Quality bar rises as the project matures +- (+) Over-engineering is prevented (Level 1 for simple docs changes) +- (+) Critical paths get rigorous verification (Level 3) +- (-) Requires labeling or explicit level assignment per issue +- (-) Level 4 is aspirational and depends on ADR-007 (knowledge acquisition) +- (!) The project must eventually graduate from Level 2 to Level 3 default + +## References + +- Issue #139 — full RFC with open questions +- ADR-003 — governance (defines when to start; this defines when to stop) +- ADR-007 — knowledge acquisition (Level 4 depends on tabula rasa verification) diff --git a/docs/src/content/docs/decisions/009-security-posture-dev-agents.md b/docs/src/content/docs/decisions/009-security-posture-dev-agents.md new file mode 100644 index 0000000..7bd51b3 --- /dev/null +++ b/docs/src/content/docs/decisions/009-security-posture-dev-agents.md @@ -0,0 +1,76 @@ +--- +title: 009 security posture dev agents +--- + +# ADR-009: Security posture and blast radius for development-time agents + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +The existing `SECURITY.md` covers runtime agent execution (inside MicroVMs). It does not cover **development-time agents** — those writing code, creating PRs, and modifying infrastructure in this repository. A development-time agent operates with the credentials of whoever invoked it, creating a risk of self-approval, policy modification, and unbounded blast radius. + +The core principle: **planners and implementors must be separated by context and ideally by identity. No self-approval.** + +## Decision + +### Role separation + +| Role | Can do | Cannot do | +|------|--------|-----------| +| **Planner** | Create/edit issues, write RFCs/ADRs, define roadmap | Write code, push branches, approve PRs | +| **Implementor** | Write code, create PRs, push branches, run tests | Approve own PRs, merge own PRs, modify CI/security config | +| **Reviewer** | Approve PRs, request changes, merge | Write code on the same PR being reviewed | +| **Admin** | All of the above + modify policies, approve issues | Still requires 2P for policy changes | + +### Blast radius classification + +| Action | Risk | Gate | +|--------|------|------| +| Edit code in existing patterns | Low | CI + peer review | +| Add new dependency | Medium | Security scan + review | +| Modify IAM policy / security config | High | 2P review + admin approval | +| Modify CI/CD workflow | High | 2P review + admin approval | +| Modify branch protection / approval rules | Critical | Admin-only + audit trail | +| Modify governance ADRs | Critical | Admin-only + 2P review | +| Delete or force-push protected branches | Critical | Never automated; human-only | + +### 2P (two-person) review + +For High and Critical actions: +- The author cannot be one of the two approvers +- At least one approver must be a human +- Approvals reference the specific risk being accepted + +### No self-approval (structural) + +- Branch protection requires review from someone other than the pusher +- If an agent plans AND implements, review must come from an identity that did neither +- The identity that writes code cannot approve or merge it + +### Credential scoping + +| Agent context | Minimum credentials | +|---------------|-------------------| +| Planning (issues, RFCs) | GitHub Issues write, read-only repo | +| Implementation (code, PRs) | Repo write, PR create, no merge capability | +| Review | PR review write, no push capability | +| Deployment | Separate deploy key, environment approval gate | + +## Consequences + +- (+) Prevents self-approval of dangerous changes +- (+) Blast radius is explicit and enforceable +- (+) Role separation enables audit trail +- (+) 2P review catches compromised or confused agents +- (-) Credential management complexity increases +- (-) Small tasks require multi-identity orchestration +- (!) Personal PATs grant all permissions — structural enforcement requires GitHub Apps or fine-grained tokens + +## References + +- Issue #140 — full RFC with open questions +- `docs/design/SECURITY.md` — runtime agent security (complementary) +- Cedar HITL gates (PR #88) — runtime tool-call governance +- ADR-003 — governance (approval gates enforced here technically) From 567b0206f27c743ab73fefe5c19b5fa4d4addcb6 Mon Sep 17 00:00:00 2001 From: bgagent <345885+scottschreckengaust@users.noreply.github.com> Date: Tue, 19 May 2026 07:06:39 +0000 Subject: [PATCH 2/4] docs: ADR-006 feature flags, ADR-007 knowledge acquisition, ADR-010 error recovery, ADR-011 conflict resolution MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ADR-006: Feature flags for concurrent development. When to use, lifecycle (proposed→introduced→active→verified→permanent), ownership, maximum lifetime enforcement. ADR-007: Knowledge acquisition through progressive failure. Zero-context execution attempts, failure capture protocol, maturity model (L0-L3), self-improvement loop. ADR-010: Error recovery and rollback. Decision tree (revert vs fix- forward), stacked PR chain recovery, things agents must never do. ADR-011: Conflict resolution. Escalation ladder (4 levels), decision criteria, merge conflict ownership, human vs agent disagreements. Refs #137, #138, #141, #142 Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/decisions/006-feature-flags.md | 64 ++++++++++++++++ docs/decisions/007-knowledge-acquisition.md | 69 ++++++++++++++++++ docs/decisions/010-error-recovery.md | 69 ++++++++++++++++++ docs/decisions/011-conflict-resolution.md | 64 ++++++++++++++++ .../docs/decisions/006-feature-flags.md | 68 +++++++++++++++++ .../decisions/007-knowledge-acquisition.md | 73 +++++++++++++++++++ .../docs/decisions/010-error-recovery.md | 73 +++++++++++++++++++ .../docs/decisions/011-conflict-resolution.md | 68 +++++++++++++++++ 8 files changed, 548 insertions(+) create mode 100644 docs/decisions/006-feature-flags.md create mode 100644 docs/decisions/007-knowledge-acquisition.md create mode 100644 docs/decisions/010-error-recovery.md create mode 100644 docs/decisions/011-conflict-resolution.md create mode 100644 docs/src/content/docs/decisions/006-feature-flags.md create mode 100644 docs/src/content/docs/decisions/007-knowledge-acquisition.md create mode 100644 docs/src/content/docs/decisions/010-error-recovery.md create mode 100644 docs/src/content/docs/decisions/011-conflict-resolution.md diff --git a/docs/decisions/006-feature-flags.md b/docs/decisions/006-feature-flags.md new file mode 100644 index 0000000..8284070 --- /dev/null +++ b/docs/decisions/006-feature-flags.md @@ -0,0 +1,64 @@ +# ADR-006: Feature flags for concurrent development + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +Multiple agents working on related features in the same area must serialize — one waits for the other to merge. Incomplete features either block the main branch or require long-lived branches that diverge. SRE needs kill switches without reverting commits. + +Feature flags enable trunk-based development where incomplete work merges safely behind toggles, and concurrent contributors avoid blocking each other. + +## Decision + +### When to use flags + +| Situation | Use a flag? | +|-----------|-------------| +| Feature spans multiple PRs, incomplete state is unsafe | Yes | +| Two contributors touch the same module for different purposes | Yes | +| SRE needs a kill switch for a new capability | Yes | +| Simple refactor with no behavioral change | No | +| Bug fix | No | +| One-PR feature, complete on merge | No | + +### Flag ownership + +- Every flag has an owner (the issue that introduced it) +- Every flag has an expiration (the issue/PR that removes it) +- Flags without a removal plan are rejected in review + +### Separation of concerns + +- **Planners** decide which features get flags (issue/RFC level) +- **Implementors** add/use flags in code (PR level) +- **SRE/operators** toggle flags in production (runtime level) +- **No self-approval** — the person who introduces a flag cannot approve its removal + +### Flag lifecycle + +1. **Proposed** — issue identifies the need for a flag +2. **Introduced** — PR adds the flag (default: off) +3. **Active** — feature behind flag is in development +4. **Verified** — feature complete, flag toggled on in testing +5. **Permanent** — flag removed, feature is always-on (or removed entirely) + +### Maximum lifetime + +Flags must be removed within one release cycle of the feature being verified. Stale flags are treated as technical debt and surfaced in periodic reviews. + +## Consequences + +- (+) Concurrent work proceeds without blocking +- (+) Trunk-based development: main stays deployable +- (+) SRE can disable features without code changes +- (+) Partial features merge safely +- (-) Flag management overhead +- (-) Combinatorial testing complexity if many flags exist simultaneously +- (!) Maximum lifetime must be enforced or flags accumulate indefinitely + +## References + +- Issue #137 — full RFC with open questions on mechanism (CDK context vs. DynamoDB vs. env vars) +- ADR-003 — governance (flag introduction requires approval) +- ADR-005 — feedback loop (reviewer may flag-gate a feature during review) diff --git a/docs/decisions/007-knowledge-acquisition.md b/docs/decisions/007-knowledge-acquisition.md new file mode 100644 index 0000000..23b5a90 --- /dev/null +++ b/docs/decisions/007-knowledge-acquisition.md @@ -0,0 +1,69 @@ +# ADR-007: Knowledge acquisition through progressive failure + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +Agents with fresh context (tabula rasa) attempt to follow documentation and hit gaps they cannot resolve. These gaps are silently worked around (agent asks a human) rather than systematically fixed. The system cannot self-improve its onboarding because failures are not captured. + +Knowledge acquisition starts from zero. Each iteration creates the roadmap to better knowledge by discovering gaps through actual failures. + +## Decision + +### Zero-context execution attempts + +Periodically, an agent with no project memory attempts to follow guides end-to-end. The agent follows ONLY what is written — no inference, no training data knowledge, no asking colleagues. + +### Failure capture protocol + +At each failure point, the agent: +1. **Stops** — does not attempt to work around or guess +2. **Documents** — creates an issue: which document, which step, what was missing +3. **Continues** — attempts the next step (if possible) to find additional gaps + +### Knowledge artifacts (interim) + +Until documentation meets ADR-004, agents may create ephemeral artifacts: +- Semantic indices of the codebase (call graphs, dependency maps) +- Annotated walkthroughs of successful executions +- "What I learned" summaries after completing a task + +These are scaffolding that informs documentation improvements, not documentation themselves. + +### Maturity model + +| Level | State | Agent capability | +|-------|-------|-----------------| +| 0 | No docs | Cannot start; files issue for missing docs | +| 1 | Partial docs | Follows docs, stops at gaps, files issues | +| 2 | Complete docs (ADR-004) | Completes end-to-end without help | +| 3 | Self-improving | Detects drift between docs and code, auto-files issues | + +### The self-improvement loop + +``` +Agent starts fresh → follows docs → hits failure → + files issue → issue gets fixed → next agent goes further → + hits next failure → files issue → ... + until end-to-end works from zero context +``` + +This runs continuously because code changes outpace documentation and different agent implementations fail at different points. + +## Consequences + +- (+) Documentation gaps become bugs with reproduction steps +- (+) Priority ordering emerges naturally (most common failures surface first) +- (+) The system self-improves without human identification of gaps +- (+) Creates a natural definition of "docs are done" (Level 2 achieved) +- (-) Generates issue volume that needs triage +- (-) Requires periodic investment in zero-context test runs +- (!) The gap between Level 1 and Level 2 may be large — patience required + +## References + +- Issue #138 — full RFC with open questions +- ADR-004 — defines the quality target (tabula rasa test) +- ADR-003 — governance for issues filed by failing agents +- ADR-008 — Level 4 Definition of Done depends on this protocol diff --git a/docs/decisions/010-error-recovery.md b/docs/decisions/010-error-recovery.md new file mode 100644 index 0000000..3a70175 --- /dev/null +++ b/docs/decisions/010-error-recovery.md @@ -0,0 +1,69 @@ +# ADR-010: Error recovery and rollback protocol + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +When merged code breaks something, the response is ad-hoc. Agents operating autonomously may merge code that passes CI but breaks integration. No protocol defines when to revert vs. fix forward, who decides, or how stacked PR chains recover. + +## Decision + +### Decision tree + +``` +Broken thing detected +├─ Production affected (users impacted NOW)? +│ └─ Yes → REVERT immediately, investigate after +├─ Fix obvious and < 30 minutes? +│ └─ Yes → Fix forward (new PR, not amend) +├─ Stacked PR chain? +│ └─ Yes → Pause dependent PRs, fix the base +└─ Scope of damage unclear? + └─ Yes → REVERT (safe default), then investigate +``` + +### Revert protocol + +1. Create a revert commit (not force-push) — preserves history +2. Open an issue: what broke, why CI did not catch it, what the fix needs +3. The fix goes through normal review (no rushing, no skipping gates) + +### Fix-forward protocol + +1. Only if the fix is obvious, small, and low-risk +2. Must still go through PR + review +3. If the fix introduces new complexity — revert instead + +### Stacked PR chain recovery + +1. Identify which PR introduced the breakage +2. Pause/close all PRs above it +3. Fix the base PR +4. Rebase and re-evaluate dependent PRs +5. Re-run CI on each before re-opening + +### Agents must NEVER do during recovery + +- Force-push to shared branches +- Delete branches with others' work +- Amend published commits +- Skip review "because it's urgent" +- Self-approve a revert + +## Consequences + +- (+) Clear decision tree prevents analysis paralysis during incidents +- (+) Revert-first default limits blast radius +- (+) Stacked chain recovery is defined (not improvised) +- (+) History is preserved (revert commits, not force-push) +- (-) Reverts create noise in git history +- (-) Fix-forward temptation may lead to rushed fixes +- (!) "Production affected" requires definition per deployment (self-hosted varies) + +## References + +- Issue #141 — full RFC with open questions +- ADR-003 — governance (no bypasses during recovery) +- ADR-001 — stacked PRs (chain recovery protocol) +- ADR-009 — security (revert authority tied to role) diff --git a/docs/decisions/011-conflict-resolution.md b/docs/decisions/011-conflict-resolution.md new file mode 100644 index 0000000..8b73852 --- /dev/null +++ b/docs/decisions/011-conflict-resolution.md @@ -0,0 +1,64 @@ +# ADR-011: Conflict resolution protocol + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +Multiple concurrent contributors — human or AI — will propose incompatible approaches, create merge conflicts, and disagree on design. Without a defined escalation path, work stalls or the loudest voice wins. + +## Decision + +### Escalation ladder + +``` +Level 1: Contributor discussion (PR comments, issue thread) + ↓ (no resolution within 2 interactions) +Level 2: Request additional reviewer (fresh perspective) + ↓ (still no resolution) +Level 3: Competing proposals in the issue body (explicit trade-off comparison) + ↓ (still no resolution) +Level 4: Admin decision (binding, documented in issue body) +``` + +### Decision criteria + +When comparing approaches, evaluate on: +1. **Correctness** — does it solve the stated problem? +2. **Simplicity** — fewer moving parts wins when correctness is equal +3. **Consistency** — follows existing codebase patterns? +4. **Reversibility** — can we change our mind later? +5. **Blast radius** — what breaks if this is wrong? + +### Merge conflict ownership + +| Situation | Who resolves | +|-----------|-------------| +| Two PRs modify same file, one merged first | Second PR's author rebases | +| Stacked PR conflict from lower change | Lower PR author notifies; upper PRs rebase after stable | +| Concurrent agents modified same module | First to merge wins; second adapts | +| Architectural conflict (both valid) | Escalate to Level 3 | + +### Human vs. agent disagreement + +- Agents present evidence (code, tests, measurements) not authority +- Humans can override but must document why +- Agents do not repeatedly argue a rejected point +- If an agent believes a human decision causes harm (security, data loss), it escalates to admin + +## Consequences + +- (+) Disagreements have a defined path to resolution +- (+) Merge conflicts have clear ownership +- (+) Competing approaches are compared on criteria, not authority +- (+) Admin decision is the final backstop (no infinite loops) +- (-) Escalation takes time; may slow delivery +- (-) Level 3 (written trade-off) requires effort +- (!) Must not become a veto mechanism for slow contributors + +## References + +- Issue #142 — full RFC with open questions +- ADR-003 — governance (issue body as resolution record) +- ADR-005 — feedback loop (reviewer disagreements feed into this) +- ADR-009 — security (authority levels for decisions) diff --git a/docs/src/content/docs/decisions/006-feature-flags.md b/docs/src/content/docs/decisions/006-feature-flags.md new file mode 100644 index 0000000..668b072 --- /dev/null +++ b/docs/src/content/docs/decisions/006-feature-flags.md @@ -0,0 +1,68 @@ +--- +title: 006 feature flags +--- + +# ADR-006: Feature flags for concurrent development + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +Multiple agents working on related features in the same area must serialize — one waits for the other to merge. Incomplete features either block the main branch or require long-lived branches that diverge. SRE needs kill switches without reverting commits. + +Feature flags enable trunk-based development where incomplete work merges safely behind toggles, and concurrent contributors avoid blocking each other. + +## Decision + +### When to use flags + +| Situation | Use a flag? | +|-----------|-------------| +| Feature spans multiple PRs, incomplete state is unsafe | Yes | +| Two contributors touch the same module for different purposes | Yes | +| SRE needs a kill switch for a new capability | Yes | +| Simple refactor with no behavioral change | No | +| Bug fix | No | +| One-PR feature, complete on merge | No | + +### Flag ownership + +- Every flag has an owner (the issue that introduced it) +- Every flag has an expiration (the issue/PR that removes it) +- Flags without a removal plan are rejected in review + +### Separation of concerns + +- **Planners** decide which features get flags (issue/RFC level) +- **Implementors** add/use flags in code (PR level) +- **SRE/operators** toggle flags in production (runtime level) +- **No self-approval** — the person who introduces a flag cannot approve its removal + +### Flag lifecycle + +1. **Proposed** — issue identifies the need for a flag +2. **Introduced** — PR adds the flag (default: off) +3. **Active** — feature behind flag is in development +4. **Verified** — feature complete, flag toggled on in testing +5. **Permanent** — flag removed, feature is always-on (or removed entirely) + +### Maximum lifetime + +Flags must be removed within one release cycle of the feature being verified. Stale flags are treated as technical debt and surfaced in periodic reviews. + +## Consequences + +- (+) Concurrent work proceeds without blocking +- (+) Trunk-based development: main stays deployable +- (+) SRE can disable features without code changes +- (+) Partial features merge safely +- (-) Flag management overhead +- (-) Combinatorial testing complexity if many flags exist simultaneously +- (!) Maximum lifetime must be enforced or flags accumulate indefinitely + +## References + +- Issue #137 — full RFC with open questions on mechanism (CDK context vs. DynamoDB vs. env vars) +- ADR-003 — governance (flag introduction requires approval) +- ADR-005 — feedback loop (reviewer may flag-gate a feature during review) diff --git a/docs/src/content/docs/decisions/007-knowledge-acquisition.md b/docs/src/content/docs/decisions/007-knowledge-acquisition.md new file mode 100644 index 0000000..d0b8427 --- /dev/null +++ b/docs/src/content/docs/decisions/007-knowledge-acquisition.md @@ -0,0 +1,73 @@ +--- +title: 007 knowledge acquisition +--- + +# ADR-007: Knowledge acquisition through progressive failure + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +Agents with fresh context (tabula rasa) attempt to follow documentation and hit gaps they cannot resolve. These gaps are silently worked around (agent asks a human) rather than systematically fixed. The system cannot self-improve its onboarding because failures are not captured. + +Knowledge acquisition starts from zero. Each iteration creates the roadmap to better knowledge by discovering gaps through actual failures. + +## Decision + +### Zero-context execution attempts + +Periodically, an agent with no project memory attempts to follow guides end-to-end. The agent follows ONLY what is written — no inference, no training data knowledge, no asking colleagues. + +### Failure capture protocol + +At each failure point, the agent: +1. **Stops** — does not attempt to work around or guess +2. **Documents** — creates an issue: which document, which step, what was missing +3. **Continues** — attempts the next step (if possible) to find additional gaps + +### Knowledge artifacts (interim) + +Until documentation meets ADR-004, agents may create ephemeral artifacts: +- Semantic indices of the codebase (call graphs, dependency maps) +- Annotated walkthroughs of successful executions +- "What I learned" summaries after completing a task + +These are scaffolding that informs documentation improvements, not documentation themselves. + +### Maturity model + +| Level | State | Agent capability | +|-------|-------|-----------------| +| 0 | No docs | Cannot start; files issue for missing docs | +| 1 | Partial docs | Follows docs, stops at gaps, files issues | +| 2 | Complete docs (ADR-004) | Completes end-to-end without help | +| 3 | Self-improving | Detects drift between docs and code, auto-files issues | + +### The self-improvement loop + +``` +Agent starts fresh → follows docs → hits failure → + files issue → issue gets fixed → next agent goes further → + hits next failure → files issue → ... + until end-to-end works from zero context +``` + +This runs continuously because code changes outpace documentation and different agent implementations fail at different points. + +## Consequences + +- (+) Documentation gaps become bugs with reproduction steps +- (+) Priority ordering emerges naturally (most common failures surface first) +- (+) The system self-improves without human identification of gaps +- (+) Creates a natural definition of "docs are done" (Level 2 achieved) +- (-) Generates issue volume that needs triage +- (-) Requires periodic investment in zero-context test runs +- (!) The gap between Level 1 and Level 2 may be large — patience required + +## References + +- Issue #138 — full RFC with open questions +- ADR-004 — defines the quality target (tabula rasa test) +- ADR-003 — governance for issues filed by failing agents +- ADR-008 — Level 4 Definition of Done depends on this protocol diff --git a/docs/src/content/docs/decisions/010-error-recovery.md b/docs/src/content/docs/decisions/010-error-recovery.md new file mode 100644 index 0000000..7568a24 --- /dev/null +++ b/docs/src/content/docs/decisions/010-error-recovery.md @@ -0,0 +1,73 @@ +--- +title: 010 error recovery +--- + +# ADR-010: Error recovery and rollback protocol + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +When merged code breaks something, the response is ad-hoc. Agents operating autonomously may merge code that passes CI but breaks integration. No protocol defines when to revert vs. fix forward, who decides, or how stacked PR chains recover. + +## Decision + +### Decision tree + +``` +Broken thing detected +├─ Production affected (users impacted NOW)? +│ └─ Yes → REVERT immediately, investigate after +├─ Fix obvious and < 30 minutes? +│ └─ Yes → Fix forward (new PR, not amend) +├─ Stacked PR chain? +│ └─ Yes → Pause dependent PRs, fix the base +└─ Scope of damage unclear? + └─ Yes → REVERT (safe default), then investigate +``` + +### Revert protocol + +1. Create a revert commit (not force-push) — preserves history +2. Open an issue: what broke, why CI did not catch it, what the fix needs +3. The fix goes through normal review (no rushing, no skipping gates) + +### Fix-forward protocol + +1. Only if the fix is obvious, small, and low-risk +2. Must still go through PR + review +3. If the fix introduces new complexity — revert instead + +### Stacked PR chain recovery + +1. Identify which PR introduced the breakage +2. Pause/close all PRs above it +3. Fix the base PR +4. Rebase and re-evaluate dependent PRs +5. Re-run CI on each before re-opening + +### Agents must NEVER do during recovery + +- Force-push to shared branches +- Delete branches with others' work +- Amend published commits +- Skip review "because it's urgent" +- Self-approve a revert + +## Consequences + +- (+) Clear decision tree prevents analysis paralysis during incidents +- (+) Revert-first default limits blast radius +- (+) Stacked chain recovery is defined (not improvised) +- (+) History is preserved (revert commits, not force-push) +- (-) Reverts create noise in git history +- (-) Fix-forward temptation may lead to rushed fixes +- (!) "Production affected" requires definition per deployment (self-hosted varies) + +## References + +- Issue #141 — full RFC with open questions +- ADR-003 — governance (no bypasses during recovery) +- ADR-001 — stacked PRs (chain recovery protocol) +- ADR-009 — security (revert authority tied to role) diff --git a/docs/src/content/docs/decisions/011-conflict-resolution.md b/docs/src/content/docs/decisions/011-conflict-resolution.md new file mode 100644 index 0000000..2fcf305 --- /dev/null +++ b/docs/src/content/docs/decisions/011-conflict-resolution.md @@ -0,0 +1,68 @@ +--- +title: 011 conflict resolution +--- + +# ADR-011: Conflict resolution protocol + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +Multiple concurrent contributors — human or AI — will propose incompatible approaches, create merge conflicts, and disagree on design. Without a defined escalation path, work stalls or the loudest voice wins. + +## Decision + +### Escalation ladder + +``` +Level 1: Contributor discussion (PR comments, issue thread) + ↓ (no resolution within 2 interactions) +Level 2: Request additional reviewer (fresh perspective) + ↓ (still no resolution) +Level 3: Competing proposals in the issue body (explicit trade-off comparison) + ↓ (still no resolution) +Level 4: Admin decision (binding, documented in issue body) +``` + +### Decision criteria + +When comparing approaches, evaluate on: +1. **Correctness** — does it solve the stated problem? +2. **Simplicity** — fewer moving parts wins when correctness is equal +3. **Consistency** — follows existing codebase patterns? +4. **Reversibility** — can we change our mind later? +5. **Blast radius** — what breaks if this is wrong? + +### Merge conflict ownership + +| Situation | Who resolves | +|-----------|-------------| +| Two PRs modify same file, one merged first | Second PR's author rebases | +| Stacked PR conflict from lower change | Lower PR author notifies; upper PRs rebase after stable | +| Concurrent agents modified same module | First to merge wins; second adapts | +| Architectural conflict (both valid) | Escalate to Level 3 | + +### Human vs. agent disagreement + +- Agents present evidence (code, tests, measurements) not authority +- Humans can override but must document why +- Agents do not repeatedly argue a rejected point +- If an agent believes a human decision causes harm (security, data loss), it escalates to admin + +## Consequences + +- (+) Disagreements have a defined path to resolution +- (+) Merge conflicts have clear ownership +- (+) Competing approaches are compared on criteria, not authority +- (+) Admin decision is the final backstop (no infinite loops) +- (-) Escalation takes time; may slow delivery +- (-) Level 3 (written trade-off) requires effort +- (!) Must not become a veto mechanism for slow contributors + +## References + +- Issue #142 — full RFC with open questions +- ADR-003 — governance (issue body as resolution record) +- ADR-005 — feedback loop (reviewer disagreements feed into this) +- ADR-009 — security (authority levels for decisions) From d7b318dba95041e0caeb4c3ef88ba9e9e2516464 Mon Sep 17 00:00:00 2001 From: bgagent <345885+scottschreckengaust@users.noreply.github.com> Date: Tue, 19 May 2026 07:43:19 +0000 Subject: [PATCH 3/4] docs(ADR-003): GraphQL dependency graph as authoritative, assignments rework MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Predecessor validation now uses GitHub GraphQL blockedBy/subIssues as the machine-enforceable source of truth (hard gate) - Rename "Self-assignment" to "Assignments" — supports self-assignment, directed assignment, and priority-based pickup - Add dependency graph maintenance rules (addBlockedBy, addSubIssue) - Sync rule: graph is authoritative; prose explains rationale - Folds comment feedback from issue #134 discussion Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/decisions/003-contribution-governance.md | 12 +++++++++--- .../docs/decisions/003-contribution-governance.md | 12 +++++++++--- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/docs/decisions/003-contribution-governance.md b/docs/decisions/003-contribution-governance.md index 537b502..d3bd816 100644 --- a/docs/decisions/003-contribution-governance.md +++ b/docs/decisions/003-contribution-governance.md @@ -27,9 +27,9 @@ Issues align to the [product roadmap](https://github.com/aws-samples/sample-auto Only permitted users can mark an issue `approved` — a GitHub Actions workflow validates that the label applicant is authorized. An issue is not workable until it is both approved and assigned. After approval, the issue is considered scope-frozen: further revisions that change deliverables require re-approval. -### Self-assignment on start +### Assignments -Unassigned means available. On starting work, self-assign. Multiple assignees (>1) require intentionality verification. +Unassigned means available. Assignment may happen via self-assignment, directed assignment by another agent/human, or priority-based pickup (inspect open tasks for highest priority + earliest predecessor). Multiple assignees (>1) require intentionality verification. ### Issue body as primary directive @@ -47,10 +47,16 @@ Before implementation, the assigned contributor must: **Priority evaluation:** Identify priority (`p0`/`p1`/`p2`). If asked to work a lower-priority item while higher-priority items are unassigned, challenge: "Should I work on #X (p0) instead?" -**Predecessor validation:** If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework." +**Predecessor validation (GraphQL dependency graph is authoritative):** +- Query the issue's `blockedBy` field via GraphQL — if any blocking issue is open, this issue is **not ready** (hard gate) +- Check `parent`/`subIssues` ordering — verify prior siblings are complete or in-flight +- Reconcile graph vs. prose — graph is authoritative for enforcement; prose explains rationale +- If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework." **Cross-reference audit:** Search open issues for duplicates. Search open PRs (including drafts) for conflicts. Flag overlaps. Check the full dependency graph. Forward-look into downstream actions to ensure alignment. +**Dependency graph maintenance:** When creating/modifying issues with dependencies, use GraphQL mutations (`addBlockedBy`, `addSubIssue`) to maintain the machine-enforceable graph. Update prose to explain rationale. If they diverge, fix the wrong one (usually prose — graph is set programmatically). + **Final gate:** If all checks pass, comment "Starting implementation." ### Identity and attribution diff --git a/docs/src/content/docs/decisions/003-contribution-governance.md b/docs/src/content/docs/decisions/003-contribution-governance.md index 722c3db..67204cb 100644 --- a/docs/src/content/docs/decisions/003-contribution-governance.md +++ b/docs/src/content/docs/decisions/003-contribution-governance.md @@ -31,9 +31,9 @@ Issues align to the [product roadmap](https://github.com/aws-samples/sample-auto Only permitted users can mark an issue `approved` — a GitHub Actions workflow validates that the label applicant is authorized. An issue is not workable until it is both approved and assigned. After approval, the issue is considered scope-frozen: further revisions that change deliverables require re-approval. -### Self-assignment on start +### Assignments -Unassigned means available. On starting work, self-assign. Multiple assignees (>1) require intentionality verification. +Unassigned means available. Assignment may happen via self-assignment, directed assignment by another agent/human, or priority-based pickup (inspect open tasks for highest priority + earliest predecessor). Multiple assignees (>1) require intentionality verification. ### Issue body as primary directive @@ -51,10 +51,16 @@ Before implementation, the assigned contributor must: **Priority evaluation:** Identify priority (`p0`/`p1`/`p2`). If asked to work a lower-priority item while higher-priority items are unassigned, challenge: "Should I work on #X (p0) instead?" -**Predecessor validation:** If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework." +**Predecessor validation (GraphQL dependency graph is authoritative):** +- Query the issue's `blockedBy` field via GraphQL — if any blocking issue is open, this issue is **not ready** (hard gate) +- Check `parent`/`subIssues` ordering — verify prior siblings are complete or in-flight +- Reconcile graph vs. prose — graph is authoritative for enforcement; prose explains rationale +- If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework." **Cross-reference audit:** Search open issues for duplicates. Search open PRs (including drafts) for conflicts. Flag overlaps. Check the full dependency graph. Forward-look into downstream actions to ensure alignment. +**Dependency graph maintenance:** When creating/modifying issues with dependencies, use GraphQL mutations (`addBlockedBy`, `addSubIssue`) to maintain the machine-enforceable graph. Update prose to explain rationale. If they diverge, fix the wrong one (usually prose — graph is set programmatically). + **Final gate:** If all checks pass, comment "Starting implementation." ### Identity and attribution From c0fc0696403474bad42c3e60f1172c873f906028 Mon Sep 17 00:00:00 2001 From: bgagent <345885+scottschreckengaust@users.noreply.github.com> Date: Tue, 19 May 2026 10:18:01 +0000 Subject: [PATCH 4/4] =?UTF-8?q?docs(AGENTS.md):=20add=20governance=20direc?= =?UTF-8?q?tive=20=E2=80=94=20issue=20required=20before=20implementation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds two bullets to top of "Common mistakes": 1. Conversational approval ≠ governance approval — create issue first 2. Branch naming must include issue number Implements the "AGENTS.md directive" row from ADR-003's enforcement mechanisms table. Fixes #150 Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 0346ac3..02296af 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -39,6 +39,8 @@ Handler entry tests: `cdk/test/handlers/orchestrate-task.test.ts`, `create-task. ### Common mistakes +- **Starting implementation without an approved GitHub issue** — Conversational approval ("yes, do it", "go ahead", "start with X") is NOT governance approval. The correct sequence is: create a GitHub issue with acceptance criteria → get the `approved` label from an admin → self-assign → comment "Starting implementation" → then begin work. Even if the user explicitly directs the work in conversation, create the durable artifact (issue) first. See [ADR-003](./docs/decisions/003-contribution-governance.md). +- **Creating branches without an issue reference** — Branch names must follow the pattern `(feat|fix|chore|docs)/-short-description`. A branch without an issue number is unauthorized work. Example: `feat/148-operational-knowledge-stack`. - Editing **`docs/src/content/docs/`** instead of **`docs/guides/`** or **`docs/design/`** — content is generated; sync from sources. - Adding or editing files in **`docs/design/`** or **`docs/guides/`** without running **`cd docs && node scripts/sync-starlight.mjs`** — CI will reject ("Fail build on mutation") because the Starlight mirror files in `docs/src/content/docs/` are stale. Always commit the regenerated mirrors alongside source changes. - Changing **`cdk/.../types.ts`** without updating **`cli/src/types.ts`** — CLI and API drift.