Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Use this routing before editing so the right package and tests get updated:
| Agent runtime (clone, tools, prompts, container) | `agent/src/` (`pipeline.py`, `runner.py`, `config.py`, `hooks.py`, `policy.py`, `prompts/`, Dockerfile, etc.) | `agent/tests/`, `agent/README.md` for env/PAT |
| Agent progress events (written to `TaskEventsTable` from the MicroVM; read by `bgagent watch`) | `agent/src/progress_writer.py`, `agent/src/pipeline.py` and `agent/src/runner.py` (integration points) | `agent/tests/test_progress_writer.py`; `cli/src/commands/watch.ts` for the consumer side |
| User-facing or design prose | `docs/guides/`, `docs/design/` | Run **`mise //docs:sync`** or **`mise //docs:build`** (do not edit `docs/src/content/docs/` by hand) |
| Architecture decisions (ADRs) | `docs/decisions/` | Run **`mise //docs:sync`** after adding or editing an ADR |
| Monorepo tasks, CI glue | Root `mise.toml`, `scripts/`, `.github/workflows/` | — |

### CDK handler tests (quick map)
Expand All @@ -38,6 +39,8 @@ Handler entry tests: `cdk/test/handlers/orchestrate-task.test.ts`, `create-task.

### Common mistakes

- **Starting implementation without an approved GitHub issue** — Conversational approval ("yes, do it", "go ahead", "start with X") is NOT governance approval. The correct sequence is: create a GitHub issue with acceptance criteria → get the `approved` label from an admin → self-assign → comment "Starting implementation" → then begin work. Even if the user explicitly directs the work in conversation, create the durable artifact (issue) first. See [ADR-003](./docs/decisions/003-contribution-governance.md).
- **Creating branches without an issue reference** — Branch names must follow the pattern `(feat|fix|chore|docs)/<issue-number>-short-description`. A branch without an issue number is unauthorized work. Example: `feat/148-operational-knowledge-stack`.
- Editing **`docs/src/content/docs/`** instead of **`docs/guides/`** or **`docs/design/`** — content is generated; sync from sources.
- Adding or editing files in **`docs/design/`** or **`docs/guides/`** without running **`cd docs && node scripts/sync-starlight.mjs`** — CI will reject ("Fail build on mutation") because the Starlight mirror files in `docs/src/content/docs/` are stale. Always commit the regenerated mirrors alongside source changes.
- Changing **`cdk/.../types.ts`** without updating **`cli/src/types.ts`** — CLI and API drift.
Expand Down
5 changes: 5 additions & 0 deletions docs/astro.config.mjs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

105 changes: 105 additions & 0 deletions docs/decisions/001-stacked-pull-requests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# ADR-001: Stacked pull requests for multi-PR features

**Status:** accepted
**Date:** 2026-05-19

## Context

Complex features in ABCA often span multiple packages, resource types, and concerns. Delivering these as a single large PR creates several problems:

- **Review fatigue:** PRs exceeding ~500 lines suffer from diminished reviewer attention — critical issues get missed in the noise of mechanical changes.
- **Context loss:** Without a framework, sequential PRs leave reviewers without knowledge of where they are in the overall delivery, what came before, or what remains.
- **Agent discoverability:** AI coding agents picking up a sub-task cannot determine the broader goal, prior decisions, or remaining work without reconstructing context from scattered commits and issues.
- **Blocked progress:** A single large PR blocks all progress until the entire feature is reviewed. Stalling on one concern (e.g., IAM review) blocks unrelated work (e.g., documentation).

The [Pragmatic Engineer analysis of stacked diffs](https://newsletter.pragmaticengineer.com/p/stacked-diffs) documents how organizations (Meta, Google, Graphite users) use this pattern to maintain velocity on complex changes while keeping review quality high.

## Decision

Use **stacked pull requests** for features requiring 4+ files changed or spanning multiple concerns. Each PR in the stack follows these rules:

### 1. Position statement

Every PR description states its position:

```markdown
## Stack position

PR {N} of {M} for #{parent-issue} — {overall goal one-liner}

### Prior (PR {N-1}): {what was delivered}
### This PR: {what this adds}
### Remaining ({M-N} PRs): {what comes next}
```

This gives reviewers and agents immediate orientation without reading the parent issue.

### 2. Branch targeting

- PR 1 targets `main`
- PR N targets PR N-1's branch
- Final PR merges the full stack to `main`

```
main
└── feat/first-concern (PR 1)
└── feat/second-concern (PR 2)
└── feat/third-concern (PR 3 → merge to main)
```

### 3. Self-contained reviewability

Each PR:
- Compiles and passes tests independently
- Can be deployed without breaking the system
- Has a single clear responsibility (one concern per PR)
- Does not leave dead code, TODOs, or broken intermediate states

### 4. Size guidelines

| Metric | Target | Maximum |
|--------|--------|---------|
| Lines changed | 200–400 | 600 |
| Review time | 20–30 min | 45 min |
| Files touched | 3–8 | 12 |

If a PR exceeds these, decompose further.

### 5. Rebase discipline

When a lower PR changes after review feedback:
- All PRs above it in the stack must be rebased
- CI must pass on each PR independently after rebase
- Reviewers are notified of the rebase (GitHub does this automatically)

### 6. Sub-issue linking

- Parent issue lists all sub-issues with a stack visualization diagram
- Each sub-issue references the parent and its position in the stack
- GitHub's task list in the parent tracks completion
- Estimated review time is listed per sub-issue to help reviewers plan

### 7. When NOT to use stacked PRs

- Changes under ~200 lines that fit naturally in one PR
- Hotfixes that need immediate merge
- Dependency bumps (use Dependabot grouping instead)
- Documentation-only changes that are self-contained

## Consequences

- (+) Each PR stays in the "reviewable without fatigue" window (~15–40 min)
- (+) Agents can pick up any sub-issue independently — the position statement provides full context
- (+) Partial delivery is meaningful — each merged PR adds value independently
- (+) Reviewers approve incrementally without needing full-stack mental context
- (+) Early PRs can merge and ship while later ones are still in review
- (-) Rebase cascades when early PRs receive feedback
- (-) More overhead in PR descriptions and branch management
- (-) Requires discipline to keep each PR independently valid (no "this will be fixed in PR N+1")
- (!) If the stack grows beyond ~8 PRs, consider decomposing into independent sub-stacks

## References

- [Stacked Diffs — Pragmatic Engineer](https://newsletter.pragmaticengineer.com/p/stacked-diffs)
- RFC #120 — first formal use of this pattern in ABCA
- Issue #129 — implementation of this ADR
72 changes: 72 additions & 0 deletions docs/decisions/002-least-privilege-bootstrap-policies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# ADR-002: Least-privilege CDK bootstrap policies as code

**Status:** accepted
**Date:** 2026-05-19
**References:** ADR-001 (delivery methodology)

## Context

CDK bootstrap creates five roles per account/region. The **CloudFormation execution role** (cdk-hnb659fds-cfn-exec-role) receives `AdministratorAccess` by default — CloudFormation assumes it to create, modify, and delete stack resources. This violates least-privilege and may conflict with organizational SCPs or compliance gates.

The ABCA project documented three scoped policies in `docs/design/DEPLOYMENT_ROLES.md` (PR #46), validated against a live deployment through 7 iterations and 36 CloudTrail-discovered actions. However, these policies exist only as JSON blobs in a Markdown file — unversioned, untested, and manually applied.

**Failure mode without automation:** When a new release adds a resource type (e.g., SQS queue), operators who pull and deploy hit a mid-rollback CloudFormation failure because their bootstrap policy predates the new permissions. The deploy fails 15 minutes in with no prior warning.

**Constraints:**
- IAM managed policies have a 6,144-character limit — hence the three-policy split (Infrastructure, Application, Observability).
- Bootstrap must exist before the CDK app can deploy — circular dependency prevents managing bootstrap from within the app stack.
- The four other bootstrap roles (deploy, lookup, file-publishing, image-publishing) are already scoped by the default template and don't need modification.

## Decision

### Policies as typed TypeScript code in `cdk/src/bootstrap/`

Rationale for location:
- **Agent routing** — `AGENTS.md` routes CDK/IAM changes to `cdk/`. An agent modifying a construct that adds a DynamoDB table naturally looks here for the policy it must update.
- **Testability** — Jest tests can assert policy size limits, validate structure, and verify coverage against the synthesized template.
- **Co-location** — the CDK app defines what resources exist (and therefore what permissions are needed); both live in the same package.
- **Self-contained** — `cdk/` has its own `mise.toml`, build, and test pipeline.

### Triple-layer versioning

| Layer | Purpose |
|-------|---------|
| **Semver** | Quick operator answer: "do I need to re-bootstrap?" Major = breaking. |
| **SHA256 hash** | Detects console drift — manual IAM edits that diverge from code. |
| **Action-set comparison** | Precise gap reporting: exactly which actions are missing. |

Semver and hash are emitted as CloudFormation outputs on the CDKToolkit stack, enabling automated preflight checks.

### Two-layer preflight validation

1. **CDK Aspect (synth-time)** — runs during `mise //cdk:synth`, visits every `CfnResource`, looks up required actions in a resource-action-map, compares against declared policy. Catches issues at dev time.
2. **Live-account validator (deploy-time)** — `mise //cdk:preflight` reads CDKToolkit stack outputs, compares version/hash against requirements. Fails fast with an actionable "re-bootstrap required" message before CloudFormation starts.

### Custom bootstrap template

Generated from the policy source code (not hand-maintained). Operators run `mise //cdk:bootstrap` to provision least-privilege roles in a single command. The template replaces `AdministratorAccess` with the three managed policies while retaining all other default bootstrap resources.

### Delivery via stacked PRs (ADR-001)

The implementation is decomposed into 8 sub-issues, each independently reviewable and deployable. See RFC #120 for the full stack.

## Consequences

- (+) Policies are diffable in PRs — IAM changes are code-reviewed like any other code
- (+) Tests enforce the 6,144-char limit and structural validity on every commit
- (+) Preflight prevents the "deploy, wait 15 minutes, fail, rollback" loop
- (+) Single `mise //cdk:bootstrap` command replaces the multi-step manual process
- (+) Agents can automatically update policies when they add new resource types
- (-) Resource-action-map requires maintenance when new AWS resource types are added
- (-) Rebase complexity from the 8-PR stack
- (!) Bootstrap template drift — CDK upstream may change defaults; requires rebase on CDK major upgrades
- (!) Operators with existing deployments must re-bootstrap (documented upgrade path provided)

## References

- Issue #121 — implementation tracking for this ADR
- RFC #120 — parent issue with full design and sub-issue breakdown
- `docs/design/DEPLOYMENT_ROLES.md` — current documentation (will become generated)
- PR #46 — original policy derivation and validation methodology
- [CDK default bootstrap template](https://github.com/aws/aws-cdk/blob/main/packages/aws-cdk/lib/api/bootstrap/bootstrap-template.yaml)
- [IAM managed policy size limit](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html)
124 changes: 124 additions & 0 deletions docs/decisions/003-contribution-governance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# ADR-003: Contribution governance for async agents and humans

**Status:** accepted
**Date:** 2026-05-19

## Context

ABCA is designed for multiple autonomous agents to work concurrently on the codebase. Without explicit governance rules, agents duplicate effort, start unapproved work, ignore priority order, miss predecessors, and create merge conflicts that require human intervention to untangle.

The rules below define how any contributor — human or AI — picks up, owns, and delivers work. They prevent priority inversion, wasted rework, unauthorized scope creep, and silent conflicts at scale.

## Decision

### No branches without an Issue

Every feature branch references an issue in its name (e.g., `feat/123-short-description` or `fix/456-bug-name`). A branch without an issue reference is unauthorized work. This prevents the failure mode where work is started "just to explore" and then snowballs into a PR without governance.

### No PRs without an Issue

Every PR references an issue. The issue provides rationale, sufficient context for the solution to be obvious, and verifiable acceptance criteria.

### Issue quality bar

An issue is "ready for work" when a contributor can read the body alone — without comments, related issues, or clarifying questions — and know exactly what to build.

### Roadmap alignment

Issues align to the [product roadmap](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/blob/main/docs/guides/ROADMAP.md). Issues that do not align require explicit admin approval.

### Admin approval gate

Only `admin` users can mark an issue `approved` (via GitHub label). Unapproved and unassigned issues are not workable. A GitHub Actions workflow prevents non-admins from adding the `approved` label.

### Assignments

Unassigned means available. Assignment may happen via self-assignment, directed assignment by another agent/human, or priority-based pickup (inspect open tasks for highest priority + earliest predecessor). Multiple assignees (>1) require intentionality verification.

### Issue body is source of truth

Discussion threads are folded back into the body. Unresolved conflicts are marked explicitly:
- `**UNRESOLVED:** <question>` — blocks implementation
- `**DEFERRED:** <question> — tracked in #N` — does not block

### Pre-start review

Before implementation, the assigned contributor must:

**Read and verify:** All comments read, no unresolved conflicts.

**Priority evaluation (soft gate):** Identify priority (`p0`/`p1`/`p2`). If asked to work a lower-priority item while higher-priority items are unassigned, challenge: "Should I work on #X (p0) instead?" Proceed if confirmed.

**Predecessor validation (GraphQL dependency graph is authoritative):**
- Query the issue's `blockedBy` field via GraphQL — if any blocking issue is open, this issue is **not ready** (hard gate)
- Check `parent`/`subIssues` ordering — verify prior siblings are complete or in-flight
- Reconcile graph vs. prose — graph is authoritative for enforcement; prose explains rationale
- If predecessors are incomplete, unassigned, and not in a stacked PR — challenge: "Steps 1-3 are incomplete. Starting step 4 may cause rework."

**Cross-reference audit:** Search open issues for duplicates. Search open PRs (including drafts) for conflicts. Flag overlaps. Check the full dependency graph.

**Dependency graph maintenance:** When creating/modifying issues with dependencies, use GraphQL mutations (`addBlockedBy`, `addSubIssue`) to maintain the machine-enforceable graph. Update prose to explain rationale. If they diverge, fix the wrong one (usually prose — graph is set programmatically).

**Final gate:** If all checks pass, comment "Starting implementation."

### Identity and attribution

Agents use identifiable credentials. The prompting user and acting agent must be distinguishable. PRs include `Co-Authored-By` for AI contributors.

### Work-in-progress discipline

Provide progress signals at checkpoints. If blocked or abandoning, comment and unassign. Stale assignments (no draft PR and no activity for N configurable hours) are auto-unassigned. Do not start multiple issues simultaneously unless explicitly parallelizable.

### Completion and handoff

CI passes before requesting review. After merge, verify acceptance criteria and close. Create follow-up issues for discovered work before closing.

### Conversational approval is NOT issue approval

A user saying "yes, do it" or "go ahead" in a conversation does NOT satisfy the governance gate. The correct response to conversational approval is:

1. Create an issue with acceptance criteria
2. Request the `approved` label from an admin
3. Self-assign once approved
4. Then begin implementation

**Known failure mode:** Agents interpret conversational momentum ("Yes start with X") as authorization to skip issue creation. This is the most common governance bypass — it feels like permission because the user explicitly directed the work, but the governance requires a *durable, reviewable artifact* (the issue), not a transient conversation.

**Why this matters:** Conversations are ephemeral. Issues are auditable. If an agent creates work based on a conversation and that conversation is lost (context compaction, session end), no record exists of what was authorized, what the acceptance criteria were, or why the work was started.

### Enforcement mechanisms

Prose governance is necessary but insufficient. The following enforcement points prevent bypass:

| Mechanism | Layer | What it catches |
|-----------|-------|-----------------|
| Branch name convention | Git workflow | Branch must match `(feat|fix|chore|docs)/<issue-number>-*` — rejects branches without issue reference |
| Commit-msg hook (Tier 0) | Pre-commit | Rejects commits without `Refs #N` or `Fixes #N` |
| Pre-push hook (Tier 1) | Pre-push | Validates referenced issue exists and has `approved` label via `gh` API |
| Claude Code hook (`PreToolUse: Write`) | Agent runtime | Blocks file creation in governed paths without declared issue context |
| Skill gate: `pickup-issue` | Agent workflow | Agent must invoke before implementation — hard-fails without valid issue |
| AGENTS.md directive | Agent prompt | Explicit instruction: "Do NOT begin implementation without an approved issue, even if the user says 'go ahead' in conversation" |

**Progressive enforcement:** Start with the commit-msg hook (cheapest, catches all contributors). Add pre-push validation next. Skill gates enforce at the agent-workflow level (ADR-012, Layer 3).

## Consequences

- (+) Prevents duplicate effort — assignment signals ownership
- (+) Prevents priority inversion — agents challenge low-priority requests
- (+) Prevents rework — predecessor validation catches out-of-order work
- (+) Issue body stays current — threads are folded back
- (+) Cross-reference audit catches duplicates early
- (+) Enforcement mechanisms catch bypass at multiple points
- (-) Pre-start overhead for small tasks
- (-) Requires discipline to fold threads into body
- (-) Commit-msg hook adds friction for rapid iteration on approved work
- (!) Assumes priority labels exist and are maintained
- (!) Conversational approval bypass is the most common failure — enforcement must be structural, not behavioral

## References

- Issue #134 — full RFC with resolved questions and automation requirements
- Roadmap: Scale and collaboration (Agent swarm, Multi-user and teams)
- ADR-001 — delivery methodology referenced by completion rules
- ADR-012 — operational knowledge stack (enforcement via skill gates)
- ADR-013 — tiered validation (enforcement hooks at Tier 0 and Tier 1)
Loading