Local command-line orchestration for a coding PR review loop.
Run a local Claude/Codex/Gemini PR review loop using your existing CLI subscriptions.
The main advantage is account reuse: the tool shells out to your
already-authenticated local CLIs (claude, codex, gemini, and gh) instead
of calling model APIs directly. If your local agent CLIs are backed by existing
AI subscriptions or authenticated developer accounts, the review loop can use
those existing entitlements rather than requiring separate model API keys.
Claude billing note (effective June 15, 2026): Non-interactive claude
usage — including claude -p as used by this tool — draws from a separate
monthly Agent SDK credit rather than your interactive subscription pool. Credits
vary by plan ($20/month on Pro, $100 on Max 5×, $200 on Max 20×). See
Anthropic's support article
for details. Gemini CLI and Codex CLI have their own separate billing models.
This is for developers who already use Claude Code, OpenAI Codex CLI, Gemini CLI, and GitHub, and want one local agent to implement or fix a PR while another local agent reviews it before merge.
It is especially useful when you are already doing this manually by switching between agent CLIs and copying review feedback back and forth.
GitHub Actions-based agent loops usually need model API keys, hosted workflow permissions, and separate API billing. This tool keeps the loop on your local machine and uses the CLI accounts you have already authenticated.
That makes it easier to experiment with agent-to-agent review loops before committing to hosted automation. It also keeps local workspace setup, credentials, and agent approval prompts under your direct control.
Note that as of June 15, 2026, Claude CLI non-interactive usage draws from a separate monthly Agent SDK credit (see billing note above) rather than being unlimited within a subscription — so very high-volume automated use may still incur costs depending on your plan.
Several related projects exist. coding-review-agent-loop is deliberately
positioned as a standalone local CLI for GitHub PR lifecycle orchestration:
one agent creates or fixes a PR, one or more reviewers review it, and the loop
continues until approval.
| Tool | Focus | How this project differs |
|---|---|---|
| claude-review-loop | Claude Code plugin that has Claude implement, then Codex review. | This project is not a Claude plugin; it is a standalone CLI that can start from an issue, task, or existing PR and can reverse the coder/reviewer direction. |
| codex-review | Claude Code plugin for Codex review of plans and implementations. | This project focuses on GitHub PR creation, review, fix, and approval loops rather than plan/artifact review inside Claude Code. |
| reviewd | Local PR review assistant for GitHub/BitBucket using Claude, Gemini, or Codex CLI. | This project focuses on agent-to-agent implementation loops where the coder can create/fix the PR and reviewers gate approval. |
| codex-plugin-cc | Use Codex from inside Claude Code for review or delegated tasks. | This project stays outside either agent host and orchestrates local CLIs plus GitHub directly. |
Currently supported local agent CLIs:
- Claude Code via
claude - OpenAI Codex CLI via
codex - Gemini CLI via
gemini
Clone the repo first:
gh repo clone wwind123/coding-review-agent-loop
cd coding-review-agent-loopThen install the CLI into a local virtual environment:
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e .
agent-loop --helpThis installs the agent-loop command from your checkout. The tool still
requires local gh, claude, codex, and/or gemini authentication depending
on which agents you use.
Use this if you are changing coding-review-agent-loop itself:
gh repo clone wwind123/coding-review-agent-loop
cd coding-review-agent-loop
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e '.[dev]'
python -m pytestStart from a GitHub issue when you want the agent loop to use the issue title, body, and comments as the implementation task. Comments are included oldest to newest, and prompts tell agents that later comments may refine or supersede the original issue body:
agent-loop issue 123 --repo OWNER/REPOFor larger or ambiguous issues, add --plan-first to run a plan review on the
issue before code is written. The coder may inspect the checkout but must not
edit files, push, or open a PR during planning. Reviewers approve or block with
AGENT_PLAN_STATE markers using explicit plan-review sections:
### Blocking plan issues
### Same-plan follow-ups
### Future follow-upsWhen earlier plan issues remain open, reviewers must also include
### Prior unresolved plan item dispositions and disposition every carried item
exactly once with resolved, still blocking, same-plan, or future follow-up. Future follow-ups are accepted only in approved plan reviews and
are reconciled with the final approved plan instead of reopening planning. If
--approved-followups=issue or fix-and-issue is enabled and implementation
will continue after approval, those plan-stage future follow-ups are filed as
separate issues before implementation starts. If implementation continues but
issue filing is disabled, they are summarized in the planning-complete comment
with an explicit note that they are not carried into PR review. Planning
item-* IDs visible in issue history are not PR prior review items unless they
are repeated in the active PR unresolved-item ledger. By default the loop posts
the approved plan summary and stops without filing follow-up issues; add
--implement-after-approval to continue into the normal PR flow:
agent-loop issue 123 --repo OWNER/REPO --plan-first --implement-after-approval--plan-first also supports explicit post-approval modes:
agent-loop issue 123 --repo OWNER/REPO --plan-first --plan-execution-mode plan-only
agent-loop issue 123 --repo OWNER/REPO --plan-first --plan-execution-mode decompose-only
agent-loop issue 123 --repo OWNER/REPO --plan-first --plan-execution-mode implement-one-shot
agent-loop issue 123 --repo OWNER/REPO --plan-first --plan-execution-mode implement-by-phaseplan-only is the default. implement-one-shot is the same behavior selected
by the backward-compatible --implement-after-approval flag. decompose-only
asks the coder to turn the approved plan into ordered phases, always creates
one GitHub child issue per phase, posts a parent summary table, and stops.
implement-by-phase creates every child issue, then implements only the first
agent-pr phase and stops after that PR review loop. Before entering that child
implementation, the parent issue records a one-time handoff marker. Parent
reruns after that marker do not re-run the child implementation; resume directly
with agent-loop issue <child>. If decomposition already exists without a
handoff marker, the first child is treated as not yet attempted and the handoff
is recorded once. If the first phase is human-action or manual-close, the
loop creates and reports all child issues but stops so a human can do the
required work, add a remark/update, and close that child issue.
Each generated child issue copies the relevant parent-plan slice, constraints
and invariants, dependency notes, scope and non-goals, rollout risk,
validation/soak requirements, automation classification, and instructions for
agent execution or human closure. Decomposition is capped at 8 phases; an
over-cap response is rejected and must be consolidated, not truncated. This cap
is separate from the approved-review follow-up issue cap used by
--approved-followups.
Provide a one-off task directly when there is no issue yet:
agent-loop task "Add a health check endpoint" --repo OWNER/REPORun the loop against an existing pull request when you want another review and iteration pass:
agent-loop pr 456 --repo OWNER/REPOIf --repo is omitted, the tool runs gh repo view from the current working
directory, or from --codex-dir when that flag is provided, and uses the
detected OWNER/REPO. Pass --repo explicitly when running outside the target
repository.
When --claude-dir, --codex-dir, or --gemini-dir is omitted for an active
agent, the tool creates or reuses a repo-scoped temporary checkout such as
/tmp/coding-review-agent-loop/OWNER-REPO/codex/repo. Existing clean temp
checkouts are fetched and fast-forwarded on the base branch before the agent
runs. Default temp checkouts are tool-owned and disposable; if one is dirty,
the tool resets and cleans it before reuse. Explicit persistent directories are
kept conservative: dirty explicit workdirs fail clearly, and existing git
checkouts must point at the requested repository. Use explicit persistent
directories for large repositories, long-lived agent worktrees, or setups that
should survive /tmp cleanup or reboot.
Agent memory is enabled by default. Before invoking agents, the loop creates or
refreshes advisory repo memory in a durable, repo-scoped user cache directory
such as ~/.cache/coding-review-agent-loop/repos/OWNER-REPO/memory on Linux:
repo summary, architecture map, module index, execution/test profile, toolchain
facts, and changed files since the previous memory commit. On macOS the default
root is ~/Library/Caches/coding-review-agent-loop; on Windows it is
%LOCALAPPDATA%/coding-review-agent-loop/Cache. Agent prompts state that this
cache is stale-prone orientation only, and that agents must inspect source files
and PR diffs directly for correctness claims. The cache is local-only. Disable
it with --no-agent-memory, force a refresh with --refresh-agent-memory,
customize the location with --agent-memory-dir PATH, or refresh only test
command facts with --refresh-test-profile. Relative --agent-memory-dir
values are resolved inside the coder checkout. If you keep sensitive repo
details out of local cache retention, use --no-agent-memory or a custom
short-lived location. If the previous memory commit is unavailable or no longer
diffable, the loop logs the git failure and treats all tracked files as changed
for that refresh.
Use --test-command to add a local test gate:
agent-loop task "Fix the flaky test" --repo OWNER/REPO --test-command "python -m pytest"By default, the command runs after coder-created or coder-updated changes before
reviewer rounds, and again after final reviewer approval before auto-merge. Add
--no-pre-review-tests if you only want the final post-approval local test
gate. The coder prompt also asks the coding agent to report the exact tests it
ran, or explain why it could not run tests.
By default Claude is the coder and Codex is the reviewer. Reverse that with:
agent-loop task "Fix the flaky test" --repo OWNER/REPO --coder codex --reviewer claudeUse Gemini as either side of the loop:
agent-loop task "Improve error handling" \
--repo OWNER/REPO \
--coder gemini \
--reviewer codex
agent-loop pr 456 \
--repo OWNER/REPO \
--reviewer geminiRepeat --reviewer to require approvals from multiple reviewers. The PR is
approved only after every configured reviewer approves in the same round. The
coder may also be listed as a reviewer when you want the same agent to work in
separate coding and review passes:
agent-loop pr 456 --repo OWNER/REPO --reviewer codex --reviewer claudeStructured-response runs use a three-level interpretation order:
- Structured JSON payloads are authoritative when present in the agent output.
- For resume/replay,
AGENT_LOOP_METAattached to orchestrator-posted comments is the canonical source of the active round ledger, carriedprior_items, completed reviewer dispositions, and nextitem-Nallocation for that structured-response round. - Markdown section parsing remains a compatibility fallback for interpreting comments that do not include a structured payload or metadata for the current round.
Mixed histories are expected during rollout. Old raw-markdown comments can
remain earlier in the issue or PR thread, while newer orchestrator-rendered
comments carry AGENT_LOOP_META. When metadata exists for the current head or
plan subject, resume reconstruction uses that metadata-backed ledger and ignores
stale visible item IDs from older heads, superseded plans, or replayed rounds.
If a PR head advances but no current-head coder metadata was recorded, the PR
loop recovers from metadata-backed active blocking and same-pr items on the
latest recorded head and routes them through a coder follow-up before reviewers
run again.
Structured JSON is also the preferred coder format for follow-up and plan
revision rounds. Coder follow-up responses use kind: "coder_followup" with
state, summary, addressed_items, remaining_items,
human_requirements, optional addressed_item_notes / remaining_item_notes,
and optional tests_run; every carried reviewer item ID must appear exactly
once in either addressed_items or remaining_items.
Plan-revision responses use kind: "plan_revision" with state: "blocking",
summary, prior_plan_item_dispositions, and plan_steps. Structured
responses must start with one top-level JSON object, place the matching
AGENT_STATE or AGENT_PLAN_STATE footer immediately after it, and end with
only the standalone agent signature. The loop renders validated structured
payloads into normal public GitHub comments, so raw JSON is not posted.
When a structured plan review, plan revision, PR review, or coder follow-up is present but malformed, the loop may run one Gemini-backed repair pass. The repair pass is format-only: it asks Gemini to re-emit the agent's intent as the required JSON object, footer marker, and signature. The repaired response is accepted only if it passes the same strict validation as the original response; failed repairs remain local protocol errors and are not posted to GitHub.
Signed human reviewer comments are approval-critical when they end with a
standalone -- Human Reviewer signature. The loop surfaces those requirements
to coders and reviewers. In markdown fallback paths, coders must include
<!-- HUMAN_REQUIREMENTS_ADDRESSED --> plus a ### Human requirements section
that covers every surfaced Requirement N; in structured coder follow-ups, the
same acknowledgement is carried in the human_requirements object. If details
were omitted to keep a prompt bounded, the coder must state that it checked the
GitHub discussion directly before responding. Reviewers cannot approve signed
requirements as resolved unless the approved review includes
<!-- HUMAN_REQUIREMENTS_RESOLVED -->; otherwise the loop carries a synthetic
human-requirements acknowledgement item into the next round.
When --approved-followups is set to summarize, issue, or a fix-and-*
mode, approved reviews may include future work under:
### Future follow-ups
- Add a follow-up test.Reviewers should use that section only for substantial work that is better
handled in a separate issue or PR. The legacy heading
### Non-blocking follow-ups is still parsed as future work for compatibility.
Approval means the review is fully complete for that round: no new blocking
work, and no carried-forward unresolved items left active in the reviewer’s
disposition section.
When --approved-followups uses a fix-and-* mode, blocking reviews may also
include small, localized, low-risk current-PR cleanup under:
### Same-PR follow-ups
- Rename a helper before merge.Same-PR follow-ups are sent back to the coder in the existing PR and require a new review round. They may not appear in an approved review. They should stay narrowly scoped to files already touched by the PR or directly adjacent code; larger redesigns and independent work belong under Future follow-ups. Approved future follow-ups remain in the round-to-round ledger so later reviewers can explicitly confirm they are still future work, resolved, or should be promoted back to same-PR or blocking status. The final summary or issue creation uses the remaining future items from that reconciled ledger, not only the final round's newly written Future follow-ups.
Before posting summaries or creating issues, the loop deduplicates the remaining
future items across reviewers using deterministic topic keys from headings,
code identifiers, docs/files, and normalized wording. The selected issue body
keeps the canonical wording plus an Original reviewer notes section so
reviewer provenance and later disposition notes are not lost. The issue modes
create at most three follow-up issues to avoid issue noise, and the final PR
comment reports how many items were filed or summarized, deduplicated, or
skipped by the cap.
Plan reviews follow the same rule: approved plan reviews may include Future
follow-ups only. Blocking plan issues, Same-plan follow-ups, or carried-forward
plan items left still blocking or same-plan keep the planning round
unapproved. Plan-stage future follow-up issues are filed before implementation
begins in issue-filing modes; PR-stage approved-review future follow-up issues
are filed after final PR approval.
By default, --approved-followups=ignore asks reviewers not to include
approved-review follow-up sections. Reviewers should mark the review blocking
instead when cleanup should be fixed before merge.
Each top-level issue, task, or pr run also writes a machine-readable
usage summary beside the normal agent logs in --log-dir as
<run-id>-usage-summary.json. The file aggregates per-call, per-agent, and
whole-run usage, including retries. When a backend exposes token counters, the
summary records them as exact or partial; when a backend exposes no usable
usage data, the loop falls back to a clearly labeled estimate based on prompt
and public-response size. --dry-run does not fabricate token usage.
--approved-followups accepts:
ignore: ignore approved follow-up sections. This is the default.summarize: post future follow-ups as a grouped PR comment.issue: create GitHub issues for future follow-ups, then comment with the created issue links.fix-and-summarize: send same-PR follow-ups to the coder for another review round, then summarize future follow-ups after final approval.fix-and-issue: send same-PR follow-ups to the coder for another review round, then create issues for future follow-ups after final approval and comment with the created issue links.
To keep a grouped record on the PR or create follow-up issues, use:
agent-loop pr 456 --repo OWNER/REPO --approved-followups summarize
agent-loop pr 456 --repo OWNER/REPO --approved-followups issue
agent-loop pr 456 --repo OWNER/REPO --approved-followups fix-and-summarizeBullets and prose paragraphs inside the Same-PR follow-ups, Future follow-ups,
and legacy Non-blocking follow-ups sections are parsed. Each section ends at
the next heading, HTML marker, or agent signature, so final protocol markers
are not mistaken for follow-up text.
The remaining legacy compatibility surface is intentional:
- Markdown review/plan parsing remains supported in the resume path for already-completed reviewer rounds that predate structured metadata.
- The legacy heading
### Non-blocking follow-upsstill maps to future work. - Marker-only markdown paths remain compatibility fallbacks; new follow-up, review, and plan-revision examples should use structured JSON first.
- Resume reconstruction should not depend on reparsing old prose once
AGENT_LOOP_METAexists for the active structured-response round.
Agent subprocess logs are written under .agent-loop-logs/ in the active
checkout, and long-running agents print heartbeat lines with the exact log
path. GitHub comments come from validated public response files under
/tmp/coding-review-agent-loop/responses/... or validated fallback stdout, not
from raw logs. If an agent looks stuck or returns diagnostics, inspect the
heartbeat log path and the response-file path; quota/reset failures may exit
early with rerun guidance, while narrower transient failures retry according to
--agent-max-retries and --agent-retry-backoff-seconds. Repair-pass attempts
are also visible in the log as schema-validation failure, repair attempt, and
recovered-or-invalid repair messages.
For trusted local automation that must run without approval prompts:
agent-loop issue 123 --repo OWNER/REPO --dangerous-agent-permissionsThis project uses agent-loop to improve itself. This command asked Codex to
review existing issue and PR feedback, with both Claude and Gemini reviewing
the result. The work became PR #13:
#13
~/tools/coding-review-agent-loop/.venv/bin/agent-loop task \
"Please go over all issue and PR reviews again and see if any future follow-ups are still worth addressing but have not been addressed." \
--repo wwind123/coding-review-agent-loop \
--coder codex \
--reviewer claude \
--reviewer gemini \
--dangerous-agent-permissionsSee docs/local_agent_loop.md for the architecture diagram, full usage, and safety notes.
python -m pytestTests use fake subprocess runners. They do not call real claude, codex, gemini, or gh.