diff --git a/README.md b/README.md
index e633cffd..2118042d 100644
--- a/README.md
+++ b/README.md
@@ -63,6 +63,7 @@ Ralph loops give you:
| **grow-coverage** | Write tests for untested modules, one per iteration, until coverage hits the target |
| **security-audit** | Hunt for vulnerabilities — scan, find, fix, verify, repeat |
| **clear-backlog** | Work through a TODO list or issue tracker, one task per loop |
+| **promise-completion** | Work until a target is done, then emit a promise tag so the loop stops early |
| **write-docs** | Generate documentation for undocumented modules, one at a time |
| **improve-codebase** | Find and fix code smells, refactor patterns, modernize APIs |
| **migrate** | Incrementally migrate files from one framework or pattern to another |
@@ -95,11 +96,17 @@ Scaffold a ralph and start experimenting:
ralph scaffold my-ralph
```
+The scaffolded `RALPH.md` includes the normal command/arg template plus a commented promise-completion path you can enable if the agent should stop early by emitting a matching `...` tag.
+
+For a committed example, see [`examples/promise-completion/RALPH.md`](examples/promise-completion/RALPH.md) — it shows a loop that exits early once the requested target is complete.
+
Edit `my-ralph/RALPH.md`, then run it:
```bash
ralph run my-ralph # loops until Ctrl+C
ralph run my-ralph -n 5 # run 5 iterations then stop
+# from a repo checkout:
+ralph run examples/promise-completion -n 10 --target "stabilize the failing auth tests"
```
### What `ralph run` does
diff --git a/docs/agents.md b/docs/agents.md
index a14d9a04..36e1e602 100644
--- a/docs/agents.md
+++ b/docs/agents.md
@@ -36,9 +36,34 @@ Your agent must:
1. **Read a prompt from stdin** — the full assembled prompt is piped in
2. **Do work in the current directory** — edit files, run commands, make commits
-3. **Exit when done** — exit code 0 means success, non-zero means failure
+3. **Exit cleanly** — exit code `0` means the agent process succeeded; non-zero means failure
+4. **Optionally emit a completion signal** — set `completion_signal` in frontmatter (default inner text: `RALPH_PROMISE_COMPLETE`) if you want the agent to print an explicit `...` marker
-That's it. No special protocol, no API — just stdin in, work done, process exits.
+Normal exit codes still indicate process success or failure. They do **not** trigger promise completion by themselves.
+
+Ralphify only stops early on promise completion when both of these are true:
+
+- `stop_on_completion_signal: true`
+- the matching `...` tag is detected in agent output or captured result text
+
+`completion_signal` is the inner promise text. For example, `completion_signal: COMPLETE` means the agent must output `COMPLETE`.
+
+Ralphify still keeps its own command/prompt loop architecture. Only the promise tag format and matching align with Ralph-Wiggum.
+
+Minimal example:
+
+```markdown
+---
+agent: claude -p --dangerously-skip-permissions
+completion_signal: COMPLETE
+stop_on_completion_signal: true
+---
+
+Implement the next todo. When the work is fully complete, print
+COMPLETE and exit.
+```
+
+That's it. No API required — just stdin in, output out, process exits.
## Claude Code
diff --git a/docs/cli.md b/docs/cli.md
index 408001ce..7d4b3669 100644
--- a/docs/cli.md
+++ b/docs/cli.md
@@ -113,6 +113,7 @@ The loop also stops automatically when:
- All `-n` iterations have completed
- `--stop-on-error` is set and the agent exits non-zero or times out
+- `stop_on_completion_signal: true` is set in frontmatter and the matching `...` tag is detected in agent output or captured result text
### Peeking at live agent output
@@ -152,7 +153,7 @@ ralph scaffold # Creates RALPH.md in the current directory
|---|---|---|
| `[NAME]` | none | Directory name. If omitted, creates RALPH.md in the current directory |
-The generated template includes an example command (`git-log`), an example arg (`focus`), and a prompt body with placeholders for both. Edit it, then run [`ralph run`](#ralph-run). See [Getting Started](getting-started.md) for a full walkthrough.
+The generated template includes an example command (`git-log`), an example arg (`focus`), a prompt body with placeholders for both, and commented `completion_signal` / `stop_on_completion_signal` lines showing the promise-completion path. Uncomment them if you want the agent to stop early by emitting a matching `...` tag. Then run [`ralph run`](#ralph-run). See [Getting Started](getting-started.md) for a full walkthrough.
Errors if `RALPH.md` already exists at the target location.
@@ -190,6 +191,10 @@ Your instructions here. Reference args with {{ args.dir }}.
| `commands` | list | no | Commands to run each iteration (each has `name` and `run`) |
| `args` | list of strings | no | Declared argument names for user arguments. Letters, digits, hyphens, and underscores only. |
| `credit` | bool | no | Append co-author trailer instruction to prompt (default: `true`) |
+| `completion_signal` | string | no | Inner text for the completion promise tag. `COMPLETE` means the agent must emit `COMPLETE` (default inner text: `RALPH_PROMISE_COMPLETE`) |
+| `stop_on_completion_signal` | bool | no | Stop the loop early when the matching `...` tag is detected (default: `false`) |
+
+Exit code `0` still only means the agent process succeeded. Ralphify keeps its own loop architecture; only the promise tag format and matching align with Ralph-Wiggum.
### Commands
diff --git a/docs/cookbook.md b/docs/cookbook.md
index e6455725..65b1ca89 100644
--- a/docs/cookbook.md
+++ b/docs/cookbook.md
@@ -1,13 +1,13 @@
---
title: Ralph Loop Recipes
-description: Copy-pasteable ralph loop setups for autonomous ML research, test coverage, code migration, security scanning, deep research, documentation, bug fixing, and codebase improvement.
-keywords: ralphify cookbook, autonomous coding recipes, RALPH.md examples, documentation loop, bug fixing loop, codebase improvement, deep research agent, code migration loop, security scanning agent, test coverage automation, autoresearch, autonomous ML research
+description: Copy-pasteable ralph loop setups for autonomous ML research, promise-based early exit, test coverage, code migration, security scanning, deep research, documentation, bug fixing, and codebase improvement.
+keywords: ralphify cookbook, autonomous coding recipes, RALPH.md examples, promise completion, early exit agent loop, documentation loop, bug fixing loop, codebase improvement, deep research agent, code migration loop, security scanning agent, test coverage automation, autoresearch, autonomous ML research
---
# Cookbook
!!! tldr "TL;DR"
- 8 copy-pasteable ralph loops: [autoresearch](#autoresearch), [codebase improvement](#codebase-improvement), [documentation](#documentation-loop), [bug hunting](#bug-hunter), [deep research](#deep-research), [code migration](#code-migration), [security scanning](#security-scan), and [test coverage](#test-coverage). Each is a real, runnable example from the `examples/` directory.
+ 9 copy-pasteable ralph loops: [autoresearch](#autoresearch), [codebase improvement](#codebase-improvement), [documentation](#documentation-loop), [bug hunting](#bug-hunter), [deep research](#deep-research), [code migration](#code-migration), [security scanning](#security-scan), [test coverage](#test-coverage), and [promise completion](#promise-completion). Each is a real, runnable example from the `examples/` directory.
Copy-pasteable setups for common autonomous workflows. Each recipe is a real, runnable ralph from the [`examples/`](https://github.com/computerlovetech/ralphify/tree/main/examples) directory.
@@ -269,6 +269,40 @@ The coverage report gives the agent a clear metric to improve and shows exactly
---
+## Stop the loop when the task is complete {: #promise-completion }
+
+A loop that uses ralphify's promise-completion path to stop before the iteration budget. The agent keeps working until the requested target is done, then emits `COMPLETE` so the run ends immediately instead of burning the remaining iterations.
+
+**`promise-completion/RALPH.md`**
+
+```markdown
+--8<-- "examples/promise-completion/RALPH.md"
+```
+
+```bash
+ralph run promise-completion -n 10 --target "stabilize the failing auth tests"
+```
+
+```text
+▶ Running: promise-completion
+ 3 commands · max 10 iterations
+
+── Iteration 1 ──
+ Commands: 3 ran
+✓ Iteration 1 completed (51.4s)
+
+── Iteration 2 ──
+ Commands: 3 ran
+✓ Iteration 2 completed via promise tag COMPLETE (43.2s)
+
+──────────────────────
+Done: 2 iterations — 2 succeeded
+```
+
+Set `completion_signal` to the inner promise text and `stop_on_completion_signal: true` to enable early exit. The agent must emit the matching `...` tag — bare text does not count.
+
+---
+
## Next steps
- [CLI Reference](cli.md) — all `ralph run` options (`--timeout`, `--stop-on-error`, `--delay`, user args)
diff --git a/docs/getting-started.md b/docs/getting-started.md
index 306954ad..313f4781 100644
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@@ -50,9 +50,10 @@ ralph scaffold my-ralph
```text
Created my-ralph/RALPH.md
Edit the file, then run: ralph run my-ralph
+Optional early exit: uncomment completion_signal + stop_on_completion_signal and emit COMPLETE.
```
-This creates `my-ralph/RALPH.md` with a ready-to-customize template including an example command, arg, and prompt. Edit the task section, [test it](#step-3-do-a-test-run), then follow [Step 4](#step-4-add-a-test-command) to add a test command — test feedback is what makes the loop self-healing.
+This creates `my-ralph/RALPH.md` with a ready-to-customize template including an example command, arg, prompt, and a commented promise-completion example. If you want the loop to stop before its iteration budget, uncomment `completion_signal` and `stop_on_completion_signal`, then tell the agent to emit the matching `...` tag when it is done. Edit the task section, [test it](#step-3-do-a-test-run), then follow [Step 4](#step-4-add-a-test-command) to add a test command — test feedback is what makes the loop self-healing.
Or create the file manually as shown below.
@@ -251,6 +252,16 @@ The agent's output streams live to your terminal between the iteration markers
If the agent breaks a test, the next iteration sees the failure output via `{{ commands.tests }}` and fixes it automatically.
+!!! tip "Optional: stop when the task is fully complete"
+ Add these frontmatter fields if you want the loop to stop on an explicit completion marker:
+
+ ```yaml
+ completion_signal: COMPLETE
+ stop_on_completion_signal: true
+ ```
+
+ `completion_signal` is the inner promise text. With `completion_signal: COMPLETE`, the agent must emit `COMPLETE`. If you omit it, the default promise tag is `RALPH_PROMISE_COMPLETE`. The loop only exits early when `stop_on_completion_signal` is enabled and that tag is detected in agent output or captured result text. Exit code `0` still only means the agent process succeeded.
+
Once you're confident the loop works, drop the `-n 3` to let it run indefinitely. Press `Ctrl+C` to stop.
## Step 7: Steer while it runs
@@ -274,7 +285,7 @@ Read TODO.md and focus only on the API module.
This is the most powerful part of ralph loops — you're steering a running agent with a text file.
!!! warning "Frontmatter changes need a restart"
- Only the **prompt body** is re-read each iteration. Frontmatter fields (`agent`, `commands`, `args`) are parsed once at startup. If you add a new command or change the agent, stop the loop with `Ctrl+C` and restart it.
+ Only the **prompt body** is re-read each iteration. Frontmatter is parsed once at startup. If you add a new command, change the agent, or change completion settings, stop the loop with `Ctrl+C` and restart it.
## Next steps
diff --git a/docs/how-it-works.md b/docs/how-it-works.md
index 74d39e9d..f4be00a7 100644
--- a/docs/how-it-works.md
+++ b/docs/how-it-works.md
@@ -19,7 +19,7 @@ Every iteration follows the same sequence. Here's what happens at each step.
The prompt body (everything below the frontmatter) is read from disk **every iteration**. This means you can edit the prompt text — add rules, change the task, adjust constraints — while the loop is running. Changes take effect on the next cycle.
-Frontmatter fields (`agent`, `commands`, `args`) are parsed once at startup. To change those, restart the loop.
+Frontmatter settings are parsed once at startup. To change them, restart the loop.
### 2. Run commands and capture output
@@ -70,6 +70,8 @@ echo "" | claude -p --dangerously-skip-permissions
The agent reads the prompt, does work in the current directory (edits files, runs commands, makes commits), and exits. Ralphify waits for the agent process to finish.
+Exit codes still mean process success or failure. Promise completion is separate: `completion_signal` is the inner promise text (default: `RALPH_PROMISE_COMPLETE`), so ralphify only stops early when `stop_on_completion_signal` is `true` and the agent emits the matching `...` tag in output or captured result text. This aligns the promise format with Ralph-Wiggum, but ralphify still uses its own command/prompt loop architecture.
+
When the agent command starts with `claude`, ralphify automatically adds `--output-format stream-json --verbose` to enable structured streaming. This lets ralphify track agent activity in real time — you don't need to configure this yourself.
### 6. Loop back with fresh context
@@ -82,7 +84,7 @@ The loop starts the next iteration from step 1. The RALPH.md is re-read, command
|---|---|---|
| Prompt body | Every iteration | Edit the prompt while the loop runs — the next iteration follows your new instructions |
| Command output | Every iteration | The agent always sees fresh data (latest git log, current test status, etc.) |
-| Frontmatter (`agent`, `commands`, `args`) | Once at startup | Parsed when the loop starts. Restart to pick up changes. |
+| Frontmatter settings | Once at startup | Parsed when the loop starts. Restart to pick up changes. |
| User arguments | Once at startup | Passed via CLI flags, constant for the run |
## How broken code gets fixed automatically
@@ -179,6 +181,7 @@ The loop continues until one of these happens:
| `Ctrl+C` (first) | Gracefully finishes the current iteration, then stops the loop. The agent completes its work and the iteration result is recorded. |
| `Ctrl+C` (second) | Force-stops immediately — kills the agent process and exits. Use when you don't want to wait for the current iteration to finish. |
| `-n` limit reached | Loop stops after completing the specified number of iterations |
+| `stop_on_completion_signal: true` and matching `...` tag detected | Loop stops after the current iteration |
| `--stop-on-error` and agent exits non-zero or times out | Loop stops after the current iteration |
| `--timeout` exceeded | Agent process is killed, iteration is marked as timed out, loop continues (unless `--stop-on-error`) |
diff --git a/docs/index.md b/docs/index.md
index 7f284a07..ca17330a 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -16,7 +16,7 @@ hide:
A **ralph** is a directory with a `RALPH.md` file — a skill-like format that bundles a prompt, the commands to run between iterations, and any files the agent needs. **Ralphify** is the CLI runtime that executes them.
-See [The Ralph Format](blog/posts/the-ralph-format.md) for the full spec.
+See [The Ralph Format](blog/posts/the-ralph-standard.md) for the full spec.
```
grow-coverage/
@@ -52,7 +52,7 @@ One directory. One command. Each iteration starts with fresh context and current
*Works with any agent CLI. Swap `claude -p` for Codex, Aider, or your own — just change the `agent` field.*
[Get Started](getting-started.md){ .md-button .md-button--primary }
-[Read the Format Spec](blog/posts/the-ralph-format.md){ .md-button }
+[Read the Format Spec](blog/posts/the-ralph-standard.md){ .md-button }
---
@@ -149,7 +149,7 @@ Ralphs are to the outer loop what [skills](https://agentskills.io/) are to the i
## Next steps
-- **[The Ralph Format](blog/posts/the-ralph-format.md)** — the full spec
+- **[The Ralph Format](blog/posts/the-ralph-standard.md)** — the full spec
- **[Getting Started](getting-started.md)** — from install to a running loop in 10 minutes
- **[How it Works](how-it-works.md)** — what happens inside each iteration
- **[Cookbook](cookbook.md)** — copy-pasteable ralphs for coding, docs, research, and more
diff --git a/docs/quick-reference.md b/docs/quick-reference.md
index 3fb7d880..a8ced46f 100644
--- a/docs/quick-reference.md
+++ b/docs/quick-reference.md
@@ -78,6 +78,8 @@ Your instructions here. Use {{ args.dir }} for user arguments.
| `commands` | list | no | Commands to run each iteration |
| `args` | list | no | User argument names. Letters, digits, hyphens, and underscores only. |
| `credit` | bool | no | Append co-author trailer instruction to prompt (default: `true`) |
+| `completion_signal` | string | no | Inner text for the completion promise tag. `COMPLETE` means the agent must emit `COMPLETE` (default inner text: `RALPH_PROMISE_COMPLETE`) |
+| `stop_on_completion_signal` | bool | no | Stop the loop early when the matching `...` tag is detected (default: `false`) |
### Command fields
@@ -160,6 +162,7 @@ Each iteration:
| `P` (shift+p) | Open full-screen peek — scroll the entire activity buffer. `j/k` line, `space/b` page, `g/G` top/bottom, `q` or `P` exits |
| `-n` limit reached | Stops after the specified number of iterations |
| `--stop-on-error` | Stops if agent exits non-zero or times out |
+| matching `...` tag detected | Stops early only when `stop_on_completion_signal: true` and the configured promise tag is found in agent output/result |
## Live editing
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
index a981fddc..52426260 100644
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -522,7 +522,7 @@ For programmatic control over concurrent runs, use the [Python API's `RunManager
### Can I edit RALPH.md while the loop runs?
-Yes. The prompt body (everything below the frontmatter) is re-read every iteration — edit the prompt text and changes take effect on the next cycle. Frontmatter fields (`agent`, `commands`, `args`) are parsed once at startup, so changing those requires restarting the loop.
+Yes. The prompt body (everything below the frontmatter) is re-read every iteration — edit the prompt text and changes take effect on the next cycle. Frontmatter settings are parsed once at startup, so changing them requires restarting the loop.
### How do I disable the co-author credit in commits?
diff --git a/examples/promise-completion/RALPH.md b/examples/promise-completion/RALPH.md
new file mode 100644
index 00000000..7302a110
--- /dev/null
+++ b/examples/promise-completion/RALPH.md
@@ -0,0 +1,49 @@
+---
+agent: claude -p --dangerously-skip-permissions
+commands:
+ - name: tests
+ run: uv run pytest -x
+ - name: lint
+ run: uv run ruff check .
+ - name: git-log
+ run: git log --oneline -10
+args:
+ - target
+completion_signal: COMPLETE
+stop_on_completion_signal: true
+---
+
+# Stop Early with a Promise Tag
+
+You are an autonomous coding agent running in a loop. Each iteration
+starts with a fresh context. Your progress lives in the code and git.
+
+## Recent commits
+
+{{ commands.git-log }}
+
+## Test results
+
+{{ commands.tests }}
+
+## Lint
+
+{{ commands.lint }}
+
+Fix any failing tests or lint violations above before doing anything else.
+
+## Task
+
+Get the requested target to a clean, shippable state.
+{{ args.target }}
+
+When the task is complete and no more changes are needed, print
+COMPLETE and exit so the loop stops early.
+
+## Rules
+
+- One fix or improvement per iteration
+- Keep the target scoped — do not drift into unrelated cleanup
+- If tests or lint are failing, fix them before new work
+- Only emit COMPLETE when the target is truly done
+- Commit with a descriptive message and push
diff --git a/src/ralphify/_agent.py b/src/ralphify/_agent.py
index af6fb972..37ece89b 100644
--- a/src/ralphify/_agent.py
+++ b/src/ralphify/_agent.py
@@ -37,6 +37,7 @@
collect_output,
warn,
)
+from ralphify.adapters import CLIAdapter, select_adapter
# ── Callback type aliases ──────────────────────────────────────────────
# Used across the streaming and blocking execution paths for callbacks
@@ -48,20 +49,27 @@
OutputLineCallback = Callable[[str, OutputStream], None]
"""Receives raw output lines with their stream name ("stdout"/"stderr")."""
+
+def _call_safely(callback: Callable[..., Any] | None, *args: Any) -> None:
+ """Invoke an observer *callback* with *args*, swallowing any exception.
+
+ Used for best-effort observer callbacks during output draining: a
+ raising callback must never stop the drain loop or leave the reader
+ thread hung.
+ """
+ if callback is None:
+ return
+ try:
+ callback(*args)
+ except Exception:
+ pass
+
+
# Typed constants for the OutputStream literal so the type checker enforces
# that only "stdout" / "stderr" ever reach ``on_output_line``.
_STDOUT: OutputStream = "stdout"
_STDERR: OutputStream = "stderr"
-# Agent binary name that supports --output-format stream-json.
-# Public because _console_emitter also needs this for display logic.
-CLAUDE_BINARY = "claude"
-
-# CLI flags appended when streaming mode is used.
-_OUTPUT_FORMAT_FLAG = "--output-format"
-_STREAM_FORMAT = "stream-json"
-_VERBOSE_FLAG = "--verbose"
-
# JSON stream event types and fields for result extraction.
_RESULT_EVENT_TYPE = "result"
_RESULT_FIELD = "result"
@@ -82,6 +90,34 @@
_PROCESS_WAIT_TIMEOUT = 5.0
+def _extract_result_text_from_lines(lines: list[str] | None) -> str | None:
+ """Return the last string payload from any JSON ``result`` event in *lines*."""
+ if lines is None:
+ return None
+
+ result_text = None
+ for line in lines:
+ extracted = _extract_result_text_from_line(line)
+ if extracted is not None:
+ result_text = extracted
+ return result_text
+
+
+def _extract_result_text_from_line(line: str) -> str | None:
+ """Return the string payload from a single JSON ``result`` event line."""
+ try:
+ parsed = json.loads(line.strip())
+ except json.JSONDecodeError:
+ return None
+ if (
+ isinstance(parsed, dict)
+ and parsed.get("type") == _RESULT_EVENT_TYPE
+ and isinstance(parsed.get(_RESULT_FIELD), str)
+ ):
+ return parsed[_RESULT_FIELD]
+ return None
+
+
def _try_graceful_group_kill(proc: subprocess.Popen[Any]) -> bool:
"""Attempt to kill the process via its POSIX process group.
@@ -237,7 +273,7 @@ class AgentResult(ProcessResult):
class _StreamResult:
"""Accumulated output from reading the agent's JSON stream."""
- stdout_lines: tuple[str, ...]
+ stdout_lines: tuple[str, ...] | None
result_text: str | None
timed_out: bool
@@ -260,19 +296,6 @@ def _write_log(
return log_file
-def _supports_stream_json(cmd: list[str]) -> bool:
- """Return True if the agent command supports ``--output-format stream-json``.
-
- Currently only Claude Code supports this protocol. To add streaming
- support for another agent, extend the check here — no other changes
- needed since :func:`_run_agent_streaming` handles the protocol generically.
- """
- if not cmd:
- return False
- binary = Path(cmd[0]).stem
- return binary == CLAUDE_BINARY
-
-
def _readline_pump(
stdout: IO[str],
line_queue: queue.Queue[str | None],
@@ -298,6 +321,8 @@ def _read_agent_stream(
deadline: float | None,
on_activity: ActivityCallback | None,
on_output_line: OutputLineCallback | None = None,
+ *,
+ capture_stdout: bool = True,
) -> _StreamResult:
"""Read the agent's JSON stream line-by-line until EOF or timeout.
@@ -322,23 +347,24 @@ def _read_agent_stream(
Returns early with ``timed_out=True`` when the deadline is exceeded,
leaving the caller responsible for killing the subprocess.
+
+ When *capture_stdout* is ``False``, stdout is still drained and parsed but
+ not retained in memory. This keeps the streaming path lightweight when no
+ later completion-signal parsing or log writing needs the raw bytes.
"""
- stdout_lines: list[str] = []
+ stdout_lines: list[str] | None = [] if capture_stdout else None
result_text: str | None = None
line_q: queue.Queue[str | None] = queue.Queue()
- reader = threading.Thread(target=_readline_pump, args=(stdout, line_q), daemon=True)
- reader.start()
+ threading.Thread(target=_readline_pump, args=(stdout, line_q), daemon=True).start()
while True:
# Compute how long we can wait for the next line.
if deadline is not None:
- remaining = deadline - time.monotonic()
- # Use max(remaining, 0) so that an already-expired deadline
- # still does a non-blocking drain of queued lines before
- # returning — lines the reader thread already buffered are
- # not silently lost.
- get_timeout: float | None = max(remaining, 0)
+ # Clamp to 0 so that an already-expired deadline still does a
+ # non-blocking drain of queued lines before returning — lines
+ # the reader thread already buffered are not silently lost.
+ get_timeout: float | None = max(deadline - time.monotonic(), 0)
else:
get_timeout = None
@@ -347,50 +373,42 @@ def _read_agent_stream(
except queue.Empty:
# Deadline expired while waiting for a line.
return _StreamResult(
- stdout_lines=tuple(stdout_lines),
+ stdout_lines=tuple(stdout_lines) if stdout_lines is not None else None,
result_text=result_text,
timed_out=True,
)
if line is None: # EOF sentinel from reader thread
return _StreamResult(
- stdout_lines=tuple(stdout_lines),
+ stdout_lines=tuple(stdout_lines) if stdout_lines is not None else None,
result_text=result_text,
timed_out=False,
)
- stdout_lines.append(line)
- if on_output_line is not None:
- try:
- on_output_line(line.rstrip("\r\n"), _STDOUT)
- except Exception:
- # Callback is best-effort; draining must not stop.
- pass
+ if stdout_lines is not None:
+ stdout_lines.append(line)
+ _call_safely(on_output_line, line.rstrip("\r\n"), _STDOUT)
stripped = line.strip()
if stripped:
try:
parsed = json.loads(stripped)
except json.JSONDecodeError:
- parsed = None
- if isinstance(parsed, dict):
- if parsed.get("type") == _RESULT_EVENT_TYPE and isinstance(
- parsed.get(_RESULT_FIELD), str
- ):
- result_text = parsed[_RESULT_FIELD]
- if on_activity is not None:
- try:
- on_activity(parsed)
- except Exception:
- # Callback is best-effort; draining must not stop.
- pass
+ pass
+ else:
+ if isinstance(parsed, dict):
+ if parsed.get("type") == _RESULT_EVENT_TYPE and isinstance(
+ parsed.get(_RESULT_FIELD), str
+ ):
+ result_text = parsed[_RESULT_FIELD]
+ _call_safely(on_activity, parsed)
# Also check deadline after processing — if the reader thread
# already queued many lines, this prevents unbounded processing
# past the deadline.
if deadline is not None and time.monotonic() > deadline:
return _StreamResult(
- stdout_lines=tuple(stdout_lines),
+ stdout_lines=tuple(stdout_lines) if stdout_lines is not None else None,
result_text=result_text,
timed_out=True,
)
@@ -404,46 +422,58 @@ def _run_agent_streaming(
iteration: int,
on_activity: ActivityCallback | None = None,
on_output_line: OutputLineCallback | None = None,
+ capture_result_text: bool = False,
+ capture_stdout: bool = False,
) -> AgentResult:
"""Run the agent subprocess with line-by-line streaming of JSON output.
- Used for agents that support ``--output-format stream-json`` (e.g. Claude
- Code). Stream processing is delegated to :func:`_read_agent_stream`;
- this function owns the subprocess lifecycle (spawn, stdin delivery,
- timeout kill, and cleanup via ``try/finally``).
+ Used for adapters whose ``supports_streaming`` flag is True (e.g. Claude
+ Code's ``--output-format stream-json``, Codex's ``--json``). The command
+ list *must already include* any adapter-required flags —
+ :func:`execute_agent` calls ``adapter.build_command`` before dispatching.
+
+ Stream processing is delegated to :func:`_read_agent_stream`; this
+ function owns the subprocess lifecycle (spawn, stdin delivery, timeout
+ kill, and cleanup via ``try/finally``).
stderr is drained concurrently on a background reader thread so large
stderr volume can't deadlock the child on a full OS pipe buffer while
the main thread is reading stdout.
"""
- stream_cmd = cmd + [_OUTPUT_FORMAT_FLAG, _STREAM_FORMAT, _VERBOSE_FLAG]
start = time.monotonic()
deadline = (start + timeout) if timeout is not None else None
+ capture_stdout_text = log_dir is not None or capture_stdout
+ pipe_stderr = log_dir is not None or on_output_line is not None
+ capture_stderr_text = log_dir is not None
+
writer_thread: threading.Thread | None = None
- stderr_lines: list[str] = []
+ stderr_lines: list[str] | None = [] if capture_stderr_text else None
stderr_thread: threading.Thread | None = None
proc = subprocess.Popen(
- stream_cmd,
+ cmd,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
- stderr=subprocess.PIPE,
+ stderr=subprocess.PIPE if pipe_stderr else None,
**SUBPROCESS_TEXT_KWARGS,
**SESSION_KWARGS,
)
try:
# Popen with PIPE guarantees non-None streams; guard explicitly
# so the type checker narrows and -O mode cannot skip the check.
- if proc.stdin is None or proc.stdout is None or proc.stderr is None:
+ if proc.stdin is None or proc.stdout is None:
raise RuntimeError("subprocess.Popen failed to create PIPE streams")
+ if pipe_stderr and proc.stderr is None:
+ raise RuntimeError("subprocess.Popen failed to create PIPE stderr")
# Start the stderr pump BEFORE writing stdin so large prompts can't
# deadlock against an agent that writes substantial diagnostics to
# stderr while still reading its stdin.
- stderr_thread = _start_pump_thread(
- proc.stderr, stderr_lines, _STDERR, on_output_line
- )
+ if proc.stderr is not None:
+ stderr_thread = _start_pump_thread(
+ proc.stderr, stderr_lines, _STDERR, on_output_line
+ )
# Deliver the prompt on a background thread so that a blocked write
# (child not reading stdin, pipe buffer full) cannot prevent
@@ -452,7 +482,13 @@ def _run_agent_streaming(
# _deliver_prompt already swallows.
writer_thread = _start_writer_thread(proc, prompt)
- stream = _read_agent_stream(proc.stdout, deadline, on_activity, on_output_line)
+ stream = _read_agent_stream(
+ proc.stdout,
+ deadline,
+ on_activity,
+ on_output_line,
+ capture_stdout=capture_stdout_text,
+ )
if stream.timed_out:
_kill_process_group(proc)
@@ -460,8 +496,8 @@ def _run_agent_streaming(
finally:
_cleanup_agent(proc, stderr_thread, writer_thread)
- stdout = "".join(stream.stdout_lines)
- stderr = "".join(stderr_lines)
+ stdout = "".join(stream.stdout_lines) if stream.stdout_lines is not None else None
+ stderr = "".join(stderr_lines) if stderr_lines is not None else None
log_file = _write_log(log_dir, iteration, stdout, stderr)
@@ -471,8 +507,8 @@ def _run_agent_streaming(
log_file=log_file,
result_text=stream.result_text,
timed_out=stream.timed_out,
- captured_stdout=stdout if log_dir is not None else None,
- captured_stderr=stderr if log_dir is not None else None,
+ captured_stdout=stdout if capture_stdout_text else None,
+ captured_stderr=stderr if capture_stderr_text else None,
)
@@ -502,12 +538,7 @@ def _pump_stream(
for line in iter(stream.readline, ""):
if buffer is not None:
buffer.append(line)
- if on_output_line is not None:
- try:
- on_output_line(line.rstrip("\r\n"), stream_name)
- except Exception:
- # Callback is best-effort; draining must not stop.
- pass
+ _call_safely(on_output_line, line.rstrip("\r\n"), stream_name)
except (ValueError, OSError):
# Pipe closed concurrently — exit cleanly so join() returns.
pass
@@ -601,6 +632,8 @@ def _run_agent_blocking(
log_dir: Path | None,
iteration: int,
on_output_line: OutputLineCallback | None = None,
+ capture_result_text: bool = False,
+ capture_stdout: bool = False,
) -> AgentResult:
"""Run the agent subprocess and return the result.
@@ -613,9 +646,9 @@ def _run_agent_blocking(
- **Callback only** (``on_output_line`` set, no log dir) — reader
threads forward lines to the callback without accumulating them,
avoiding unbounded memory growth.
- - **Log capture** (``log_dir`` set) — reader threads accumulate
- lines into lists for log writing; lines are also forwarded to the
- callback if provided.
+ - **Buffered capture** (``log_dir`` or ``capture_stdout`` set) —
+ reader threads accumulate lines for log writing or later completion
+ parsing; lines are also forwarded to the callback if provided.
The subprocess is started in its own process group so that on
``KeyboardInterrupt`` or timeout the entire child tree can be killed
@@ -625,7 +658,12 @@ def _run_agent_blocking(
Raises ``FileNotFoundError`` if the command binary does not exist.
"""
start = time.monotonic()
- capture = log_dir is not None or on_output_line is not None
+ capture_stdout_text = log_dir is not None or capture_stdout
+ capture_stderr_text = log_dir is not None
+ pipe_stdout = (
+ capture_stdout_text or on_output_line is not None or capture_result_text
+ )
+ pipe_stderr = capture_stderr_text or on_output_line is not None
# When no subscriber needs the bytes, stdout/stderr are left
# un-piped so the child writes directly to the terminal. When
@@ -638,29 +676,41 @@ def _run_agent_blocking(
writer_thread: threading.Thread | None = None
stdout_thread: threading.Thread | None = None
stderr_thread: threading.Thread | None = None
- stdout_lines: list[str] | None = [] if log_dir is not None else None
- stderr_lines: list[str] | None = [] if log_dir is not None else None
+ stdout_lines: list[str] | None = [] if capture_stdout_text else None
+ stderr_lines: list[str] | None = [] if capture_stderr_text else None
+ result_text: str | None = None
+
+ def _on_output_line(line: str, stream_name: OutputStream) -> None:
+ nonlocal result_text
+ if capture_result_text and stream_name == _STDOUT:
+ extracted = _extract_result_text_from_line(line)
+ if extracted is not None:
+ result_text = extracted
+ if on_output_line is not None:
+ on_output_line(line, stream_name)
- pipe = subprocess.PIPE if capture else None
proc = subprocess.Popen(
cmd,
stdin=subprocess.PIPE,
- stdout=pipe,
- stderr=pipe,
+ stdout=subprocess.PIPE if pipe_stdout else None,
+ stderr=subprocess.PIPE if pipe_stderr else None,
**SUBPROCESS_TEXT_KWARGS,
**SESSION_KWARGS,
)
try:
if proc.stdin is None:
raise RuntimeError("subprocess.Popen failed to create PIPE stdin")
- if capture:
- if proc.stdout is None or proc.stderr is None:
- raise RuntimeError("subprocess.Popen failed to create PIPE streams")
+ if pipe_stdout:
+ if proc.stdout is None:
+ raise RuntimeError("subprocess.Popen failed to create PIPE stdout")
stdout_thread = _start_pump_thread(
- proc.stdout, stdout_lines, _STDOUT, on_output_line
+ proc.stdout, stdout_lines, _STDOUT, _on_output_line
)
+ if pipe_stderr:
+ if proc.stderr is None:
+ raise RuntimeError("subprocess.Popen failed to create PIPE stderr")
stderr_thread = _start_pump_thread(
- proc.stderr, stderr_lines, _STDERR, on_output_line
+ proc.stderr, stderr_lines, _STDERR, _on_output_line
)
writer_thread = _start_writer_thread(proc, prompt)
@@ -681,9 +731,10 @@ def _run_agent_blocking(
returncode=None if timed_out else returncode,
elapsed=time.monotonic() - start,
log_file=log_file,
+ result_text=result_text or _extract_result_text_from_lines(stdout_lines),
timed_out=timed_out,
- captured_stdout=stdout,
- captured_stderr=stderr,
+ captured_stdout=stdout if capture_stdout_text else None,
+ captured_stderr=stderr if capture_stderr_text else None,
)
@@ -694,21 +745,35 @@ def execute_agent(
timeout: float | None,
log_dir: Path | None,
iteration: int,
+ adapter: CLIAdapter | None = None,
on_activity: ActivityCallback | None = None,
on_output_line: OutputLineCallback | None = None,
+ capture_result_text: bool = False,
+ capture_stdout: bool | None = None,
) -> AgentResult:
"""Run the agent subprocess, auto-selecting streaming or blocking mode.
- Uses streaming mode for agents that support ``--output-format stream-json``
- (e.g. Claude Code); all other agents use the blocking path that drains
- stdout and stderr via reader threads. The *on_activity* callback is
- only invoked in streaming mode; *on_output_line* fires for both modes
- as raw lines arrive.
+ The *adapter* argument (or :func:`select_adapter` when omitted) decides
+ which execution path runs: adapters whose ``supports_streaming`` flag is
+ True take the line-streaming path that drives ``on_activity`` callbacks;
+ all others take the blocking path with concurrent stdout/stderr drain.
+ ``adapter.build_command(cmd)`` is applied before spawning, so the CLI
+ receives any flags the adapter requires (e.g. Claude's
+ ``--output-format stream-json --verbose`` or Codex's ``--json``).
This is the single entry point the engine should use — callers don't need
to know which execution mode is selected.
"""
- if _supports_stream_json(cmd):
+ if adapter is None:
+ adapter = select_adapter(cmd)
+ cmd = adapter.build_command(cmd)
+ supports_streaming = adapter.supports_streaming
+ if capture_stdout is None:
+ capture_stdout = log_dir is not None or (
+ not supports_streaming and on_output_line is None and capture_result_text
+ )
+
+ if supports_streaming:
return _run_agent_streaming(
cmd,
prompt,
@@ -717,6 +782,8 @@ def execute_agent(
iteration,
on_activity=on_activity,
on_output_line=on_output_line,
+ capture_result_text=capture_result_text,
+ capture_stdout=capture_stdout,
)
return _run_agent_blocking(
cmd,
@@ -725,4 +792,6 @@ def execute_agent(
log_dir,
iteration,
on_output_line=on_output_line,
+ capture_result_text=capture_result_text,
+ capture_stdout=capture_stdout,
)
diff --git a/src/ralphify/_console_emitter.py b/src/ralphify/_console_emitter.py
index c58d0224..ac3bf26d 100644
--- a/src/ralphify/_console_emitter.py
+++ b/src/ralphify/_console_emitter.py
@@ -42,8 +42,8 @@
RunStoppedData,
)
from ralphify import _brand
-from ralphify._agent import CLAUDE_BINARY
from ralphify._output import format_count, format_duration
+from ralphify.adapters import select_adapter
_ICON_SUCCESS = "✓"
_ICON_FAILURE = "✗"
@@ -106,18 +106,22 @@
f"shift+{PEEK_TOGGLE_KEY} for full view[/]"
)
-# ── Claude binary detection ───────────────────────────────────────────
+# ── Adapter-driven structured-output detection ────────────────────────
-def _is_claude_command(agent: str) -> bool:
- """Return True if *agent* is a Claude Code command."""
+def _agent_renders_structured_peek(agent: str) -> bool:
+ """Return True if *agent*'s adapter feeds the structured peek panel.
+
+ Drives the ``ConsoleEmitter`` choice between :class:`_IterationPanel`
+ (structured) and :class:`_IterationSpinner` (raw). Delegates to
+ :func:`select_adapter` so adding a new CLI with peek-panel support
+ requires no edits here.
+ """
try:
- parts = shlex.split(agent)
+ cmd = shlex.split(agent)
except ValueError:
return False
- if not parts:
- return False
- return Path(parts[0]).stem == CLAUDE_BINARY
+ return select_adapter(cmd).renders_structured_peek
# ── Tool argument abbreviation ────────────────────────────────────────
@@ -169,7 +173,7 @@ def _format_params(tool_input: dict[str, Any], keys: list[str]) -> str:
val = tool_input.get(key)
if val is not None:
parts.append(f"{key}: {val}")
- return " · ".join(parts) if parts else ""
+ return " · ".join(parts)
def _extract_file_path(i: dict[str, Any]) -> str:
@@ -353,6 +357,11 @@ def freeze(self, outcome: str) -> None:
self._end = time.monotonic()
self._outcome = outcome
+ @property
+ def outcome(self) -> str | None:
+ """The frozen-iteration outcome label, or ``None`` while live."""
+ return self._outcome
+
# ── Scroll buffer management ─────────────────────────────────────
def add_scroll_line(self, markup: str) -> None:
@@ -407,8 +416,7 @@ def _build_body(self) -> Group:
"""Body group: scroll lines (or peek message) + spacer + footer."""
rows: list[Any] = []
if self._peek_visible:
- visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]
- for line in visible:
+ for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:
line.no_wrap = True
line.overflow = "ellipsis"
rows.append(line)
@@ -483,10 +491,8 @@ def apply(self, raw: dict[str, Any]) -> None:
)
def _apply_assistant(self, raw: dict[str, Any]) -> None:
- msg = raw.get("message", {})
-
# Update token counts from usage
- usage = msg.get("usage")
+ usage = raw.get("message", {}).get("usage")
if isinstance(usage, dict):
self._input_tokens = usage.get("input_tokens", self._input_tokens)
self._output_tokens = usage.get("output_tokens", self._output_tokens)
@@ -522,14 +528,14 @@ def _apply_assistant(self, raw: dict[str, Any]) -> None:
color, cat, arg = _tool_display(name, tool_input)
self._tool_categories[cat] = self._tool_categories.get(cat, 0) + 1
- # Pad short names to a fixed column so arguments line up;
- # longer names get a guaranteed two-space gap so the arg
- # never collides with the tool label.
- if len(name) < _TOOL_NAME_COL:
- name_col = f"{name:<{_TOOL_NAME_COL}}"
- else:
- name_col = f"{name} "
if arg:
+ # Pad short names to a fixed column so arguments line up;
+ # longer names get a guaranteed two-space gap so the arg
+ # never collides with the tool label.
+ if len(name) < _TOOL_NAME_COL:
+ name_col = f"{name:<{_TOOL_NAME_COL}}"
+ else:
+ name_col = f"{name} "
self.add_scroll_line(
f"[bold {color}]{escape_markup(name_col)}[/]"
f"[dim]{escape_markup(arg)}[/]"
@@ -550,16 +556,13 @@ def _apply_user(self, raw: dict[str, Any]) -> None:
def _format_tokens(self) -> str:
"""Format token counts as compact ctx/out string."""
parts: list[str] = []
- total_in = self._input_tokens
- if total_in > 0:
- parts.append(f"ctx {format_count(total_in)}")
+ if self._input_tokens > 0:
+ parts.append(f"ctx {format_count(self._input_tokens)}")
if self._output_tokens > 0:
parts.append(f"out {format_count(self._output_tokens)}")
return " · ".join(parts)
def _format_categories(self) -> str:
- if not self._tool_categories:
- return ""
parts = [f"{v} {k}" for k, v in self._tool_categories.items()]
return " · ".join(parts)
@@ -611,11 +614,10 @@ class _IterationSpinner(_LivePanelBase):
"""
def _build_footer(self) -> Table:
- line_count = len(self._scroll_lines)
summary = Text(no_wrap=True, overflow="ellipsis")
- if line_count > 0:
+ if self._scroll_lines:
summary.append(
- _plural(line_count, "line"),
+ _plural(len(self._scroll_lines), "line"),
style=f"bold {_brand.PURPLE}",
)
summary.append(" of agent output", style="dim")
@@ -632,6 +634,10 @@ def _build_footer(self) -> Table:
_FULLSCREEN_CHROME_ROWS = 2
_FULLSCREEN_MIN_VISIBLE = 3
+# Fallback terminal height used before the first render populates the
+# real value, and when ``Console.size.height`` access fails.
+_DEFAULT_CONSOLE_HEIGHT = 40
+
@dataclass(slots=True)
class _ScrollbarMetrics:
@@ -661,8 +667,8 @@ def _scrollbar_metrics(total: int, visible: int, offset: int) -> _ScrollbarMetri
if total <= visible:
return _ScrollbarMetrics(show=False, thumb_start=0, thumb_size=0)
thumb_size = max(1, visible * visible // total)
- max_off = max(total - visible, 1)
- frac = 1.0 - (offset / max_off)
+ # Safe: the early return above guarantees total > visible, so total - visible ≥ 1.
+ frac = 1.0 - (offset / (total - visible))
track_space = visible - thumb_size
thumb_start = int(frac * track_space)
return _ScrollbarMetrics(show=True, thumb_start=thumb_start, thumb_size=thumb_size)
@@ -763,9 +769,8 @@ def scroll_up(self, lines: int = 1) -> None:
def scroll_down(self, lines: int = 1) -> None:
"""Scroll toward newer lines (offset shrinks)."""
- new_offset = max(0, self._offset - lines)
- self._offset = new_offset
- if new_offset == 0:
+ self._offset = max(0, self._offset - lines)
+ if self._offset == 0:
self._auto_scroll = True
def scroll_to_top(self) -> None:
@@ -774,54 +779,47 @@ def scroll_to_top(self) -> None:
self._auto_scroll = False
def scroll_to_bottom(self) -> None:
+ """Snap to the newest line and re-enable follow mode."""
self._offset = 0
self._auto_scroll = True
# ── Iteration navigation ─────────────────────────────────────────
- def _reset_view(self) -> None:
- """Snap to bottom + follow when switching iterations."""
- self._offset = 0
- self._auto_scroll = True
+ def _step_iteration(self, direction: int) -> bool:
+ """Move *direction* iterations (-1 = prev, +1 = next).
- def prev_iteration(self) -> bool:
- """Move to the iteration before the current one. Returns ``True``
- if the view changed; ``False`` when there is no older iteration."""
+ Returns ``True`` when the view changed; ``False`` when already at
+ the boundary in the requested direction. When the current
+ iteration was evicted from the navigator, snaps to the oldest
+ (prev) or newest (next) entry instead of failing.
+ """
ids = self._navigator.iteration_ids()
if not ids:
return False
if self._iteration_id not in ids:
- # Current iteration was evicted — snap to oldest available.
- self._iteration_id = ids[0]
- self._reset_view()
+ self._iteration_id = ids[0] if direction < 0 else ids[-1]
+ self.scroll_to_bottom()
return True
- idx = ids.index(self._iteration_id)
- if idx == 0:
+ new_idx = ids.index(self._iteration_id) + direction
+ if not 0 <= new_idx < len(ids):
return False
- self._iteration_id = ids[idx - 1]
- self._reset_view()
+ self._iteration_id = ids[new_idx]
+ self.scroll_to_bottom()
return True
+ def prev_iteration(self) -> bool:
+ """Move to the iteration before the current one. Returns ``True``
+ if the view changed; ``False`` when there is no older iteration."""
+ return self._step_iteration(-1)
+
def next_iteration(self) -> bool:
"""Move to the iteration after the current one. Returns ``True``
if the view changed; ``False`` when already on the newest."""
- ids = self._navigator.iteration_ids()
- if not ids:
- return False
- if self._iteration_id not in ids:
- self._iteration_id = ids[-1]
- self._reset_view()
- return True
- idx = ids.index(self._iteration_id)
- if idx >= len(ids) - 1:
- return False
- self._iteration_id = ids[idx + 1]
- self._reset_view()
- return True
+ return self._step_iteration(+1)
# ── Rendering ────────────────────────────────────────────────────
- _console_height: int = 40 # updated on every render
+ _console_height: int = _DEFAULT_CONSOLE_HEIGHT # updated on every render
def _build_header(self, total: int, visible: int) -> Text:
header = Text(no_wrap=True, overflow="ellipsis")
@@ -840,12 +838,12 @@ def _build_header(self, total: int, visible: int) -> Text:
header.append("live", style=f"italic {_brand.GREEN}")
else:
source = self._source
- outcome = source._outcome if source is not None else None
+ outcome = source.outcome if source is not None else None
if outcome:
header.append(" · ", style="dim")
header.append(outcome, style=f"italic {_brand.LAVENDER}")
header.append(" · ", style="dim")
- header.append(f"{_plural(total, 'line')}", style="dim")
+ header.append(_plural(total, "line"), style="dim")
if self._auto_scroll:
header.append(" · ", style="dim")
header.append("following", style=f"italic {_brand.GREEN}")
@@ -974,10 +972,10 @@ def __init__(self, console: Console) -> None:
# receiving events). ``None`` between iterations.
self._current_iteration: int | None = None
# Bounded ring buffer of finished iteration panels, keyed by
- # iteration number. Insertion order is tracked separately so
- # eviction is O(1). Used by fullscreen peek for browsing.
+ # iteration number. Python dicts preserve insertion order, so
+ # the oldest entry is always first — used for eviction. Used by
+ # fullscreen peek for browsing.
self._iteration_history: dict[int, _LivePanelBase] = {}
- self._iteration_order: list[int] = []
# Fullscreen peek state — a second Live using Rich's alt-screen
# that shows an iteration's full activity buffer with scroll +
# iteration-navigation controls. While fullscreen is active the
@@ -1044,10 +1042,7 @@ def iteration_ids(self) -> list[int]:
def panel_for(self, iteration_id: int) -> _LivePanelBase | None:
"""Look up the panel for *iteration_id* in history or active state."""
- if (
- self._current_iteration == iteration_id
- and self._active_renderable is not None
- ):
+ if self.is_live(iteration_id):
return self._active_renderable
return self._iteration_history.get(iteration_id)
@@ -1109,28 +1104,27 @@ def _archive_current_iteration_unlocked(self, outcome: str) -> None:
iteration_id = self._current_iteration
panel.freeze(outcome)
# Record (or refresh order of) the iteration in history.
- if iteration_id in self._iteration_history:
- self._iteration_order.remove(iteration_id)
+ # Pop-then-insert moves an existing entry to the end of the dict's
+ # insertion order so eviction always drops the oldest first.
+ self._iteration_history.pop(iteration_id, None)
self._iteration_history[iteration_id] = panel
- self._iteration_order.append(iteration_id)
# Eviction: drop oldest until at or below the cap, but skip the
# iteration the user is currently viewing in fullscreen.
viewing = (
- self._fullscreen_view._iteration_id
+ self._fullscreen_view.iteration_id
if self._fullscreen_view is not None
else None
)
- while len(self._iteration_order) > _MAX_HISTORY_ITERATIONS:
+ while len(self._iteration_history) > _MAX_HISTORY_ITERATIONS:
candidate = next(
- (iid for iid in self._iteration_order if iid != viewing),
+ (iid for iid in self._iteration_history if iid != viewing),
None,
)
if candidate is None:
# All remaining entries are the viewed iteration (impossible
# with one viewer) — bail to avoid an infinite loop.
break
- self._iteration_order.remove(candidate)
- self._iteration_history.pop(candidate, None)
+ self._iteration_history.pop(candidate)
self._active_renderable = None
self._current_iteration = None
@@ -1203,14 +1197,18 @@ def _panel_for_event(self, iteration: int | None) -> _LivePanelBase | None:
return self._active_renderable
def _on_agent_output_line(self, data: AgentOutputLineData) -> None:
+ # When we have structured rendering, raw lines are redundant noise.
+ # ``_structured_agent`` is write-once (set in ``_on_run_started``
+ # before any iteration events flow), so the check is lock-free —
+ # same pattern as ``_on_agent_activity``. Skipping the lock also
+ # avoids contention on every stdout line when running Claude.
+ if self._structured_agent:
+ return
with self._console_lock:
- # When we have structured rendering, raw lines are redundant noise.
- if self._structured_agent:
- return
- line = escape_markup(data["line"])
target = self._panel_for_event(data["iteration"])
if not isinstance(target, _IterationSpinner):
return
+ line = escape_markup(data["line"])
target.add_scroll_line(f"[white]{line}[/]")
self._refresh_live_unlocked(target)
@@ -1248,7 +1246,7 @@ def emit(self, event: Event) -> None:
def _on_run_started(self, data: RunStartedData) -> None:
ralph_name = data["ralph_name"]
agent = data["agent"]
- self._structured_agent = _is_claude_command(agent)
+ self._structured_agent = _agent_renders_structured_peek(agent)
with self._console_lock:
self._console.print(
f"\n[bold {_brand.PURPLE}]{_ICON_PLAY} Running:[/] [bold]{escape_markup(ralph_name)}[/]"
@@ -1296,6 +1294,15 @@ def _start_compact_live_unlocked(self, renderable: _LivePanelBase) -> None:
)
self._live.start()
+ def _stop_compact_live_unlocked(self) -> None:
+ """Stop the compact Live region if active. No-op otherwise.
+
+ Caller must hold ``_console_lock``.
+ """
+ if self._live is not None:
+ self._live.stop()
+ self._live = None
+
def _stop_live_unlocked(self) -> None:
"""Tear down all Live regions and forget the active iteration.
@@ -1309,9 +1316,7 @@ def _stop_live_unlocked(self) -> None:
self._fullscreen_live.stop()
self._fullscreen_live = None
self._fullscreen_view = None
- if self._live is not None:
- self._live.stop()
- self._live = None
+ self._stop_compact_live_unlocked()
self._active_renderable = None
self._current_iteration = None
@@ -1336,8 +1341,8 @@ def enter_fullscreen(self) -> bool:
if self._fullscreen_view is not None:
return True # already active — no-op
initial_id: int | None = self._current_iteration
- if initial_id is None and self._iteration_order:
- initial_id = self._iteration_order[-1]
+ if initial_id is None:
+ initial_id = next(reversed(self._iteration_history), None)
if initial_id is None or self.panel_for(initial_id) is None:
self._console.print("[dim]Full peek: no iterations yet[/]")
return False
@@ -1345,9 +1350,7 @@ def enter_fullscreen(self) -> bool:
self._fullscreen_view = view
# Stop the compact Live before taking over the terminal so
# the two Rich renderers don't fight for the same console.
- if self._live is not None:
- self._live.stop()
- self._live = None
+ self._stop_compact_live_unlocked()
self._fullscreen_live = Live(
view,
console=self._console,
@@ -1418,7 +1421,7 @@ def _fullscreen_page_size(self) -> int:
try:
height = self._console.size.height
except Exception:
- height = 40
+ height = _DEFAULT_CONSOLE_HEIGHT
return max(1, height - _FULLSCREEN_CHROME_ROWS - 2)
def handle_key(self, key: str) -> None:
@@ -1447,12 +1450,11 @@ def _handle_fullscreen_key(self, key: str) -> None:
if view is None:
return # raced with exit
if key not in ("q", FULLSCREEN_PEEK_KEY):
- page = self._fullscreen_page_size()
actions: dict[str, Callable[[], object]] = {
"j": lambda: view.scroll_down(1),
"k": lambda: view.scroll_up(1),
- " ": lambda: view.scroll_down(page),
- "b": lambda: view.scroll_up(page),
+ " ": lambda: view.scroll_down(self._fullscreen_page_size()),
+ "b": lambda: view.scroll_up(self._fullscreen_page_size()),
"g": view.scroll_to_top,
"G": view.scroll_to_bottom,
PREV_ITERATION_KEY: view.prev_iteration,
@@ -1472,9 +1474,9 @@ def _on_iteration_started(self, data: IterationStartedData) -> None:
with self._console_lock:
self._peek_broken = False
# Defensive: if a previous iteration didn't archive (engine
- # error), evict it now so we don't leak panel state.
- if self._active_renderable is not None:
- self._archive_current_iteration_unlocked("interrupted")
+ # error), evict it now so we don't leak panel state. The
+ # archive call no-ops when nothing is active.
+ self._archive_current_iteration_unlocked("interrupted")
self._current_iteration = iteration
renderable = self._create_panel_unlocked()
@@ -1524,9 +1526,7 @@ def _on_iteration_ended(
# underlying panel is preserved in history for fullscreen
# browsing. When fullscreen is active there is no compact
# Live to stop; the panel was buffering events directly.
- if self._live is not None:
- self._live.stop()
- self._live = None
+ self._stop_compact_live_unlocked()
self._archive_current_iteration_unlocked(outcome)
def do_print() -> None:
diff --git a/src/ralphify/_events.py b/src/ralphify/_events.py
index 2c80a53a..7e5a4b90 100644
--- a/src/ralphify/_events.py
+++ b/src/ralphify/_events.py
@@ -82,6 +82,11 @@ class EventType(Enum):
# ── Agent activity (live streaming) ─────────────────────────
AGENT_ACTIVITY = "agent_activity"
AGENT_OUTPUT_LINE = "agent_output_line"
+ TOOL_USE = "tool_use"
+
+ # ── Turn-cap enforcement ────────────────────────────────────
+ ITERATION_TURN_APPROACHING_LIMIT = "iteration_turn_approaching_limit"
+ ITERATION_TURN_CAPPED = "iteration_turn_capped"
# ── Other ───────────────────────────────────────────────────
LOG_MESSAGE = "log_message"
@@ -153,6 +158,23 @@ class AgentOutputLineData(TypedDict):
iteration: int
+class ToolUseData(TypedDict):
+ iteration: int
+ tool_name: str
+ count: int
+
+
+class TurnApproachingLimitData(TypedDict):
+ iteration: int
+ count: int
+ max_turns: int
+
+
+class TurnCappedData(TypedDict):
+ iteration: int
+ count: int
+
+
class LogMessageData(TypedDict):
message: str
level: LogLevel
@@ -169,6 +191,9 @@ class LogMessageData(TypedDict):
| PromptAssembledData
| AgentActivityData
| AgentOutputLineData
+ | ToolUseData
+ | TurnApproachingLimitData
+ | TurnCappedData
| LogMessageData
)
"""Union of all typed event data payloads."""
diff --git a/src/ralphify/_frontmatter.py b/src/ralphify/_frontmatter.py
index a361c241..737ca6ab 100644
--- a/src/ralphify/_frontmatter.py
+++ b/src/ralphify/_frontmatter.py
@@ -26,6 +26,14 @@
FIELD_ARGS = "args"
FIELD_CREDIT = "credit"
FIELD_RALPH = "ralph"
+# Promise config keeps the legacy key names. ``completion_signal`` stores the
+# inner promise text, not the surrounding ``...`` markup.
+FIELD_COMPLETION_SIGNAL = "completion_signal"
+FIELD_STOP_ON_COMPLETION_SIGNAL = "stop_on_completion_signal"
+
+# Per-iteration turn-cap configuration — see docs/specs/cli-adapter-layer.md.
+FIELD_MAX_TURNS = "max_turns"
+FIELD_MAX_TURNS_GRACE = "max_turns_grace"
# Sub-field names within each command mapping.
CMD_FIELD_NAME = "name"
@@ -108,8 +116,7 @@ def parse_frontmatter(text: str) -> tuple[dict[str, Any], str]:
Returns ``(frontmatter_dict, body_text)``.
"""
- if text.startswith(_UTF8_BOM):
- text = text.removeprefix(_UTF8_BOM)
+ text = text.removeprefix(_UTF8_BOM)
fm_raw, body = _extract_frontmatter_block(text)
if fm_raw:
try:
diff --git a/src/ralphify/_output.py b/src/ralphify/_output.py
index 766dfc33..9da59e6b 100644
--- a/src/ralphify/_output.py
+++ b/src/ralphify/_output.py
@@ -71,10 +71,9 @@ def collect_output(
parts: list[str] = []
for stream in (stdout, stderr):
if stream:
- text = ensure_str(stream)
if parts and not parts[-1].endswith("\n"):
parts.append("\n")
- parts.append(text)
+ parts.append(ensure_str(stream))
return "".join(parts)
@@ -130,8 +129,8 @@ def format_duration(seconds: float) -> str:
# latter silently drops 0.5s when the total is even (e.g. 90.5→90).
total = int(seconds + 0.5)
minutes = total // _SECONDS_PER_MINUTE
- secs = total % _SECONDS_PER_MINUTE
if minutes < _MINUTES_PER_HOUR:
+ secs = total % _SECONDS_PER_MINUTE
return f"{minutes}m {secs}s"
hours = minutes // _MINUTES_PER_HOUR
mins = minutes % _MINUTES_PER_HOUR
diff --git a/src/ralphify/_promise.py b/src/ralphify/_promise.py
new file mode 100644
index 00000000..362b78fa
--- /dev/null
+++ b/src/ralphify/_promise.py
@@ -0,0 +1,27 @@
+"""Parse promise completion tags emitted by agents."""
+
+from __future__ import annotations
+
+import re
+
+_PROMISE_TAG_RE = re.compile(r"(.*?)", re.DOTALL)
+
+
+def _normalize_promise_text(text: str) -> str:
+ """Collapse internal whitespace so config and tag payloads compare consistently."""
+ return " ".join(text.split())
+
+
+def parse_promise_tags(text: str | None) -> list[str]:
+ """Return normalized inner text from all well-formed promise tags in *text*."""
+ if not text:
+ return []
+ return [
+ _normalize_promise_text(match.group(1))
+ for match in _PROMISE_TAG_RE.finditer(text)
+ ]
+
+
+def has_promise_completion(text: str | None, completion_signal: str) -> bool:
+ """Return True when *text* contains a matching promise completion tag."""
+ return _normalize_promise_text(completion_signal) in parse_promise_tags(text)
diff --git a/src/ralphify/_resolver.py b/src/ralphify/_resolver.py
index 6281f5d6..2ec44762 100644
--- a/src/ralphify/_resolver.py
+++ b/src/ralphify/_resolver.py
@@ -35,11 +35,10 @@
def resolve_args(prompt: str, user_args: dict[str, str]) -> str:
"""Replace ``{{ args.name }}`` placeholders with user-supplied values.
- When *user_args* is empty, clears any remaining ``{{ args.* }}``
- placeholders so they don't leak into the assembled prompt.
+ Unknown names (and all placeholders when *user_args* is empty) resolve
+ to the empty string, so stray ``{{ args.* }}`` never leak into the
+ assembled prompt.
"""
- if not user_args:
- return _ARGS_RE.sub("", prompt)
def _replace(match: re.Match) -> str:
return user_args.get(match.group(1), "")
diff --git a/src/ralphify/_run_types.py b/src/ralphify/_run_types.py
index 411ec93b..161de77b 100644
--- a/src/ralphify/_run_types.py
+++ b/src/ralphify/_run_types.py
@@ -20,6 +20,9 @@
DEFAULT_COMMAND_TIMEOUT: float = 60
"""Default timeout in seconds for commands defined in RALPH.md frontmatter."""
+DEFAULT_COMPLETION_SIGNAL = "RALPH_PROMISE_COMPLETE"
+"""Default inner ``...`` text that marks promise completion."""
+
RUN_ID_LENGTH: int = 12
"""Number of hex characters used for generated run IDs."""
@@ -97,6 +100,14 @@ class RunConfig:
log_dir: Path | None = None
project_root: Path = field(default=Path("."))
credit: bool = True
+ # Inner text expected inside ``...``.
+ completion_signal: str = DEFAULT_COMPLETION_SIGNAL
+ # Stop the run when the configured promise payload is observed.
+ stop_on_completion_signal: bool = False
+ # Per-iteration tool-use cap; None disables the cap.
+ max_turns: int | None = None
+ # Soft wind-down fires at ``max_turns - max_turns_grace``.
+ max_turns_grace: int = 2
@dataclass(slots=True)
@@ -120,6 +131,7 @@ class RunState:
failed: int = 0
timed_out_count: int = 0
started_at: datetime | None = None
+ promise_completed: bool = False
_stop_event: threading.Event = field(
default_factory=threading.Event, init=False, repr=False, compare=False
@@ -134,27 +146,22 @@ def total(self) -> int:
return self.completed + self.failed
def __post_init__(self) -> None:
- # Set initially: the run starts in an unpaused (resumed) state.
self._resume_event.set()
def request_stop(self) -> None:
- """Signal the loop to stop after the current iteration."""
self._stop_event.set()
self._resume_event.set()
def request_pause(self) -> None:
- """Pause the loop between iterations until resumed."""
self.status = RunStatus.PAUSED
self._resume_event.clear()
def request_resume(self) -> None:
- """Resume a paused loop."""
self.status = RunStatus.RUNNING
self._resume_event.set()
@property
def stop_requested(self) -> bool:
- """Whether a stop has been requested."""
return self._stop_event.is_set()
def wait_for_stop(self, timeout: float | None = None) -> bool:
@@ -163,7 +170,6 @@ def wait_for_stop(self, timeout: float | None = None) -> bool:
@property
def paused(self) -> bool:
- """Whether the run is currently paused."""
return not self._resume_event.is_set()
def wait_for_unpause(self, timeout: float | None = None) -> bool:
@@ -171,11 +177,9 @@ def wait_for_unpause(self, timeout: float | None = None) -> bool:
return self._resume_event.wait(timeout=timeout)
def mark_completed(self) -> None:
- """Record a successful iteration."""
self.completed += 1
def mark_failed(self) -> None:
- """Record a failed iteration."""
self.failed += 1
def mark_timed_out(self) -> None:
diff --git a/src/ralphify/adapters/__init__.py b/src/ralphify/adapters/__init__.py
new file mode 100644
index 00000000..a21ab80a
--- /dev/null
+++ b/src/ralphify/adapters/__init__.py
@@ -0,0 +1,66 @@
+"""Pluggable CLI adapter layer.
+
+Each supported agent CLI (Claude, Codex, Copilot, ...) implements the
+:class:`CLIAdapter` protocol in its own module under this package. The
+engine dispatches on :func:`select_adapter` at run time, so adding a new
+CLI means writing one file and registering it in :data:`ADAPTERS` — no
+edits to the engine, emitter, or subprocess machinery.
+
+Adapters translate the CLI's native output format to a common
+:class:`AdapterEvent` stream and advertise capability flags so the core
+loop can gracefully degrade when a CLI lacks structured output or hook
+injection. Process lifecycle (spawn, SIGTERM at cap, reap) stays in
+``_agent.py``; adapters only observe.
+
+The Protocol, :class:`AdapterEvent`, and :data:`ADAPTERS` live in
+:mod:`_protocol` so concrete adapter modules can depend on them
+without cycling through this package's ``__init__``.
+"""
+
+from __future__ import annotations
+
+from ralphify.adapters._protocol import (
+ ADAPTERS,
+ AdapterEvent,
+ AdapterEventKind,
+ CLIAdapter,
+ CountsWhat,
+)
+
+
+def select_adapter(cmd: list[str]) -> CLIAdapter:
+ """Return the first registered adapter that claims *cmd*.
+
+ Falls back to :class:`GenericAdapter` when nothing matches. Never
+ returns None — callers can always dispatch safely.
+ """
+ from ralphify.adapters._generic import GenericAdapter
+
+ for adapter in ADAPTERS:
+ if adapter.matches(cmd):
+ return adapter
+ return GenericAdapter()
+
+
+def _register_builtin_adapters() -> None:
+ """Import concrete adapter modules so their ``ADAPTERS.append`` runs.
+
+ Keeps the registry populated without forcing callers to import every
+ adapter module manually. Imports are deferred to the bottom of this
+ module (executed once at first package import) so cyclic-import risk
+ is contained.
+ """
+ from ralphify.adapters import claude, codex, copilot # noqa: F401
+
+
+_register_builtin_adapters()
+
+
+__all__ = [
+ "ADAPTERS",
+ "AdapterEvent",
+ "AdapterEventKind",
+ "CLIAdapter",
+ "CountsWhat",
+ "select_adapter",
+]
diff --git a/src/ralphify/adapters/_generic.py b/src/ralphify/adapters/_generic.py
new file mode 100644
index 00000000..3bffb514
--- /dev/null
+++ b/src/ralphify/adapters/_generic.py
@@ -0,0 +1,70 @@
+"""Fallback adapter for CLIs with no dedicated implementation.
+
+Returned by :func:`ralphify.adapters.select_adapter` when no specific
+adapter's ``matches`` returns True. All capability flags are False, so
+the core loop treats sessions as blocking, untyped, and uncappable.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from ralphify._promise import has_promise_completion
+from ralphify.adapters._protocol import AdapterEvent, CountsWhat
+
+
+class GenericAdapter:
+ """No-op adapter: pass commands through unchanged, parse nothing."""
+
+ name: str = "generic"
+ counts_what: CountsWhat = "none"
+ supports_streaming: bool = False
+ renders_structured_peek: bool = False
+ supports_soft_wind_down: bool = False
+ # Untyped agents have no streaming result event; the engine must keep
+ # the full stdout buffer if it wants promise detection.
+ requires_full_stdout_for_completion: bool = True
+
+ def matches(self, cmd: list[str]) -> bool:
+ return False
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ return list(cmd)
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ return None
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Scan the full stdout for the promise tag.
+
+ Unknown CLIs have no event schema to parse, so the whole-stdout
+ regex scan is the only reliable path. Matches the current
+ engine-side behavior so switching to adapter-owned detection does
+ not regress promise completion for untyped agents.
+
+ *result_text* is unused (the blocking path does not populate it
+ for unknown CLIs); the engine opts into
+ ``requires_full_stdout_for_completion`` to make sure *stdout* is
+ supplied when promise detection is requested.
+ """
+ del result_text
+ if stdout is None:
+ return False
+ return has_promise_completion(stdout, user_signal)
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ raise NotImplementedError(
+ "GenericAdapter does not support soft wind-down; max_turns will hard-kill."
+ )
diff --git a/src/ralphify/adapters/_protocol.py b/src/ralphify/adapters/_protocol.py
new file mode 100644
index 00000000..e58f2a67
--- /dev/null
+++ b/src/ralphify/adapters/_protocol.py
@@ -0,0 +1,116 @@
+"""Adapter Protocol, event type, and registry.
+
+Concrete adapter modules (:mod:`claude`, :mod:`codex`, :mod:`copilot`,
+:mod:`_generic`) import from here rather than from the package
+``__init__``. The package ``__init__`` populates :data:`ADAPTERS` by
+importing concrete adapters, and those adapters need the Protocol
+before the ``__init__`` finishes executing — keeping the Protocol in a
+leaf module makes the import graph acyclic.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Literal, NamedTuple, Protocol, runtime_checkable
+
+
+AdapterEventKind = Literal["tool_use", "turn", "message", "result"]
+"""Categories of events an adapter can surface from a CLI's output stream."""
+
+CountsWhat = Literal["tool_use", "turn", "none"]
+"""What an adapter counts against ``max_turns`` — tool uses, turns, or nothing."""
+
+
+class AdapterEvent(NamedTuple):
+ """A single structured event parsed from a CLI's output stream.
+
+ ``kind`` is the event category; ``name`` carries a tool name for
+ ``tool_use`` events (``None`` otherwise); ``raw`` is the original
+ parsed JSON object so callers can inspect extra fields when needed.
+ """
+
+ kind: AdapterEventKind
+ name: str | None = None
+ raw: dict | None = None
+
+
+@runtime_checkable
+class CLIAdapter(Protocol):
+ """Protocol every CLI adapter must satisfy.
+
+ Adapters are stateless singletons: the same instance is reused for
+ every iteration of every run. Any per-iteration state lives in the
+ caller (``_agent.py``) — adapters only translate.
+ """
+
+ name: str
+ counts_what: CountsWhat
+ supports_streaming: bool
+ renders_structured_peek: bool
+ supports_soft_wind_down: bool
+ requires_full_stdout_for_completion: bool
+
+ def matches(self, cmd: list[str]) -> bool:
+ """Return True if this adapter handles the given agent command."""
+ ...
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ """Return the command with any adapter-required flags appended.
+
+ Idempotent: calling twice returns the same command.
+ """
+ ...
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ """Parse one line of stdout into an :class:`AdapterEvent`.
+
+ Returns ``None`` for lines that are not recognised events.
+ MUST NOT raise on malformed input (per FR-8).
+ """
+ ...
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Return True if the agent's final output contains the completion signal.
+
+ The signal is wrapped in ``...`` markup; the
+ inner text equals ``user_signal``.
+
+ Adapters receive both the streaming-extracted *result_text* (the
+ terminal assistant message, when the streaming path could parse one)
+ and the full *stdout* buffer (only present when the engine chose to
+ capture it). Adapters with ``requires_full_stdout_for_completion``
+ set False MUST be able to detect completion from *result_text* alone;
+ engines may pass ``stdout=None`` to skip the memory cost.
+ """
+ ...
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ """Write hook config files into *tempdir* and return env-var overrides.
+
+ Only called when ``supports_soft_wind_down`` is True. Adapters that
+ set the flag False may leave this unimplemented (a ``NotImplementedError``
+ is acceptable and is treated as a runtime downgrade to hard-cap-only).
+ """
+ ...
+
+
+ADAPTERS: list[CLIAdapter] = []
+"""Adapter registry, populated at import time by concrete adapter modules.
+
+Ordering matters: :func:`ralphify.adapters.select_adapter` returns the
+first adapter whose ``matches`` method returns True, with
+:class:`ralphify.adapters._generic.GenericAdapter` as a final catch-all.
+Specific adapters go first, generic last.
+"""
diff --git a/src/ralphify/adapters/claude.py b/src/ralphify/adapters/claude.py
new file mode 100644
index 00000000..49b0ad92
--- /dev/null
+++ b/src/ralphify/adapters/claude.py
@@ -0,0 +1,172 @@
+"""Claude Code adapter.
+
+Claude is the only CLI shipping a stable ``--output-format stream-json``
+protocol today; its structured events drive the peek panel and power
+per-event tool-use counting. Every Claude iteration emits:
+
+1. A ``system`` init event with the model name.
+2. Zero or more ``assistant`` messages whose ``content`` list may include
+ ``tool_use``, ``thinking``, and ``text`` blocks.
+3. Zero or more ``user`` messages (tool results echoed back).
+4. A terminal ``result`` event carrying the final assistant text.
+
+Tool-use counting is scoped to ``assistant`` messages; we ignore
+``tool_use`` blocks echoed back by ``user`` events so each invocation is
+counted exactly once.
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+from ralphify._promise import has_promise_completion
+from ralphify.adapters._protocol import ADAPTERS, AdapterEvent, CountsWhat
+
+
+CLAUDE_BINARY_STEM = "claude"
+"""Binary stem (``Path(cmd[0]).stem``) that identifies the Claude CLI."""
+
+_OUTPUT_FORMAT_FLAG = "--output-format"
+_OUTPUT_FORMAT_VALUE = "stream-json"
+_VERBOSE_FLAG = "--verbose"
+
+_EVENT_TYPE_ASSISTANT = "assistant"
+_EVENT_TYPE_RESULT = "result"
+_BLOCK_TYPE_TOOL_USE = "tool_use"
+
+
+class ClaudeAdapter:
+ """Parses Claude's stream-json output and supports soft wind-down."""
+
+ name: str = "claude"
+ counts_what: CountsWhat = "tool_use"
+ supports_streaming: bool = True
+ renders_structured_peek: bool = True
+ supports_soft_wind_down: bool = True
+ # Claude's final assistant text already arrives as ``agent.result_text``
+ # via the stream-json ``result`` event, so the engine does not need to
+ # buffer the full stdout to scan for the promise tag.
+ requires_full_stdout_for_completion: bool = False
+
+ def matches(self, cmd: list[str]) -> bool:
+ if not cmd:
+ return False
+ return Path(cmd[0]).stem == CLAUDE_BINARY_STEM
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ """Ensure ``--output-format stream-json --verbose`` is present.
+
+ Idempotent: running twice yields the same command. If the caller
+ already supplied ``--output-format ``, the existing value is
+ overwritten with ``stream-json`` — we cannot honor a user-chosen
+ format while still emitting a parseable event stream.
+ """
+ result = list(cmd)
+ try:
+ format_index = result.index(_OUTPUT_FORMAT_FLAG)
+ except ValueError:
+ result.extend([_OUTPUT_FORMAT_FLAG, _OUTPUT_FORMAT_VALUE])
+ else:
+ value_index = format_index + 1
+ if value_index < len(result):
+ result[value_index] = _OUTPUT_FORMAT_VALUE
+ else:
+ result.append(_OUTPUT_FORMAT_VALUE)
+ if _VERBOSE_FLAG not in result:
+ result.append(_VERBOSE_FLAG)
+ return result
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ """Parse one stream-json line into an :class:`AdapterEvent`.
+
+ Empty lines, non-JSON payloads, and non-dict JSON return ``None``.
+ ``result`` events return ``AdapterEvent(kind="result")``. An
+ ``assistant`` event whose content contains a ``tool_use`` block
+ returns the first such block as ``AdapterEvent(kind="tool_use")``;
+ Claude emits one tool_use block per assistant message, so
+ single-event dispatch matches the protocol. Every other parsed
+ event dict — including non-tool-use ``assistant`` messages —
+ returns ``AdapterEvent(kind="message")`` so callers can still
+ render them without counting against the turn cap.
+ """
+ stripped = line.strip()
+ if not stripped:
+ return None
+ try:
+ parsed = json.loads(stripped)
+ except json.JSONDecodeError:
+ return None
+ if not isinstance(parsed, dict):
+ return None
+
+ event_type = parsed.get("type")
+ if event_type == _EVENT_TYPE_RESULT:
+ return AdapterEvent(kind="result", raw=parsed)
+ if event_type != _EVENT_TYPE_ASSISTANT:
+ return AdapterEvent(kind="message", raw=parsed)
+
+ for block in _iter_content_blocks(parsed):
+ if block.get("type") == _BLOCK_TYPE_TOOL_USE:
+ name = block.get("name")
+ return AdapterEvent(
+ kind="tool_use",
+ name=name if isinstance(name, str) else None,
+ raw=parsed,
+ )
+ return AdapterEvent(kind="message", raw=parsed)
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Scan the streaming-extracted result text for ``{signal}``.
+
+ Claude's terminal ``result`` event carries the last assistant
+ message as a plain string, which the streaming reader already
+ captures into :attr:`AgentResult.result_text`. Using *result_text*
+ directly avoids buffering the full stdout — large transcripts can
+ run into many megabytes per iteration.
+
+ Only the parsed result text is considered — raw JSON from
+ ``status`` or ``assistant`` messages can legitimately echo
+ ``...`` substrings that must not trigger
+ completion.
+
+ *stdout* is unused (Claude does not need a fallback because the
+ streaming path always populates *result_text* on a successful run);
+ it stays in the signature for protocol parity.
+ """
+ del stdout
+ if result_text is None:
+ return False
+ return has_promise_completion(result_text, user_signal)
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ raise NotImplementedError(
+ "Claude soft wind-down (settings.json PreToolUse hook) is scheduled "
+ "for Phase 3 of the CLI adapter layer spec."
+ )
+
+
+def _iter_content_blocks(raw: dict) -> list[dict]:
+ """Return the ``message.content`` list, filtered to dict blocks only."""
+ message = raw.get("message")
+ if not isinstance(message, dict):
+ return []
+ content = message.get("content")
+ if not isinstance(content, list):
+ return []
+ return [block for block in content if isinstance(block, dict)]
+
+
+ADAPTERS.append(ClaudeAdapter())
diff --git a/src/ralphify/adapters/codex.py b/src/ralphify/adapters/codex.py
new file mode 100644
index 00000000..6ee9069c
--- /dev/null
+++ b/src/ralphify/adapters/codex.py
@@ -0,0 +1,198 @@
+"""Codex CLI adapter.
+
+Codex emits newline-delimited JSON with explicit event types:
+
+- ``TurnStarted`` / ``TurnCompleted`` — conversation turn boundaries.
+- ``CollabToolCall`` / ``McpToolCall`` — tool invocations initiated by
+ the agent.
+- ``CommandExecution`` — shell commands run inside the sandbox.
+
+We map every tool-call event to ``AdapterEvent(kind="tool_use", ...)``
+so the turn-cap counter uses the same user-facing metric across CLIs.
+Turn boundaries surface as ``kind="turn"`` events for adapters that want
+them; they do not count against ``max_turns`` today (counts_what is
+``tool_use``, not ``turn``, for a unified metric — see spec Q13).
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+from ralphify._promise import has_promise_completion
+from ralphify.adapters._protocol import ADAPTERS, AdapterEvent, CountsWhat
+
+
+CODEX_BINARY_STEM = "codex"
+"""Binary stem (``Path(cmd[0]).stem``) that identifies the Codex CLI."""
+
+_JSON_FLAG = "--json"
+"""Flag appended to request newline-delimited JSON output."""
+
+_TURN_EVENTS: frozenset[str] = frozenset({"TurnStarted", "TurnCompleted"})
+_TOOL_CALL_EVENTS: frozenset[str] = frozenset(
+ {"CollabToolCall", "McpToolCall", "CommandExecution"}
+)
+_RESULT_EVENTS: frozenset[str] = frozenset({"TaskComplete", "TurnCompleted"})
+
+
+class CodexAdapter:
+ """Parses Codex's ``--json`` event stream."""
+
+ name: str = "codex"
+ counts_what: CountsWhat = "tool_use"
+ supports_streaming: bool = True
+ # Codex emits structured JSON that the streaming execution path parses
+ # for activity callbacks, but the console peek panel only understands
+ # Claude's stream-json schema today. Keep peek in raw-line mode until
+ # the emitter can render Codex events directly.
+ renders_structured_peek: bool = False
+ supports_soft_wind_down: bool = True
+ # Codex's terminal text lives inside ``TaskComplete`` / ``TurnCompleted``
+ # events, which the streaming reader does not extract into
+ # ``agent.result_text``. The full stdout buffer is currently the only
+ # source for promise-tag scanning.
+ requires_full_stdout_for_completion: bool = True
+
+ def matches(self, cmd: list[str]) -> bool:
+ if not cmd:
+ return False
+ return Path(cmd[0]).stem == CODEX_BINARY_STEM
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ """Append ``--json`` to request structured output. Idempotent."""
+ result = list(cmd)
+ if _JSON_FLAG not in result:
+ result.append(_JSON_FLAG)
+ return result
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ """Classify one JSONL line as turn / tool_use / message / result.
+
+ Unknown event types return ``AdapterEvent(kind="message", ...)`` so
+ callers can still render them (e.g. peek panel) without counting
+ them against the turn cap. Malformed lines return ``None``.
+ """
+ stripped = line.strip()
+ if not stripped:
+ return None
+ try:
+ parsed = json.loads(stripped)
+ except json.JSONDecodeError:
+ return None
+ if not isinstance(parsed, dict):
+ return None
+
+ event_type = _event_type(parsed)
+ if event_type in _TOOL_CALL_EVENTS:
+ return AdapterEvent(
+ kind="tool_use",
+ name=_tool_name(parsed, event_type),
+ raw=parsed,
+ )
+ if event_type in _RESULT_EVENTS:
+ return AdapterEvent(kind="result", raw=parsed)
+ if event_type in _TURN_EVENTS:
+ return AdapterEvent(kind="turn", raw=parsed)
+ return AdapterEvent(kind="message", raw=parsed)
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Scan every ``TurnCompleted`` / ``TaskComplete`` event for the promise tag.
+
+ Codex does not carry a single terminal ``result`` string the way
+ Claude does; completion may be spread across assistant text in
+ multiple events. Falling back to a whole-stdout scan is safe
+ because promise tags are explicit and non-ambiguous markup.
+
+ *result_text* is unused — Codex never populates it through the
+ streaming reader (no ``{"type":"result"}`` lines). The engine
+ opts into ``requires_full_stdout_for_completion`` to make sure
+ *stdout* is supplied when promise detection is requested.
+ """
+ del result_text
+ if stdout is None:
+ return False
+ if has_promise_completion(stdout, user_signal):
+ return True
+ for line in stdout.splitlines():
+ stripped = line.strip()
+ if not stripped:
+ continue
+ try:
+ parsed = json.loads(stripped)
+ except json.JSONDecodeError:
+ continue
+ if isinstance(parsed, dict) and _event_type(parsed) in _RESULT_EVENTS:
+ text = _event_text_payload(parsed)
+ if text and has_promise_completion(text, user_signal):
+ return True
+ return False
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ raise NotImplementedError(
+ "Codex soft wind-down (hooks.json PostToolUse) is scheduled "
+ "for Phase 3 of the CLI adapter layer spec."
+ )
+
+
+def _event_type(parsed: dict) -> str | None:
+ """Return the Codex event type, whether top-level or nested under ``type``."""
+ event_type = parsed.get("type") or parsed.get("kind")
+ if isinstance(event_type, str):
+ return event_type
+ msg = parsed.get("msg")
+ if isinstance(msg, dict):
+ nested = msg.get("type") or msg.get("kind")
+ if isinstance(nested, str):
+ return nested
+ return None
+
+
+def _tool_name(parsed: dict, event_type: str | None) -> str | None:
+ """Best-effort extraction of the tool name from a tool-call event.
+
+ Codex event shapes vary by tool type — ``CommandExecution`` carries a
+ command, ``CollabToolCall`` a tool name, ``McpToolCall`` a server +
+ tool. When no specific name is available, return the event type.
+ """
+ for key in ("name", "tool", "tool_name"):
+ value = parsed.get(key)
+ if isinstance(value, str):
+ return value
+ msg = parsed.get("msg")
+ if isinstance(msg, dict):
+ for key in ("name", "tool", "tool_name", "command"):
+ value = msg.get(key)
+ if isinstance(value, str):
+ return value
+ return event_type
+
+
+def _event_text_payload(parsed: dict) -> str | None:
+ """Extract any final-assistant text from a Codex result event."""
+ for key in ("result", "text", "content", "output"):
+ value = parsed.get(key)
+ if isinstance(value, str):
+ return value
+ msg = parsed.get("msg")
+ if isinstance(msg, dict):
+ for key in ("result", "text", "content", "output"):
+ value = msg.get(key)
+ if isinstance(value, str):
+ return value
+ return None
+
+
+ADAPTERS.append(CodexAdapter())
diff --git a/src/ralphify/adapters/copilot.py b/src/ralphify/adapters/copilot.py
new file mode 100644
index 00000000..deeb6cf1
--- /dev/null
+++ b/src/ralphify/adapters/copilot.py
@@ -0,0 +1,151 @@
+"""GitHub Copilot CLI adapter (alpha).
+
+The standalone ``copilot`` binary (GA 2026-02-25) ships a
+``--output-format json`` mode that is **only loosely documented**. This
+adapter does best-effort counting based on the empirical shapes seen in
+the ralphify test corpus; unknown event types return ``None`` rather
+than crashing.
+
+Capability matrix:
+
+- ``counts_what = "tool_use"`` with an alpha caveat — counting accuracy
+ depends on ongoing schema discovery (see :file:`docs/agents.md`).
+- ``supports_streaming = False`` — event schema is unverified, so the
+ adapter falls through the blocking path and avoids per-line parsing.
+- ``renders_structured_peek = False`` — peek panel stays in raw-line mode.
+- ``supports_soft_wind_down = False`` — Copilot has no hook system as of
+ 2026-04, so ``install_wind_down_hook`` raises :class:`NotImplementedError`
+ (which the engine downgrades to hard-cap-only).
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+from ralphify._promise import has_promise_completion
+from ralphify.adapters._protocol import ADAPTERS, AdapterEvent, CountsWhat
+
+
+COPILOT_BINARY_STEM = "copilot"
+"""Binary stem (``Path(cmd[0]).stem``) that identifies the standalone Copilot CLI.
+
+Note: this is the GA ``copilot`` binary, NOT the ``gh copilot`` subcommand.
+The ``gh`` stem is deliberately excluded because ``gh`` hosts many other
+subcommands that have nothing to do with AI agents.
+"""
+
+_OUTPUT_FORMAT_FLAGS: tuple[str, ...] = ("--output-format", "json")
+
+_TOOL_USE_EVENT_TYPES: frozenset[str] = frozenset(
+ {"tool_use", "tool_call", "ToolCall", "ToolUse"}
+)
+_RESULT_EVENT_TYPES: frozenset[str] = frozenset(
+ {"result", "response", "final", "Final", "Complete"}
+)
+
+
+class CopilotAdapter:
+ """Best-effort adapter for the standalone Copilot CLI."""
+
+ name: str = "copilot"
+ counts_what: CountsWhat = "tool_use"
+ supports_streaming: bool = False
+ renders_structured_peek: bool = False
+ supports_soft_wind_down: bool = False
+ # Copilot runs on the blocking path with no stream parsing today; the
+ # promise tag must be located somewhere in the captured stdout.
+ requires_full_stdout_for_completion: bool = True
+
+ def matches(self, cmd: list[str]) -> bool:
+ if not cmd:
+ return False
+ return Path(cmd[0]).stem == COPILOT_BINARY_STEM
+
+ def build_command(self, cmd: list[str]) -> list[str]:
+ """Ensure ``--output-format json`` is present.
+
+ Idempotent: running twice yields the same command. If the caller
+ already supplied ``--output-format ``, the existing value is
+ overwritten with ``json`` — we cannot honor a user-chosen format
+ while still emitting a parseable event stream.
+ """
+ result = list(cmd)
+ output_format_flag, output_format_value = _OUTPUT_FORMAT_FLAGS
+ try:
+ format_index = result.index(output_format_flag)
+ except ValueError:
+ result.extend(_OUTPUT_FORMAT_FLAGS)
+ else:
+ value_index = format_index + 1
+ if value_index < len(result):
+ result[value_index] = output_format_value
+ else:
+ result.append(output_format_value)
+ return result
+
+ def parse_event(self, line: str) -> AdapterEvent | None:
+ """Parse best-effort; return ``None`` for unknown shapes.
+
+ The Copilot event schema is ``[unverified]`` in the spec — this
+ method intentionally errs on the side of *not* inventing events
+ so the turn cap is never inflated by false positives.
+ """
+ stripped = line.strip()
+ if not stripped:
+ return None
+ try:
+ parsed = json.loads(stripped)
+ except json.JSONDecodeError:
+ return None
+ if not isinstance(parsed, dict):
+ return None
+
+ event_type = parsed.get("type") or parsed.get("event") or parsed.get("kind")
+ if not isinstance(event_type, str):
+ return None
+
+ if event_type in _TOOL_USE_EVENT_TYPES:
+ name = parsed.get("name") or parsed.get("tool")
+ return AdapterEvent(
+ kind="tool_use",
+ name=name if isinstance(name, str) else None,
+ raw=parsed,
+ )
+ if event_type in _RESULT_EVENT_TYPES:
+ return AdapterEvent(kind="result", raw=parsed)
+ return None
+
+ def extract_completion_signal(
+ self,
+ *,
+ result_text: str | None,
+ stdout: str | None,
+ user_signal: str,
+ ) -> bool:
+ """Scan the entire stdout for the promise tag.
+
+ Without a verified event schema there is no reliable per-event
+ extraction path; the whole-stdout scan is the safest fallback.
+ *result_text* is unused — Copilot runs on the blocking path and
+ does not produce a streaming result event today.
+ """
+ del result_text
+ if stdout is None:
+ return False
+ return has_promise_completion(stdout, user_signal)
+
+ def install_wind_down_hook(
+ self,
+ tempdir: Path,
+ counter_path: Path,
+ cap: int,
+ grace: int,
+ ) -> dict[str, str]:
+ raise NotImplementedError(
+ "Copilot CLI has no hook system as of 2026-04; max_turns "
+ "will hard-kill without soft wind-down signal."
+ )
+
+
+ADAPTERS.append(CopilotAdapter())
diff --git a/src/ralphify/cli.py b/src/ralphify/cli.py
index 6652d535..48ae28e0 100644
--- a/src/ralphify/cli.py
+++ b/src/ralphify/cli.py
@@ -32,6 +32,10 @@
FIELD_ARGS,
FIELD_COMMANDS,
FIELD_CREDIT,
+ FIELD_COMPLETION_SIGNAL,
+ FIELD_MAX_TURNS,
+ FIELD_MAX_TURNS_GRACE,
+ FIELD_STOP_ON_COMPLETION_SIGNAL,
RALPH_MARKER,
VALID_NAME_CHARS_MSG,
parse_frontmatter,
@@ -39,6 +43,7 @@
from ralphify._output import IS_WINDOWS
from ralphify._run_types import (
Command,
+ DEFAULT_COMPLETION_SIGNAL,
DEFAULT_COMMAND_TIMEOUT,
RunConfig,
RunState,
@@ -56,13 +61,11 @@
def _exit_error(msg: str) -> NoReturn:
- """Print an error in red and exit with code 1."""
_console.print(f"[red]{escape_markup(msg)}[/]")
raise typer.Exit(1)
def _is_nonempty_string(value: Any) -> bool:
- """Return True if *value* is a non-empty string (after stripping whitespace)."""
return isinstance(value, str) and bool(value.strip())
@@ -128,6 +131,10 @@ def _check_unique_name(name: str, seen: set[str], context: str) -> None:
run: git log --oneline -5
args:
- focus
+# Optional early exit: uncomment both lines and have the agent emit
+# COMPLETE when the task is done.
+# completion_signal: COMPLETE
+# stop_on_completion_signal: true
---
You are an autonomous coding agent running in a loop. Each iteration
@@ -146,6 +153,7 @@ def _check_unique_name(name: str, seen: set[str], context: str) -> None:
- Implement one thing per iteration
+- If you enable promise completion above, emit COMPLETE and exit
- Run tests and fix failures before committing
- Commit with a descriptive message and push
"""
@@ -202,7 +210,13 @@ def scaffold(
help="Directory name. If omitted, creates RALPH.md in the current directory.",
),
) -> None:
- """Scaffold a new ralph with a ready-to-customize template."""
+ """Scaffold a new ralph with a ready-to-customize template.
+
+ The template includes example commands and args plus an optional
+ completion-signal path you can enable with ``completion_signal`` and
+ ``stop_on_completion_signal`` when the agent should stop early by
+ emitting a matching ``...`` tag.
+ """
if name:
target_dir = Path.cwd() / name
target_dir.mkdir(exist_ok=True)
@@ -219,6 +233,13 @@ def scaffold(
_console.print(
f"[dim]Edit the file, then run:[/] ralph run {escape_markup(name or '.')}"
)
+ _console.print(
+ "[dim]Optional early exit:[/] uncomment "
+ f"{escape_markup(FIELD_COMPLETION_SIGNAL)} + "
+ f"{escape_markup(FIELD_STOP_ON_COMPLETION_SIGNAL)} "
+ "and emit "
+ f"{escape_markup('COMPLETE')}."
+ )
def _parse_user_args(
@@ -428,6 +449,71 @@ def _validate_credit(raw_credit: Any) -> bool:
return raw_credit
+def _validate_completion_signal(raw_signal: Any) -> str:
+ """Validate the inner ``...`` text from frontmatter."""
+ if raw_signal is None:
+ return DEFAULT_COMPLETION_SIGNAL
+ if not _is_nonempty_string(raw_signal):
+ _exit_error(f"'{FIELD_COMPLETION_SIGNAL}' must be a non-empty string.")
+ if raw_signal != raw_signal.strip():
+ _exit_error(
+ f"'{FIELD_COMPLETION_SIGNAL}' must not include leading or trailing whitespace."
+ )
+ if "<" in raw_signal or ">" in raw_signal:
+ _exit_error(
+ f"'{FIELD_COMPLETION_SIGNAL}' must be the text inside "
+ "..., not markup or a raw output fragment. "
+ "Example: completion_signal: COMPLETE"
+ )
+ return raw_signal
+
+
+def _validate_max_turns(raw: Any) -> int | None:
+ """Validate the ``max_turns`` frontmatter field.
+
+ Returns ``None`` when absent (no cap). Exits with an error when the
+ value is not a positive integer.
+ """
+ if raw is None:
+ return None
+ if isinstance(raw, bool) or not isinstance(raw, int) or raw < 1:
+ _exit_error(f"'{FIELD_MAX_TURNS}' must be a positive integer, got {raw!r}.")
+ return raw
+
+
+def _validate_max_turns_grace(raw: Any, max_turns: int | None) -> int:
+ """Validate the ``max_turns_grace`` field.
+
+ Defaults to ``2`` when absent. Must be a non-negative integer and
+ strictly less than *max_turns* (when *max_turns* is set).
+ """
+ if raw is None:
+ grace = 2
+ elif isinstance(raw, bool) or not isinstance(raw, int) or raw < 0:
+ _exit_error(
+ f"'{FIELD_MAX_TURNS_GRACE}' must be a non-negative integer, got {raw!r}."
+ )
+ else:
+ grace = raw
+ if max_turns is not None and grace >= max_turns:
+ _exit_error(
+ f"'{FIELD_MAX_TURNS_GRACE}' ({grace}) must be less than "
+ f"'{FIELD_MAX_TURNS}' ({max_turns})."
+ )
+ return grace
+
+
+def _validate_stop_on_completion_signal(raw_value: Any) -> bool:
+ """Validate the stop-on-completion-signal frontmatter field."""
+ if raw_value is None:
+ return False
+ if not isinstance(raw_value, bool):
+ _exit_error(
+ f"'{FIELD_STOP_ON_COMPLETION_SIGNAL}' must be true or false, got {raw_value!r}."
+ )
+ return raw_value
+
+
def _validate_run_options(
max_iterations: int | None,
delay: float,
@@ -471,6 +557,14 @@ def _build_run_config(
ralph_args = _parse_user_args(extra_args, declared_names)
credit = _validate_credit(fm.get(FIELD_CREDIT))
+ completion_signal = _validate_completion_signal(fm.get(FIELD_COMPLETION_SIGNAL))
+ stop_on_completion_signal = _validate_stop_on_completion_signal(
+ fm.get(FIELD_STOP_ON_COMPLETION_SIGNAL)
+ )
+ max_turns = _validate_max_turns(fm.get(FIELD_MAX_TURNS))
+ max_turns_grace = _validate_max_turns_grace(
+ fm.get(FIELD_MAX_TURNS_GRACE), max_turns
+ )
return RunConfig(
agent=agent,
@@ -485,6 +579,10 @@ def _build_run_config(
log_dir=Path(log_dir) if log_dir else None,
project_root=Path.cwd(),
credit=credit,
+ completion_signal=completion_signal,
+ stop_on_completion_signal=stop_on_completion_signal,
+ max_turns=max_turns,
+ max_turns_grace=max_turns_grace,
)
@@ -534,6 +632,10 @@ def run(
passed as user arguments. Use {{ args.name }} placeholders in
RALPH.md to reference them.
+ To stop before the iteration budget, set ``completion_signal`` and
+ ``stop_on_completion_signal`` in frontmatter and have the agent emit
+ the matching ``...`` tag.
+
Keybindings (interactive terminal):
p Toggle live peek of agent output (on by default)
P Enter full-screen peek — scroll the entire buffer
diff --git a/src/ralphify/engine.py b/src/ralphify/engine.py
index 1072ee05..71876cf5 100644
--- a/src/ralphify/engine.py
+++ b/src/ralphify/engine.py
@@ -13,6 +13,7 @@
import traceback
from datetime import datetime, timezone
from pathlib import Path
+from typing import Any
from ralphify._agent import execute_agent
from ralphify._events import (
@@ -45,6 +46,7 @@
)
from ralphify._resolver import resolve_all, resolve_args
from ralphify._runner import run_command
+from ralphify.adapters import select_adapter
_PAUSE_POLL_INTERVAL = 0.25 # seconds between pause/resume checks
@@ -52,7 +54,6 @@
def _field_hint(field_name: str) -> str:
- """Return a user-facing hint pointing to a frontmatter field."""
return f"Check the '{field_name}' field in your {RALPH_MARKER} frontmatter."
@@ -103,8 +104,6 @@ def _run_commands(
quoted_args = {k: shlex.quote(v) for k, v in user_args.items()}
for cmd in commands:
run_str = resolve_args(cmd.run, quoted_args)
- # Determine working directory: if the command starts with ./ it's
- # relative to the ralph directory, otherwise use project root.
if run_str.lstrip().startswith(_RELATIVE_CMD_PREFIX):
cwd = ralph_dir
else:
@@ -167,10 +166,10 @@ def _run_agent_phase(
config: RunConfig,
state: RunState,
emit: BoundEmitter,
-) -> bool:
+) -> tuple[bool, bool]:
"""Run the agent subprocess, update state counters, and emit the result event.
- Returns ``True`` when the agent exited successfully (code 0, no timeout).
+ Returns ``(agent_succeeded, stop_for_completion_signal)``.
"""
try:
cmd = shlex.split(config.agent)
@@ -179,9 +178,9 @@ def _run_agent_phase(
f"Invalid agent command syntax: {config.agent!r}. {_field_hint(FIELD_AGENT)}"
) from exc
- # Option C: recheck per-line so mid-iteration peek toggle takes effect.
- # When neither peek nor logging needs output, pass None so the blocking
- # path can inherit file descriptors (critical-01 contract).
+ adapter = select_adapter(cmd)
+ completion_signal = config.completion_signal
+
def _on_output_line(line: str, stream: OutputStream) -> None:
if emit.wants_agent_output_lines():
emit.agent_output_line(line, stream, state.iteration)
@@ -191,18 +190,35 @@ def _on_output_line(line: str, stream: OutputStream) -> None:
else:
on_output_line = None
+ # Capture full stdout only when somebody downstream actually needs the
+ # bytes — log writing, or promise detection for adapters that cannot
+ # work from ``agent.result_text`` alone. Without this gate every
+ # iteration would buffer the entire transcript even for verbose
+ # streaming agents, regressing memory vs the prior tail-scan path.
+ capture_stdout_for_promise = (
+ config.stop_on_completion_signal and adapter.requires_full_stdout_for_completion
+ )
+ capture_stdout = config.log_dir is not None or capture_stdout_for_promise
+
try:
+
+ def on_activity(data: dict[str, Any]) -> None:
+ emit(
+ EventType.AGENT_ACTIVITY,
+ AgentActivityData(raw=data, iteration=state.iteration),
+ )
+
agent = execute_agent(
cmd,
prompt,
timeout=config.timeout,
log_dir=config.log_dir,
iteration=state.iteration,
- on_activity=lambda data: emit(
- EventType.AGENT_ACTIVITY,
- AgentActivityData(raw=data, iteration=state.iteration),
- ),
+ adapter=adapter,
+ on_activity=on_activity,
on_output_line=on_output_line,
+ capture_result_text=True,
+ capture_stdout=capture_stdout,
)
except FileNotFoundError as exc:
raise FileNotFoundError(
@@ -210,6 +226,13 @@ def _on_output_line(line: str, stream: OutputStream) -> None:
) from exc
duration = format_duration(agent.elapsed)
+ promise_completed = agent.success and adapter.extract_completion_signal(
+ result_text=agent.result_text,
+ stdout=agent.captured_stdout,
+ user_signal=completion_signal,
+ )
+ if promise_completed:
+ state.promise_completed = True
if agent.timed_out:
state.mark_timed_out()
@@ -218,7 +241,13 @@ def _on_output_line(line: str, stream: OutputStream) -> None:
elif agent.success:
state.mark_completed()
event_type = EventType.ITERATION_COMPLETED
- state_detail = f"completed ({duration})"
+ if promise_completed:
+ state_detail = (
+ "completed via promise tag "
+ f"{completion_signal} ({duration})"
+ )
+ else:
+ state_detail = f"completed ({duration})"
else:
state.mark_failed()
event_type = EventType.ITERATION_FAILED
@@ -233,32 +262,36 @@ def _on_output_line(line: str, stream: OutputStream) -> None:
log_file=str(agent.log_file) if agent.log_file else None,
result_text=agent.result_text,
)
- # When logging captured output and peek was off (lines were not rendered
- # live), include captured output so the emitter can echo it after
- # stopping the Live spinner. When peek was on, lines were already shown.
- if not emit.wants_agent_output_lines() and config.log_dir is not None:
- ended_data["echo_stdout"] = agent.captured_stdout
- ended_data["echo_stderr"] = agent.captured_stderr
+ if not emit.wants_agent_output_lines():
+ # When peek was off, echo any captured raw output after the spinner
+ # stops so blocking agents do not appear silent. Structured agents
+ # already surface their parsed result_text, so avoid echoing raw JSON
+ # unless we explicitly captured logs.
+ if config.log_dir is not None:
+ ended_data["echo_stdout"] = agent.captured_stdout
+ ended_data["echo_stderr"] = agent.captured_stderr
+ elif agent.result_text is None and agent.captured_stdout is not None:
+ ended_data["echo_stdout"] = agent.captured_stdout
emit(event_type, ended_data)
- return agent.success
+ return agent.success, promise_completed and config.stop_on_completion_signal
def _run_iteration(
config: RunConfig,
state: RunState,
emit: BoundEmitter,
-) -> bool:
+) -> tuple[bool, bool]:
"""Execute one iteration of the agent loop.
- Returns ``True`` if the loop should continue, ``False`` when
- ``--stop-on-error`` triggers.
+ Returns (should_continue, stop_for_completion_signal):
+ - should_continue: True if the loop should continue, False to break
+ - stop_for_completion_signal: True if a completion signal ended the run early
"""
iteration = state.iteration
emit(EventType.ITERATION_STARTED, IterationStartedData(iteration=iteration))
- # Run commands and collect outputs for placeholder resolution
command_outputs: dict[str, str] = {}
if config.commands:
emit(
@@ -279,22 +312,22 @@ def _run_iteration(
),
)
- # Assemble prompt
prompt = _assemble_prompt(config, state, command_outputs)
emit(
EventType.PROMPT_ASSEMBLED,
PromptAssembledData(iteration=iteration, prompt_length=len(prompt)),
)
- # Run agent
- agent_succeeded = _run_agent_phase(prompt, config, state, emit)
+ agent_succeeded, stop_for_completion_signal = _run_agent_phase(
+ prompt, config, state, emit
+ )
if not agent_succeeded and config.stop_on_error:
state.status = RunStatus.FAILED
emit.log_error("Stopping due to --stop-on-error.")
- return False
+ return False, stop_for_completion_signal
- return True
+ return True, stop_for_completion_signal
def _delay_if_needed(config: RunConfig, state: RunState, emit: BoundEmitter) -> None:
@@ -346,14 +379,19 @@ def run_loop(
if not _handle_control_signals(state, emit):
break
- state.iteration += 1
if (
config.max_iterations is not None
- and state.iteration > config.max_iterations
+ and state.iteration >= config.max_iterations
):
break
+ state.iteration += 1
- should_continue = _run_iteration(config, state, emit)
+ should_continue, stop_for_completion_signal = _run_iteration(
+ config, state, emit
+ )
+ if stop_for_completion_signal:
+ state.status = RunStatus.COMPLETED
+ break
if not should_continue:
break
@@ -369,11 +407,10 @@ def run_loop(
if state.status == RunStatus.RUNNING:
state.status = RunStatus.COMPLETED
- reason = state.status.reason
emit(
EventType.RUN_STOPPED,
RunStoppedData(
- reason=reason,
+ reason=state.status.reason,
total=state.total,
completed=state.completed,
failed=state.failed,
diff --git a/tests/conftest.py b/tests/conftest.py
index ce9e46ab..4d999714 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -2,8 +2,17 @@
import pytest
+from ralphify.adapters import ADAPTERS
+
@pytest.fixture(autouse=True)
def _disable_streaming(monkeypatch):
- """Disable the Popen-based streaming path in all tests."""
- monkeypatch.setattr("ralphify._agent._supports_stream_json", lambda cmd: False)
+ """Force the blocking path and raw peek on every registered adapter.
+
+ Tests that explicitly need the Popen streaming path or structured
+ peek rendering re-enable the relevant flag on the specific adapter
+ they exercise.
+ """
+ for adapter in ADAPTERS:
+ monkeypatch.setattr(adapter, "supports_streaming", False)
+ monkeypatch.setattr(adapter, "renders_structured_peek", False)
diff --git a/tests/test_adapters_claude.py b/tests/test_adapters_claude.py
new file mode 100644
index 00000000..478cff7d
--- /dev/null
+++ b/tests/test_adapters_claude.py
@@ -0,0 +1,183 @@
+"""Tests for the Claude stream-json adapter."""
+
+from __future__ import annotations
+
+import json
+
+import pytest
+
+from ralphify.adapters import select_adapter
+from ralphify.adapters.claude import ClaudeAdapter
+
+
+def _assistant_event(*blocks: dict) -> str:
+ """Build one JSON line matching Claude's assistant message schema."""
+ return json.dumps(
+ {
+ "type": "assistant",
+ "message": {"content": list(blocks)},
+ }
+ )
+
+
+def _result_event(result_text: str) -> str:
+ return json.dumps({"type": "result", "result": result_text})
+
+
+def test_matches_claude_binary_stem() -> None:
+ adapter = ClaudeAdapter()
+ assert adapter.matches(["claude"]) is True
+ assert adapter.matches(["/usr/local/bin/claude"]) is True
+ assert adapter.matches(["claude", "--print"]) is True
+ assert adapter.matches(["codex"]) is False
+ assert adapter.matches([]) is False
+
+
+def test_build_command_appends_stream_flags() -> None:
+ adapter = ClaudeAdapter()
+ result = adapter.build_command(["claude"])
+ assert result == ["claude", "--output-format", "stream-json", "--verbose"]
+
+
+def test_build_command_is_idempotent() -> None:
+ adapter = ClaudeAdapter()
+ once = adapter.build_command(["claude"])
+ twice = adapter.build_command(once)
+ assert once == twice
+
+
+def test_build_command_preserves_user_flags() -> None:
+ adapter = ClaudeAdapter()
+ result = adapter.build_command(["claude", "--print", "-p"])
+ assert result[:3] == ["claude", "--print", "-p"]
+ assert "--output-format" in result
+ assert "stream-json" in result
+
+
+def test_parse_tool_use_event() -> None:
+ adapter = ClaudeAdapter()
+ line = _assistant_event({"type": "tool_use", "name": "Bash", "input": {}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "Bash"
+
+
+def test_parse_result_event() -> None:
+ adapter = ClaudeAdapter()
+ event = adapter.parse_event(_result_event("done"))
+ assert event is not None
+ assert event.kind == "result"
+
+
+def test_parse_ignores_thinking_blocks() -> None:
+ adapter = ClaudeAdapter()
+ line = _assistant_event({"type": "thinking", "thinking": "planning..."})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "message"
+
+
+def test_parse_malformed_json_returns_none() -> None:
+ adapter = ClaudeAdapter()
+ assert adapter.parse_event("not json") is None
+ assert adapter.parse_event("") is None
+ assert adapter.parse_event(" \n") is None
+
+
+def test_parse_non_dict_json_returns_none() -> None:
+ adapter = ClaudeAdapter()
+ assert adapter.parse_event("[1, 2, 3]") is None
+ assert adapter.parse_event('"just a string"') is None
+
+
+def test_parse_tool_use_with_non_string_name() -> None:
+ """Defensive: tool_use blocks with missing name must not raise."""
+ adapter = ClaudeAdapter()
+ line = _assistant_event({"type": "tool_use", "input": {}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name is None
+
+
+def test_parse_skips_first_non_tool_use_block() -> None:
+ adapter = ClaudeAdapter()
+ line = _assistant_event(
+ {"type": "text", "text": "thinking out loud"},
+ {"type": "tool_use", "name": "Edit", "input": {}},
+ )
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "Edit"
+
+
+def test_extract_completion_signal_from_result_text() -> None:
+ adapter = ClaudeAdapter()
+ result_text = "DONE"
+ assert (
+ adapter.extract_completion_signal(
+ result_text=result_text, stdout=None, user_signal="DONE"
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=result_text, stdout=None, user_signal="OTHER"
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_ignores_raw_stdout() -> None:
+ """ClaudeAdapter only inspects ``result_text``; the streaming reader
+ extracts the terminal assistant message there. Promise tags embedded
+ in raw stdout (e.g. ``status`` or ``assistant`` JSON) must not trigger
+ completion."""
+ adapter = ClaudeAdapter()
+ stdout = "raw text MARKER trailing"
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=stdout, user_signal="MARKER"
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_handles_missing_result_text() -> None:
+ adapter = ClaudeAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=None, user_signal="DONE"
+ )
+ is False
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text="", stdout=None, user_signal="DONE"
+ )
+ is False
+ )
+
+
+def test_install_wind_down_hook_raises_not_implemented(tmp_path) -> None:
+ adapter = ClaudeAdapter()
+ with pytest.raises(NotImplementedError):
+ adapter.install_wind_down_hook(tmp_path, tmp_path / "counter", 10, 2)
+
+
+def test_capability_flags() -> None:
+ adapter = ClaudeAdapter()
+ assert adapter.name == "claude"
+ assert adapter.counts_what == "tool_use"
+ assert adapter.supports_streaming is True
+ assert adapter.renders_structured_peek is True
+ assert adapter.supports_soft_wind_down is True
+ assert adapter.requires_full_stdout_for_completion is False
+
+
+def test_registered_in_adapters_registry() -> None:
+ """Import side-effect registration should hand back the Claude adapter."""
+ selected = select_adapter(["claude"])
+ assert isinstance(selected, ClaudeAdapter)
diff --git a/tests/test_adapters_codex.py b/tests/test_adapters_codex.py
new file mode 100644
index 00000000..f6fcd485
--- /dev/null
+++ b/tests/test_adapters_codex.py
@@ -0,0 +1,159 @@
+"""Tests for the Codex CLI adapter."""
+
+from __future__ import annotations
+
+import json
+
+import pytest
+
+from ralphify.adapters import select_adapter
+from ralphify.adapters.codex import CodexAdapter
+
+
+def test_matches_codex_binary_stem() -> None:
+ adapter = CodexAdapter()
+ assert adapter.matches(["codex"]) is True
+ assert adapter.matches(["/usr/local/bin/codex"]) is True
+ assert adapter.matches(["codex", "exec", "--sandbox"]) is True
+ assert adapter.matches(["claude"]) is False
+ assert adapter.matches([]) is False
+
+
+def test_build_command_appends_json_flag() -> None:
+ adapter = CodexAdapter()
+ assert adapter.build_command(["codex"]) == ["codex", "--json"]
+
+
+def test_build_command_is_idempotent() -> None:
+ adapter = CodexAdapter()
+ once = adapter.build_command(["codex"])
+ twice = adapter.build_command(once)
+ assert once == twice
+
+
+def test_parse_tool_call_events() -> None:
+ adapter = CodexAdapter()
+ for event_type, expected_name in [
+ ("CollabToolCall", "Edit"),
+ ("McpToolCall", "Edit"),
+ ("CommandExecution", "Edit"),
+ ]:
+ line = json.dumps({"type": event_type, "name": "Edit"})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == expected_name
+
+
+def test_parse_turn_events() -> None:
+ adapter = CodexAdapter()
+ for event_type in ("TurnStarted", "TurnCompleted"):
+ # TurnCompleted is a result *and* turn event; result wins.
+ line = json.dumps({"type": event_type})
+ event = adapter.parse_event(line)
+ assert event is not None
+ if event_type == "TurnCompleted":
+ assert event.kind == "result"
+ else:
+ assert event.kind == "turn"
+
+
+def test_parse_unknown_events_become_message() -> None:
+ adapter = CodexAdapter()
+ event = adapter.parse_event(json.dumps({"type": "SomethingNew"}))
+ assert event is not None
+ assert event.kind == "message"
+
+
+def test_parse_malformed_returns_none() -> None:
+ adapter = CodexAdapter()
+ assert adapter.parse_event("not json") is None
+ assert adapter.parse_event("") is None
+ assert adapter.parse_event("42") is None
+
+
+def test_parse_tool_call_nested_under_msg() -> None:
+ """Some Codex builds wrap event data under a ``msg`` key."""
+ adapter = CodexAdapter()
+ line = json.dumps({"msg": {"type": "CommandExecution", "command": "git status"}})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "git status"
+
+
+def test_parse_falls_back_to_event_type_for_name() -> None:
+ adapter = CodexAdapter()
+ line = json.dumps({"type": "CollabToolCall"})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.name == "CollabToolCall"
+
+
+def test_extract_completion_signal_from_stream() -> None:
+ adapter = CodexAdapter()
+ stream = "\n".join(
+ [
+ json.dumps({"type": "TurnStarted"}),
+ json.dumps({"type": "CommandExecution", "command": "ls"}),
+ json.dumps({"type": "TaskComplete", "text": "DONE"}),
+ ]
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=stream, user_signal="DONE"
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=stream, user_signal="OTHER"
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_scans_plain_output() -> None:
+ adapter = CodexAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None,
+ stdout="some HI text",
+ user_signal="HI",
+ )
+ is True
+ )
+
+
+def test_extract_completion_signal_returns_false_when_stdout_missing() -> None:
+ """When the engine elects not to capture stdout, Codex cannot detect
+ completion — the streaming reader does not populate ``result_text``
+ for Codex's ``TaskComplete`` event shape."""
+ adapter = CodexAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=None, user_signal="DONE"
+ )
+ is False
+ )
+
+
+def test_install_wind_down_hook_raises_not_implemented(tmp_path) -> None:
+ adapter = CodexAdapter()
+ with pytest.raises(NotImplementedError):
+ adapter.install_wind_down_hook(tmp_path, tmp_path / "counter", 10, 2)
+
+
+def test_capability_flags() -> None:
+ adapter = CodexAdapter()
+ assert adapter.name == "codex"
+ assert adapter.counts_what == "tool_use"
+ assert adapter.supports_streaming is True
+ assert adapter.renders_structured_peek is False
+ assert adapter.supports_soft_wind_down is True
+ assert adapter.requires_full_stdout_for_completion is True
+
+
+def test_registered_in_adapters_registry() -> None:
+ selected = select_adapter(["codex"])
+ assert isinstance(selected, CodexAdapter)
diff --git a/tests/test_adapters_copilot.py b/tests/test_adapters_copilot.py
new file mode 100644
index 00000000..50819481
--- /dev/null
+++ b/tests/test_adapters_copilot.py
@@ -0,0 +1,129 @@
+"""Tests for the Copilot CLI adapter (alpha)."""
+
+from __future__ import annotations
+
+import json
+
+import pytest
+
+from ralphify.adapters import select_adapter
+from ralphify.adapters.copilot import CopilotAdapter
+
+
+def test_matches_copilot_binary_stem() -> None:
+ adapter = CopilotAdapter()
+ assert adapter.matches(["copilot"]) is True
+ assert adapter.matches(["/opt/copilot/bin/copilot"]) is True
+ # Deliberately does NOT match the gh subcommand
+ assert adapter.matches(["gh"]) is False
+ assert adapter.matches([]) is False
+
+
+def test_build_command_appends_json_flags() -> None:
+ adapter = CopilotAdapter()
+ assert adapter.build_command(["copilot"]) == [
+ "copilot",
+ "--output-format",
+ "json",
+ ]
+
+
+def test_build_command_is_idempotent() -> None:
+ adapter = CopilotAdapter()
+ once = adapter.build_command(["copilot"])
+ twice = adapter.build_command(once)
+ assert once == twice
+
+
+def test_parse_tool_use_variants() -> None:
+ adapter = CopilotAdapter()
+ for event_type in ("tool_use", "tool_call", "ToolCall", "ToolUse"):
+ line = json.dumps({"type": event_type, "name": "Edit"})
+ event = adapter.parse_event(line)
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "Edit"
+
+
+def test_parse_result_variants() -> None:
+ adapter = CopilotAdapter()
+ for event_type in ("result", "response", "final", "Complete"):
+ event = adapter.parse_event(json.dumps({"type": event_type}))
+ assert event is not None
+ assert event.kind == "result"
+
+
+def test_parse_unknown_returns_none() -> None:
+ """Unknown event types must NOT count against the turn cap."""
+ adapter = CopilotAdapter()
+ assert adapter.parse_event(json.dumps({"type": "SomethingElse"})) is None
+ assert adapter.parse_event(json.dumps({"type": "progress"})) is None
+
+
+def test_parse_missing_type_returns_none() -> None:
+ adapter = CopilotAdapter()
+ assert adapter.parse_event(json.dumps({"name": "Edit"})) is None
+
+
+def test_parse_malformed_returns_none() -> None:
+ adapter = CopilotAdapter()
+ assert adapter.parse_event("not json") is None
+ assert adapter.parse_event("") is None
+
+
+def test_parse_event_with_alternate_key_names() -> None:
+ """Covers ``event`` / ``kind`` alternative type keys."""
+ adapter = CopilotAdapter()
+ event = adapter.parse_event(json.dumps({"event": "tool_use", "name": "Bash"}))
+ assert event is not None
+ assert event.kind == "tool_use"
+ assert event.name == "Bash"
+
+
+def test_extract_completion_signal_scans_stdout() -> None:
+ adapter = CopilotAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None,
+ stdout="chat chatter MARKER more text",
+ user_signal="MARKER",
+ )
+ is True
+ )
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout="no marker here", user_signal="MARKER"
+ )
+ is False
+ )
+
+
+def test_extract_completion_signal_returns_false_when_stdout_missing() -> None:
+ adapter = CopilotAdapter()
+ assert (
+ adapter.extract_completion_signal(
+ result_text=None, stdout=None, user_signal="MARKER"
+ )
+ is False
+ )
+
+
+def test_install_wind_down_hook_raises_not_implemented(tmp_path) -> None:
+ adapter = CopilotAdapter()
+ with pytest.raises(NotImplementedError, match="no hook system"):
+ adapter.install_wind_down_hook(tmp_path, tmp_path / "counter", 10, 2)
+
+
+def test_capability_flags() -> None:
+ adapter = CopilotAdapter()
+ assert adapter.name == "copilot"
+ assert adapter.counts_what == "tool_use"
+ assert adapter.supports_streaming is False
+ assert adapter.renders_structured_peek is False
+ assert adapter.supports_soft_wind_down is False
+ assert adapter.requires_full_stdout_for_completion is True
+
+
+def test_registered_in_adapters_registry() -> None:
+ selected = select_adapter(["copilot"])
+ assert isinstance(selected, CopilotAdapter)
diff --git a/tests/test_adapters_registry.py b/tests/test_adapters_registry.py
new file mode 100644
index 00000000..8072e96c
--- /dev/null
+++ b/tests/test_adapters_registry.py
@@ -0,0 +1,43 @@
+"""Tests for the adapter registry and first-match dispatch."""
+
+from __future__ import annotations
+
+from ralphify.adapters import ADAPTERS, CLIAdapter, select_adapter
+from ralphify.adapters._generic import GenericAdapter
+from ralphify.adapters.claude import ClaudeAdapter
+from ralphify.adapters.codex import CodexAdapter
+from ralphify.adapters.copilot import CopilotAdapter
+
+
+def test_registry_contains_builtin_adapters() -> None:
+ types = {type(a) for a in ADAPTERS}
+ assert ClaudeAdapter in types
+ assert CodexAdapter in types
+ assert CopilotAdapter in types
+
+
+def test_select_adapter_dispatches_by_binary_stem() -> None:
+ assert isinstance(select_adapter(["claude"]), ClaudeAdapter)
+ assert isinstance(select_adapter(["codex", "exec"]), CodexAdapter)
+ assert isinstance(select_adapter(["copilot"]), CopilotAdapter)
+
+
+def test_select_adapter_falls_back_to_generic() -> None:
+ selected = select_adapter(["aider", "--model", "claude-4"])
+ assert isinstance(selected, GenericAdapter)
+
+
+def test_select_adapter_handles_empty_cmd() -> None:
+ assert isinstance(select_adapter([]), GenericAdapter)
+
+
+def test_generic_adapter_parse_never_raises() -> None:
+ generic = GenericAdapter()
+ assert generic.parse_event("garbage") is None
+ assert generic.parse_event("") is None
+
+
+def test_all_adapters_satisfy_protocol() -> None:
+ """Runtime Protocol check catches shape regressions in any adapter."""
+ for adapter in ADAPTERS:
+ assert isinstance(adapter, CLIAdapter)
diff --git a/tests/test_agent.py b/tests/test_agent.py
index f3358448..d6b88c4e 100644
--- a/tests/test_agent.py
+++ b/tests/test_agent.py
@@ -19,35 +19,11 @@
_read_agent_stream,
_run_agent_blocking,
_run_agent_streaming,
- _supports_stream_json,
_write_log,
execute_agent,
)
-
-
-class TestSupportsStreamJson:
- def test_claude_binary(self):
- assert _supports_stream_json(["claude", "-p"]) is True
-
- def test_claude_absolute_path(self):
- assert _supports_stream_json(["/usr/local/bin/claude", "-p"]) is True
-
- def test_non_claude_binary(self):
- assert _supports_stream_json(["aider", "--yes"]) is False
-
- def test_empty_command(self):
- assert _supports_stream_json([]) is False
-
- def test_claude_like_name(self):
- assert _supports_stream_json(["claude-code"]) is False
-
- def test_claude_with_cmd_extension(self):
- """On Windows, npm installs claude as claude.cmd — streaming must
- still be detected."""
- assert _supports_stream_json(["claude.cmd", "-p"]) is True
-
- def test_claude_with_exe_extension(self):
- assert _supports_stream_json(["claude.exe", "-p"]) is True
+from ralphify.adapters import select_adapter
+from ralphify.adapters.claude import ClaudeAdapter
class TestWriteLog:
@@ -365,6 +341,32 @@ def test_timeout_captures_partial_output(self, mock_popen, tmp_path):
assert result.captured_stdout == "partial stdout"
assert result.captured_stderr == "partial stderr"
+ @patch(MOCK_SUBPROCESS)
+ def test_capture_result_text_does_not_buffer_blocking_output_without_log_dir(
+ self, mock_popen
+ ):
+ mock_popen.return_value = ok_proc(
+ stdout_text="done\n",
+ stderr_text="some stderr\n",
+ )
+ result = _run_agent_blocking(
+ ["echo"],
+ "prompt",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ capture_result_text=True,
+ )
+
+ assert result.captured_stdout is None
+ assert result.captured_stderr is None
+ assert result.result_text is None
+
+ call_kwargs = mock_popen.call_args[1]
+ assert call_kwargs.get("stdin") == subprocess.PIPE
+ assert call_kwargs.get("stdout") == subprocess.PIPE
+ assert call_kwargs.get("stderr") is None
+
@patch(MOCK_SUBPROCESS, side_effect=ok_proc)
def test_no_log_when_dir_not_set(self, mock_popen):
result = execute_agent(
@@ -400,19 +402,22 @@ def test_timed_out(self):
assert result.timed_out is True
assert result.returncode is None
- def test_success_when_zero_exit(self):
- result = AgentResult(returncode=0, elapsed=1.0, log_file=None)
- assert result.success is True
-
- def test_not_success_when_nonzero_exit(self):
- result = AgentResult(returncode=1, elapsed=1.0, log_file=None)
- assert result.success is False
-
- def test_not_success_when_timed_out(self):
- result = AgentResult(
- returncode=None, elapsed=5.0, log_file=None, timed_out=True
- )
- assert result.success is False
+ @pytest.mark.parametrize(
+ ("result", "expected"),
+ [
+ (AgentResult(returncode=0, elapsed=1.0, log_file=None), True),
+ (AgentResult(returncode=1, elapsed=1.0, log_file=None), False),
+ (
+ AgentResult(
+ returncode=None, elapsed=5.0, log_file=None, timed_out=True
+ ),
+ False,
+ ),
+ ],
+ ids=["zero-exit", "nonzero-exit", "timed-out"],
+ )
+ def test_success(self, result, expected):
+ assert result.success is expected
class TestExecuteAgentDispatch:
@@ -420,8 +425,11 @@ class TestExecuteAgentDispatch:
@patch(MOCK_SUBPROCESS)
def test_dispatches_to_streaming_for_claude(self, mock_popen, monkeypatch):
- """execute_agent uses the streaming path when the agent supports it."""
- monkeypatch.setattr("ralphify._agent._supports_stream_json", lambda cmd: True)
+ """execute_agent uses the streaming path when the adapter renders
+ structured output."""
+ claude_adapter = select_adapter(["claude"])
+ assert isinstance(claude_adapter, ClaudeAdapter)
+ monkeypatch.setattr(claude_adapter, "supports_streaming", True)
mock_popen.return_value = make_mock_popen(
stdout_lines='{"type": "result", "result": "done"}\n',
returncode=0,
@@ -438,10 +446,148 @@ def test_dispatches_to_streaming_for_claude(self, mock_popen, monkeypatch):
assert result.result_text == "done"
mock_popen.assert_called_once()
+ def test_execute_agent_passes_capture_result_text_to_streaming_helper(
+ self, monkeypatch
+ ):
+ on_activity = MagicMock()
+ on_output_line = MagicMock()
+ fake_streaming = MagicMock(return_value=AgentResult(returncode=0, elapsed=0.01))
+
+ claude_adapter = select_adapter(["claude"])
+ assert isinstance(claude_adapter, ClaudeAdapter)
+ monkeypatch.setattr(claude_adapter, "supports_streaming", True)
+ monkeypatch.setattr("ralphify._agent._run_agent_streaming", fake_streaming)
+
+ execute_agent(
+ ["claude", "-p"],
+ "prompt",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ on_activity=on_activity,
+ on_output_line=on_output_line,
+ capture_result_text=True,
+ )
+
+ fake_streaming.assert_called_once_with(
+ claude_adapter.build_command(["claude", "-p"]),
+ "prompt",
+ None,
+ None,
+ 1,
+ on_activity=on_activity,
+ on_output_line=on_output_line,
+ capture_result_text=True,
+ capture_stdout=False,
+ )
+
+ def test_execute_agent_passes_capture_result_text_to_blocking_helper(
+ self, monkeypatch
+ ):
+ on_output_line = MagicMock()
+ fake_blocking = MagicMock(return_value=AgentResult(returncode=0, elapsed=0.01))
+
+ # The autouse conftest fixture already forces every adapter's
+ # supports_streaming flag to False, so ``echo`` falls into the
+ # blocking path through the GenericAdapter fallback.
+ monkeypatch.setattr("ralphify._agent._run_agent_blocking", fake_blocking)
+
+ execute_agent(
+ ["echo"],
+ "prompt",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ on_output_line=on_output_line,
+ capture_result_text=True,
+ )
+
+ fake_blocking.assert_called_once_with(
+ ["echo"],
+ "prompt",
+ None,
+ None,
+ 1,
+ on_output_line=on_output_line,
+ capture_result_text=True,
+ capture_stdout=False,
+ )
+
class TestExecuteAgentStreaming:
"""Tests for the streaming execution path (_run_agent_streaming)."""
+ @patch(MOCK_SUBPROCESS)
+ def test_streaming_result_event_populates_result_text(self, mock_popen):
+ mock_popen.return_value = make_mock_popen(
+ stdout_lines='{"type": "result", "result": "early done"}\n',
+ returncode=0,
+ )
+ result = _run_agent_streaming(
+ ["claude", "-p"],
+ "prompt",
+ timeout=10,
+ log_dir=None,
+ iteration=1,
+ )
+ assert result.result_text == "early done"
+ assert result.returncode == 0
+ assert result.timed_out is False
+
+ @patch(MOCK_SUBPROCESS)
+ def test_blocking_result_event_populates_result_text_when_captured(
+ self, mock_popen, tmp_path
+ ):
+ mock_popen.return_value = make_mock_popen(
+ stdout_lines='{"type": "result", "result": "early done"}\n',
+ returncode=0,
+ )
+ result = _run_agent_blocking(
+ ["claude", "-p"],
+ "prompt",
+ timeout=10,
+ log_dir=tmp_path,
+ iteration=1,
+ )
+
+ assert result.result_text == "early done"
+ assert result.returncode == 0
+ assert result.timed_out is False
+
+ @patch(MOCK_SUBPROCESS)
+ def test_result_text_absent_when_no_result_event(self, mock_popen):
+ mock_popen.return_value = make_mock_popen(
+ stdout_lines="status: working\n",
+ returncode=0,
+ )
+ result = _run_agent_streaming(
+ ["claude", "-p"],
+ "prompt",
+ timeout=10,
+ log_dir=None,
+ iteration=1,
+ )
+ assert result.result_text is None
+ assert result.returncode == 0
+ assert result.timed_out is False
+
+ @patch(MOCK_SUBPROCESS)
+ def test_last_result_event_wins(self, mock_popen):
+ mock_popen.return_value = make_mock_popen(
+ stdout_lines='{"type": "result", "result": "first"}\n{"type": "result", "result": "second"}\n',
+ returncode=0,
+ )
+ result = _run_agent_streaming(
+ ["claude", "-p"],
+ "prompt",
+ timeout=10,
+ log_dir=None,
+ iteration=1,
+ )
+ assert result.result_text == "second"
+ assert result.returncode == 0
+ assert result.timed_out is False
+
@patch(MOCK_SUBPROCESS)
def test_success(self, mock_popen):
mock_popen.return_value = make_mock_popen(
@@ -507,10 +653,12 @@ def test_sends_prompt_to_stdin(self, mock_popen):
proc.stdin.close.assert_called_once()
@patch(MOCK_SUBPROCESS)
- def test_adds_stream_json_flags(self, mock_popen):
+ def test_passes_cmd_verbatim_to_popen(self, mock_popen):
+ """_run_agent_streaming no longer appends its own flags — the caller
+ (via ``adapter.build_command``) owns CLI flags now."""
mock_popen.return_value = make_mock_popen(returncode=0)
_run_agent_streaming(
- ["claude", "-p"],
+ ["claude", "-p", "--output-format", "stream-json", "--verbose"],
"prompt",
timeout=None,
log_dir=None,
@@ -519,9 +667,13 @@ def test_adds_stream_json_flags(self, mock_popen):
call_args = mock_popen.call_args
cmd = call_args[0][0]
- assert "--output-format" in cmd
- assert "stream-json" in cmd
- assert "--verbose" in cmd
+ assert cmd == [
+ "claude",
+ "-p",
+ "--output-format",
+ "stream-json",
+ "--verbose",
+ ]
@patch(MOCK_SUBPROCESS)
def test_writes_log_on_success(self, mock_popen, tmp_path):
@@ -566,6 +718,35 @@ def test_captured_output_set_when_logging(self, mock_popen, tmp_path):
assert result.captured_stdout == "agent output\n"
assert result.captured_stderr == "some stderr\n"
+ @patch(MOCK_SUBPROCESS)
+ def test_capture_result_text_does_not_buffer_stream_output_without_log_dir(
+ self, mock_popen
+ ):
+ mock_popen.return_value = make_mock_popen(
+ stdout_lines="done\n",
+ stderr_text="some stderr\n",
+ returncode=0,
+ )
+ result = _run_agent_streaming(
+ ["claude", "-p"],
+ "prompt",
+ timeout=None,
+ log_dir=None,
+ iteration=1,
+ capture_result_text=True,
+ )
+
+ assert result.captured_stdout is None
+ assert result.captured_stderr is None
+ assert result.result_text is None
+
+ call_args = mock_popen.call_args
+ assert call_args.args[0] == ["claude", "-p"]
+ call_kwargs = call_args[1]
+ assert call_kwargs.get("stdin") == subprocess.PIPE
+ assert call_kwargs.get("stdout") == subprocess.PIPE
+ assert call_kwargs.get("stderr") is None
+
@patch(MOCK_SUBPROCESS)
def test_no_log_when_dir_not_set(self, mock_popen):
mock_popen.return_value = make_mock_popen(returncode=0)
diff --git a/tests/test_cli.py b/tests/test_cli.py
index 79e224ea..650026f1 100644
--- a/tests/test_cli.py
+++ b/tests/test_cli.py
@@ -1,6 +1,7 @@
"""Tests for the CLI."""
import importlib
+import re
import signal
from unittest.mock import patch, MagicMock
@@ -24,6 +25,31 @@
runner = CliRunner()
+def _flatten_help(output: str) -> str:
+ no_ansi = re.sub(r"\x1b\[[0-9;]*[a-zA-Z]", "", output)
+ return re.sub(r"\s+", " ", no_ansi)
+
+
+class TestHelp:
+ def test_scaffold_help_mentions_promise_completion(self):
+ result = runner.invoke(app, ["scaffold", "--help"])
+
+ assert result.exit_code == 0
+ flat = _flatten_help(result.output)
+ assert "completion_signal" in flat
+ assert "stop_on_completion_signal" in flat
+ assert "..." in flat
+
+ def test_run_help_mentions_promise_completion(self):
+ result = runner.invoke(app, ["run", "--help"])
+
+ assert result.exit_code == 0
+ flat = _flatten_help(result.output)
+ assert "completion_signal" in flat
+ assert "stop_on_completion_signal" in flat
+ assert "..." in flat
+
+
class TestVersion:
@pytest.mark.parametrize("flag", ["--version", "-V"])
def test_version_flag(self, flag):
@@ -95,6 +121,103 @@ def test_errors_with_malformed_agent_field(self, mock_which, tmp_path, monkeypat
assert result.exit_code == 1
assert "malformed" in result.output.lower()
+ def test_run_uses_default_completion_signal_config(
+ self, mock_which, tmp_path, monkeypatch
+ ):
+ monkeypatch.chdir(tmp_path)
+ ralph_dir = make_ralph(tmp_path)
+ with patch("ralphify.cli.run_loop") as mock_run_loop:
+ result = runner.invoke(app, ["run", str(ralph_dir), "-n", "3"])
+
+ assert result.exit_code == 0
+ mock_run_loop.assert_called_once()
+ config = mock_run_loop.call_args.args[0]
+ assert config.completion_signal == "RALPH_PROMISE_COMPLETE"
+ assert config.stop_on_completion_signal is False
+ assert config.max_iterations == 3
+
+ def test_run_passes_completion_signal_frontmatter_to_config(
+ self, mock_which, tmp_path, monkeypatch
+ ):
+ monkeypatch.chdir(tmp_path)
+ ralph_dir = tmp_path / "my-ralph"
+ ralph_dir.mkdir()
+ (ralph_dir / RALPH_MARKER).write_text(
+ "---\n"
+ "agent: claude -p --dangerously-skip-permissions\n"
+ "completion_signal: CUSTOM_DONE\n"
+ "stop_on_completion_signal: true\n"
+ "---\n"
+ "go"
+ )
+ with patch("ralphify.cli.run_loop") as mock_run_loop:
+ result = runner.invoke(app, ["run", str(ralph_dir), "-n", "2"])
+
+ assert result.exit_code == 0
+ mock_run_loop.assert_called_once()
+ config = mock_run_loop.call_args.args[0]
+ assert config.completion_signal == "CUSTOM_DONE"
+ assert config.stop_on_completion_signal is True
+
+ @pytest.mark.parametrize(
+ ("frontmatter_line", "expected_error"),
+ [
+ ("completion_signal: 0", "must be a non-empty string"),
+ (
+ 'completion_signal: " CUSTOM_DONE "',
+ "must not include leading or trailing whitespace",
+ ),
+ (
+ 'completion_signal: "CUSTOM_DONE"',
+ "must be the text inside ...",
+ ),
+ ],
+ ids=["wrong-type", "surrounding-whitespace", "markup-instead-of-text"],
+ )
+ def test_run_rejects_invalid_completion_signal_frontmatter(
+ self, mock_which, tmp_path, monkeypatch, frontmatter_line, expected_error
+ ):
+ monkeypatch.chdir(tmp_path)
+ ralph_dir = tmp_path / "my-ralph"
+ ralph_dir.mkdir()
+ (ralph_dir / RALPH_MARKER).write_text(
+ "---\n"
+ "agent: claude -p --dangerously-skip-permissions\n"
+ f"{frontmatter_line}\n"
+ "---\n"
+ "go"
+ )
+
+ with patch("ralphify.cli.run_loop") as mock_run_loop:
+ result = runner.invoke(app, ["run", str(ralph_dir), "-n", "1"])
+
+ assert result.exit_code == 1
+ assert "completion_signal" in result.output.lower()
+ assert expected_error in result.output.lower()
+ mock_run_loop.assert_not_called()
+
+ def test_run_rejects_non_boolean_stop_on_completion_signal(
+ self, mock_which, tmp_path, monkeypatch
+ ):
+ monkeypatch.chdir(tmp_path)
+ ralph_dir = tmp_path / "my-ralph"
+ ralph_dir.mkdir()
+ (ralph_dir / RALPH_MARKER).write_text(
+ "---\n"
+ "agent: claude -p --dangerously-skip-permissions\n"
+ 'stop_on_completion_signal: "maybe"\n'
+ "---\n"
+ "go"
+ )
+
+ with patch("ralphify.cli.run_loop") as mock_run_loop:
+ result = runner.invoke(app, ["run", str(ralph_dir), "-n", "1"])
+
+ assert result.exit_code == 1
+ assert "stop_on_completion_signal" in result.output.lower()
+ assert "must be true or false" in result.output.lower()
+ mock_run_loop.assert_not_called()
+
@pytest.mark.parametrize(
"frontmatter, expected_error",
[
@@ -517,6 +640,16 @@ def test_creates_ralph_with_name(self, tmp_path, monkeypatch):
assert ralph_file.exists()
assert "Created" in result.output
+ def test_scaffold_output_mentions_promise_completion(self, tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+
+ result = runner.invoke(app, ["scaffold", "my-task"])
+
+ assert result.exit_code == 0
+ assert "completion_signal" in result.output
+ assert "stop_on_completion_signal" in result.output
+ assert "COMPLETE" in result.output
+
def test_creates_ralph_in_cwd(self, tmp_path, monkeypatch):
monkeypatch.chdir(tmp_path)
result = runner.invoke(app, ["scaffold"])
@@ -556,6 +689,16 @@ def test_template_has_valid_frontmatter(self, tmp_path, monkeypatch):
assert "{{ commands.git-log }}" in body
assert "{{ args.focus }}" in body
+ def test_template_mentions_promise_completion_path(self, tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+
+ runner.invoke(app, ["scaffold", "my-task"])
+ content = (tmp_path / "my-task" / RALPH_MARKER).read_text()
+
+ assert "# completion_signal: COMPLETE" in content
+ assert "# stop_on_completion_signal: true" in content
+ assert "COMPLETE" in content
+
class TestParseUserArgs:
def test_named_flag(self):
diff --git a/tests/test_cli_frontmatter_fields.py b/tests/test_cli_frontmatter_fields.py
new file mode 100644
index 00000000..91e8071e
--- /dev/null
+++ b/tests/test_cli_frontmatter_fields.py
@@ -0,0 +1,152 @@
+"""Tests for the new max_turns / max_turns_grace frontmatter fields."""
+
+from __future__ import annotations
+
+from pathlib import Path
+from unittest.mock import patch
+
+import pytest
+import typer
+
+from helpers import MOCK_WHICH
+from ralphify.cli import (
+ _build_run_config,
+ _validate_max_turns,
+ _validate_max_turns_grace,
+)
+
+
+class TestValidateMaxTurns:
+ def test_absent_returns_none(self) -> None:
+ assert _validate_max_turns(None) is None
+
+ def test_positive_int_passes(self) -> None:
+ assert _validate_max_turns(5) == 5
+
+ def test_zero_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns(0)
+
+ def test_negative_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns(-1)
+
+ def test_bool_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns(True)
+
+ def test_string_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns("5")
+
+ def test_float_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns(5.0)
+
+
+class TestValidateMaxTurnsGrace:
+ def test_absent_defaults_to_two(self) -> None:
+ assert _validate_max_turns_grace(None, max_turns=None) == 2
+
+ def test_absent_with_cap_defaults_to_two(self) -> None:
+ assert _validate_max_turns_grace(None, max_turns=10) == 2
+
+ def test_zero_allowed(self) -> None:
+ assert _validate_max_turns_grace(0, max_turns=10) == 0
+
+ def test_negative_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns_grace(-1, max_turns=10)
+
+ def test_bool_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns_grace(True, max_turns=10)
+
+ def test_float_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns_grace(1.5, max_turns=10)
+
+ def test_grace_equal_to_cap_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns_grace(10, max_turns=10)
+
+ def test_grace_above_cap_rejected(self) -> None:
+ with pytest.raises(typer.Exit):
+ _validate_max_turns_grace(11, max_turns=10)
+
+ def test_grace_below_cap_allowed(self) -> None:
+ assert _validate_max_turns_grace(3, max_turns=10) == 3
+
+ def test_grace_without_cap_any_non_negative_allowed(self) -> None:
+ # No cap means no upper bound on grace; the value is still retained.
+ assert _validate_max_turns_grace(99, max_turns=None) == 99
+
+
+@patch(MOCK_WHICH, return_value="/usr/bin/claude")
+class TestBuildRunConfigTurnCapFields:
+ def _write_ralph(self, tmp_path: Path, body: str) -> Path:
+ ralph = tmp_path / "RALPH.md"
+ ralph.write_text(body, encoding="utf-8")
+ return ralph
+
+ def test_defaults_when_absent(self, _mock_which, tmp_path: Path) -> None:
+ self._write_ralph(
+ tmp_path,
+ "---\nagent: claude -p\n---\nhello\n",
+ )
+ config = _build_run_config(
+ ralph_path=str(tmp_path),
+ max_iterations=1,
+ stop_on_error=False,
+ delay=0,
+ log_dir=None,
+ timeout=None,
+ )
+ assert config.max_turns is None
+ assert config.max_turns_grace == 2
+
+ def test_values_threaded_through(self, _mock_which, tmp_path: Path) -> None:
+ self._write_ralph(
+ tmp_path,
+ "---\nagent: claude -p\nmax_turns: 20\nmax_turns_grace: 5\n---\nhello\n",
+ )
+ config = _build_run_config(
+ ralph_path=str(tmp_path),
+ max_iterations=1,
+ stop_on_error=False,
+ delay=0,
+ log_dir=None,
+ timeout=None,
+ )
+ assert config.max_turns == 20
+ assert config.max_turns_grace == 5
+
+ def test_invalid_max_turns_exits(self, _mock_which, tmp_path: Path) -> None:
+ self._write_ralph(
+ tmp_path,
+ "---\nagent: claude -p\nmax_turns: 0\n---\nhello\n",
+ )
+ with pytest.raises(typer.Exit):
+ _build_run_config(
+ ralph_path=str(tmp_path),
+ max_iterations=1,
+ stop_on_error=False,
+ delay=0,
+ log_dir=None,
+ timeout=None,
+ )
+
+ def test_grace_at_or_above_cap_exits(self, _mock_which, tmp_path: Path) -> None:
+ self._write_ralph(
+ tmp_path,
+ "---\nagent: claude -p\nmax_turns: 5\nmax_turns_grace: 5\n---\nhello\n",
+ )
+ with pytest.raises(typer.Exit):
+ _build_run_config(
+ ralph_path=str(tmp_path),
+ max_iterations=1,
+ stop_on_error=False,
+ delay=0,
+ log_dir=None,
+ timeout=None,
+ )
diff --git a/tests/test_console_emitter.py b/tests/test_console_emitter.py
index b6476d46..730b4edd 100644
--- a/tests/test_console_emitter.py
+++ b/tests/test_console_emitter.py
@@ -15,9 +15,9 @@
_IterationPanel,
_IterationSpinner,
_SinglePanelNavigator,
+ _agent_renders_structured_peek,
_format_run_info,
_format_summary,
- _is_claude_command,
_scrollbar_metrics,
_shorten_path,
)
@@ -332,7 +332,11 @@ def test_startup_hint_shown_when_peek_on_by_default(self):
output = console.export_text()
assert "press p to hide" in output
- def test_startup_hint_structured_for_claude(self):
+ def test_startup_hint_structured_for_claude(self, monkeypatch):
+ from ralphify.adapters import select_adapter
+
+ claude_adapter = select_adapter(["claude"])
+ monkeypatch.setattr(claude_adapter, "renders_structured_peek", True)
emitter, console = _capture_emitter()
emitter._peek_enabled = True
emitter.emit(
@@ -1499,24 +1503,33 @@ def test_peek_message_shown_in_spinner(self):
assert "live output on" in output
-class TestIsClaudeCommand:
- def test_claude_binary(self):
- assert _is_claude_command("claude") is True
+class TestAgentRendersStructuredPeek:
+ """Verify the emitter routes through :func:`select_adapter`."""
- def test_claude_with_flags(self):
- assert _is_claude_command("claude --dangerously-skip-permissions") is True
+ def test_claude_adapter_reports_structured_peek(self, monkeypatch):
+ from ralphify.adapters import select_adapter
+ from ralphify.adapters.claude import ClaudeAdapter
- def test_claude_full_path(self):
- assert _is_claude_command("/usr/local/bin/claude -p") is True
+ adapter = select_adapter(["claude"])
+ assert isinstance(adapter, ClaudeAdapter)
+ monkeypatch.setattr(adapter, "renders_structured_peek", True)
+ assert _agent_renders_structured_peek("claude") is True
+ assert (
+ _agent_renders_structured_peek("claude --dangerously-skip-permissions")
+ is True
+ )
+ assert _agent_renders_structured_peek("/usr/local/bin/claude -p") is True
- def test_not_claude(self):
- assert _is_claude_command("aider --yes") is False
+ def test_unknown_agent_falls_back_to_generic(self):
+ # conftest forces every adapter's renders_structured_peek to False;
+ # the GenericAdapter fallback also reports False.
+ assert _agent_renders_structured_peek("aider --yes") is False
- def test_empty(self):
- assert _is_claude_command("") is False
+ def test_empty_string_is_not_structured(self):
+ assert _agent_renders_structured_peek("") is False
- def test_invalid_shlex(self):
- assert _is_claude_command("claude 'unterminated") is False
+ def test_invalid_shlex_is_not_structured(self):
+ assert _agent_renders_structured_peek("claude 'unterminated") is False
def _populate_buffer(spinner, count: int, prefix: str = "line") -> None:
@@ -1858,7 +1871,7 @@ def test_history_eviction_protects_viewed_iteration(self):
) if i > 2 else None
emitter.emit(_make_event(EventType.ITERATION_STARTED, iteration=i))
assert 1 in emitter._iteration_history
- assert len(emitter._iteration_order) <= _MAX_HISTORY_ITERATIONS
+ assert len(emitter._iteration_history) <= _MAX_HISTORY_ITERATIONS
finally:
emitter._stop_live()
diff --git a/tests/test_engine.py b/tests/test_engine.py
index b05f8d64..a1d0fdb9 100644
--- a/tests/test_engine.py
+++ b/tests/test_engine.py
@@ -20,6 +20,7 @@
)
from rich.console import Console
+from ralphify._agent import AgentResult
from ralphify._console_emitter import ConsoleEmitter
from ralphify._events import BoundEmitter, EventType, NullEmitter, QueueEmitter
from ralphify._run_types import Command, RunStatus
@@ -126,6 +127,320 @@ def test_log_dir_creates_files(self, mock_run, tmp_path):
assert log_files[1].name.startswith("002_")
+class TestPromiseCompletionSignals:
+ @patch("ralphify.engine.execute_agent")
+ def test_tagged_promise_does_not_stop_by_default(
+ self, mock_execute_agent, tmp_path
+ ):
+ """Without ``stop_on_completion_signal`` the loop must run all iterations.
+
+ A non-streaming adapter (here ``echo`` → GenericAdapter) only sees
+ promise completion when the engine elects to capture stdout, which
+ in turn requires either logging or the explicit opt-in. Skipping
+ the buffer is the whole point of the gating, so
+ ``state.promise_completed`` legitimately stays False here.
+ """
+ config = make_config(tmp_path, max_iterations=3)
+ state = make_state()
+ emitter = NullEmitter()
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout=None,
+ )
+
+ run_loop(config, state, emitter)
+
+ assert mock_execute_agent.call_count == 3
+ assert state.completed == 3
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is False
+ assert [
+ call.kwargs["on_output_line"] for call in mock_execute_agent.call_args_list
+ ] == [None, None, None]
+ assert [
+ call.kwargs["capture_result_text"]
+ for call in mock_execute_agent.call_args_list
+ ] == [True, True, True]
+ # Generic adapter requires full stdout, but the user didn't opt in
+ # and no log dir is set — capture should stay off to avoid
+ # buffering verbose agent output.
+ assert [
+ call.kwargs["capture_stdout"] for call in mock_execute_agent.call_args_list
+ ] == [False, False, False]
+
+ @patch("ralphify.engine.execute_agent")
+ def test_capture_stdout_off_for_streaming_adapter_without_logging(
+ self, mock_execute_agent, tmp_path
+ ):
+ """Claude exposes ``result_text`` directly; the engine must not
+ force-buffer the full stdout transcript even when the user opts
+ into ``stop_on_completion_signal``."""
+ config = make_config(
+ tmp_path,
+ agent="claude",
+ max_iterations=1,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ result_text="RALPH_PROMISE_COMPLETE",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_args.kwargs["capture_stdout"] is False
+ assert state.promise_completed is True
+
+ @patch("ralphify.engine.execute_agent")
+ def test_capture_stdout_on_for_blocking_adapter_with_opt_in(
+ self, mock_execute_agent, tmp_path
+ ):
+ """Generic / Copilot adapters need the full stdout buffer to
+ scan for the promise tag — engine must opt in when the user
+ opts into completion signalling."""
+ config = make_config(
+ tmp_path,
+ max_iterations=1,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="RALPH_PROMISE_COMPLETE\n",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_args.kwargs["capture_stdout"] is True
+ assert state.promise_completed is True
+
+ @patch("ralphify.engine.execute_agent")
+ def test_capture_stdout_on_when_log_dir_set(self, mock_execute_agent, tmp_path):
+ """Logging always needs the buffer regardless of completion signal."""
+ log_dir = tmp_path / "logs"
+ config = make_config(tmp_path, max_iterations=1, log_dir=log_dir)
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="anything\n",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_args.kwargs["capture_stdout"] is True
+
+ @patch("ralphify.engine.execute_agent")
+ def test_tagged_promise_stops_early_when_enabled(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(
+ tmp_path,
+ max_iterations=5,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+ emitter = QueueEmitter()
+ emitter.wants_agent_output_lines = lambda: True
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="RALPH_PROMISE_COMPLETE\n",
+ )
+
+ run_loop(config, state, emitter)
+
+ mock_execute_agent.assert_called_once()
+ assert mock_execute_agent.call_args.kwargs["capture_result_text"] is True
+ assert state.iteration == 1
+ assert state.completed == 1
+ assert state.failed == 0
+ assert state.total == 1
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is True
+
+ events = drain_events(emitter)
+ completed_events = events_of_type(events, EventType.ITERATION_COMPLETED)
+ assert len(completed_events) == 1
+ stop_event = events_of_type(events, EventType.RUN_STOPPED)[0]
+ assert stop_event.data["reason"] == "completed"
+ assert stop_event.data["total"] == 1
+ assert stop_event.data["completed"] == 1
+ assert stop_event.data["failed"] == 0
+ assert stop_event.data["timed_out_count"] == 0
+
+ @patch("ralphify.engine.execute_agent")
+ def test_custom_promise_text_matches_inner_tag_text(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(
+ tmp_path,
+ max_iterations=4,
+ completion_signal="CUSTOM_DONE",
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+ emitter = QueueEmitter()
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="CUSTOM_DONE\n",
+ )
+
+ run_loop(config, state, emitter)
+
+ mock_execute_agent.assert_called_once()
+ assert state.iteration == 1
+ assert state.completed == 1
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is True
+
+ events = drain_events(emitter)
+ stop_event = events_of_type(events, EventType.RUN_STOPPED)[0]
+ assert stop_event.data["reason"] == "completed"
+ assert stop_event.data["total"] == 1
+ assert stop_event.data["completed"] == 1
+
+ @patch("ralphify.engine.execute_agent")
+ def test_promise_tag_normalizes_inner_whitespace_before_matching(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(
+ tmp_path,
+ max_iterations=4,
+ completion_signal="CUSTOM DONE",
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="\n CUSTOM\tDONE \n\n",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_count == 1
+ assert state.iteration == 1
+ assert state.completed == 1
+ assert state.failed == 0
+ assert state.total == 1
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is True
+
+ @patch("ralphify.engine.execute_agent")
+ def test_untagged_raw_text_does_not_match_completion_signal(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(
+ tmp_path,
+ max_iterations=3,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ result_text="done RALPH_PROMISE_COMPLETE without promise tags",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_count == 3
+ assert state.completed == 3
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is False
+
+ @patch("ralphify.engine.execute_agent")
+ def test_different_tagged_promise_text_does_not_match(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(
+ tmp_path,
+ max_iterations=3,
+ completion_signal="CUSTOM_DONE",
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ result_text="CUSTOM_DONE_NOW",
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_count == 3
+ assert state.completed == 3
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is False
+
+ @patch("ralphify.engine.execute_agent")
+ def test_structured_agents_ignore_raw_stdout_for_promise_detection(
+ self, mock_execute_agent, tmp_path
+ ):
+ """ClaudeAdapter only looks at ``result`` events — embedded
+ promise tags inside ``status`` or ``assistant`` JSON messages
+ must not trigger early completion."""
+ config = make_config(
+ tmp_path,
+ agent="claude",
+ max_iterations=2,
+ stop_on_completion_signal=True,
+ )
+ state = make_state()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ result_text="done without promise tag",
+ captured_stdout='{"type":"status","message":"RALPH_PROMISE_COMPLETE"}\n',
+ )
+
+ run_loop(config, state, NullEmitter())
+
+ assert mock_execute_agent.call_count == 2
+ assert state.completed == 2
+ assert state.failed == 0
+ assert state.status == RunStatus.COMPLETED
+ assert state.promise_completed is False
+
+ @patch("ralphify.engine.execute_agent")
+ def test_blocking_captured_stdout_is_echoed_when_peek_is_off(
+ self, mock_execute_agent, tmp_path
+ ):
+ config = make_config(tmp_path, max_iterations=1)
+ state = make_state()
+ emitter = QueueEmitter()
+
+ mock_execute_agent.return_value = AgentResult(
+ returncode=0,
+ elapsed=0.01,
+ captured_stdout="plain blocking output\n",
+ )
+
+ run_loop(config, state, emitter)
+
+ completed_event = events_of_type(
+ drain_events(emitter), EventType.ITERATION_COMPLETED
+ )[0]
+ assert completed_event.data["echo_stdout"] == "plain blocking output\n"
+
+
class TestRunLoopDefaults:
@patch(MOCK_SUBPROCESS, side_effect=ok_proc)
def test_runs_without_emitter(self, mock_run, tmp_path):
diff --git a/tests/test_promise.py b/tests/test_promise.py
new file mode 100644
index 00000000..55125e34
--- /dev/null
+++ b/tests/test_promise.py
@@ -0,0 +1,47 @@
+"""Tests for strict promise-tag parsing."""
+
+import pytest
+
+from ralphify._promise import has_promise_completion, parse_promise_tags
+
+
+class TestParsePromiseTags:
+ @pytest.mark.parametrize(
+ "raw_text", [None, "", "plain text", "missing close"]
+ )
+ def test_parse_promise_tags_invalid_input_returns_empty_list(self, raw_text):
+ assert parse_promise_tags(raw_text) == []
+
+ def test_parse_promise_tags_normalizes_whitespace_and_preserves_unicode(self):
+ text = (
+ "before "
+ "\n CUSTOM\tDONE \n "
+ "middle "
+ "✅ shipped"
+ )
+
+ assert parse_promise_tags(text) == ["CUSTOM DONE", "✅ shipped"]
+
+
+class TestHasPromiseCompletion:
+ def test_has_promise_completion_matches_only_exact_tag_payload(self):
+ text = (
+ "raw CUSTOM_DONE text "
+ "CUSTOM_DONE_NOW "
+ "CUSTOM_DONE"
+ )
+
+ assert has_promise_completion(text, "CUSTOM_DONE") is True
+ assert has_promise_completion(text, "CUSTOM_DONE_NOW") is True
+ assert has_promise_completion(text, "CUSTOM") is False
+
+ def test_has_promise_completion_ignores_wrong_case_and_malformed_tags(self):
+ text = "CUSTOM_DONECUSTOM_DONE"
+
+ assert has_promise_completion(text, "CUSTOM_DONE") is False
+
+ def test_has_promise_completion_normalizes_completion_signal_whitespace(self):
+ text = "\n CUSTOM\tDONE \n"
+
+ assert has_promise_completion(text, "CUSTOM DONE") is True
+ assert has_promise_completion(text, "CUSTOM\tDONE") is True
diff --git a/workspace/ralphs/improve-codebase/PLAN.md b/workspace/ralphs/improve-codebase/PLAN.md
new file mode 100644
index 00000000..f6a3fcaf
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/PLAN.md
@@ -0,0 +1,56 @@
+# Improve Codebase — Plan
+
+Ralphify is a small, well-tested Python CLI (~4.7k LOC src, 628 tests passing,
+ruff + ty clean). Recent commits have been steady refactor work targeting
+`_console_emitter.py` (the biggest file at ~1.6k LOC). Continue that thread:
+squeeze out duplication and complexity in the hottest files first, then fan
+out to smaller polish.
+
+## Phases
+
+1. **Dead code / unused symbols** — private helpers with no callers, stale
+ constants, dead branches. Verify with grep + tests.
+2. **Duplication** — copy-pasted blocks (especially in `_console_emitter.py`
+ and `_agent.py`). Extract helpers where extraction does not hurt clarity.
+3. **Magic values & local constants** — stringly/numeric literals that repeat
+ inside a single module; lift to a named constant near the top.
+4. **Complex conditionals & long functions** — split `_console_emitter.py`
+ functions that juggle many states; extract pure helpers.
+5. **Naming & structure** — vague names, misplaced helpers, module-level
+ imports that could collapse.
+6. **Tests** — tighten unclear test names, merge duplicated fixtures.
+
+Each iteration must preserve behavior. If an "improvement" changes observable
+behavior (even by one log line), skip it and leave a note in backlog.md.
+
+## Current phase
+
+**Phase 4 — complex conditionals & long functions.** Phase 3 (magic values)
+is now essentially drained: every module-level scan for numeric literals
+≥ 10 across `src/ralphify/` turns up only named constants. Phase 1 (dead
+code) and Phase 2 (duplication) stay open for anything spotted in passing.
+Move on to simplifying local control flow where a variable is computed
+unconditionally but only used on one branch, or where a helper can tighten
+a nested conditional without losing clarity. The 134078d `name_col`
+scope narrowing is a representative Phase 4 move: same output, dead work
+gone, clearer scope.
+
+## Priorities (tailored to this repo)
+
+- The largest module is `_console_emitter.py`; every iteration there should
+ leave the module smaller *or* clearer, never both-at-once.
+- `_agent.py` has two execution paths (streaming / blocking) that historically
+ drift apart — watch for duplication.
+- Constants like `_MAX_VISIBLE_SCROLL`, `_MAX_SCROLL_LINES`,
+ `_SIGTERM_GRACE_PERIOD` live where they're used. Don't centralize them
+ unless two modules need the same value.
+- Do not churn public API: `src/ralphify/__init__.py` re-exports, CLI
+ commands, and event payload types.
+- Do not change docs wording in this ralph; that's for other ralphs.
+
+## Out of scope
+
+- New features or behavior changes
+- Dependency upgrades
+- Docs content (beyond fixing stale contributor notes encountered in passing)
+- Release tooling
diff --git a/workspace/ralphs/improve-codebase/backlog.md b/workspace/ralphs/improve-codebase/backlog.md
new file mode 100644
index 00000000..5c647856
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/backlog.md
@@ -0,0 +1,118 @@
+# Backlog
+
+Ordered roughly by phase, then by expected payoff. Add items freely; remove
+only when they land in a commit.
+
+## Phase 1 — dead code
+
+- Audit `_console_emitter.py` for unused private helpers / constants (grep
+ each `_foo` name for other references inside the module and tests).
+- Audit `_agent.py` for parallel streaming/blocking helpers that reference
+ the same constants but define their own copies. (cb61477 — extracted
+ `_call_safely` for the 3× best-effort observer-callback pattern; no
+ remaining obvious dup after that pass. Streaming's `_readline_pump` and
+ blocking's `_pump_stream` look similar but do genuinely different work:
+ the queue-based pump feeds a main-thread loop that parses JSON, while
+ the list-based pump does its callback work inline on its own thread.)
+- Check `cli.py` validators for unreachable error branches after recent
+ TypedDict refactors.
+- Confirm every `from typing import ...` import in `src/ralphify/` is used.
+ (Checked 4ccfa9a — all six modules import only what they use.)
+- vulture 60% flags that were verified as live: `clear_scroll`,
+ `_SinglePanelNavigator`, `_stop_live`, `serialize_frontmatter`,
+ `to_dict`, `_atexit_hook`, RunManager public methods — all used in tests,
+ docs, or scripts/. TypedDict field "unused" warnings are spurious.
+- Consider inlining `_validate_name` into `_check_unique_name` in `cli.py`
+ (the former has exactly one caller). Tradeoff: the split doc-strings
+ document the two concerns (format vs uniqueness) cleanly.
+- `_is_claude_command` (`_console_emitter.py`) and `_supports_stream_json`
+ (`_agent.py`) both check `Path(parts[0]).stem == CLAUDE_BINARY` but on
+ different inputs (string vs list). Consolidating would cross module
+ boundaries for modest payoff — revisit only if a third caller appears.
+
+## Phase 2 — duplication
+
+- Look for repeated `console.print(...)` formatting patterns in
+ `_console_emitter.py`.
+- Look for repeated dict/TypedDict key access patterns in the event handlers.
+- The fullscreen-Live teardown (`self._fullscreen_live.stop(); = None`)
+ appears in `_stop_live_unlocked` and `_teardown_fullscreen_unlocked` —
+ only two call sites and each is adjacent to other state mutations, so
+ extracting right now would just add indirection. Revisit if a third
+ caller appears.
+- The `try: fn(); except Exception: pass` pattern appears in
+ `_print_or_defer_unlocked`, `_flush_deferred_unlocked` (loop body),
+ and around `handle_key`'s body. Could extract a tiny `_safe_call`
+ but each site is one line; not worth the indirection unless a fourth
+ appears.
+- `_IterationPanel._build_footer` and `_IterationSpinner._build_footer`
+ both create `summary = Text(no_wrap=True, overflow="ellipsis")` then
+ branch on count > 0 vs "waiting…". Two subclasses only — already
+ noted in coverage as not-worth-extracting.
+- (01f2f1c — dropped `_FullscreenPeek._reset_view` which had the same body
+ as `scroll_to_bottom`.) No other near-duplicate scroll helpers spotted
+ in that class; `scroll_up` / `scroll_down` / `scroll_to_top` each touch
+ `_auto_scroll` under different conditions. (b19625e — dropped the
+ `new_offset` alias in `scroll_down`; `scroll_up` keeps its local
+ because it compares old vs new before the assignment.)
+
+## Phase 3 — magic values
+
+Essentially drained. Latest full scan (at 134078d): every bare integer
+≥ 10 across `src/ralphify/` already resolves to a named constant, and
+the handful of remaining single-site `2`s are flagged below with
+"only if a second site appears".
+
+- Scan each module's numeric literals (especially timeouts, widths, retry
+ counts) and promote to module constants when reused. (1d7251f —
+ promoted `40` → `_DEFAULT_CONSOLE_HEIGHT` for the two fallback-height
+ sites in `_console_emitter.py`.)
+- `_console_emitter.py:_fullscreen_page_size` uses a bare `2` as the
+ "page overlap lines" magic — only one site, but could be named
+ `_PAGE_OVERLAP` for symmetry with `_FULLSCREEN_CHROME_ROWS` if a
+ second page-size helper ever appears.
+- `_keypress.py` has `_POLL_INTERVAL`, `_WIN_POLL_INTERVAL`,
+ `_THREAD_JOIN_TIMEOUT` already at module top. No obvious leftover
+ literals worth promoting.
+
+## Phase 4 — complex conditionals & long functions
+
+- (134078d — narrowed `name_col` scope in `_IterationPanel._apply_assistant`
+ so the padded name column is only computed on the branch that renders
+ it.)
+- (7730dd4 — narrowed `secs = total % _SECONDS_PER_MINUTE` in
+ `_output.py:format_duration` into the `if minutes < _MINUTES_PER_HOUR:`
+ branch. Saved a modulo on every duration ≥ 1h and co-located the
+ local with its only use site.)
+- (ce487d3 — inlined `text = ensure_str(stream)` in
+ `_output.py:collect_output`. Same alias-inline shape as fc5e1cb /
+ 497c028 / 52e0272. Helper name `ensure_str` already documents the
+ decode step, so the intermediate binding added no clarity.)
+- (d0060b3 — exposed `_LivePanelBase.outcome` as a public property and
+ switched `_FullscreenPeek._build_header`'s `source._outcome` read to
+ go through it. Mirror of ef9a178's `iteration_id` cleanup.) Two
+ private-attr cross-class reads remain — both of `source._scroll_lines`
+ in `_FullscreenPeek._max_offset` and `__rich_console__`. Those touch a
+ mutable list the class itself appends to, so a read-only property
+ would hide the mutation asymmetry; defer unless a clearer abstraction
+ emerges (e.g., a "get a snapshot of visible lines" helper).
+- `_IterationPanel._apply_assistant` still juggles three block types
+ (`thinking` / `text` / `tool_use`) in one ~50-line method. Splitting
+ into `_render_thinking_block` / `_render_text_block` / `_render_tool_use_block`
+ would shorten the outer loop but each helper is short enough that the
+ indirection may not pay off — revisit only if a fourth block type lands.
+- `cli.py:_parse_user_args` is 55 lines of token-by-token iteration with
+ two nested branches and a while-loop that skips already-filled declared
+ names. Could be split into `_consume_flag` / `_consume_positional`
+ helpers without changing any error message. Medium payoff, medium
+ churn — land only once behavior is fully pinned by tests (which it is).
+
+## Notes / ideas to triage
+
+- `scripts/tui_dev/` has its own fixtures; out of scope unless it blocks a
+ src/ralphify/ change.
+- `_IterationPanel._cache_read_tokens` is captured from usage but never
+ read in production — only the regression test
+ `test_format_tokens_does_not_double_count_cached_input` reads it. The
+ capture protects against a hypothetical future display, so not strictly
+ dead, but worth revisiting when token rendering changes.
diff --git a/workspace/ralphs/improve-codebase/conventions.md b/workspace/ralphs/improve-codebase/conventions.md
new file mode 100644
index 00000000..4a16e49e
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/conventions.md
@@ -0,0 +1,3 @@
+# Conventions learned in this codebase
+
+Record repo-wide patterns worth preserving. One bullet per pattern.
diff --git a/workspace/ralphs/improve-codebase/coverage/.gitkeep b/workspace/ralphs/improve-codebase/coverage/.gitkeep
new file mode 100644
index 00000000..e69de29b
diff --git a/workspace/ralphs/improve-codebase/coverage/_agent.md b/workspace/ralphs/improve-codebase/coverage/_agent.md
new file mode 100644
index 00000000..898b2029
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/_agent.md
@@ -0,0 +1,136 @@
+# `_agent.py` coverage
+
+Valid at: 7402f04
+
+## Recent changes
+
+- 7402f04 — inlined the `stream_cmd = cmd + [_OUTPUT_FORMAT_FLAG,
+ _STREAM_FORMAT, _VERBOSE_FLAG]` local in `_run_agent_streaming`.
+ The binding was consumed exactly once on the very next statement
+ (the first positional arg to `subprocess.Popen(...)`); no other
+ references exist in src/ or tests/ (grep confirmed). The three
+ appended tokens are already named constants
+ (`_OUTPUT_FORMAT_FLAG`, `_STREAM_FORMAT`, `_VERBOSE_FLAG`) so the
+ "extended command for streaming mode" intent reads cleanly at the
+ call site without the intermediate name. Same Phase 4 inline-alias
+ shape as 66d6c60 (`remaining`), b24accf (`reader`), 2fda4f0
+ (`visible`), and e1ad87a (`binary`). Behavior preserved —
+ subprocess.Popen still receives the same list; pinned by the
+ streaming-path test coverage in `tests/test_agent.py`.
+- 66d6c60 — inlined the `remaining = deadline - time.monotonic()` local
+ in `_read_agent_stream`'s per-iteration timeout calc. The alias was
+ read exactly once on the next line as `max(remaining, 0)`; collapsing
+ to `max(deadline - time.monotonic(), 0)` matches the inline-alias
+ style from e1ad87a / 497c028 / 52e0272 / ce487d3. The adjacent
+ comment was updated ("max(remaining, 0)" → "clamp to 0") since the
+ name no longer exists. Behavior preserved — the clamped timeout
+ still reaches `line_q.get(timeout=...)`, so the non-blocking drain
+ on an already-expired deadline still fires and deadline enforcement
+ is unchanged. Pinned by the streaming-path agent tests. No other
+ `remaining` locals remain in the module (grep confirmed).
+- b24accf — inlined the `reader` thread handle in `_read_agent_stream`.
+ The local served only to call `.start()`; the thread is never joined
+ explicitly (termination is signalled through the queue's `None`
+ sentinel produced by `_readline_pump`'s `finally` and through the
+ daemon flag). Collapsing into the fluent
+ `threading.Thread(target=_readline_pump, args=(stdout, line_q),
+ daemon=True).start()` drops an unused binding and matches the
+ fire-and-forget intent. Python keeps live threads reachable via
+ `threading._active`, so no GC risk. Side effects preserved: the
+ reader still closes cleanly on `_close_pipes` (OSError in
+ `readline`), and the main loop still relies on `line_q.get` for
+ deadline enforcement. Pinned by the full `tests/test_agent.py`
+ suite (streaming-path coverage). This is the same alias/handle-drop
+ shape as e1ad87a / 497c028 / b19625e, specialised to a Thread —
+ thread-return values aren't special, they're just another handle
+ whose only use was `.start()`.
+- e1ad87a — inlined the `binary = Path(cmd[0]).stem` local in
+ `_supports_stream_json`. The alias was read exactly once on the
+ following line as `binary == CLAUDE_BINARY`. Collapsing to
+ `return Path(cmd[0]).stem == CLAUDE_BINARY` matches the already-inline
+ sibling check in `_console_emitter.py:_is_claude_command`
+ (`return Path(parts[0]).stem == CLAUDE_BINARY`). Same Phase 4
+ inline-alias shape as ce487d3 / 52e0272 / 497c028 / fc5e1cb. Empty-cmd
+ short-circuit (`if not cmd: return False`) preserved so
+ `Path(cmd[0])` never gets indexed into an empty list. Backlog note
+ about consolidating `_is_claude_command` and `_supports_stream_json`
+ across modules is unchanged — still deferred until a third caller
+ appears.
+- cf72fd9 — replaced the `parsed = None` sentinel in `_read_agent_stream`
+ with a `try/except/else` block. The old code set `parsed = None` in
+ the JSON-decode-except branch solely so the next line's
+ `if isinstance(parsed, dict):` would fall through; restructuring with
+ `try: ... except: pass; else: if isinstance(...):` makes the "only
+ forward when parsing succeeded" intent structural instead of encoded
+ through a sentinel value. The error path now skips the isinstance
+ check entirely (dead work before), and the success path is unchanged.
+ `parsed` is no longer bound when the except clause runs, which matches
+ Python convention — the value was always meant to be ignored there.
+ Pinned by `tests/test_agent.py::test_ignores_non_json_lines` and the
+ broader stream-JSON coverage in that file.
+- d8d5592 — gated the `"".join(...)` of `stream.stdout_lines` and
+ `stderr_lines` at the tail of `_run_agent_streaming` on
+ `log_dir is not None`. The joined strings were only consumed by
+ `_write_log` (which short-circuits when log_dir is None) and by the
+ `captured_stdout` / `captured_stderr` AgentResult fields, both of
+ which previously discarded the joined string with
+ `... if log_dir is not None else None`. Now matches the
+ already-lazy `"".join(x) if x is not None else None` idiom in
+ `_run_agent_blocking`'s tail, and the duplicated ternary on each
+ AgentResult field collapses to a bare `stdout` / `stderr`. Same
+ observable behavior — pinned by `test_captured_output_set_when_logging`
+ and `test_no_log_when_dir_not_set` in tests/test_agent.py.
+- cb61477 — added `_call_safely(callback, *args)` helper next to the
+ callback type aliases. Replaces three copies of the
+ `if cb is not None: try: cb(...); except Exception: pass` pattern
+ (two in `_read_agent_stream`, one in `_pump_stream`) with single-line
+ calls. Behavior preserved — identical None guard, identical broad
+ `Exception` suppression, identical argument-once semantics.
+
+## Shape of the module
+
+- Two execution paths: `_run_agent_streaming` (JSON line stream, used for
+ `claude`) and `_run_agent_blocking` (subprocess.Popen with optional
+ capture, used for all other agents).
+- `execute_agent` is the single public entry point; selects mode via
+ `_supports_stream_json(cmd)` (checks `Path(cmd[0]).stem == CLAUDE_BINARY`).
+- Shared shutdown sequence is centralized in `_cleanup_agent`:
+ 1. `_ensure_process_dead` (SIGTERM → SIGKILL via `_try_graceful_group_kill`,
+ then `proc.kill()`).
+ 2. `_close_pipes` (raw `os.close` on stdout/stderr fds to unblock readers).
+ 3. `_drain_readers` (bounded join on reader/writer threads).
+ 4. `_finalize_pipes` (Python-level `pipe.close()` for GC hygiene).
+- Thread spawning uses `_start_writer_thread` / `_start_pump_thread` to
+ centralize the `Thread(..., daemon=True); .start()` boilerplate.
+
+## Verified live (grepped, confirmed used)
+
+- `CLAUDE_BINARY` — public; imported by `_console_emitter.py` for display
+ logic (see backlog note about consolidating `_is_claude_command` /
+ `_supports_stream_json`; deferred until a third caller appears).
+- `_STDOUT`, `_STDERR` — used in `_run_agent_streaming` /
+ `_run_agent_blocking` stderr pump calls and inside `_read_agent_stream`.
+- `_SIGTERM_GRACE_PERIOD`, `_THREAD_JOIN_TIMEOUT`, `_PROCESS_WAIT_TIMEOUT`
+ — each referenced exactly once; constants kept near usage as the
+ project convention prefers.
+- `AgentResult`, `_StreamResult` — returned from streaming/blocking paths
+ and consumed by `engine.py`.
+
+## Potential future wins (not yet taken)
+
+- `_run_agent_streaming` and `_run_agent_blocking` both finish with the
+ same "`stdout = "".join(...) if … else None; stderr = "".join(...)
+ if … else None; log_file = _write_log(...); return AgentResult(...)`"
+ tail. After d8d5592 the conditional-join idiom is now identical
+ across both paths (gated on `log_dir is not None` for streaming,
+ on `stdout_lines is not None` for blocking — but `stdout_lines` is
+ itself `[] if log_dir is not None else None`, so the conditions are
+ equivalent). The intermediate state still differs (`stream.stdout_lines`
+ tuple vs `stdout_lines` list|None), so extracting a shared helper
+ would mostly move arguments around. Revisit only if a third
+ execution path appears.
+- The two `if proc.stdin/stdout/stderr is None: raise RuntimeError(...)`
+ guards just after `Popen` could use a single helper, but `subprocess`
+ guarantees these are non-None when `PIPE` is passed — the guards exist
+ mainly to narrow for the type checker, and a helper would make the
+ narrow less explicit. Leave as-is.
diff --git a/workspace/ralphs/improve-codebase/coverage/_console_emitter.md b/workspace/ralphs/improve-codebase/coverage/_console_emitter.md
new file mode 100644
index 00000000..c12834d9
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/_console_emitter.md
@@ -0,0 +1,205 @@
+# `_console_emitter.py` coverage
+
+Valid at: 2fda4f0
+
+## Recent changes
+
+- 2fda4f0 — inlined the `visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]`
+ local in `_LivePanelBase._build_body`. The alias was read exactly
+ once, as the iterable of the very next `for line in visible:` loop.
+ Collapsing to `for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:`
+ matches the inline-alias pattern from 497c028 (`agent`), fc5e1cb
+ (`total_in`), 52e0272 (`msg`), ce487d3 (`text`), and e1ad87a
+ (`binary`). Behavior unchanged — each iteration still mutates the
+ Text in place (`no_wrap` / `overflow`) before appending to `rows`,
+ and the slice still materializes the last `_MAX_VISIBLE_SCROLL`
+ items. No other `visible` locals remain in the class; the name is
+ reused elsewhere in the module (fullscreen viewport height, scrollbar
+ geometry) but all in unrelated scopes.
+- 3823019 — narrowed `line = escape_markup(data["line"])` scope in
+ `_on_agent_output_line` past the
+ `if not isinstance(target, _IterationSpinner): return` guard. The
+ escape_markup call was wasted work on the early-return path (target
+ None or wrong-type panel); moving the binding after the guard keeps
+ the _IterationSpinner branch behavior identical and drops the wasted
+ work on the other branch. Same shape as 134078d's `name_col`
+ narrowing — unconditional compute that only one branch consumes.
+ Note: the `_structured_agent` short-circuit earlier in the method
+ already skips this path for Claude runs (ad7523e), so this narrowing
+ only affects the raw-stdout path.
+- d0060b3 — added a public `outcome` property on `_LivePanelBase` and
+ replaced `source._outcome` in `_FullscreenPeek._build_header` with
+ `source.outcome`. Mirrors ef9a178's `iteration_id` cleanup — both
+ commits expose an existing private attribute through a getter so the
+ cross-class read doesn't have to dip into private state. The
+ `_outcome` attribute is still the single source of truth (written
+ only in `freeze`), and tests that read `_outcome` directly
+ (test_console_emitter.py:1766) keep working. Two private-attr
+ cross-class reads remain in the module (`source._scroll_lines` at
+ lines 750 and 872); those touch a mutable list that the class itself
+ appends to, so a read-only property would paper over the mutation
+ asymmetry — not taking until a real need appears.
+- 3a8908d — replaced the `if initial_id is None and self._iteration_history:`
+ guard in `enter_fullscreen` with `next(reversed(self._iteration_history), None)`.
+ The compound condition was doing two jobs at once: pick the fallback only
+ when nothing is live, *and* sidestep `next(reversed({}))` raising
+ StopIteration on the empty dict. The `next(it, default)` form moves the
+ empty-handling into the standard library idiom so the outer `if` reads
+ as a single concern. Same observable behavior — the immediately-following
+ `if initial_id is None or self.panel_for(initial_id) is None:` branch
+ still prints "Full peek: no iterations yet" and returns False when the
+ fallback yielded nothing. Pinned by `test_enter_without_iteration_prints_hint`.
+- 59b0e34 — inlined `self._fullscreen_page_size()` into the space/b
+ action lambdas in `_handle_fullscreen_key`. The `page` local was
+ computed unconditionally in the non-exit branch but only consumed by
+ the page-down (" ") and page-up ("b") lambdas — j/k/g/G/[/] now skip
+ the call entirely. Space/b still compute it exactly once per keypress,
+ now at action-invocation time (under the same `_console_lock`), so
+ behavior is unchanged. `_fullscreen_page_size()` is a pure read of
+ `self._console.size.height` in a try/except, so deferring evaluation
+ has no observable effect — the dict build and action invocation happen
+ back-to-back inside the lock. Same Phase 4 shape as 134078d / ef176bf
+ / b19625e.
+- 52e0272 — inlined the `msg = raw.get("message", {})` local in
+ `_IterationPanel._apply_assistant`. The alias was read exactly once on
+ the next line as `msg.get("usage")`. Collapsing to
+ `usage = raw.get("message", {}).get("usage")` matches the chained-get
+ style already used by `_iter_content_blocks` two functions above
+ (`raw.get("message", {}).get("content", [])`), and the same
+ inline-alias pattern as 497c028 (`agent`) and fc5e1cb (`total_in`).
+ No other reference to `msg` exists in the function — verified by grep.
+- b19625e — dropped the `new_offset` alias in `_FullscreenPeek.scroll_down`.
+ The local was assigned directly to `self._offset`, then the
+ follow-mode check re-read it as `new_offset == 0` — identical to
+ `self._offset == 0` after the assignment. Sibling `scroll_up` keeps
+ its local because it compares old vs new *before* assigning (needed
+ to conditionally disable auto-scroll on a real move); `scroll_down`
+ has no such comparison, so the alias was dead. Same Phase 4 shape
+ as ef176bf (`line_count`) and 134078d (`name_col`).
+- 497c028 — inlined the `agent = data["agent"]` local in `_on_run_started`.
+ The alias was read exactly once, immediately below, as the arg to
+ `_is_claude_command(agent)`. Reading `data["agent"]` directly matches
+ the style established by fc5e1cb (inlined `total_in`). `ralph_name`
+ was preserved — it's used inside an f-string where `data['ralph_name']`
+ would be awkward. Same Phase 4 shape as fc5e1cb.
+- ef176bf — dropped the `line_count = len(self._scroll_lines)` alias in
+ `_IterationSpinner._build_footer`. The local served dual duty as a
+ predicate (`if line_count > 0`) and as the `_plural` arg, but both
+ uses were on the same truthy branch. Replaced the predicate with
+ `if self._scroll_lines:` (idiomatic list truthiness) and moved the
+ `len()` call inside the branch that needs it. This matches the
+ sibling `_IterationPanel._build_footer` which uses `if self._tool_count > 0:`
+ inline with no local alias. Same Phase 4 shape as 134078d.
+- ad7523e — moved the `if self._structured_agent: return` short-circuit
+ in `_on_agent_output_line` from inside `_console_lock` to before
+ acquisition. The flag is write-once (set in `_on_run_started` before
+ any iteration events can flow) and already read lock-free in
+ `_on_agent_activity` — now both structured/raw output handlers use the
+ same pattern. Bonus: avoids a lock acquisition per stdout line under
+ Claude, where every line short-circuits anyway. Added a comment
+ explaining the write-once invariant so the lock-free read doesn't look
+ accidental.
+- bcadee1 — dropped the `if self._active_renderable is not None:` guard
+ wrapping `_archive_current_iteration_unlocked("interrupted")` in
+ `_on_iteration_started`. The archive helper already no-ops when
+ nothing is active (docstring: "No-op when no iteration is active"),
+ so the outer guard was mechanically redundant. Updated the
+ surrounding comment to note the no-op behavior so the call's intent
+ still reads as defensive. Same shape as 5337d88 / 4ccfa9a / 8cb0d47.
+- 134078d — narrowed `name_col` scope in `_IterationPanel._apply_assistant`'s
+ tool_use branch. The padded name column was computed unconditionally
+ but only rendered when `arg` was truthy (the `else` branch uses raw
+ `name` without padding). Moved the pad-to-column if/else inside
+ `if arg:` so the helper variable lives only where it's used. Same
+ output in both branches — only the dead formatting work is gone.
+- 1d7251f — promoted the `40` fallback-terminal-height literal to a named
+ module constant `_DEFAULT_CONSOLE_HEIGHT` near the other fullscreen
+ constants (`_FULLSCREEN_CHROME_ROWS`, `_FULLSCREEN_MIN_VISIBLE`). Two
+ use-sites both meant "reasonable default terminal height when the
+ real value isn't available": `_FullscreenPeek._console_height` (class-
+ attribute default used before the first `__rich_console__` call) and
+ `ConsoleEmitter._fullscreen_page_size`'s except-branch fallback. The
+ constant keeps them in lockstep. No other bare `40`s remain in the
+ module (grep confirmed).
+- d34e957 — dropped redundant `f"{_plural(total, 'line')}"` wrap in
+ `_FullscreenPeek._build_header`. `_plural` already returns a str
+ so the f-string just format-identity-copied it. Same shape as the
+ surrounding `header.append(literal, style=...)` calls. No other
+ `f"{_plural(...)}"` wraps remain in the module (checked with grep).
+- fc5e1cb — inlined `total_in = self._input_tokens` alias in
+ `_IterationPanel._format_tokens`. The rename hinted at a "total
+ input" aggregate that no longer exists (cache-read tokens are
+ intentionally excluded from ctx); reading `self._input_tokens`
+ directly matches what the value actually is. Matches the existing
+ style in the sibling `if self._output_tokens > 0` branch.
+- 3838006 — rewrote `ConsoleEmitter.panel_for` to call `self.is_live(...)`
+ for its guard instead of re-stating the
+ `cur_iter == id and active is not None` expression. Same behavior;
+ one source of truth for "this is the active iteration" check. Type
+ checker is happy: returning `self._active_renderable` (typed
+ `_LivePanelBase | None`) matches the declared return type even though
+ the runtime invariant guarantees non-None whenever `is_live` is True.
+- 01f2f1c — dropped `_FullscreenPeek._reset_view`. Its body
+ (`self._offset = 0; self._auto_scroll = True`) was byte-for-byte identical
+ to `scroll_to_bottom`. The two call sites in `_step_iteration` now call
+ `scroll_to_bottom()` directly; the "snap to newest line + follow" intent
+ moved into a docstring on the surviving method. No other scroll-reset
+ duplication remains.
+- ef9a178 — replaced the single cross-class `_fullscreen_view._iteration_id`
+ access in `_archive_current_iteration_unlocked` with the public
+ `iteration_id` property on `_FullscreenPeek`. No behavior change —
+ `_FullscreenPeek` already exposes this via an `@property` (line 739-741);
+ the private-attribute shortcut was an oversight from earlier iterations.
+ Now `_iteration_id` is only read from within `_FullscreenPeek` itself.
+ (d0060b3 applied the same pattern to `_LivePanelBase._outcome`.)
+- c4469a1 — extracted `_FullscreenPeek._step_iteration(direction)` from
+ `prev_iteration` / `next_iteration`. The two methods were 12-line
+ mirror images differing only in step direction (-1 vs +1) and
+ eviction-fallback (`ids[0]` vs `ids[-1]`). Combined boundary check
+ uses `0 <= new_idx < len(ids)` which collapses both `idx == 0` (prev)
+ and `idx >= len(ids) - 1` (next) into one expression.
+- 5337d88 — dropped `if not self._tool_categories: return ""` early
+ return in `_IterationPanel._format_categories`. Empty dict yields an
+ empty list comprehension which `" · ".join` turns into `""`, so the
+ guard was dead — same shape as 4ccfa9a's `_format_params` cleanup.
+ No other empty-collection-then-join pattern remains in the module
+ (`_format_tokens` builds its `parts` list with conditional appends, so
+ it has no comprehension to short-circuit).
+- 8cb0d47 — dropped the `max(total - visible, 1)` guard in
+ `_scrollbar_metrics`. The early return `if total <= visible` already
+ guarantees `total - visible ≥ 1`, so the `max(..., 1)` floor was dead
+ defensive code. Inlined the subtraction directly into the `frac`
+ calculation and added a comment noting the invariant.
+- 0900aad — dropped `_iteration_order` list; `_iteration_history` dict
+ preserves insertion order by itself. Archive now pops-then-inserts to
+ move existing entries to the end; eviction iterates the dict
+ (oldest-first); `enter_fullscreen` uses `next(reversed(...))` for the
+ most recent finished iteration. Updated the single direct-field
+ reference in `tests/test_console_emitter.py`.
+- 3e9627b — extracted `_stop_compact_live_unlocked` helper to dedupe the
+ `if self._live is not None: self._live.stop(); self._live = None` pattern
+ across `_stop_live_unlocked`, `enter_fullscreen`, and `_on_iteration_ended`.
+- 4ccfa9a — dropped `if parts else ""` branch in `_format_params`.
+ `" · ".join([])` returns `""`, so the guard was dead.
+
+## Verified live (grepped, confirmed used)
+
+Private helpers and constants that look unused but are legitimately used:
+
+- `_ICON_SUCCESS`, `_ICON_FAILURE`, `_ICON_TIMEOUT`, `_ICON_ARROW`,
+ `_ICON_DASH`, `_ICON_PLAY` — all referenced in handler print strings.
+- `clear_scroll` — used by test_console_emitter tests.
+- `_SinglePanelNavigator` — used by tests and `scripts/tui_dev/snapshot.py`.
+- `_stop_live` (the locked wrapper) — used only in tests for cleanup
+ between test cases. Production code uses `_stop_live_unlocked` inside
+ an existing lock.
+- `_format_params`, `_extract_file_path`, `_extract_key`, `_extract_params`
+ — all referenced in the `_TOOL_REGISTRY` table (`"Read"`, `"Glob"`,
+ `"Grep"`, `"Edit"`, `"Write"`, `"Bash"`, `"WebFetch"`, `"WebSearch"`, etc.).
+
+## Potential future wins (not yet taken)
+
+- `_IterationPanel._build_footer` and `_IterationSpinner._build_footer` both
+ start with `Text(no_wrap=True, overflow="ellipsis")` and use
+ `_footer_grid(summary)` — the `Text(...)` construction repeats, but only
+ twice. Not worth extracting unless a third subclass appears.
diff --git a/workspace/ralphs/improve-codebase/coverage/_frontmatter.md b/workspace/ralphs/improve-codebase/coverage/_frontmatter.md
new file mode 100644
index 00000000..3f4faeb2
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/_frontmatter.md
@@ -0,0 +1,52 @@
+# `_frontmatter.py` coverage
+
+Valid at: a6f4c47
+
+## Recent changes
+
+- a6f4c47 — dropped the `if text.startswith(_UTF8_BOM):` guard in
+ `parse_frontmatter` before the `text = text.removeprefix(_UTF8_BOM)`
+ call. Python's `str.removeprefix` is already a no-op (returns the
+ same string object) when the prefix is absent, so the guard was
+ purely decorative dead code. Behavior preserved:
+ - BOM-prefixed input still gets stripped (pinned by
+ `test_utf8_bom_does_not_break_frontmatter` in
+ `tests/test_frontmatter.py`).
+ - Non-BOM input is passed through unchanged (exercised by every
+ other parse test in the file).
+ - The CPython implementation returns the same object identity when
+ no prefix match occurs, so there's no allocation overhead either.
+
+## Shape of the module
+
+- `parse_frontmatter(text)` — public entry point. Strips optional
+ UTF-8 BOM, splits on `---` delimiters via `_extract_frontmatter_block`,
+ runs `yaml.safe_load`, strips HTML comments from the body with
+ `_strip_html_comments`, returns `(dict, body)`.
+- `serialize_frontmatter(frontmatter, body)` — inverse; emits
+ `---`-delimited blocks only when the frontmatter is non-empty *or*
+ the body would otherwise be mis-parsed as frontmatter.
+- Constants: `RALPH_MARKER`, `FIELD_*`, `CMD_FIELD_*`, `NAME_RE`,
+ `VALID_NAME_CHARS_MSG`. All imported by `cli.py` and
+ `_resolver.py`; each one is reused across modules so centralisation
+ is justified.
+
+## Verified live (grepped, confirmed used)
+
+- `_FRONTMATTER_DELIMITER` — used 4× in `serialize_frontmatter` (plus
+ 2× in `_extract_frontmatter_block`).
+- `_FENCE_OR_COMMENT_RE` — used in `_strip_html_comments`.
+- `_UTF8_BOM` — used in `parse_frontmatter` (BOM strip).
+
+## Potential future wins (not yet taken)
+
+- `_extract_frontmatter_block` splits `text` on `"\n"` up front, then
+ re-joins slices for the body — the body slice is re-joined even when
+ the input is tiny. Could stream with `str.find` + `str.index` to
+ avoid the list allocation, but this function runs once per
+ iteration and the input is ~1 KB in practice; not worth the churn.
+- The `serialize_frontmatter` `needs_delimiters` expression uses
+ `body.lstrip().startswith(...)` which allocates a new stripped
+ string just to check a prefix. Could collapse via
+ `re.match(r"\s*---", body)` but the current form reads cleanly;
+ revisit only if this becomes hot.
diff --git a/workspace/ralphs/improve-codebase/coverage/_output.md b/workspace/ralphs/improve-codebase/coverage/_output.md
new file mode 100644
index 00000000..e0a9dfbb
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/_output.md
@@ -0,0 +1,45 @@
+# `_output.py` coverage
+
+Valid at: ce487d3
+
+## Recent changes
+
+- ce487d3 — inlined `text = ensure_str(stream)` in `collect_output`.
+ The local was assigned then read exactly once on the next line as
+ `parts.append(text)`. The `ensure_str` helper name already
+ documents the decode step, so the intermediate binding was pure
+ noise. Same alias-inline shape as fc5e1cb / 497c028 / 52e0272.
+- 7730dd4 — narrowed `secs = total % _SECONDS_PER_MINUTE` scope in
+ `format_duration`. The local was computed unconditionally between
+ the `total = int(seconds + 0.5)` / `minutes = total // 60` setup and
+ the `if minutes < _MINUTES_PER_HOUR:` branch, but only consumed by
+ the `f"{minutes}m {secs}s"` return on the truthy branch. The hours
+ branch (`hours = minutes // 60; mins = minutes % 60`) never touches
+ `secs`, so for any duration ≥ 1h the modulo was wasted work. Moved
+ the assignment inside the if to co-locate with its only use site.
+ Same Phase 4 shape as 134078d / ef176bf / 59b0e34 / b19625e.
+
+## Layout snapshot (at 7730dd4)
+
+- Module is 139 lines — small, mostly format helpers.
+- Top: `IS_WINDOWS`, `SUBPROCESS_TEXT_KWARGS`, `SESSION_KWARGS` —
+ shared subprocess kwargs imported by `_agent.py` and `_runner.py`.
+- `ProcessResult` — base dataclass for `RunResult` / `AgentResult`,
+ with the shared `success` property.
+- Format helpers: `ensure_str`, `collect_output`, `warn`, `format_count`,
+ `format_duration`.
+- Module-level constants `_COUNT_THOUSANDS`, `_COUNT_MILLIONS`,
+ `_SECONDS_PER_MINUTE`, `_MINUTES_PER_HOUR` are used only by the two
+ `format_*` functions below them.
+
+## Potential future wins (not yet taken)
+
+- `format_count` repeats `f"{n / _COUNT_MILLIONS:.1f}M"` in two
+ branches (the bare ≥1M return and the rounded-cross-into-M guard).
+ Could lift to a `_format_millions(n)` helper, but each site is one
+ line and the duplication is intentional (the rounding guard explains
+ the second site). Skip unless a third user appears.
+- The two `format_*` functions both have a "rounded value crossed into
+ next unit" guard (59.95s → "1m 0s"; 999_950 → "1.0M") with parallel
+ comments referencing each other. Already kept in lockstep — no
+ refactor needed.
diff --git a/workspace/ralphs/improve-codebase/coverage/_resolver.md b/workspace/ralphs/improve-codebase/coverage/_resolver.md
new file mode 100644
index 00000000..db553415
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/_resolver.md
@@ -0,0 +1,20 @@
+# `_resolver.py` coverage
+
+Valid at: 6227863
+
+## Recent changes
+
+- 6227863 — dropped `if not user_args: return _ARGS_RE.sub("", prompt)`
+ early return in `resolve_args`. `_ARGS_RE.sub` with the callable
+ `lambda m: user_args.get(m.group(1), "")` already resolves every match
+ to `""` when the dict is empty, so the fast path produced byte-for-byte
+ identical output to the general path. Test
+ `test_empty_args_clears_placeholders` still covers the empty-dict
+ behavior.
+
+## Potential future wins (not yet taken)
+
+- None obvious; the module is 80 lines and both `resolve_args` /
+ `resolve_all` share the substitution shape but operate on different
+ regexes (single-kind vs multi-kind), so extracting a helper would add
+ indirection for two very short call sites.
diff --git a/workspace/ralphs/improve-codebase/coverage/engine.md b/workspace/ralphs/improve-codebase/coverage/engine.md
new file mode 100644
index 00000000..274137c1
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/coverage/engine.md
@@ -0,0 +1,34 @@
+# `engine.py` coverage
+
+Valid at: c5ce11d
+
+## Recent changes
+
+- c5ce11d — inlined the `reason = state.status.reason` local in
+ `run_loop`. The alias was read exactly once on the next line as the
+ `reason=` kwarg to `RunStoppedData(...)`. Inlined to
+ `reason=state.status.reason` to match the chained-read style elsewhere
+ (e.g. `_IterationPanel._apply_assistant`'s `raw.get("message", {}).get("usage")`
+ after 52e0272). Same Phase 4 inline-alias shape as ce487d3 (`text`),
+ 52e0272 (`msg`), 497c028 (`agent`), fc5e1cb (`total_in`).
+ Safe: the immediately-preceding `if state.status == RunStatus.RUNNING:
+ state.status = RunStatus.COMPLETED` normalizes the status to a terminal
+ value, so `RunStatus.reason`'s ValueError guard (non-terminal) cannot
+ fire here.
+
+## Structure notes
+
+`run_loop` is the main loop orchestrator. Its control-flow helpers
+(`_handle_control_signals`, `_wait_for_resume`, `_run_iteration`,
+`_delay_if_needed`) are each short and single-purpose. `_run_iteration`
+splits into `_run_commands` → `_assemble_prompt` → `_run_agent_phase`.
+`_run_agent_phase` is the longest at ~80 lines but its branches map 1:1
+to the agent outcome tri-state (timed_out / success / failure) — no
+obvious extraction candidate without inventing synthetic abstractions.
+
+## Potential future wins (not yet taken)
+
+- None spotted. The locals that survive in `_run_agent_phase`
+ (`duration`, `event_type`, `state_detail`, `ended_data`) all have
+ multiple uses and good names. `_run_commands`'s `output` is mutated
+ before use. `_run_iteration`'s `iteration` alias is used 4 times.
diff --git a/workspace/ralphs/improve-codebase/iterations.md b/workspace/ralphs/improve-codebase/iterations.md
new file mode 100644
index 00000000..c742b441
--- /dev/null
+++ b/workspace/ralphs/improve-codebase/iterations.md
@@ -0,0 +1,58 @@
+# Iterations
+
+One line per iteration: ` `.
+
+7402f04 refactor: inline `stream_cmd` into Popen call in `_run_agent_streaming` — the `stream_cmd = cmd + [_OUTPUT_FORMAT_FLAG, _STREAM_FORMAT, _VERBOSE_FLAG]` local was consumed exactly once on the very next statement (the first positional arg to `subprocess.Popen(...)`), and grep confirmed no other references in src/ or tests/. The three appended tokens are already named constants, so the intent reads cleanly at the call site without the intermediate binding. Same Phase 4 inline-alias shape as 66d6c60 (`remaining`), b24accf (`reader`), 2fda4f0 (`visible`), e1ad87a (`binary`), and 52e0272 (`msg`). Behavior preserved — subprocess.Popen still receives the same list, pinned by the full streaming-path test coverage in `tests/test_agent.py`.
+
+66d6c60 refactor: inline `remaining` alias in `_read_agent_stream` timeout calc — the `remaining = deadline - time.monotonic()` local served a single use on the very next line as `max(remaining, 0)`. Collapsing to `max(deadline - time.monotonic(), 0)` matches the inline-alias pattern from e1ad87a (`binary`), 497c028 (`agent`), fc5e1cb (`total_in`), 52e0272 (`msg`), and ce487d3 (`text`). The adjacent comment was updated to refer to "clamp to 0" since `remaining` no longer exists. Behavior preserved — the same clamped value reaches `line_q.get(timeout=...)` on every iteration of the read loop, so deadline enforcement and the non-blocking drain on an already-expired deadline both behave identically. Pinned by the streaming-path tests in `tests/test_agent.py`.
+
+b24accf refactor: inline `reader` thread handle in `_read_agent_stream` — the local served only to call `.start()`; the thread is never joined because termination flows through the queue's None sentinel plus the daemon flag, so the handle had no further role. Collapsing into the fluent `threading.Thread(..., daemon=True).start()` form matches the fire-and-forget intent and drops an unused binding. Python's `threading` module keeps live threads reachable via `threading._active`, so dropping the local reference does not affect thread lifetime — verified by the full agent test suite (628 passed). Same alias/handle-drop shape as 2fda4f0 / e1ad87a / 497c028 / b19625e, specialised to a Thread.
+
+2fda4f0 refactor: inline `visible` alias in `_LivePanelBase._build_body` — the `visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]` local was read exactly once, as the iterable of the very next `for line in visible:` loop. Collapsing to `for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:` matches the inline-alias pattern from 497c028 / fc5e1cb / 52e0272 / ce487d3 / e1ad87a. Behavior unchanged — the slice still materializes the last `_MAX_VISIBLE_SCROLL` items, and each loop iteration still mutates the Text in-place (`no_wrap` / `overflow`) before appending to `rows`. Pinned by the broad `test_console_emitter.py` suite (peek-visible rendering paths).
+
+e1ad87a refactor: inline `binary` alias in `_supports_stream_json` — the local was assigned to `Path(cmd[0]).stem` and then read exactly once on the next line as `binary == CLAUDE_BINARY`. Collapsing to the chained form `Path(cmd[0]).stem == CLAUDE_BINARY` matches the already-inline sibling check in `_console_emitter.py:_is_claude_command` (`return Path(parts[0]).stem == CLAUDE_BINARY`) and the inline-alias pattern from ce487d3 / 52e0272 / 497c028 / fc5e1cb. Same behavior — `_supports_stream_json` still short-circuits on empty `cmd` first, and the final boolean result is unchanged. Pinned by `tests/test_agent.py::test_streaming_mode_used_for_claude` and the broader agent-selection tests.
+
+c5ce11d refactor: inline `reason = state.status.reason` alias in `run_loop`'s `RUN_STOPPED` emit. The local was read exactly once as the `reason=` kwarg to `RunStoppedData(...)` on the next line; collapsing to `reason=state.status.reason` matches the inline-alias pattern from ce487d3 (`text`) / 52e0272 (`msg`) / 497c028 (`agent`) / fc5e1cb (`total_in`). Safe at this call site — the immediately-preceding `if state.status == RunStatus.RUNNING: state.status = RunStatus.COMPLETED` normalizes status to a terminal value (STOPPED / COMPLETED / FAILED), so the `RunStatus.reason` property's "non-terminal raises ValueError" guard cannot fire.
+
+3823019 refactor: narrow `line = escape_markup(data["line"])` scope past the early-return guard in `_on_agent_output_line` — the local was computed unconditionally inside `_console_lock` but only used on the `_IterationSpinner` branch; the `if not isinstance(target, _IterationSpinner): return` early-return path would have thrown the value away. Moving the binding below the guard preserves exactly the same output (same escape_markup call, same f-string, same add_scroll_line) while skipping the wasted work whenever target is None or an _IterationPanel. Same Phase 4 shape as 134078d (`name_col`), 7730dd4 (`secs`), ef176bf (`line_count`), 59b0e34 (`page`). Pinned by the raw-line handling tests in test_console_emitter.py.
+
+ce487d3 refactor: inline `text` alias in `_output.py:collect_output` — the local was assigned from `ensure_str(stream)` and read exactly once on the next line as `parts.append(text)`. Same alias-inline shape as fc5e1cb / 497c028 / 52e0272 — the `ensure_str` helper name already documents the decode step, so the intermediate binding was pure noise. Tests in `test_output.py` cover the str, bytes, and stdout+stderr-newline-join paths.
+
+7730dd4 refactor: narrow `secs = total % _SECONDS_PER_MINUTE` scope into the `if minutes < _MINUTES_PER_HOUR:` branch in `_output.py:format_duration` — the local was computed unconditionally but only consumed by `f"{minutes}m {secs}s"` on the truthy branch. The hours branch derives `hours`/`mins` independently and never references `secs`, so the modulo was wasted work for any duration ≥ 1h. Same Phase 4 narrow-the-scope shape as 134078d (`name_col`), ef176bf (`line_count`), 59b0e34 (`page`), b19625e (`new_offset`). Tests in test_output.py cover both branches (sub-hour: lines 110–137; hour+: lines 140–144), so the move is pinned in either direction.
+
+d0060b3 refactor: replace `source._outcome` cross-class access with public `outcome` property on `_LivePanelBase` — `_FullscreenPeek._build_header` was reading the private attribute directly on its `source` panel. Added an `@property outcome` that returns `self._outcome`, matching the `iteration_id` property pattern from ef9a178 (which cleaned up the parallel `_fullscreen_view._iteration_id` cross-class access). The two remaining cross-class private reads on `_LivePanelBase` (`source._scroll_lines` at lines 750 and 872) read a mutable list that's also appended to from within the class, so a read-only property would mask the asymmetry — leaving those alone until a real need appears. Same observable behavior — `_outcome` is still the single source of truth written by `freeze`, and all existing readers get the same value. The direct `panel_before._outcome == "completed"` assertion in test_console_emitter.py:1766 still passes (underlying attribute is unchanged); fullscreen-header rendering paths exercise the new property transitively.
+
+cf72fd9 refactor: replace `parsed = None` sentinel with try/except/else in `_read_agent_stream` — the sentinel only existed so that `isinstance(parsed, dict)` on the next line would fall through on JSON-parse failure. Moving the dict-handling into the try/else branch expresses "only run when parse succeeded" structurally and drops both the sentinel assignment and the redundant isinstance check on the error path. Behavior preserved — valid-but-non-dict JSON (lists, numbers, strings) still skips forwarding via the inner isinstance guard. Same sentinel→structure shape as prior Phase 4 scope narrowings, applied one level up.
+
+a6f4c47 refactor: drop redundant BOM-startswith guard before `removeprefix` in `parse_frontmatter` — `str.removeprefix` already returns the string unchanged when the prefix is absent, so the `if text.startswith(_UTF8_BOM):` wrapper was dead defensive code. Behavior preserved — `test_utf8_bom_does_not_break_frontmatter` still pins the BOM-stripping path and every other test exercises the no-BOM path. Same "drop the guard when the operation already handles the no-match case" shape as 5337d88 (empty dict → `" · ".join([])`), 4ccfa9a (empty list → `" · ".join([])`), and 8cb0d47 (redundant `max(..., 1)` floor).
+
+3a8908d refactor: use `next(reversed(...), None)` in `enter_fullscreen` history fallback — the compound `if initial_id is None and self._iteration_history:` guard mixed two concerns (only fall back when nothing is live, *and* avoid `next(reversed({}))` raising StopIteration). Switching to the `next(it, default)` sentinel idiom lets the standard default handle empty-dict case so the outer `if` only encodes the "fall back when nothing live" intent. Behavior preserved — the existing `if initial_id is None or panel_for(...) is None:` next branch still prints "no iterations yet" and returns False, covered by `test_enter_without_iteration_prints_hint`.
+
+59b0e34 refactor: inline `_fullscreen_page_size()` into the space/b lambdas in `_handle_fullscreen_key` — the `page` local was computed unconditionally in the non-exit branch but only consumed by the page-down (" ") and page-up ("b") action lambdas. j/k/g/G/[/] now skip the call entirely; space/b still compute it exactly once per keypress, now at action-invocation time under the same lock. Same Phase 4 "narrow the scope of a one-branch local" shape as 134078d / ef176bf / b19625e.
+
+52e0272 refactor: inline `msg` alias in `_apply_assistant` — the `msg = raw.get("message", {})` local was read exactly once on the next line as `msg.get("usage")`. Collapsing into `raw.get("message", {}).get("usage")` matches the chained-get style already used by `_iter_content_blocks` (which independently does `raw.get("message", {}).get("content", [])`) and the inline-alias pattern from 497c028 / fc5e1cb.
+
+b19625e refactor: drop `new_offset` alias in `_FullscreenPeek.scroll_down` — the local was assigned to `self._offset` and then re-read only as `new_offset == 0`, which is identical to `self._offset == 0` after the assignment. Sibling `scroll_up` needs its local because it compares old vs new before the assignment; `scroll_down` has no such comparison, so the alias was dead. Same shape as ef176bf and 134078d.
+
+497c028 refactor: inline `agent` alias in `_on_run_started` — the local served a single use as the arg to `_is_claude_command(...)`. Reading `data["agent"]` directly matches the sibling style from fc5e1cb (`total_in` inline). Preserved `ralph_name` — it's used inside an f-string on line 1247 where the alias still helps readability.
+
+ef176bf refactor: drop `line_count` alias in `_IterationSpinner._build_footer` — local was computed unconditionally but only used on the truthy branch (both as predicate `> 0` and as `_plural` arg). Inlined the truthy check on `self._scroll_lines` and moved the `len()` call inside the branch that uses it. Matches the style of sibling `_IterationPanel._build_footer` which uses `if self._tool_count > 0:` with no local alias. Same Phase 4 shape as 134078d's `name_col` narrowing.
+
+d8d5592 refactor: gate `"".join(stdout/stderr_lines)` in `_run_agent_streaming` on `log_dir is not None` — the joined strings were only consumed by `_write_log` and the `captured_*` AgentResult fields, both of which discarded the result when log_dir was None. Mirrors the already-lazy idiom in `_run_agent_blocking` and drops the duplicated `... if log_dir is not None else None` ternary on each AgentResult field.
+ad7523e refactor: move `_structured_agent` short-circuit out of `_console_lock` in `_on_agent_output_line` — flag is write-once (set in `_on_run_started`), matches `_on_agent_activity`'s pattern, avoids lock acquisition per stdout line under Claude
+bcadee1 refactor: drop redundant `_active_renderable` guard in `_on_iteration_started` — archive call already no-ops when nothing is active
+134078d refactor: narrow `name_col` scope into `if arg:` in `_apply_assistant` — padded variant was computed but unused when arg falsy
+1d7251f refactor: promote 40-line fallback height to `_DEFAULT_CONSOLE_HEIGHT` — two sites used the literal 40 for "default terminal height when unknown"
+6227863 refactor: drop redundant empty-user_args branch in `resolve_args` — callable path already yields "" for every match when dict is empty
+d34e957 refactor: drop redundant f-string wrap around `_plural(total, 'line')` in fullscreen header
+fc5e1cb refactor: inline `total_in` alias in `_format_tokens` — direct read of `self._input_tokens` clarifies intent
+3838006 refactor: defer `panel_for` guard to `is_live` to dedupe identical condition
+01f2f1c refactor: drop `_reset_view` in `_FullscreenPeek` — identical body to `scroll_to_bottom`
+ef9a178 refactor: replace `_fullscreen_view._iteration_id` cross-class private access with public `iteration_id` property
+c4469a1 refactor: extract `_step_iteration` to dedupe prev/next iteration browsing in `_FullscreenPeek`
+5337d88 refactor: drop redundant empty-dict guard in `_format_categories` — `" · ".join([])` already returns `""`
+8cb0d47 refactor: drop redundant `max(total - visible, 1)` guard in `_scrollbar_metrics` — early return makes `total - visible ≥ 1`
+cb61477 refactor: extract `_call_safely` helper — dedupes 3× best-effort callback guards in `_agent.py`
+4ccfa9a refactor: drop redundant `if parts else ""` in `_format_params` (empty join already returns "")
+3e9627b refactor: extract `_stop_compact_live_unlocked` to dedupe compact-Live teardown across 3 call sites
+0900aad refactor: drop redundant `_iteration_order` list — dict insertion order suffices