From 85ff945e8b8fa062ab1df8bce1669bbadebbff46 Mon Sep 17 00:00:00 2001 From: Paul Sorensen Date: Sat, 18 Apr 2026 14:22:14 -0700 Subject: [PATCH 1/4] feat(loop): add completion tag for early exit Adds opt-in early-exit to the loop via a structured completion tag. When stop_on_completion_signal: true is set in frontmatter and the agent emits a matching tag in its output, ralphify stops after the current iteration instead of continuing to the iteration budget. - _promise.py: strict TEXT parser with whitespace normalization - Two optional frontmatter fields: completion_signal and stop_on_completion_signal - examples/promise-completion/RALPH.md: canonical example - Docs updated across getting-started, cli, how-it-works, agents, cookbook, quick-reference - Tests in test_promise.py, test_agent.py, test_cli.py, test_engine.py (~98% coverage) --- README.md | 7 + docs/agents.md | 29 +++- docs/cli.md | 7 +- docs/cookbook.md | 40 ++++- docs/getting-started.md | 15 +- docs/how-it-works.md | 7 +- docs/index.md | 6 +- docs/quick-reference.md | 3 + docs/troubleshooting.md | 2 +- examples/promise-completion/RALPH.md | 49 ++++++ src/ralphify/_agent.py | 150 ++++++++++++++---- src/ralphify/_frontmatter.py | 4 + src/ralphify/_promise.py | 27 ++++ src/ralphify/_run_types.py | 16 +- src/ralphify/cli.py | 65 +++++++- src/ralphify/engine.py | 113 +++++++++---- tests/test_agent.py | 224 ++++++++++++++++++++++++-- tests/test_cli.py | 143 +++++++++++++++++ tests/test_engine.py | 229 +++++++++++++++++++++++++++ tests/test_promise.py | 47 ++++++ 20 files changed, 1079 insertions(+), 104 deletions(-) create mode 100644 examples/promise-completion/RALPH.md create mode 100644 src/ralphify/_promise.py create mode 100644 tests/test_promise.py diff --git a/README.md b/README.md index e633cffd..2118042d 100644 --- a/README.md +++ b/README.md @@ -63,6 +63,7 @@ Ralph loops give you: | **grow-coverage** | Write tests for untested modules, one per iteration, until coverage hits the target | | **security-audit** | Hunt for vulnerabilities — scan, find, fix, verify, repeat | | **clear-backlog** | Work through a TODO list or issue tracker, one task per loop | +| **promise-completion** | Work until a target is done, then emit a promise tag so the loop stops early | | **write-docs** | Generate documentation for undocumented modules, one at a time | | **improve-codebase** | Find and fix code smells, refactor patterns, modernize APIs | | **migrate** | Incrementally migrate files from one framework or pattern to another | @@ -95,11 +96,17 @@ Scaffold a ralph and start experimenting: ralph scaffold my-ralph ``` +The scaffolded `RALPH.md` includes the normal command/arg template plus a commented promise-completion path you can enable if the agent should stop early by emitting a matching `...` tag. + +For a committed example, see [`examples/promise-completion/RALPH.md`](examples/promise-completion/RALPH.md) — it shows a loop that exits early once the requested target is complete. + Edit `my-ralph/RALPH.md`, then run it: ```bash ralph run my-ralph # loops until Ctrl+C ralph run my-ralph -n 5 # run 5 iterations then stop +# from a repo checkout: +ralph run examples/promise-completion -n 10 --target "stabilize the failing auth tests" ``` ### What `ralph run` does diff --git a/docs/agents.md b/docs/agents.md index a14d9a04..36e1e602 100644 --- a/docs/agents.md +++ b/docs/agents.md @@ -36,9 +36,34 @@ Your agent must: 1. **Read a prompt from stdin** — the full assembled prompt is piped in 2. **Do work in the current directory** — edit files, run commands, make commits -3. **Exit when done** — exit code 0 means success, non-zero means failure +3. **Exit cleanly** — exit code `0` means the agent process succeeded; non-zero means failure +4. **Optionally emit a completion signal** — set `completion_signal` in frontmatter (default inner text: `RALPH_PROMISE_COMPLETE`) if you want the agent to print an explicit `...` marker -That's it. No special protocol, no API — just stdin in, work done, process exits. +Normal exit codes still indicate process success or failure. They do **not** trigger promise completion by themselves. + +Ralphify only stops early on promise completion when both of these are true: + +- `stop_on_completion_signal: true` +- the matching `...` tag is detected in agent output or captured result text + +`completion_signal` is the inner promise text. For example, `completion_signal: COMPLETE` means the agent must output `COMPLETE`. + +Ralphify still keeps its own command/prompt loop architecture. Only the promise tag format and matching align with Ralph-Wiggum. + +Minimal example: + +```markdown +--- +agent: claude -p --dangerously-skip-permissions +completion_signal: COMPLETE +stop_on_completion_signal: true +--- + +Implement the next todo. When the work is fully complete, print +COMPLETE and exit. +``` + +That's it. No API required — just stdin in, output out, process exits. ## Claude Code diff --git a/docs/cli.md b/docs/cli.md index 408001ce..7d4b3669 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -113,6 +113,7 @@ The loop also stops automatically when: - All `-n` iterations have completed - `--stop-on-error` is set and the agent exits non-zero or times out +- `stop_on_completion_signal: true` is set in frontmatter and the matching `...` tag is detected in agent output or captured result text ### Peeking at live agent output @@ -152,7 +153,7 @@ ralph scaffold # Creates RALPH.md in the current directory |---|---|---| | `[NAME]` | none | Directory name. If omitted, creates RALPH.md in the current directory | -The generated template includes an example command (`git-log`), an example arg (`focus`), and a prompt body with placeholders for both. Edit it, then run [`ralph run`](#ralph-run). See [Getting Started](getting-started.md) for a full walkthrough. +The generated template includes an example command (`git-log`), an example arg (`focus`), a prompt body with placeholders for both, and commented `completion_signal` / `stop_on_completion_signal` lines showing the promise-completion path. Uncomment them if you want the agent to stop early by emitting a matching `...` tag. Then run [`ralph run`](#ralph-run). See [Getting Started](getting-started.md) for a full walkthrough. Errors if `RALPH.md` already exists at the target location. @@ -190,6 +191,10 @@ Your instructions here. Reference args with {{ args.dir }}. | `commands` | list | no | Commands to run each iteration (each has `name` and `run`) | | `args` | list of strings | no | Declared argument names for user arguments. Letters, digits, hyphens, and underscores only. | | `credit` | bool | no | Append co-author trailer instruction to prompt (default: `true`) | +| `completion_signal` | string | no | Inner text for the completion promise tag. `COMPLETE` means the agent must emit `COMPLETE` (default inner text: `RALPH_PROMISE_COMPLETE`) | +| `stop_on_completion_signal` | bool | no | Stop the loop early when the matching `...` tag is detected (default: `false`) | + +Exit code `0` still only means the agent process succeeded. Ralphify keeps its own loop architecture; only the promise tag format and matching align with Ralph-Wiggum. ### Commands diff --git a/docs/cookbook.md b/docs/cookbook.md index e6455725..65b1ca89 100644 --- a/docs/cookbook.md +++ b/docs/cookbook.md @@ -1,13 +1,13 @@ --- title: Ralph Loop Recipes -description: Copy-pasteable ralph loop setups for autonomous ML research, test coverage, code migration, security scanning, deep research, documentation, bug fixing, and codebase improvement. -keywords: ralphify cookbook, autonomous coding recipes, RALPH.md examples, documentation loop, bug fixing loop, codebase improvement, deep research agent, code migration loop, security scanning agent, test coverage automation, autoresearch, autonomous ML research +description: Copy-pasteable ralph loop setups for autonomous ML research, promise-based early exit, test coverage, code migration, security scanning, deep research, documentation, bug fixing, and codebase improvement. +keywords: ralphify cookbook, autonomous coding recipes, RALPH.md examples, promise completion, early exit agent loop, documentation loop, bug fixing loop, codebase improvement, deep research agent, code migration loop, security scanning agent, test coverage automation, autoresearch, autonomous ML research --- # Cookbook !!! tldr "TL;DR" - 8 copy-pasteable ralph loops: [autoresearch](#autoresearch), [codebase improvement](#codebase-improvement), [documentation](#documentation-loop), [bug hunting](#bug-hunter), [deep research](#deep-research), [code migration](#code-migration), [security scanning](#security-scan), and [test coverage](#test-coverage). Each is a real, runnable example from the `examples/` directory. + 9 copy-pasteable ralph loops: [autoresearch](#autoresearch), [codebase improvement](#codebase-improvement), [documentation](#documentation-loop), [bug hunting](#bug-hunter), [deep research](#deep-research), [code migration](#code-migration), [security scanning](#security-scan), [test coverage](#test-coverage), and [promise completion](#promise-completion). Each is a real, runnable example from the `examples/` directory. Copy-pasteable setups for common autonomous workflows. Each recipe is a real, runnable ralph from the [`examples/`](https://github.com/computerlovetech/ralphify/tree/main/examples) directory. @@ -269,6 +269,40 @@ The coverage report gives the agent a clear metric to improve and shows exactly --- +## Stop the loop when the task is complete {: #promise-completion } + +A loop that uses ralphify's promise-completion path to stop before the iteration budget. The agent keeps working until the requested target is done, then emits `COMPLETE` so the run ends immediately instead of burning the remaining iterations. + +**`promise-completion/RALPH.md`** + +```markdown +--8<-- "examples/promise-completion/RALPH.md" +``` + +```bash +ralph run promise-completion -n 10 --target "stabilize the failing auth tests" +``` + +```text +▶ Running: promise-completion + 3 commands · max 10 iterations + +── Iteration 1 ── + Commands: 3 ran +✓ Iteration 1 completed (51.4s) + +── Iteration 2 ── + Commands: 3 ran +✓ Iteration 2 completed via promise tag COMPLETE (43.2s) + +────────────────────── +Done: 2 iterations — 2 succeeded +``` + +Set `completion_signal` to the inner promise text and `stop_on_completion_signal: true` to enable early exit. The agent must emit the matching `...` tag — bare text does not count. + +--- + ## Next steps - [CLI Reference](cli.md) — all `ralph run` options (`--timeout`, `--stop-on-error`, `--delay`, user args) diff --git a/docs/getting-started.md b/docs/getting-started.md index 306954ad..313f4781 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -50,9 +50,10 @@ ralph scaffold my-ralph ```text Created my-ralph/RALPH.md Edit the file, then run: ralph run my-ralph +Optional early exit: uncomment completion_signal + stop_on_completion_signal and emit COMPLETE. ``` -This creates `my-ralph/RALPH.md` with a ready-to-customize template including an example command, arg, and prompt. Edit the task section, [test it](#step-3-do-a-test-run), then follow [Step 4](#step-4-add-a-test-command) to add a test command — test feedback is what makes the loop self-healing. +This creates `my-ralph/RALPH.md` with a ready-to-customize template including an example command, arg, prompt, and a commented promise-completion example. If you want the loop to stop before its iteration budget, uncomment `completion_signal` and `stop_on_completion_signal`, then tell the agent to emit the matching `...` tag when it is done. Edit the task section, [test it](#step-3-do-a-test-run), then follow [Step 4](#step-4-add-a-test-command) to add a test command — test feedback is what makes the loop self-healing. Or create the file manually as shown below. @@ -251,6 +252,16 @@ The agent's output streams live to your terminal between the iteration markers If the agent breaks a test, the next iteration sees the failure output via `{{ commands.tests }}` and fixes it automatically. +!!! tip "Optional: stop when the task is fully complete" + Add these frontmatter fields if you want the loop to stop on an explicit completion marker: + + ```yaml + completion_signal: COMPLETE + stop_on_completion_signal: true + ``` + + `completion_signal` is the inner promise text. With `completion_signal: COMPLETE`, the agent must emit `COMPLETE`. If you omit it, the default promise tag is `RALPH_PROMISE_COMPLETE`. The loop only exits early when `stop_on_completion_signal` is enabled and that tag is detected in agent output or captured result text. Exit code `0` still only means the agent process succeeded. + Once you're confident the loop works, drop the `-n 3` to let it run indefinitely. Press `Ctrl+C` to stop. ## Step 7: Steer while it runs @@ -274,7 +285,7 @@ Read TODO.md and focus only on the API module. This is the most powerful part of ralph loops — you're steering a running agent with a text file. !!! warning "Frontmatter changes need a restart" - Only the **prompt body** is re-read each iteration. Frontmatter fields (`agent`, `commands`, `args`) are parsed once at startup. If you add a new command or change the agent, stop the loop with `Ctrl+C` and restart it. + Only the **prompt body** is re-read each iteration. Frontmatter is parsed once at startup. If you add a new command, change the agent, or change completion settings, stop the loop with `Ctrl+C` and restart it. ## Next steps diff --git a/docs/how-it-works.md b/docs/how-it-works.md index 74d39e9d..f4be00a7 100644 --- a/docs/how-it-works.md +++ b/docs/how-it-works.md @@ -19,7 +19,7 @@ Every iteration follows the same sequence. Here's what happens at each step. The prompt body (everything below the frontmatter) is read from disk **every iteration**. This means you can edit the prompt text — add rules, change the task, adjust constraints — while the loop is running. Changes take effect on the next cycle. -Frontmatter fields (`agent`, `commands`, `args`) are parsed once at startup. To change those, restart the loop. +Frontmatter settings are parsed once at startup. To change them, restart the loop. ### 2. Run commands and capture output @@ -70,6 +70,8 @@ echo "" | claude -p --dangerously-skip-permissions The agent reads the prompt, does work in the current directory (edits files, runs commands, makes commits), and exits. Ralphify waits for the agent process to finish. +Exit codes still mean process success or failure. Promise completion is separate: `completion_signal` is the inner promise text (default: `RALPH_PROMISE_COMPLETE`), so ralphify only stops early when `stop_on_completion_signal` is `true` and the agent emits the matching `...` tag in output or captured result text. This aligns the promise format with Ralph-Wiggum, but ralphify still uses its own command/prompt loop architecture. + When the agent command starts with `claude`, ralphify automatically adds `--output-format stream-json --verbose` to enable structured streaming. This lets ralphify track agent activity in real time — you don't need to configure this yourself. ### 6. Loop back with fresh context @@ -82,7 +84,7 @@ The loop starts the next iteration from step 1. The RALPH.md is re-read, command |---|---|---| | Prompt body | Every iteration | Edit the prompt while the loop runs — the next iteration follows your new instructions | | Command output | Every iteration | The agent always sees fresh data (latest git log, current test status, etc.) | -| Frontmatter (`agent`, `commands`, `args`) | Once at startup | Parsed when the loop starts. Restart to pick up changes. | +| Frontmatter settings | Once at startup | Parsed when the loop starts. Restart to pick up changes. | | User arguments | Once at startup | Passed via CLI flags, constant for the run | ## How broken code gets fixed automatically @@ -179,6 +181,7 @@ The loop continues until one of these happens: | `Ctrl+C` (first) | Gracefully finishes the current iteration, then stops the loop. The agent completes its work and the iteration result is recorded. | | `Ctrl+C` (second) | Force-stops immediately — kills the agent process and exits. Use when you don't want to wait for the current iteration to finish. | | `-n` limit reached | Loop stops after completing the specified number of iterations | +| `stop_on_completion_signal: true` and matching `...` tag detected | Loop stops after the current iteration | | `--stop-on-error` and agent exits non-zero or times out | Loop stops after the current iteration | | `--timeout` exceeded | Agent process is killed, iteration is marked as timed out, loop continues (unless `--stop-on-error`) | diff --git a/docs/index.md b/docs/index.md index 7f284a07..ca17330a 100644 --- a/docs/index.md +++ b/docs/index.md @@ -16,7 +16,7 @@ hide: A **ralph** is a directory with a `RALPH.md` file — a skill-like format that bundles a prompt, the commands to run between iterations, and any files the agent needs. **Ralphify** is the CLI runtime that executes them. -See [The Ralph Format](blog/posts/the-ralph-format.md) for the full spec. +See [The Ralph Format](blog/posts/the-ralph-standard.md) for the full spec. ``` grow-coverage/ @@ -52,7 +52,7 @@ One directory. One command. Each iteration starts with fresh context and current *Works with any agent CLI. Swap `claude -p` for Codex, Aider, or your own — just change the `agent` field.* [Get Started](getting-started.md){ .md-button .md-button--primary } -[Read the Format Spec](blog/posts/the-ralph-format.md){ .md-button } +[Read the Format Spec](blog/posts/the-ralph-standard.md){ .md-button } --- @@ -149,7 +149,7 @@ Ralphs are to the outer loop what [skills](https://agentskills.io/) are to the i ## Next steps -- **[The Ralph Format](blog/posts/the-ralph-format.md)** — the full spec +- **[The Ralph Format](blog/posts/the-ralph-standard.md)** — the full spec - **[Getting Started](getting-started.md)** — from install to a running loop in 10 minutes - **[How it Works](how-it-works.md)** — what happens inside each iteration - **[Cookbook](cookbook.md)** — copy-pasteable ralphs for coding, docs, research, and more diff --git a/docs/quick-reference.md b/docs/quick-reference.md index 3fb7d880..a8ced46f 100644 --- a/docs/quick-reference.md +++ b/docs/quick-reference.md @@ -78,6 +78,8 @@ Your instructions here. Use {{ args.dir }} for user arguments. | `commands` | list | no | Commands to run each iteration | | `args` | list | no | User argument names. Letters, digits, hyphens, and underscores only. | | `credit` | bool | no | Append co-author trailer instruction to prompt (default: `true`) | +| `completion_signal` | string | no | Inner text for the completion promise tag. `COMPLETE` means the agent must emit `COMPLETE` (default inner text: `RALPH_PROMISE_COMPLETE`) | +| `stop_on_completion_signal` | bool | no | Stop the loop early when the matching `...` tag is detected (default: `false`) | ### Command fields @@ -160,6 +162,7 @@ Each iteration: | `P` (shift+p) | Open full-screen peek — scroll the entire activity buffer. `j/k` line, `space/b` page, `g/G` top/bottom, `q` or `P` exits | | `-n` limit reached | Stops after the specified number of iterations | | `--stop-on-error` | Stops if agent exits non-zero or times out | +| matching `...` tag detected | Stops early only when `stop_on_completion_signal: true` and the configured promise tag is found in agent output/result | ## Live editing diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index a981fddc..52426260 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -522,7 +522,7 @@ For programmatic control over concurrent runs, use the [Python API's `RunManager ### Can I edit RALPH.md while the loop runs? -Yes. The prompt body (everything below the frontmatter) is re-read every iteration — edit the prompt text and changes take effect on the next cycle. Frontmatter fields (`agent`, `commands`, `args`) are parsed once at startup, so changing those requires restarting the loop. +Yes. The prompt body (everything below the frontmatter) is re-read every iteration — edit the prompt text and changes take effect on the next cycle. Frontmatter settings are parsed once at startup, so changing them requires restarting the loop. ### How do I disable the co-author credit in commits? diff --git a/examples/promise-completion/RALPH.md b/examples/promise-completion/RALPH.md new file mode 100644 index 00000000..7302a110 --- /dev/null +++ b/examples/promise-completion/RALPH.md @@ -0,0 +1,49 @@ +--- +agent: claude -p --dangerously-skip-permissions +commands: + - name: tests + run: uv run pytest -x + - name: lint + run: uv run ruff check . + - name: git-log + run: git log --oneline -10 +args: + - target +completion_signal: COMPLETE +stop_on_completion_signal: true +--- + +# Stop Early with a Promise Tag + +You are an autonomous coding agent running in a loop. Each iteration +starts with a fresh context. Your progress lives in the code and git. + +## Recent commits + +{{ commands.git-log }} + +## Test results + +{{ commands.tests }} + +## Lint + +{{ commands.lint }} + +Fix any failing tests or lint violations above before doing anything else. + +## Task + +Get the requested target to a clean, shippable state. +{{ args.target }} + +When the task is complete and no more changes are needed, print +COMPLETE and exit so the loop stops early. + +## Rules + +- One fix or improvement per iteration +- Keep the target scoped — do not drift into unrelated cleanup +- If tests or lint are failing, fix them before new work +- Only emit COMPLETE when the target is truly done +- Commit with a descriptive message and push diff --git a/src/ralphify/_agent.py b/src/ralphify/_agent.py index af6fb972..6bf1a0da 100644 --- a/src/ralphify/_agent.py +++ b/src/ralphify/_agent.py @@ -82,6 +82,34 @@ _PROCESS_WAIT_TIMEOUT = 5.0 +def _extract_result_text_from_lines(lines: list[str] | None) -> str | None: + """Return the last string payload from any JSON ``result`` event in *lines*.""" + if lines is None: + return None + + result_text = None + for line in lines: + extracted = _extract_result_text_from_line(line) + if extracted is not None: + result_text = extracted + return result_text + + +def _extract_result_text_from_line(line: str) -> str | None: + """Return the string payload from a single JSON ``result`` event line.""" + try: + parsed = json.loads(line.strip()) + except json.JSONDecodeError: + return None + if ( + isinstance(parsed, dict) + and parsed.get("type") == _RESULT_EVENT_TYPE + and isinstance(parsed.get(_RESULT_FIELD), str) + ): + return parsed[_RESULT_FIELD] + return None + + def _try_graceful_group_kill(proc: subprocess.Popen[Any]) -> bool: """Attempt to kill the process via its POSIX process group. @@ -237,7 +265,7 @@ class AgentResult(ProcessResult): class _StreamResult: """Accumulated output from reading the agent's JSON stream.""" - stdout_lines: tuple[str, ...] + stdout_lines: tuple[str, ...] | None result_text: str | None timed_out: bool @@ -298,6 +326,8 @@ def _read_agent_stream( deadline: float | None, on_activity: ActivityCallback | None, on_output_line: OutputLineCallback | None = None, + *, + capture_stdout: bool = True, ) -> _StreamResult: """Read the agent's JSON stream line-by-line until EOF or timeout. @@ -322,8 +352,12 @@ def _read_agent_stream( Returns early with ``timed_out=True`` when the deadline is exceeded, leaving the caller responsible for killing the subprocess. + + When *capture_stdout* is ``False``, stdout is still drained and parsed but + not retained in memory. This keeps the streaming path lightweight when no + later completion-signal parsing or log writing needs the raw bytes. """ - stdout_lines: list[str] = [] + stdout_lines: list[str] | None = [] if capture_stdout else None result_text: str | None = None line_q: queue.Queue[str | None] = queue.Queue() @@ -347,19 +381,20 @@ def _read_agent_stream( except queue.Empty: # Deadline expired while waiting for a line. return _StreamResult( - stdout_lines=tuple(stdout_lines), + stdout_lines=tuple(stdout_lines) if stdout_lines is not None else None, result_text=result_text, timed_out=True, ) if line is None: # EOF sentinel from reader thread return _StreamResult( - stdout_lines=tuple(stdout_lines), + stdout_lines=tuple(stdout_lines) if stdout_lines is not None else None, result_text=result_text, timed_out=False, ) - stdout_lines.append(line) + if stdout_lines is not None: + stdout_lines.append(line) if on_output_line is not None: try: on_output_line(line.rstrip("\r\n"), _STDOUT) @@ -390,7 +425,7 @@ def _read_agent_stream( # past the deadline. if deadline is not None and time.monotonic() > deadline: return _StreamResult( - stdout_lines=tuple(stdout_lines), + stdout_lines=tuple(stdout_lines) if stdout_lines is not None else None, result_text=result_text, timed_out=True, ) @@ -404,6 +439,8 @@ def _run_agent_streaming( iteration: int, on_activity: ActivityCallback | None = None, on_output_line: OutputLineCallback | None = None, + capture_result_text: bool = False, + capture_stdout: bool = False, ) -> AgentResult: """Run the agent subprocess with line-by-line streaming of JSON output. @@ -420,30 +457,37 @@ def _run_agent_streaming( start = time.monotonic() deadline = (start + timeout) if timeout is not None else None + capture_stdout_text = log_dir is not None or capture_stdout + pipe_stderr = log_dir is not None or on_output_line is not None + capture_stderr_text = log_dir is not None + writer_thread: threading.Thread | None = None - stderr_lines: list[str] = [] + stderr_lines: list[str] | None = [] if capture_stderr_text else None stderr_thread: threading.Thread | None = None proc = subprocess.Popen( stream_cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, - stderr=subprocess.PIPE, + stderr=subprocess.PIPE if pipe_stderr else None, **SUBPROCESS_TEXT_KWARGS, **SESSION_KWARGS, ) try: # Popen with PIPE guarantees non-None streams; guard explicitly # so the type checker narrows and -O mode cannot skip the check. - if proc.stdin is None or proc.stdout is None or proc.stderr is None: + if proc.stdin is None or proc.stdout is None: raise RuntimeError("subprocess.Popen failed to create PIPE streams") + if pipe_stderr and proc.stderr is None: + raise RuntimeError("subprocess.Popen failed to create PIPE stderr") # Start the stderr pump BEFORE writing stdin so large prompts can't # deadlock against an agent that writes substantial diagnostics to # stderr while still reading its stdin. - stderr_thread = _start_pump_thread( - proc.stderr, stderr_lines, _STDERR, on_output_line - ) + if proc.stderr is not None: + stderr_thread = _start_pump_thread( + proc.stderr, stderr_lines, _STDERR, on_output_line + ) # Deliver the prompt on a background thread so that a blocked write # (child not reading stdin, pipe buffer full) cannot prevent @@ -452,7 +496,13 @@ def _run_agent_streaming( # _deliver_prompt already swallows. writer_thread = _start_writer_thread(proc, prompt) - stream = _read_agent_stream(proc.stdout, deadline, on_activity, on_output_line) + stream = _read_agent_stream( + proc.stdout, + deadline, + on_activity, + on_output_line, + capture_stdout=capture_stdout_text, + ) if stream.timed_out: _kill_process_group(proc) @@ -460,8 +510,8 @@ def _run_agent_streaming( finally: _cleanup_agent(proc, stderr_thread, writer_thread) - stdout = "".join(stream.stdout_lines) - stderr = "".join(stderr_lines) + stdout = "".join(stream.stdout_lines) if stream.stdout_lines is not None else None + stderr = "".join(stderr_lines) if stderr_lines is not None else None log_file = _write_log(log_dir, iteration, stdout, stderr) @@ -471,8 +521,8 @@ def _run_agent_streaming( log_file=log_file, result_text=stream.result_text, timed_out=stream.timed_out, - captured_stdout=stdout if log_dir is not None else None, - captured_stderr=stderr if log_dir is not None else None, + captured_stdout=stdout if capture_stdout_text else None, + captured_stderr=stderr if capture_stderr_text else None, ) @@ -601,6 +651,8 @@ def _run_agent_blocking( log_dir: Path | None, iteration: int, on_output_line: OutputLineCallback | None = None, + capture_result_text: bool = False, + capture_stdout: bool = False, ) -> AgentResult: """Run the agent subprocess and return the result. @@ -613,9 +665,9 @@ def _run_agent_blocking( - **Callback only** (``on_output_line`` set, no log dir) — reader threads forward lines to the callback without accumulating them, avoiding unbounded memory growth. - - **Log capture** (``log_dir`` set) — reader threads accumulate - lines into lists for log writing; lines are also forwarded to the - callback if provided. + - **Buffered capture** (``log_dir`` or ``capture_stdout`` set) — + reader threads accumulate lines for log writing or later completion + parsing; lines are also forwarded to the callback if provided. The subprocess is started in its own process group so that on ``KeyboardInterrupt`` or timeout the entire child tree can be killed @@ -625,7 +677,12 @@ def _run_agent_blocking( Raises ``FileNotFoundError`` if the command binary does not exist. """ start = time.monotonic() - capture = log_dir is not None or on_output_line is not None + capture_stdout_text = log_dir is not None or capture_stdout + capture_stderr_text = log_dir is not None + pipe_stdout = ( + capture_stdout_text or on_output_line is not None or capture_result_text + ) + pipe_stderr = capture_stderr_text or on_output_line is not None # When no subscriber needs the bytes, stdout/stderr are left # un-piped so the child writes directly to the terminal. When @@ -638,29 +695,41 @@ def _run_agent_blocking( writer_thread: threading.Thread | None = None stdout_thread: threading.Thread | None = None stderr_thread: threading.Thread | None = None - stdout_lines: list[str] | None = [] if log_dir is not None else None - stderr_lines: list[str] | None = [] if log_dir is not None else None + stdout_lines: list[str] | None = [] if capture_stdout_text else None + stderr_lines: list[str] | None = [] if capture_stderr_text else None + result_text: str | None = None + + def _on_output_line(line: str, stream_name: OutputStream) -> None: + nonlocal result_text + if capture_result_text and stream_name == _STDOUT: + extracted = _extract_result_text_from_line(line) + if extracted is not None: + result_text = extracted + if on_output_line is not None: + on_output_line(line, stream_name) - pipe = subprocess.PIPE if capture else None proc = subprocess.Popen( cmd, stdin=subprocess.PIPE, - stdout=pipe, - stderr=pipe, + stdout=subprocess.PIPE if pipe_stdout else None, + stderr=subprocess.PIPE if pipe_stderr else None, **SUBPROCESS_TEXT_KWARGS, **SESSION_KWARGS, ) try: if proc.stdin is None: raise RuntimeError("subprocess.Popen failed to create PIPE stdin") - if capture: - if proc.stdout is None or proc.stderr is None: - raise RuntimeError("subprocess.Popen failed to create PIPE streams") + if pipe_stdout: + if proc.stdout is None: + raise RuntimeError("subprocess.Popen failed to create PIPE stdout") stdout_thread = _start_pump_thread( - proc.stdout, stdout_lines, _STDOUT, on_output_line + proc.stdout, stdout_lines, _STDOUT, _on_output_line ) + if pipe_stderr: + if proc.stderr is None: + raise RuntimeError("subprocess.Popen failed to create PIPE stderr") stderr_thread = _start_pump_thread( - proc.stderr, stderr_lines, _STDERR, on_output_line + proc.stderr, stderr_lines, _STDERR, _on_output_line ) writer_thread = _start_writer_thread(proc, prompt) @@ -681,9 +750,10 @@ def _run_agent_blocking( returncode=None if timed_out else returncode, elapsed=time.monotonic() - start, log_file=log_file, + result_text=result_text or _extract_result_text_from_lines(stdout_lines), timed_out=timed_out, - captured_stdout=stdout, - captured_stderr=stderr, + captured_stdout=stdout if capture_stdout_text else None, + captured_stderr=stderr if capture_stderr_text else None, ) @@ -696,6 +766,8 @@ def execute_agent( iteration: int, on_activity: ActivityCallback | None = None, on_output_line: OutputLineCallback | None = None, + capture_result_text: bool = False, + capture_stdout: bool | None = None, ) -> AgentResult: """Run the agent subprocess, auto-selecting streaming or blocking mode. @@ -708,7 +780,13 @@ def execute_agent( This is the single entry point the engine should use — callers don't need to know which execution mode is selected. """ - if _supports_stream_json(cmd): + supports_stream_json = _supports_stream_json(cmd) + if capture_stdout is None: + capture_stdout = log_dir is not None or ( + not supports_stream_json and on_output_line is None and capture_result_text + ) + + if supports_stream_json: return _run_agent_streaming( cmd, prompt, @@ -717,6 +795,8 @@ def execute_agent( iteration, on_activity=on_activity, on_output_line=on_output_line, + capture_result_text=capture_result_text, + capture_stdout=capture_stdout, ) return _run_agent_blocking( cmd, @@ -725,4 +805,6 @@ def execute_agent( log_dir, iteration, on_output_line=on_output_line, + capture_result_text=capture_result_text, + capture_stdout=capture_stdout, ) diff --git a/src/ralphify/_frontmatter.py b/src/ralphify/_frontmatter.py index a361c241..fa3f67aa 100644 --- a/src/ralphify/_frontmatter.py +++ b/src/ralphify/_frontmatter.py @@ -26,6 +26,10 @@ FIELD_ARGS = "args" FIELD_CREDIT = "credit" FIELD_RALPH = "ralph" +# Promise config keeps the legacy key names. ``completion_signal`` stores the +# inner promise text, not the surrounding ``...`` markup. +FIELD_COMPLETION_SIGNAL = "completion_signal" +FIELD_STOP_ON_COMPLETION_SIGNAL = "stop_on_completion_signal" # Sub-field names within each command mapping. CMD_FIELD_NAME = "name" diff --git a/src/ralphify/_promise.py b/src/ralphify/_promise.py new file mode 100644 index 00000000..362b78fa --- /dev/null +++ b/src/ralphify/_promise.py @@ -0,0 +1,27 @@ +"""Parse promise completion tags emitted by agents.""" + +from __future__ import annotations + +import re + +_PROMISE_TAG_RE = re.compile(r"(.*?)", re.DOTALL) + + +def _normalize_promise_text(text: str) -> str: + """Collapse internal whitespace so config and tag payloads compare consistently.""" + return " ".join(text.split()) + + +def parse_promise_tags(text: str | None) -> list[str]: + """Return normalized inner text from all well-formed promise tags in *text*.""" + if not text: + return [] + return [ + _normalize_promise_text(match.group(1)) + for match in _PROMISE_TAG_RE.finditer(text) + ] + + +def has_promise_completion(text: str | None, completion_signal: str) -> bool: + """Return True when *text* contains a matching promise completion tag.""" + return _normalize_promise_text(completion_signal) in parse_promise_tags(text) diff --git a/src/ralphify/_run_types.py b/src/ralphify/_run_types.py index 411ec93b..9218b03b 100644 --- a/src/ralphify/_run_types.py +++ b/src/ralphify/_run_types.py @@ -20,6 +20,9 @@ DEFAULT_COMMAND_TIMEOUT: float = 60 """Default timeout in seconds for commands defined in RALPH.md frontmatter.""" +DEFAULT_COMPLETION_SIGNAL = "RALPH_PROMISE_COMPLETE" +"""Default inner ``...`` text that marks promise completion.""" + RUN_ID_LENGTH: int = 12 """Number of hex characters used for generated run IDs.""" @@ -97,6 +100,10 @@ class RunConfig: log_dir: Path | None = None project_root: Path = field(default=Path(".")) credit: bool = True + # Inner text expected inside ``...``. + completion_signal: str = DEFAULT_COMPLETION_SIGNAL + # Stop the run when the configured promise payload is observed. + stop_on_completion_signal: bool = False @dataclass(slots=True) @@ -120,6 +127,7 @@ class RunState: failed: int = 0 timed_out_count: int = 0 started_at: datetime | None = None + promise_completed: bool = False _stop_event: threading.Event = field( default_factory=threading.Event, init=False, repr=False, compare=False @@ -134,27 +142,22 @@ def total(self) -> int: return self.completed + self.failed def __post_init__(self) -> None: - # Set initially: the run starts in an unpaused (resumed) state. self._resume_event.set() def request_stop(self) -> None: - """Signal the loop to stop after the current iteration.""" self._stop_event.set() self._resume_event.set() def request_pause(self) -> None: - """Pause the loop between iterations until resumed.""" self.status = RunStatus.PAUSED self._resume_event.clear() def request_resume(self) -> None: - """Resume a paused loop.""" self.status = RunStatus.RUNNING self._resume_event.set() @property def stop_requested(self) -> bool: - """Whether a stop has been requested.""" return self._stop_event.is_set() def wait_for_stop(self, timeout: float | None = None) -> bool: @@ -163,7 +166,6 @@ def wait_for_stop(self, timeout: float | None = None) -> bool: @property def paused(self) -> bool: - """Whether the run is currently paused.""" return not self._resume_event.is_set() def wait_for_unpause(self, timeout: float | None = None) -> bool: @@ -171,11 +173,9 @@ def wait_for_unpause(self, timeout: float | None = None) -> bool: return self._resume_event.wait(timeout=timeout) def mark_completed(self) -> None: - """Record a successful iteration.""" self.completed += 1 def mark_failed(self) -> None: - """Record a failed iteration.""" self.failed += 1 def mark_timed_out(self) -> None: diff --git a/src/ralphify/cli.py b/src/ralphify/cli.py index 6652d535..93bbd0c3 100644 --- a/src/ralphify/cli.py +++ b/src/ralphify/cli.py @@ -32,6 +32,8 @@ FIELD_ARGS, FIELD_COMMANDS, FIELD_CREDIT, + FIELD_COMPLETION_SIGNAL, + FIELD_STOP_ON_COMPLETION_SIGNAL, RALPH_MARKER, VALID_NAME_CHARS_MSG, parse_frontmatter, @@ -39,6 +41,7 @@ from ralphify._output import IS_WINDOWS from ralphify._run_types import ( Command, + DEFAULT_COMPLETION_SIGNAL, DEFAULT_COMMAND_TIMEOUT, RunConfig, RunState, @@ -56,13 +59,11 @@ def _exit_error(msg: str) -> NoReturn: - """Print an error in red and exit with code 1.""" _console.print(f"[red]{escape_markup(msg)}[/]") raise typer.Exit(1) def _is_nonempty_string(value: Any) -> bool: - """Return True if *value* is a non-empty string (after stripping whitespace).""" return isinstance(value, str) and bool(value.strip()) @@ -128,6 +129,10 @@ def _check_unique_name(name: str, seen: set[str], context: str) -> None: run: git log --oneline -5 args: - focus +# Optional early exit: uncomment both lines and have the agent emit +# COMPLETE when the task is done. +# completion_signal: COMPLETE +# stop_on_completion_signal: true --- You are an autonomous coding agent running in a loop. Each iteration @@ -146,6 +151,7 @@ def _check_unique_name(name: str, seen: set[str], context: str) -> None: - Implement one thing per iteration +- If you enable promise completion above, emit COMPLETE and exit - Run tests and fix failures before committing - Commit with a descriptive message and push """ @@ -202,7 +208,13 @@ def scaffold( help="Directory name. If omitted, creates RALPH.md in the current directory.", ), ) -> None: - """Scaffold a new ralph with a ready-to-customize template.""" + """Scaffold a new ralph with a ready-to-customize template. + + The template includes example commands and args plus an optional + completion-signal path you can enable with ``completion_signal`` and + ``stop_on_completion_signal`` when the agent should stop early by + emitting a matching ``...`` tag. + """ if name: target_dir = Path.cwd() / name target_dir.mkdir(exist_ok=True) @@ -219,6 +231,13 @@ def scaffold( _console.print( f"[dim]Edit the file, then run:[/] ralph run {escape_markup(name or '.')}" ) + _console.print( + "[dim]Optional early exit:[/] uncomment " + f"{escape_markup(FIELD_COMPLETION_SIGNAL)} + " + f"{escape_markup(FIELD_STOP_ON_COMPLETION_SIGNAL)} " + "and emit " + f"{escape_markup('COMPLETE')}." + ) def _parse_user_args( @@ -428,6 +447,36 @@ def _validate_credit(raw_credit: Any) -> bool: return raw_credit +def _validate_completion_signal(raw_signal: Any) -> str: + """Validate the inner ``...`` text from frontmatter.""" + if raw_signal is None: + return DEFAULT_COMPLETION_SIGNAL + if not _is_nonempty_string(raw_signal): + _exit_error(f"'{FIELD_COMPLETION_SIGNAL}' must be a non-empty string.") + if raw_signal != raw_signal.strip(): + _exit_error( + f"'{FIELD_COMPLETION_SIGNAL}' must not include leading or trailing whitespace." + ) + if "<" in raw_signal or ">" in raw_signal: + _exit_error( + f"'{FIELD_COMPLETION_SIGNAL}' must be the text inside " + "..., not markup or a raw output fragment. " + "Example: completion_signal: COMPLETE" + ) + return raw_signal + + +def _validate_stop_on_completion_signal(raw_value: Any) -> bool: + """Validate the stop-on-completion-signal frontmatter field.""" + if raw_value is None: + return False + if not isinstance(raw_value, bool): + _exit_error( + f"'{FIELD_STOP_ON_COMPLETION_SIGNAL}' must be true or false, got {raw_value!r}." + ) + return raw_value + + def _validate_run_options( max_iterations: int | None, delay: float, @@ -471,6 +520,10 @@ def _build_run_config( ralph_args = _parse_user_args(extra_args, declared_names) credit = _validate_credit(fm.get(FIELD_CREDIT)) + completion_signal = _validate_completion_signal(fm.get(FIELD_COMPLETION_SIGNAL)) + stop_on_completion_signal = _validate_stop_on_completion_signal( + fm.get(FIELD_STOP_ON_COMPLETION_SIGNAL) + ) return RunConfig( agent=agent, @@ -485,6 +538,8 @@ def _build_run_config( log_dir=Path(log_dir) if log_dir else None, project_root=Path.cwd(), credit=credit, + completion_signal=completion_signal, + stop_on_completion_signal=stop_on_completion_signal, ) @@ -534,6 +589,10 @@ def run( passed as user arguments. Use {{ args.name }} placeholders in RALPH.md to reference them. + To stop before the iteration budget, set ``completion_signal`` and + ``stop_on_completion_signal`` in frontmatter and have the agent emit + the matching ``...`` tag. + Keybindings (interactive terminal): p Toggle live peek of agent output (on by default) P Enter full-screen peek — scroll the entire buffer diff --git a/src/ralphify/engine.py b/src/ralphify/engine.py index 1072ee05..5e3729e4 100644 --- a/src/ralphify/engine.py +++ b/src/ralphify/engine.py @@ -9,10 +9,12 @@ from __future__ import annotations +import json import shlex import traceback from datetime import datetime, timezone from pathlib import Path +from typing import Any from ralphify._agent import execute_agent from ralphify._events import ( @@ -37,6 +39,7 @@ parse_frontmatter, ) from ralphify._output import format_duration +from ralphify._promise import has_promise_completion from ralphify._run_types import ( Command, RunConfig, @@ -52,7 +55,6 @@ def _field_hint(field_name: str) -> str: - """Return a user-facing hint pointing to a frontmatter field.""" return f"Check the '{field_name}' field in your {RALPH_MARKER} frontmatter." @@ -103,8 +105,6 @@ def _run_commands( quoted_args = {k: shlex.quote(v) for k, v in user_args.items()} for cmd in commands: run_str = resolve_args(cmd.run, quoted_args) - # Determine working directory: if the command starts with ./ it's - # relative to the ralph directory, otherwise use project root. if run_str.lstrip().startswith(_RELATIVE_CMD_PREFIX): cwd = ralph_dir else: @@ -167,10 +167,10 @@ def _run_agent_phase( config: RunConfig, state: RunState, emit: BoundEmitter, -) -> bool: +) -> tuple[bool, bool]: """Run the agent subprocess, update state counters, and emit the result event. - Returns ``True`` when the agent exited successfully (code 0, no timeout). + Returns ``(agent_succeeded, stop_for_completion_signal)``. """ try: cmd = shlex.split(config.agent) @@ -179,12 +179,32 @@ def _run_agent_phase( f"Invalid agent command syntax: {config.agent!r}. {_field_hint(FIELD_AGENT)}" ) from exc - # Option C: recheck per-line so mid-iteration peek toggle takes effect. - # When neither peek nor logging needs output, pass None so the blocking - # path can inherit file descriptors (critical-01 contract). + completion_signal = config.completion_signal + promise_completed_from_output = False + promise_scan_tail = "" + promise_scan_limit = max(4096, len(completion_signal) * 4 + 64) + + def _record_promise_fragment(fragment: str) -> None: + nonlocal promise_completed_from_output, promise_scan_tail + if promise_completed_from_output or not fragment: + return + promise_scan_tail = (promise_scan_tail + fragment)[-promise_scan_limit:] + promise_completed_from_output = has_promise_completion( + promise_scan_tail, completion_signal + ) + def _on_output_line(line: str, stream: OutputStream) -> None: if emit.wants_agent_output_lines(): emit.agent_output_line(line, stream, state.iteration) + if stream != "stdout": + return + stripped = line.strip() + if not stripped: + return + try: + json.loads(stripped) + except json.JSONDecodeError: + _record_promise_fragment(f"{line}\n") if emit.wants_agent_output_lines() or config.log_dir is not None: on_output_line = _on_output_line @@ -192,17 +212,22 @@ def _on_output_line(line: str, stream: OutputStream) -> None: on_output_line = None try: + + def on_activity(data: dict[str, Any]) -> None: + emit( + EventType.AGENT_ACTIVITY, + AgentActivityData(raw=data, iteration=state.iteration), + ) + agent = execute_agent( cmd, prompt, timeout=config.timeout, log_dir=config.log_dir, iteration=state.iteration, - on_activity=lambda data: emit( - EventType.AGENT_ACTIVITY, - AgentActivityData(raw=data, iteration=state.iteration), - ), + on_activity=on_activity, on_output_line=on_output_line, + capture_result_text=True, ) except FileNotFoundError as exc: raise FileNotFoundError( @@ -210,6 +235,15 @@ def _on_output_line(line: str, stream: OutputStream) -> None: ) from exc duration = format_duration(agent.elapsed) + completion_text = ( + agent.result_text if agent.result_text is not None else agent.captured_stdout + ) + promise_completed = agent.success and ( + promise_completed_from_output + or has_promise_completion(completion_text, completion_signal) + ) + if promise_completed: + state.promise_completed = True if agent.timed_out: state.mark_timed_out() @@ -218,7 +252,13 @@ def _on_output_line(line: str, stream: OutputStream) -> None: elif agent.success: state.mark_completed() event_type = EventType.ITERATION_COMPLETED - state_detail = f"completed ({duration})" + if promise_completed: + state_detail = ( + "completed via promise tag " + f"{completion_signal} ({duration})" + ) + else: + state_detail = f"completed ({duration})" else: state.mark_failed() event_type = EventType.ITERATION_FAILED @@ -233,32 +273,36 @@ def _on_output_line(line: str, stream: OutputStream) -> None: log_file=str(agent.log_file) if agent.log_file else None, result_text=agent.result_text, ) - # When logging captured output and peek was off (lines were not rendered - # live), include captured output so the emitter can echo it after - # stopping the Live spinner. When peek was on, lines were already shown. - if not emit.wants_agent_output_lines() and config.log_dir is not None: - ended_data["echo_stdout"] = agent.captured_stdout - ended_data["echo_stderr"] = agent.captured_stderr + if not emit.wants_agent_output_lines(): + # When peek was off, echo any captured raw output after the spinner + # stops so blocking agents do not appear silent. Structured agents + # already surface their parsed result_text, so avoid echoing raw JSON + # unless we explicitly captured logs. + if config.log_dir is not None: + ended_data["echo_stdout"] = agent.captured_stdout + ended_data["echo_stderr"] = agent.captured_stderr + elif agent.result_text is None and agent.captured_stdout is not None: + ended_data["echo_stdout"] = agent.captured_stdout emit(event_type, ended_data) - return agent.success + return agent.success, promise_completed and config.stop_on_completion_signal def _run_iteration( config: RunConfig, state: RunState, emit: BoundEmitter, -) -> bool: +) -> tuple[bool, bool]: """Execute one iteration of the agent loop. - Returns ``True`` if the loop should continue, ``False`` when - ``--stop-on-error`` triggers. + Returns (should_continue, stop_for_completion_signal): + - should_continue: True if the loop should continue, False to break + - stop_for_completion_signal: True if a completion signal ended the run early """ iteration = state.iteration emit(EventType.ITERATION_STARTED, IterationStartedData(iteration=iteration)) - # Run commands and collect outputs for placeholder resolution command_outputs: dict[str, str] = {} if config.commands: emit( @@ -279,22 +323,22 @@ def _run_iteration( ), ) - # Assemble prompt prompt = _assemble_prompt(config, state, command_outputs) emit( EventType.PROMPT_ASSEMBLED, PromptAssembledData(iteration=iteration, prompt_length=len(prompt)), ) - # Run agent - agent_succeeded = _run_agent_phase(prompt, config, state, emit) + agent_succeeded, stop_for_completion_signal = _run_agent_phase( + prompt, config, state, emit + ) if not agent_succeeded and config.stop_on_error: state.status = RunStatus.FAILED emit.log_error("Stopping due to --stop-on-error.") - return False + return False, stop_for_completion_signal - return True + return True, stop_for_completion_signal def _delay_if_needed(config: RunConfig, state: RunState, emit: BoundEmitter) -> None: @@ -346,14 +390,19 @@ def run_loop( if not _handle_control_signals(state, emit): break - state.iteration += 1 if ( config.max_iterations is not None - and state.iteration > config.max_iterations + and state.iteration >= config.max_iterations ): break + state.iteration += 1 - should_continue = _run_iteration(config, state, emit) + should_continue, stop_for_completion_signal = _run_iteration( + config, state, emit + ) + if stop_for_completion_signal: + state.status = RunStatus.COMPLETED + break if not should_continue: break diff --git a/tests/test_agent.py b/tests/test_agent.py index f3358448..58520141 100644 --- a/tests/test_agent.py +++ b/tests/test_agent.py @@ -365,6 +365,32 @@ def test_timeout_captures_partial_output(self, mock_popen, tmp_path): assert result.captured_stdout == "partial stdout" assert result.captured_stderr == "partial stderr" + @patch(MOCK_SUBPROCESS) + def test_capture_result_text_does_not_buffer_blocking_output_without_log_dir( + self, mock_popen + ): + mock_popen.return_value = ok_proc( + stdout_text="done\n", + stderr_text="some stderr\n", + ) + result = _run_agent_blocking( + ["echo"], + "prompt", + timeout=None, + log_dir=None, + iteration=1, + capture_result_text=True, + ) + + assert result.captured_stdout is None + assert result.captured_stderr is None + assert result.result_text is None + + call_kwargs = mock_popen.call_args[1] + assert call_kwargs.get("stdin") == subprocess.PIPE + assert call_kwargs.get("stdout") == subprocess.PIPE + assert call_kwargs.get("stderr") is None + @patch(MOCK_SUBPROCESS, side_effect=ok_proc) def test_no_log_when_dir_not_set(self, mock_popen): result = execute_agent( @@ -400,19 +426,22 @@ def test_timed_out(self): assert result.timed_out is True assert result.returncode is None - def test_success_when_zero_exit(self): - result = AgentResult(returncode=0, elapsed=1.0, log_file=None) - assert result.success is True - - def test_not_success_when_nonzero_exit(self): - result = AgentResult(returncode=1, elapsed=1.0, log_file=None) - assert result.success is False - - def test_not_success_when_timed_out(self): - result = AgentResult( - returncode=None, elapsed=5.0, log_file=None, timed_out=True - ) - assert result.success is False + @pytest.mark.parametrize( + ("result", "expected"), + [ + (AgentResult(returncode=0, elapsed=1.0, log_file=None), True), + (AgentResult(returncode=1, elapsed=1.0, log_file=None), False), + ( + AgentResult( + returncode=None, elapsed=5.0, log_file=None, timed_out=True + ), + False, + ), + ], + ids=["zero-exit", "nonzero-exit", "timed-out"], + ) + def test_success(self, result, expected): + assert result.success is expected class TestExecuteAgentDispatch: @@ -438,10 +467,144 @@ def test_dispatches_to_streaming_for_claude(self, mock_popen, monkeypatch): assert result.result_text == "done" mock_popen.assert_called_once() + def test_execute_agent_passes_capture_result_text_to_streaming_helper( + self, monkeypatch + ): + on_activity = MagicMock() + on_output_line = MagicMock() + fake_streaming = MagicMock(return_value=AgentResult(returncode=0, elapsed=0.01)) + + monkeypatch.setattr("ralphify._agent._supports_stream_json", lambda cmd: True) + monkeypatch.setattr("ralphify._agent._run_agent_streaming", fake_streaming) + + execute_agent( + ["claude", "-p"], + "prompt", + timeout=None, + log_dir=None, + iteration=1, + on_activity=on_activity, + on_output_line=on_output_line, + capture_result_text=True, + ) + + fake_streaming.assert_called_once_with( + ["claude", "-p"], + "prompt", + None, + None, + 1, + on_activity=on_activity, + on_output_line=on_output_line, + capture_result_text=True, + capture_stdout=False, + ) + + def test_execute_agent_passes_capture_result_text_to_blocking_helper( + self, monkeypatch + ): + on_output_line = MagicMock() + fake_blocking = MagicMock(return_value=AgentResult(returncode=0, elapsed=0.01)) + + monkeypatch.setattr("ralphify._agent._supports_stream_json", lambda cmd: False) + monkeypatch.setattr("ralphify._agent._run_agent_blocking", fake_blocking) + + execute_agent( + ["echo"], + "prompt", + timeout=None, + log_dir=None, + iteration=1, + on_output_line=on_output_line, + capture_result_text=True, + ) + + fake_blocking.assert_called_once_with( + ["echo"], + "prompt", + None, + None, + 1, + on_output_line=on_output_line, + capture_result_text=True, + capture_stdout=False, + ) + class TestExecuteAgentStreaming: """Tests for the streaming execution path (_run_agent_streaming).""" + @patch(MOCK_SUBPROCESS) + def test_streaming_result_event_populates_result_text(self, mock_popen): + mock_popen.return_value = make_mock_popen( + stdout_lines='{"type": "result", "result": "early done"}\n', + returncode=0, + ) + result = _run_agent_streaming( + ["claude", "-p"], + "prompt", + timeout=10, + log_dir=None, + iteration=1, + ) + assert result.result_text == "early done" + assert result.returncode == 0 + assert result.timed_out is False + + @patch(MOCK_SUBPROCESS) + def test_blocking_result_event_populates_result_text_when_captured( + self, mock_popen, tmp_path + ): + mock_popen.return_value = make_mock_popen( + stdout_lines='{"type": "result", "result": "early done"}\n', + returncode=0, + ) + result = _run_agent_blocking( + ["claude", "-p"], + "prompt", + timeout=10, + log_dir=tmp_path, + iteration=1, + ) + + assert result.result_text == "early done" + assert result.returncode == 0 + assert result.timed_out is False + + @patch(MOCK_SUBPROCESS) + def test_result_text_absent_when_no_result_event(self, mock_popen): + mock_popen.return_value = make_mock_popen( + stdout_lines="status: working\n", + returncode=0, + ) + result = _run_agent_streaming( + ["claude", "-p"], + "prompt", + timeout=10, + log_dir=None, + iteration=1, + ) + assert result.result_text is None + assert result.returncode == 0 + assert result.timed_out is False + + @patch(MOCK_SUBPROCESS) + def test_last_result_event_wins(self, mock_popen): + mock_popen.return_value = make_mock_popen( + stdout_lines='{"type": "result", "result": "first"}\n{"type": "result", "result": "second"}\n', + returncode=0, + ) + result = _run_agent_streaming( + ["claude", "-p"], + "prompt", + timeout=10, + log_dir=None, + iteration=1, + ) + assert result.result_text == "second" + assert result.returncode == 0 + assert result.timed_out is False + @patch(MOCK_SUBPROCESS) def test_success(self, mock_popen): mock_popen.return_value = make_mock_popen( @@ -566,6 +729,41 @@ def test_captured_output_set_when_logging(self, mock_popen, tmp_path): assert result.captured_stdout == "agent output\n" assert result.captured_stderr == "some stderr\n" + @patch(MOCK_SUBPROCESS) + def test_capture_result_text_does_not_buffer_stream_output_without_log_dir( + self, mock_popen + ): + mock_popen.return_value = make_mock_popen( + stdout_lines="done\n", + stderr_text="some stderr\n", + returncode=0, + ) + result = _run_agent_streaming( + ["claude", "-p"], + "prompt", + timeout=None, + log_dir=None, + iteration=1, + capture_result_text=True, + ) + + assert result.captured_stdout is None + assert result.captured_stderr is None + assert result.result_text is None + + call_args = mock_popen.call_args + assert call_args.args[0] == [ + "claude", + "-p", + "--output-format", + "stream-json", + "--verbose", + ] + call_kwargs = call_args[1] + assert call_kwargs.get("stdin") == subprocess.PIPE + assert call_kwargs.get("stdout") == subprocess.PIPE + assert call_kwargs.get("stderr") is None + @patch(MOCK_SUBPROCESS) def test_no_log_when_dir_not_set(self, mock_popen): mock_popen.return_value = make_mock_popen(returncode=0) diff --git a/tests/test_cli.py b/tests/test_cli.py index 79e224ea..650026f1 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -1,6 +1,7 @@ """Tests for the CLI.""" import importlib +import re import signal from unittest.mock import patch, MagicMock @@ -24,6 +25,31 @@ runner = CliRunner() +def _flatten_help(output: str) -> str: + no_ansi = re.sub(r"\x1b\[[0-9;]*[a-zA-Z]", "", output) + return re.sub(r"\s+", " ", no_ansi) + + +class TestHelp: + def test_scaffold_help_mentions_promise_completion(self): + result = runner.invoke(app, ["scaffold", "--help"]) + + assert result.exit_code == 0 + flat = _flatten_help(result.output) + assert "completion_signal" in flat + assert "stop_on_completion_signal" in flat + assert "..." in flat + + def test_run_help_mentions_promise_completion(self): + result = runner.invoke(app, ["run", "--help"]) + + assert result.exit_code == 0 + flat = _flatten_help(result.output) + assert "completion_signal" in flat + assert "stop_on_completion_signal" in flat + assert "..." in flat + + class TestVersion: @pytest.mark.parametrize("flag", ["--version", "-V"]) def test_version_flag(self, flag): @@ -95,6 +121,103 @@ def test_errors_with_malformed_agent_field(self, mock_which, tmp_path, monkeypat assert result.exit_code == 1 assert "malformed" in result.output.lower() + def test_run_uses_default_completion_signal_config( + self, mock_which, tmp_path, monkeypatch + ): + monkeypatch.chdir(tmp_path) + ralph_dir = make_ralph(tmp_path) + with patch("ralphify.cli.run_loop") as mock_run_loop: + result = runner.invoke(app, ["run", str(ralph_dir), "-n", "3"]) + + assert result.exit_code == 0 + mock_run_loop.assert_called_once() + config = mock_run_loop.call_args.args[0] + assert config.completion_signal == "RALPH_PROMISE_COMPLETE" + assert config.stop_on_completion_signal is False + assert config.max_iterations == 3 + + def test_run_passes_completion_signal_frontmatter_to_config( + self, mock_which, tmp_path, monkeypatch + ): + monkeypatch.chdir(tmp_path) + ralph_dir = tmp_path / "my-ralph" + ralph_dir.mkdir() + (ralph_dir / RALPH_MARKER).write_text( + "---\n" + "agent: claude -p --dangerously-skip-permissions\n" + "completion_signal: CUSTOM_DONE\n" + "stop_on_completion_signal: true\n" + "---\n" + "go" + ) + with patch("ralphify.cli.run_loop") as mock_run_loop: + result = runner.invoke(app, ["run", str(ralph_dir), "-n", "2"]) + + assert result.exit_code == 0 + mock_run_loop.assert_called_once() + config = mock_run_loop.call_args.args[0] + assert config.completion_signal == "CUSTOM_DONE" + assert config.stop_on_completion_signal is True + + @pytest.mark.parametrize( + ("frontmatter_line", "expected_error"), + [ + ("completion_signal: 0", "must be a non-empty string"), + ( + 'completion_signal: " CUSTOM_DONE "', + "must not include leading or trailing whitespace", + ), + ( + 'completion_signal: "CUSTOM_DONE"', + "must be the text inside ...", + ), + ], + ids=["wrong-type", "surrounding-whitespace", "markup-instead-of-text"], + ) + def test_run_rejects_invalid_completion_signal_frontmatter( + self, mock_which, tmp_path, monkeypatch, frontmatter_line, expected_error + ): + monkeypatch.chdir(tmp_path) + ralph_dir = tmp_path / "my-ralph" + ralph_dir.mkdir() + (ralph_dir / RALPH_MARKER).write_text( + "---\n" + "agent: claude -p --dangerously-skip-permissions\n" + f"{frontmatter_line}\n" + "---\n" + "go" + ) + + with patch("ralphify.cli.run_loop") as mock_run_loop: + result = runner.invoke(app, ["run", str(ralph_dir), "-n", "1"]) + + assert result.exit_code == 1 + assert "completion_signal" in result.output.lower() + assert expected_error in result.output.lower() + mock_run_loop.assert_not_called() + + def test_run_rejects_non_boolean_stop_on_completion_signal( + self, mock_which, tmp_path, monkeypatch + ): + monkeypatch.chdir(tmp_path) + ralph_dir = tmp_path / "my-ralph" + ralph_dir.mkdir() + (ralph_dir / RALPH_MARKER).write_text( + "---\n" + "agent: claude -p --dangerously-skip-permissions\n" + 'stop_on_completion_signal: "maybe"\n' + "---\n" + "go" + ) + + with patch("ralphify.cli.run_loop") as mock_run_loop: + result = runner.invoke(app, ["run", str(ralph_dir), "-n", "1"]) + + assert result.exit_code == 1 + assert "stop_on_completion_signal" in result.output.lower() + assert "must be true or false" in result.output.lower() + mock_run_loop.assert_not_called() + @pytest.mark.parametrize( "frontmatter, expected_error", [ @@ -517,6 +640,16 @@ def test_creates_ralph_with_name(self, tmp_path, monkeypatch): assert ralph_file.exists() assert "Created" in result.output + def test_scaffold_output_mentions_promise_completion(self, tmp_path, monkeypatch): + monkeypatch.chdir(tmp_path) + + result = runner.invoke(app, ["scaffold", "my-task"]) + + assert result.exit_code == 0 + assert "completion_signal" in result.output + assert "stop_on_completion_signal" in result.output + assert "COMPLETE" in result.output + def test_creates_ralph_in_cwd(self, tmp_path, monkeypatch): monkeypatch.chdir(tmp_path) result = runner.invoke(app, ["scaffold"]) @@ -556,6 +689,16 @@ def test_template_has_valid_frontmatter(self, tmp_path, monkeypatch): assert "{{ commands.git-log }}" in body assert "{{ args.focus }}" in body + def test_template_mentions_promise_completion_path(self, tmp_path, monkeypatch): + monkeypatch.chdir(tmp_path) + + runner.invoke(app, ["scaffold", "my-task"]) + content = (tmp_path / "my-task" / RALPH_MARKER).read_text() + + assert "# completion_signal: COMPLETE" in content + assert "# stop_on_completion_signal: true" in content + assert "COMPLETE" in content + class TestParseUserArgs: def test_named_flag(self): diff --git a/tests/test_engine.py b/tests/test_engine.py index b05f8d64..a6e3956c 100644 --- a/tests/test_engine.py +++ b/tests/test_engine.py @@ -20,6 +20,7 @@ ) from rich.console import Console +from ralphify._agent import AgentResult from ralphify._console_emitter import ConsoleEmitter from ralphify._events import BoundEmitter, EventType, NullEmitter, QueueEmitter from ralphify._run_types import Command, RunStatus @@ -126,6 +127,234 @@ def test_log_dir_creates_files(self, mock_run, tmp_path): assert log_files[1].name.startswith("002_") +class TestPromiseCompletionSignals: + @patch("ralphify.engine.execute_agent") + def test_tagged_promise_does_not_stop_by_default( + self, mock_execute_agent, tmp_path + ): + config = make_config(tmp_path, max_iterations=3) + state = make_state() + emitter = NullEmitter() + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + captured_stdout="RALPH_PROMISE_COMPLETE\n", + ) + + run_loop(config, state, emitter) + + assert mock_execute_agent.call_count == 3 + assert state.completed == 3 + assert state.failed == 0 + assert state.status == RunStatus.COMPLETED + assert state.promise_completed is True + assert [ + call.kwargs["on_output_line"] for call in mock_execute_agent.call_args_list + ] == [None, None, None] + assert [ + call.kwargs["capture_result_text"] + for call in mock_execute_agent.call_args_list + ] == [True, True, True] + + @patch("ralphify.engine.execute_agent") + def test_tagged_promise_stops_early_when_enabled( + self, mock_execute_agent, tmp_path + ): + config = make_config( + tmp_path, + max_iterations=5, + stop_on_completion_signal=True, + ) + state = make_state() + emitter = QueueEmitter() + emitter.wants_agent_output_lines = lambda: True + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + result_text="RALPH_PROMISE_COMPLETE", + ) + + run_loop(config, state, emitter) + + mock_execute_agent.assert_called_once() + assert mock_execute_agent.call_args.kwargs["capture_result_text"] is True + assert state.iteration == 1 + assert state.completed == 1 + assert state.failed == 0 + assert state.total == 1 + assert state.status == RunStatus.COMPLETED + assert state.promise_completed is True + + events = drain_events(emitter) + completed_events = events_of_type(events, EventType.ITERATION_COMPLETED) + assert len(completed_events) == 1 + stop_event = events_of_type(events, EventType.RUN_STOPPED)[0] + assert stop_event.data["reason"] == "completed" + assert stop_event.data["total"] == 1 + assert stop_event.data["completed"] == 1 + assert stop_event.data["failed"] == 0 + assert stop_event.data["timed_out_count"] == 0 + + @patch("ralphify.engine.execute_agent") + def test_custom_promise_text_matches_inner_tag_text( + self, mock_execute_agent, tmp_path + ): + config = make_config( + tmp_path, + max_iterations=4, + completion_signal="CUSTOM_DONE", + stop_on_completion_signal=True, + ) + state = make_state() + emitter = QueueEmitter() + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + result_text="CUSTOM_DONE", + ) + + run_loop(config, state, emitter) + + mock_execute_agent.assert_called_once() + assert state.iteration == 1 + assert state.completed == 1 + assert state.failed == 0 + assert state.status == RunStatus.COMPLETED + assert state.promise_completed is True + + events = drain_events(emitter) + stop_event = events_of_type(events, EventType.RUN_STOPPED)[0] + assert stop_event.data["reason"] == "completed" + assert stop_event.data["total"] == 1 + assert stop_event.data["completed"] == 1 + + @patch("ralphify.engine.execute_agent") + def test_promise_tag_normalizes_inner_whitespace_before_matching( + self, mock_execute_agent, tmp_path + ): + config = make_config( + tmp_path, + max_iterations=4, + completion_signal="CUSTOM DONE", + stop_on_completion_signal=True, + ) + state = make_state() + + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + result_text="\n CUSTOM\tDONE \n", + ) + + run_loop(config, state, NullEmitter()) + + assert mock_execute_agent.call_count == 1 + assert state.iteration == 1 + assert state.completed == 1 + assert state.failed == 0 + assert state.total == 1 + assert state.status == RunStatus.COMPLETED + assert state.promise_completed is True + + @patch("ralphify.engine.execute_agent") + def test_untagged_raw_text_does_not_match_completion_signal( + self, mock_execute_agent, tmp_path + ): + config = make_config( + tmp_path, + max_iterations=3, + stop_on_completion_signal=True, + ) + state = make_state() + + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + result_text="done RALPH_PROMISE_COMPLETE without promise tags", + ) + + run_loop(config, state, NullEmitter()) + + assert mock_execute_agent.call_count == 3 + assert state.completed == 3 + assert state.failed == 0 + assert state.status == RunStatus.COMPLETED + assert state.promise_completed is False + + @patch("ralphify.engine.execute_agent") + def test_different_tagged_promise_text_does_not_match( + self, mock_execute_agent, tmp_path + ): + config = make_config( + tmp_path, + max_iterations=3, + completion_signal="CUSTOM_DONE", + stop_on_completion_signal=True, + ) + state = make_state() + + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + result_text="CUSTOM_DONE_NOW", + ) + + run_loop(config, state, NullEmitter()) + + assert mock_execute_agent.call_count == 3 + assert state.completed == 3 + assert state.failed == 0 + assert state.status == RunStatus.COMPLETED + assert state.promise_completed is False + + @patch("ralphify.engine.execute_agent") + def test_structured_agents_ignore_raw_stdout_for_promise_detection( + self, mock_execute_agent, tmp_path + ): + config = make_config( + tmp_path, + max_iterations=2, + stop_on_completion_signal=True, + ) + state = make_state() + + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + result_text="done without promise tag", + captured_stdout='{"type":"status","message":"RALPH_PROMISE_COMPLETE"}\n', + ) + + run_loop(config, state, NullEmitter()) + + assert mock_execute_agent.call_count == 2 + assert state.completed == 2 + assert state.failed == 0 + assert state.status == RunStatus.COMPLETED + assert state.promise_completed is False + + @patch("ralphify.engine.execute_agent") + def test_blocking_captured_stdout_is_echoed_when_peek_is_off( + self, mock_execute_agent, tmp_path + ): + config = make_config(tmp_path, max_iterations=1) + state = make_state() + emitter = QueueEmitter() + + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + captured_stdout="plain blocking output\n", + ) + + run_loop(config, state, emitter) + + completed_event = events_of_type( + drain_events(emitter), EventType.ITERATION_COMPLETED + )[0] + assert completed_event.data["echo_stdout"] == "plain blocking output\n" + + class TestRunLoopDefaults: @patch(MOCK_SUBPROCESS, side_effect=ok_proc) def test_runs_without_emitter(self, mock_run, tmp_path): diff --git a/tests/test_promise.py b/tests/test_promise.py new file mode 100644 index 00000000..55125e34 --- /dev/null +++ b/tests/test_promise.py @@ -0,0 +1,47 @@ +"""Tests for strict promise-tag parsing.""" + +import pytest + +from ralphify._promise import has_promise_completion, parse_promise_tags + + +class TestParsePromiseTags: + @pytest.mark.parametrize( + "raw_text", [None, "", "plain text", "missing close"] + ) + def test_parse_promise_tags_invalid_input_returns_empty_list(self, raw_text): + assert parse_promise_tags(raw_text) == [] + + def test_parse_promise_tags_normalizes_whitespace_and_preserves_unicode(self): + text = ( + "before " + "\n CUSTOM\tDONE \n " + "middle " + "✅ shipped" + ) + + assert parse_promise_tags(text) == ["CUSTOM DONE", "✅ shipped"] + + +class TestHasPromiseCompletion: + def test_has_promise_completion_matches_only_exact_tag_payload(self): + text = ( + "raw CUSTOM_DONE text " + "CUSTOM_DONE_NOW " + "CUSTOM_DONE" + ) + + assert has_promise_completion(text, "CUSTOM_DONE") is True + assert has_promise_completion(text, "CUSTOM_DONE_NOW") is True + assert has_promise_completion(text, "CUSTOM") is False + + def test_has_promise_completion_ignores_wrong_case_and_malformed_tags(self): + text = "CUSTOM_DONECUSTOM_DONE" + + assert has_promise_completion(text, "CUSTOM_DONE") is False + + def test_has_promise_completion_normalizes_completion_signal_whitespace(self): + text = "\n CUSTOM\tDONE \n" + + assert has_promise_completion(text, "CUSTOM DONE") is True + assert has_promise_completion(text, "CUSTOM\tDONE") is True From 8c4882107b60d1fb9a8dd611acbccbe2c156c991 Mon Sep 17 00:00:00 2001 From: Paul Sorensen Date: Sat, 18 Apr 2026 20:02:16 -0700 Subject: [PATCH 2/4] feat: introduce CLI adapter layer with Claude, Codex, Copilot adapters (#6) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat: seed CLI adapter layer with Claude, Codex, Copilot adapters Introduces a pluggable CLIAdapter Protocol and lands the three concrete adapters that exercise it, so the abstraction ships with demonstrated value on day one. - src/ralphify/adapters/__init__.py — CLIAdapter Protocol, AdapterEvent, first-match ADAPTERS registry, select_adapter() dispatch with GenericAdapter fallback. - src/ralphify/adapters/_generic.py — blocking-path fallback adapter. - src/ralphify/adapters/claude.py — stem "claude", --output-format stream-json --verbose, parses tool_use blocks in assistant messages, extracts completion signal from terminal result event. - src/ralphify/adapters/codex.py — stem "codex", --json, maps CollabToolCall/McpToolCall/CommandExecution to tool_use events, scans TurnCompleted/TaskComplete for promise tags. - src/ralphify/adapters/copilot.py — stem "copilot" (not "gh"), --output-format json, renders_structured=False, supports_soft_windown =False. install_wind_down_hook is part of the Protocol surface but every concrete adapter currently raises NotImplementedError — the actual hook plumbing (soft wind-down, AgentHook fanout, max_turns enforcement) lands in subsequent PRs. This PR is pure additions with no engine wiring: adapters are registered and discoverable, but the run loop still uses the old hard-coded Claude path. Tests: 50 new tests covering each adapter's event parsing, command building, completion-signal extraction, and registry dispatch. Co-Authored-By: WOZCODE * feat: route Claude detection and promise completion through adapter Wire the CLI adapter layer into the three sites that previously hard-coded Claude knowledge: - _agent.py: execute_agent resolves the adapter (or accepts one from the caller) and delegates flag construction to adapter.build_command. _run_agent_streaming no longer appends --output-format/--verbose itself — the cmd it receives is already flagged. - _console_emitter.py: startup-hint text picks "live activity on" vs "live output on" via adapter.renders_structured instead of a claude-binary check. - engine.py: promise completion runs through adapter.extract_completion_signal(captured_stdout, signal). ClaudeAdapter scans the terminal result event only; GenericAdapter scans full stdout to preserve behavior for unknown CLIs. The engine-side _record_promise_fragment incremental scanner is gone. Tightens ClaudeAdapter.extract_completion_signal to return False when no result event is present so embedded tags inside status or assistant JSON can't trigger early completion. * fix: address copilot review feedback on adapter seed PR - Rename `supports_soft_windown` → `supports_soft_wind_down` across the Protocol and all adapters to match the `install_wind_down_hook` / "wind-down" terminology used elsewhere (caught before it baked into the public adapter surface). - Claude and Copilot `build_command()` now detect the `--output-format` flag pair and overwrite a user-supplied value instead of treating the flag and value tokens independently (which produced invalid argv when a caller pre-set `--output-format `). - Align Claude `parse_event()` docstring with actual behavior: `result` events return `kind="result"` and non-assistant events return `kind="message"` rather than `None`. - Flip Codex `renders_structured` to False so the peek panel stays in raw-line mode until the emitter is adapter-driven; Codex's streaming path currently feeds an `_IterationPanel` that only understands Claude's stream-json schema. * refactor: split renders_structured into supports_streaming and renders_structured_peek Addresses the architectural concern in PR #6 review: the old `renders_structured` flag conflated two orthogonal capabilities: 1. Whether the adapter emits a structured JSON stream that the execution path can parse into on_activity callbacks. 2. Whether the console peek panel understands the adapter's event schema and should render it via `_IterationPanel` instead of raw lines. Before the split, an adapter had to choose between a useful streaming path with an empty peek panel (Codex's previous state) or a working raw peek with no activity callbacks. The split makes the correct combination representable: - ClaudeAdapter: supports_streaming=True, renders_structured_peek=True - CodexAdapter: supports_streaming=True, renders_structured_peek=False - CopilotAdapter / GenericAdapter: both False Codex now takes the streaming path again (on_activity callbacks fire) while the peek panel stays in raw-line mode until the emitter is adapter-driven. Callers updated: - _agent.execute_agent dispatches on adapter.supports_streaming - _console_emitter consumes adapter.renders_structured_peek via _agent_renders_structured_peek (renamed) - tests/conftest autouse fixture forces both flags to False * perf: skip stdout capture when adapter exposes result_text The previous adapter seed forced capture_stdout=True on every iteration so the engine could feed the buffered transcript into adapter.extract_completion_signal. For verbose streaming agents this regressed memory vs the prior tail-scan approach, and Claude in particular was paying for a full transcript buffer it did not need: agent.result_text already carries the terminal assistant message. Add a requires_full_stdout_for_completion capability flag and a keyword-only extract_completion_signal(result_text=, stdout=, user_signal=) signature so each adapter declares its own input. Claude sets the flag False and reads result_text directly; Codex / Copilot / Generic keep the full-stdout path. The engine now gates capture on log_dir or (stop_on_completion_signal AND requires_full_stdout_for_completion), so streaming agents skip the buffer entirely unless logging is on, and non-streaming agents only buffer when the user opts into completion signalling. Co-Authored-By: Claude Opus 4.7 --------- Co-authored-by: WOZCODE Co-authored-by: Claude Opus 4.7 --- src/ralphify/_agent.py | 60 ++++----- src/ralphify/_console_emitter.py | 22 ++-- src/ralphify/adapters/__init__.py | 158 ++++++++++++++++++++++++ src/ralphify/adapters/_generic.py | 70 +++++++++++ src/ralphify/adapters/claude.py | 172 ++++++++++++++++++++++++++ src/ralphify/adapters/codex.py | 198 ++++++++++++++++++++++++++++++ src/ralphify/adapters/copilot.py | 151 +++++++++++++++++++++++ src/ralphify/engine.py | 47 +++---- tests/conftest.py | 13 +- tests/test_adapters_claude.py | 183 +++++++++++++++++++++++++++ tests/test_adapters_codex.py | 159 ++++++++++++++++++++++++ tests/test_adapters_copilot.py | 129 +++++++++++++++++++ tests/test_adapters_registry.py | 43 +++++++ tests/test_agent.py | 69 ++++------- tests/test_console_emitter.py | 43 ++++--- tests/test_engine.py | 96 ++++++++++++++- 16 files changed, 1474 insertions(+), 139 deletions(-) create mode 100644 src/ralphify/adapters/__init__.py create mode 100644 src/ralphify/adapters/_generic.py create mode 100644 src/ralphify/adapters/claude.py create mode 100644 src/ralphify/adapters/codex.py create mode 100644 src/ralphify/adapters/copilot.py create mode 100644 tests/test_adapters_claude.py create mode 100644 tests/test_adapters_codex.py create mode 100644 tests/test_adapters_copilot.py create mode 100644 tests/test_adapters_registry.py diff --git a/src/ralphify/_agent.py b/src/ralphify/_agent.py index 6bf1a0da..6823cf3d 100644 --- a/src/ralphify/_agent.py +++ b/src/ralphify/_agent.py @@ -37,6 +37,7 @@ collect_output, warn, ) +from ralphify.adapters import CLIAdapter, select_adapter # ── Callback type aliases ────────────────────────────────────────────── # Used across the streaming and blocking execution paths for callbacks @@ -53,15 +54,6 @@ _STDOUT: OutputStream = "stdout" _STDERR: OutputStream = "stderr" -# Agent binary name that supports --output-format stream-json. -# Public because _console_emitter also needs this for display logic. -CLAUDE_BINARY = "claude" - -# CLI flags appended when streaming mode is used. -_OUTPUT_FORMAT_FLAG = "--output-format" -_STREAM_FORMAT = "stream-json" -_VERBOSE_FLAG = "--verbose" - # JSON stream event types and fields for result extraction. _RESULT_EVENT_TYPE = "result" _RESULT_FIELD = "result" @@ -288,19 +280,6 @@ def _write_log( return log_file -def _supports_stream_json(cmd: list[str]) -> bool: - """Return True if the agent command supports ``--output-format stream-json``. - - Currently only Claude Code supports this protocol. To add streaming - support for another agent, extend the check here — no other changes - needed since :func:`_run_agent_streaming` handles the protocol generically. - """ - if not cmd: - return False - binary = Path(cmd[0]).stem - return binary == CLAUDE_BINARY - - def _readline_pump( stdout: IO[str], line_queue: queue.Queue[str | None], @@ -444,16 +423,19 @@ def _run_agent_streaming( ) -> AgentResult: """Run the agent subprocess with line-by-line streaming of JSON output. - Used for agents that support ``--output-format stream-json`` (e.g. Claude - Code). Stream processing is delegated to :func:`_read_agent_stream`; - this function owns the subprocess lifecycle (spawn, stdin delivery, - timeout kill, and cleanup via ``try/finally``). + Used for adapters whose ``supports_streaming`` flag is True (e.g. Claude + Code's ``--output-format stream-json``, Codex's ``--json``). The command + list *must already include* any adapter-required flags — + :func:`execute_agent` calls ``adapter.build_command`` before dispatching. + + Stream processing is delegated to :func:`_read_agent_stream`; this + function owns the subprocess lifecycle (spawn, stdin delivery, timeout + kill, and cleanup via ``try/finally``). stderr is drained concurrently on a background reader thread so large stderr volume can't deadlock the child on a full OS pipe buffer while the main thread is reading stdout. """ - stream_cmd = cmd + [_OUTPUT_FORMAT_FLAG, _STREAM_FORMAT, _VERBOSE_FLAG] start = time.monotonic() deadline = (start + timeout) if timeout is not None else None @@ -466,7 +448,7 @@ def _run_agent_streaming( stderr_thread: threading.Thread | None = None proc = subprocess.Popen( - stream_cmd, + cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE if pipe_stderr else None, @@ -764,6 +746,7 @@ def execute_agent( timeout: float | None, log_dir: Path | None, iteration: int, + adapter: CLIAdapter | None = None, on_activity: ActivityCallback | None = None, on_output_line: OutputLineCallback | None = None, capture_result_text: bool = False, @@ -771,22 +754,27 @@ def execute_agent( ) -> AgentResult: """Run the agent subprocess, auto-selecting streaming or blocking mode. - Uses streaming mode for agents that support ``--output-format stream-json`` - (e.g. Claude Code); all other agents use the blocking path that drains - stdout and stderr via reader threads. The *on_activity* callback is - only invoked in streaming mode; *on_output_line* fires for both modes - as raw lines arrive. + The *adapter* argument (or :func:`select_adapter` when omitted) decides + which execution path runs: adapters whose ``supports_streaming`` flag is + True take the line-streaming path that drives ``on_activity`` callbacks; + all others take the blocking path with concurrent stdout/stderr drain. + ``adapter.build_command(cmd)`` is applied before spawning, so the CLI + receives any flags the adapter requires (e.g. Claude's + ``--output-format stream-json --verbose`` or Codex's ``--json``). This is the single entry point the engine should use — callers don't need to know which execution mode is selected. """ - supports_stream_json = _supports_stream_json(cmd) + if adapter is None: + adapter = select_adapter(cmd) + cmd = adapter.build_command(cmd) + supports_streaming = adapter.supports_streaming if capture_stdout is None: capture_stdout = log_dir is not None or ( - not supports_stream_json and on_output_line is None and capture_result_text + not supports_streaming and on_output_line is None and capture_result_text ) - if supports_stream_json: + if supports_streaming: return _run_agent_streaming( cmd, prompt, diff --git a/src/ralphify/_console_emitter.py b/src/ralphify/_console_emitter.py index c58d0224..9ae84422 100644 --- a/src/ralphify/_console_emitter.py +++ b/src/ralphify/_console_emitter.py @@ -42,8 +42,8 @@ RunStoppedData, ) from ralphify import _brand -from ralphify._agent import CLAUDE_BINARY from ralphify._output import format_count, format_duration +from ralphify.adapters import select_adapter _ICON_SUCCESS = "✓" _ICON_FAILURE = "✗" @@ -106,18 +106,22 @@ f"shift+{PEEK_TOGGLE_KEY} for full view[/]" ) -# ── Claude binary detection ─────────────────────────────────────────── +# ── Adapter-driven structured-output detection ──────────────────────── -def _is_claude_command(agent: str) -> bool: - """Return True if *agent* is a Claude Code command.""" +def _agent_renders_structured_peek(agent: str) -> bool: + """Return True if *agent*'s adapter feeds the structured peek panel. + + Drives the ``ConsoleEmitter`` choice between :class:`_IterationPanel` + (structured) and :class:`_IterationSpinner` (raw). Delegates to + :func:`select_adapter` so adding a new CLI with peek-panel support + requires no edits here. + """ try: - parts = shlex.split(agent) + cmd = shlex.split(agent) except ValueError: return False - if not parts: - return False - return Path(parts[0]).stem == CLAUDE_BINARY + return select_adapter(cmd).renders_structured_peek # ── Tool argument abbreviation ──────────────────────────────────────── @@ -1248,7 +1252,7 @@ def emit(self, event: Event) -> None: def _on_run_started(self, data: RunStartedData) -> None: ralph_name = data["ralph_name"] agent = data["agent"] - self._structured_agent = _is_claude_command(agent) + self._structured_agent = _agent_renders_structured_peek(agent) with self._console_lock: self._console.print( f"\n[bold {_brand.PURPLE}]{_ICON_PLAY} Running:[/] [bold]{escape_markup(ralph_name)}[/]" diff --git a/src/ralphify/adapters/__init__.py b/src/ralphify/adapters/__init__.py new file mode 100644 index 00000000..c0619e19 --- /dev/null +++ b/src/ralphify/adapters/__init__.py @@ -0,0 +1,158 @@ +"""Pluggable CLI adapter layer. + +Each supported agent CLI (Claude, Codex, Copilot, ...) implements the +:class:`CLIAdapter` protocol in its own module under this package. The +engine dispatches on :func:`select_adapter` at run time, so adding a new +CLI means writing one file and registering it in :data:`ADAPTERS` — no +edits to the engine, emitter, or subprocess machinery. + +Adapters translate the CLI's native output format to a common +:class:`AdapterEvent` stream and advertise capability flags so the core +loop can gracefully degrade when a CLI lacks structured output or hook +injection. Process lifecycle (spawn, SIGTERM at cap, reap) stays in +``_agent.py``; adapters only observe. +""" + +from __future__ import annotations + +from pathlib import Path +from typing import Literal, NamedTuple, Protocol, runtime_checkable + + +AdapterEventKind = Literal["tool_use", "turn", "message", "result"] +"""Categories of events an adapter can surface from a CLI's output stream.""" + +CountsWhat = Literal["tool_use", "turn", "none"] +"""What an adapter counts against ``max_turns`` — tool uses, turns, or nothing.""" + + +class AdapterEvent(NamedTuple): + """A single structured event parsed from a CLI's output stream. + + ``kind`` is the event category; ``name`` carries a tool name for + ``tool_use`` events (``None`` otherwise); ``raw`` is the original + parsed JSON object so callers can inspect extra fields when needed. + """ + + kind: AdapterEventKind + name: str | None = None + raw: dict | None = None + + +@runtime_checkable +class CLIAdapter(Protocol): + """Protocol every CLI adapter must satisfy. + + Adapters are stateless singletons: the same instance is reused for + every iteration of every run. Any per-iteration state lives in the + caller (``_agent.py``) — adapters only translate. + """ + + name: str + counts_what: CountsWhat + supports_streaming: bool + renders_structured_peek: bool + supports_soft_wind_down: bool + requires_full_stdout_for_completion: bool + + def matches(self, cmd: list[str]) -> bool: + """Return True if this adapter handles the given agent command.""" + ... + + def build_command(self, cmd: list[str]) -> list[str]: + """Return the command with any adapter-required flags appended. + + Idempotent: calling twice returns the same command. + """ + ... + + def parse_event(self, line: str) -> AdapterEvent | None: + """Parse one line of stdout into an :class:`AdapterEvent`. + + Returns ``None`` for lines that are not recognised events. + MUST NOT raise on malformed input (per FR-8). + """ + ... + + def extract_completion_signal( + self, + *, + result_text: str | None, + stdout: str | None, + user_signal: str, + ) -> bool: + """Return True if the agent's final output contains the completion signal. + + The signal is wrapped in ``...`` markup; the + inner text equals ``user_signal``. + + Adapters receive both the streaming-extracted *result_text* (the + terminal assistant message, when the streaming path could parse one) + and the full *stdout* buffer (only present when the engine chose to + capture it). Adapters with ``requires_full_stdout_for_completion`` + set False MUST be able to detect completion from *result_text* alone; + engines may pass ``stdout=None`` to skip the memory cost. + """ + ... + + def install_wind_down_hook( + self, + tempdir: Path, + counter_path: Path, + cap: int, + grace: int, + ) -> dict[str, str]: + """Write hook config files into *tempdir* and return env-var overrides. + + Only called when ``supports_soft_wind_down`` is True. Adapters that + set the flag False may leave this unimplemented (a ``NotImplementedError`` + is acceptable and is treated as a runtime downgrade to hard-cap-only). + """ + ... + + +ADAPTERS: list[CLIAdapter] = [] +"""Adapter registry, populated at import time by concrete adapter modules. + +Ordering matters: :func:`select_adapter` returns the first adapter whose +``matches`` method returns True, with :class:`GenericAdapter` as a final +catch-all. Specific adapters go first, generic last. +""" + + +def select_adapter(cmd: list[str]) -> CLIAdapter: + """Return the first registered adapter that claims *cmd*. + + Falls back to :class:`GenericAdapter` when nothing matches. Never + returns None — callers can always dispatch safely. + """ + from ralphify.adapters._generic import GenericAdapter + + for adapter in ADAPTERS: + if adapter.matches(cmd): + return adapter + return GenericAdapter() + + +def _register_builtin_adapters() -> None: + """Import concrete adapter modules so their ``ADAPTERS.append`` runs. + + Keeps the registry populated without forcing callers to import every + adapter module manually. Imports are deferred to the bottom of this + module (executed once at first package import) so cyclic-import risk + is contained. + """ + from ralphify.adapters import claude, codex, copilot # noqa: F401 + + +_register_builtin_adapters() + + +__all__ = [ + "ADAPTERS", + "AdapterEvent", + "AdapterEventKind", + "CLIAdapter", + "CountsWhat", + "select_adapter", +] diff --git a/src/ralphify/adapters/_generic.py b/src/ralphify/adapters/_generic.py new file mode 100644 index 00000000..85ee477a --- /dev/null +++ b/src/ralphify/adapters/_generic.py @@ -0,0 +1,70 @@ +"""Fallback adapter for CLIs with no dedicated implementation. + +Returned by :func:`ralphify.adapters.select_adapter` when no specific +adapter's ``matches`` returns True. All capability flags are False, so +the core loop treats sessions as blocking, untyped, and uncappable. +""" + +from __future__ import annotations + +from pathlib import Path + +from ralphify._promise import has_promise_completion +from ralphify.adapters import AdapterEvent, CountsWhat + + +class GenericAdapter: + """No-op adapter: pass commands through unchanged, parse nothing.""" + + name: str = "generic" + counts_what: CountsWhat = "none" + supports_streaming: bool = False + renders_structured_peek: bool = False + supports_soft_wind_down: bool = False + # Untyped agents have no streaming result event; the engine must keep + # the full stdout buffer if it wants promise detection. + requires_full_stdout_for_completion: bool = True + + def matches(self, cmd: list[str]) -> bool: + return False + + def build_command(self, cmd: list[str]) -> list[str]: + return list(cmd) + + def parse_event(self, line: str) -> AdapterEvent | None: + return None + + def extract_completion_signal( + self, + *, + result_text: str | None, + stdout: str | None, + user_signal: str, + ) -> bool: + """Scan the full stdout for the promise tag. + + Unknown CLIs have no event schema to parse, so the whole-stdout + regex scan is the only reliable path. Matches the current + engine-side behavior so switching to adapter-owned detection does + not regress promise completion for untyped agents. + + *result_text* is unused (the blocking path does not populate it + for unknown CLIs); the engine opts into + ``requires_full_stdout_for_completion`` to make sure *stdout* is + supplied when promise detection is requested. + """ + del result_text + if stdout is None: + return False + return has_promise_completion(stdout, user_signal) + + def install_wind_down_hook( + self, + tempdir: Path, + counter_path: Path, + cap: int, + grace: int, + ) -> dict[str, str]: + raise NotImplementedError( + "GenericAdapter does not support soft wind-down; max_turns will hard-kill." + ) diff --git a/src/ralphify/adapters/claude.py b/src/ralphify/adapters/claude.py new file mode 100644 index 00000000..d54849a7 --- /dev/null +++ b/src/ralphify/adapters/claude.py @@ -0,0 +1,172 @@ +"""Claude Code adapter. + +Claude is the only CLI shipping a stable ``--output-format stream-json`` +protocol today; its structured events drive the peek panel and power +per-event tool-use counting. Every Claude iteration emits: + +1. A ``system`` init event with the model name. +2. Zero or more ``assistant`` messages whose ``content`` list may include + ``tool_use``, ``thinking``, and ``text`` blocks. +3. Zero or more ``user`` messages (tool results echoed back). +4. A terminal ``result`` event carrying the final assistant text. + +Tool-use counting is scoped to ``assistant`` messages; we ignore +``tool_use`` blocks echoed back by ``user`` events so each invocation is +counted exactly once. +""" + +from __future__ import annotations + +import json +from pathlib import Path + +from ralphify._promise import has_promise_completion +from ralphify.adapters import ADAPTERS, AdapterEvent, CountsWhat + + +CLAUDE_BINARY_STEM = "claude" +"""Binary stem (``Path(cmd[0]).stem``) that identifies the Claude CLI.""" + +_OUTPUT_FORMAT_FLAG = "--output-format" +_OUTPUT_FORMAT_VALUE = "stream-json" +_VERBOSE_FLAG = "--verbose" + +_EVENT_TYPE_ASSISTANT = "assistant" +_EVENT_TYPE_RESULT = "result" +_BLOCK_TYPE_TOOL_USE = "tool_use" + + +class ClaudeAdapter: + """Parses Claude's stream-json output and supports soft wind-down.""" + + name: str = "claude" + counts_what: CountsWhat = "tool_use" + supports_streaming: bool = True + renders_structured_peek: bool = True + supports_soft_wind_down: bool = True + # Claude's final assistant text already arrives as ``agent.result_text`` + # via the stream-json ``result`` event, so the engine does not need to + # buffer the full stdout to scan for the promise tag. + requires_full_stdout_for_completion: bool = False + + def matches(self, cmd: list[str]) -> bool: + if not cmd: + return False + return Path(cmd[0]).stem == CLAUDE_BINARY_STEM + + def build_command(self, cmd: list[str]) -> list[str]: + """Ensure ``--output-format stream-json --verbose`` is present. + + Idempotent: running twice yields the same command. If the caller + already supplied ``--output-format ``, the existing value is + overwritten with ``stream-json`` — we cannot honor a user-chosen + format while still emitting a parseable event stream. + """ + result = list(cmd) + try: + format_index = result.index(_OUTPUT_FORMAT_FLAG) + except ValueError: + result.extend([_OUTPUT_FORMAT_FLAG, _OUTPUT_FORMAT_VALUE]) + else: + value_index = format_index + 1 + if value_index < len(result): + result[value_index] = _OUTPUT_FORMAT_VALUE + else: + result.append(_OUTPUT_FORMAT_VALUE) + if _VERBOSE_FLAG not in result: + result.append(_VERBOSE_FLAG) + return result + + def parse_event(self, line: str) -> AdapterEvent | None: + """Parse one stream-json line into an :class:`AdapterEvent`. + + Empty lines, non-JSON payloads, and non-dict JSON return ``None``. + ``result`` events return ``AdapterEvent(kind="result")``. An + ``assistant`` event whose content contains a ``tool_use`` block + returns the first such block as ``AdapterEvent(kind="tool_use")``; + Claude emits one tool_use block per assistant message, so + single-event dispatch matches the protocol. Every other parsed + event dict — including non-tool-use ``assistant`` messages — + returns ``AdapterEvent(kind="message")`` so callers can still + render them without counting against the turn cap. + """ + stripped = line.strip() + if not stripped: + return None + try: + parsed = json.loads(stripped) + except json.JSONDecodeError: + return None + if not isinstance(parsed, dict): + return None + + event_type = parsed.get("type") + if event_type == _EVENT_TYPE_RESULT: + return AdapterEvent(kind="result", raw=parsed) + if event_type != _EVENT_TYPE_ASSISTANT: + return AdapterEvent(kind="message", raw=parsed) + + for block in _iter_content_blocks(parsed): + if block.get("type") == _BLOCK_TYPE_TOOL_USE: + name = block.get("name") + return AdapterEvent( + kind="tool_use", + name=name if isinstance(name, str) else None, + raw=parsed, + ) + return AdapterEvent(kind="message", raw=parsed) + + def extract_completion_signal( + self, + *, + result_text: str | None, + stdout: str | None, + user_signal: str, + ) -> bool: + """Scan the streaming-extracted result text for ``{signal}``. + + Claude's terminal ``result`` event carries the last assistant + message as a plain string, which the streaming reader already + captures into :attr:`AgentResult.result_text`. Using *result_text* + directly avoids buffering the full stdout — large transcripts can + run into many megabytes per iteration. + + Only the parsed result text is considered — raw JSON from + ``status`` or ``assistant`` messages can legitimately echo + ``...`` substrings that must not trigger + completion. + + *stdout* is unused (Claude does not need a fallback because the + streaming path always populates *result_text* on a successful run); + it stays in the signature for protocol parity. + """ + del stdout + if result_text is None: + return False + return has_promise_completion(result_text, user_signal) + + def install_wind_down_hook( + self, + tempdir: Path, + counter_path: Path, + cap: int, + grace: int, + ) -> dict[str, str]: + raise NotImplementedError( + "Claude soft wind-down (settings.json PreToolUse hook) is scheduled " + "for Phase 3 of the CLI adapter layer spec." + ) + + +def _iter_content_blocks(raw: dict) -> list[dict]: + """Return the ``message.content`` list, filtered to dict blocks only.""" + message = raw.get("message") + if not isinstance(message, dict): + return [] + content = message.get("content") + if not isinstance(content, list): + return [] + return [block for block in content if isinstance(block, dict)] + + +ADAPTERS.append(ClaudeAdapter()) diff --git a/src/ralphify/adapters/codex.py b/src/ralphify/adapters/codex.py new file mode 100644 index 00000000..a64853ea --- /dev/null +++ b/src/ralphify/adapters/codex.py @@ -0,0 +1,198 @@ +"""Codex CLI adapter. + +Codex emits newline-delimited JSON with explicit event types: + +- ``TurnStarted`` / ``TurnCompleted`` — conversation turn boundaries. +- ``CollabToolCall`` / ``McpToolCall`` — tool invocations initiated by + the agent. +- ``CommandExecution`` — shell commands run inside the sandbox. + +We map every tool-call event to ``AdapterEvent(kind="tool_use", ...)`` +so the turn-cap counter uses the same user-facing metric across CLIs. +Turn boundaries surface as ``kind="turn"`` events for adapters that want +them; they do not count against ``max_turns`` today (counts_what is +``tool_use``, not ``turn``, for a unified metric — see spec Q13). +""" + +from __future__ import annotations + +import json +from pathlib import Path + +from ralphify._promise import has_promise_completion +from ralphify.adapters import ADAPTERS, AdapterEvent, CountsWhat + + +CODEX_BINARY_STEM = "codex" +"""Binary stem (``Path(cmd[0]).stem``) that identifies the Codex CLI.""" + +_JSON_FLAG = "--json" +"""Flag appended to request newline-delimited JSON output.""" + +_TURN_EVENTS: frozenset[str] = frozenset({"TurnStarted", "TurnCompleted"}) +_TOOL_CALL_EVENTS: frozenset[str] = frozenset( + {"CollabToolCall", "McpToolCall", "CommandExecution"} +) +_RESULT_EVENTS: frozenset[str] = frozenset({"TaskComplete", "TurnCompleted"}) + + +class CodexAdapter: + """Parses Codex's ``--json`` event stream.""" + + name: str = "codex" + counts_what: CountsWhat = "tool_use" + supports_streaming: bool = True + # Codex emits structured JSON that the streaming execution path parses + # for activity callbacks, but the console peek panel only understands + # Claude's stream-json schema today. Keep peek in raw-line mode until + # the emitter can render Codex events directly. + renders_structured_peek: bool = False + supports_soft_wind_down: bool = True + # Codex's terminal text lives inside ``TaskComplete`` / ``TurnCompleted`` + # events, which the streaming reader does not extract into + # ``agent.result_text``. The full stdout buffer is currently the only + # source for promise-tag scanning. + requires_full_stdout_for_completion: bool = True + + def matches(self, cmd: list[str]) -> bool: + if not cmd: + return False + return Path(cmd[0]).stem == CODEX_BINARY_STEM + + def build_command(self, cmd: list[str]) -> list[str]: + """Append ``--json`` to request structured output. Idempotent.""" + result = list(cmd) + if _JSON_FLAG not in result: + result.append(_JSON_FLAG) + return result + + def parse_event(self, line: str) -> AdapterEvent | None: + """Classify one JSONL line as turn / tool_use / message / result. + + Unknown event types return ``AdapterEvent(kind="message", ...)`` so + callers can still render them (e.g. peek panel) without counting + them against the turn cap. Malformed lines return ``None``. + """ + stripped = line.strip() + if not stripped: + return None + try: + parsed = json.loads(stripped) + except json.JSONDecodeError: + return None + if not isinstance(parsed, dict): + return None + + event_type = _event_type(parsed) + if event_type in _TOOL_CALL_EVENTS: + return AdapterEvent( + kind="tool_use", + name=_tool_name(parsed, event_type), + raw=parsed, + ) + if event_type in _RESULT_EVENTS: + return AdapterEvent(kind="result", raw=parsed) + if event_type in _TURN_EVENTS: + return AdapterEvent(kind="turn", raw=parsed) + return AdapterEvent(kind="message", raw=parsed) + + def extract_completion_signal( + self, + *, + result_text: str | None, + stdout: str | None, + user_signal: str, + ) -> bool: + """Scan every ``TurnCompleted`` / ``TaskComplete`` event for the promise tag. + + Codex does not carry a single terminal ``result`` string the way + Claude does; completion may be spread across assistant text in + multiple events. Falling back to a whole-stdout scan is safe + because promise tags are explicit and non-ambiguous markup. + + *result_text* is unused — Codex never populates it through the + streaming reader (no ``{"type":"result"}`` lines). The engine + opts into ``requires_full_stdout_for_completion`` to make sure + *stdout* is supplied when promise detection is requested. + """ + del result_text + if stdout is None: + return False + if has_promise_completion(stdout, user_signal): + return True + for line in stdout.splitlines(): + stripped = line.strip() + if not stripped: + continue + try: + parsed = json.loads(stripped) + except json.JSONDecodeError: + continue + if isinstance(parsed, dict) and _event_type(parsed) in _RESULT_EVENTS: + text = _event_text_payload(parsed) + if text and has_promise_completion(text, user_signal): + return True + return False + + def install_wind_down_hook( + self, + tempdir: Path, + counter_path: Path, + cap: int, + grace: int, + ) -> dict[str, str]: + raise NotImplementedError( + "Codex soft wind-down (hooks.json PostToolUse) is scheduled " + "for Phase 3 of the CLI adapter layer spec." + ) + + +def _event_type(parsed: dict) -> str | None: + """Return the Codex event type, whether top-level or nested under ``type``.""" + event_type = parsed.get("type") or parsed.get("kind") + if isinstance(event_type, str): + return event_type + msg = parsed.get("msg") + if isinstance(msg, dict): + nested = msg.get("type") or msg.get("kind") + if isinstance(nested, str): + return nested + return None + + +def _tool_name(parsed: dict, event_type: str | None) -> str | None: + """Best-effort extraction of the tool name from a tool-call event. + + Codex event shapes vary by tool type — ``CommandExecution`` carries a + command, ``CollabToolCall`` a tool name, ``McpToolCall`` a server + + tool. When no specific name is available, return the event type. + """ + for key in ("name", "tool", "tool_name"): + value = parsed.get(key) + if isinstance(value, str): + return value + msg = parsed.get("msg") + if isinstance(msg, dict): + for key in ("name", "tool", "tool_name", "command"): + value = msg.get(key) + if isinstance(value, str): + return value + return event_type + + +def _event_text_payload(parsed: dict) -> str | None: + """Extract any final-assistant text from a Codex result event.""" + for key in ("result", "text", "content", "output"): + value = parsed.get(key) + if isinstance(value, str): + return value + msg = parsed.get("msg") + if isinstance(msg, dict): + for key in ("result", "text", "content", "output"): + value = msg.get(key) + if isinstance(value, str): + return value + return None + + +ADAPTERS.append(CodexAdapter()) diff --git a/src/ralphify/adapters/copilot.py b/src/ralphify/adapters/copilot.py new file mode 100644 index 00000000..8766db31 --- /dev/null +++ b/src/ralphify/adapters/copilot.py @@ -0,0 +1,151 @@ +"""GitHub Copilot CLI adapter (alpha). + +The standalone ``copilot`` binary (GA 2026-02-25) ships a +``--output-format json`` mode that is **only loosely documented**. This +adapter does best-effort counting based on the empirical shapes seen in +the ralphify test corpus; unknown event types return ``None`` rather +than crashing. + +Capability matrix: + +- ``counts_what = "tool_use"`` with an alpha caveat — counting accuracy + depends on ongoing schema discovery (see :file:`docs/agents.md`). +- ``supports_streaming = False`` — event schema is unverified, so the + adapter falls through the blocking path and avoids per-line parsing. +- ``renders_structured_peek = False`` — peek panel stays in raw-line mode. +- ``supports_soft_wind_down = False`` — Copilot has no hook system as of + 2026-04, so ``install_wind_down_hook`` raises :class:`NotImplementedError` + (which the engine downgrades to hard-cap-only). +""" + +from __future__ import annotations + +import json +from pathlib import Path + +from ralphify._promise import has_promise_completion +from ralphify.adapters import ADAPTERS, AdapterEvent, CountsWhat + + +COPILOT_BINARY_STEM = "copilot" +"""Binary stem (``Path(cmd[0]).stem``) that identifies the standalone Copilot CLI. + +Note: this is the GA ``copilot`` binary, NOT the ``gh copilot`` subcommand. +The ``gh`` stem is deliberately excluded because ``gh`` hosts many other +subcommands that have nothing to do with AI agents. +""" + +_OUTPUT_FORMAT_FLAGS: tuple[str, ...] = ("--output-format", "json") + +_TOOL_USE_EVENT_TYPES: frozenset[str] = frozenset( + {"tool_use", "tool_call", "ToolCall", "ToolUse"} +) +_RESULT_EVENT_TYPES: frozenset[str] = frozenset( + {"result", "response", "final", "Final", "Complete"} +) + + +class CopilotAdapter: + """Best-effort adapter for the standalone Copilot CLI.""" + + name: str = "copilot" + counts_what: CountsWhat = "tool_use" + supports_streaming: bool = False + renders_structured_peek: bool = False + supports_soft_wind_down: bool = False + # Copilot runs on the blocking path with no stream parsing today; the + # promise tag must be located somewhere in the captured stdout. + requires_full_stdout_for_completion: bool = True + + def matches(self, cmd: list[str]) -> bool: + if not cmd: + return False + return Path(cmd[0]).stem == COPILOT_BINARY_STEM + + def build_command(self, cmd: list[str]) -> list[str]: + """Ensure ``--output-format json`` is present. + + Idempotent: running twice yields the same command. If the caller + already supplied ``--output-format ``, the existing value is + overwritten with ``json`` — we cannot honor a user-chosen format + while still emitting a parseable event stream. + """ + result = list(cmd) + output_format_flag, output_format_value = _OUTPUT_FORMAT_FLAGS + try: + format_index = result.index(output_format_flag) + except ValueError: + result.extend(_OUTPUT_FORMAT_FLAGS) + else: + value_index = format_index + 1 + if value_index < len(result): + result[value_index] = output_format_value + else: + result.append(output_format_value) + return result + + def parse_event(self, line: str) -> AdapterEvent | None: + """Parse best-effort; return ``None`` for unknown shapes. + + The Copilot event schema is ``[unverified]`` in the spec — this + method intentionally errs on the side of *not* inventing events + so the turn cap is never inflated by false positives. + """ + stripped = line.strip() + if not stripped: + return None + try: + parsed = json.loads(stripped) + except json.JSONDecodeError: + return None + if not isinstance(parsed, dict): + return None + + event_type = parsed.get("type") or parsed.get("event") or parsed.get("kind") + if not isinstance(event_type, str): + return None + + if event_type in _TOOL_USE_EVENT_TYPES: + name = parsed.get("name") or parsed.get("tool") + return AdapterEvent( + kind="tool_use", + name=name if isinstance(name, str) else None, + raw=parsed, + ) + if event_type in _RESULT_EVENT_TYPES: + return AdapterEvent(kind="result", raw=parsed) + return None + + def extract_completion_signal( + self, + *, + result_text: str | None, + stdout: str | None, + user_signal: str, + ) -> bool: + """Scan the entire stdout for the promise tag. + + Without a verified event schema there is no reliable per-event + extraction path; the whole-stdout scan is the safest fallback. + *result_text* is unused — Copilot runs on the blocking path and + does not produce a streaming result event today. + """ + del result_text + if stdout is None: + return False + return has_promise_completion(stdout, user_signal) + + def install_wind_down_hook( + self, + tempdir: Path, + counter_path: Path, + cap: int, + grace: int, + ) -> dict[str, str]: + raise NotImplementedError( + "Copilot CLI has no hook system as of 2026-04; max_turns " + "will hard-kill without soft wind-down signal." + ) + + +ADAPTERS.append(CopilotAdapter()) diff --git a/src/ralphify/engine.py b/src/ralphify/engine.py index 5e3729e4..8b88139c 100644 --- a/src/ralphify/engine.py +++ b/src/ralphify/engine.py @@ -9,7 +9,6 @@ from __future__ import annotations -import json import shlex import traceback from datetime import datetime, timezone @@ -39,7 +38,6 @@ parse_frontmatter, ) from ralphify._output import format_duration -from ralphify._promise import has_promise_completion from ralphify._run_types import ( Command, RunConfig, @@ -48,6 +46,7 @@ ) from ralphify._resolver import resolve_all, resolve_args from ralphify._runner import run_command +from ralphify.adapters import select_adapter _PAUSE_POLL_INTERVAL = 0.25 # seconds between pause/resume checks @@ -179,38 +178,28 @@ def _run_agent_phase( f"Invalid agent command syntax: {config.agent!r}. {_field_hint(FIELD_AGENT)}" ) from exc + adapter = select_adapter(cmd) completion_signal = config.completion_signal - promise_completed_from_output = False - promise_scan_tail = "" - promise_scan_limit = max(4096, len(completion_signal) * 4 + 64) - - def _record_promise_fragment(fragment: str) -> None: - nonlocal promise_completed_from_output, promise_scan_tail - if promise_completed_from_output or not fragment: - return - promise_scan_tail = (promise_scan_tail + fragment)[-promise_scan_limit:] - promise_completed_from_output = has_promise_completion( - promise_scan_tail, completion_signal - ) def _on_output_line(line: str, stream: OutputStream) -> None: if emit.wants_agent_output_lines(): emit.agent_output_line(line, stream, state.iteration) - if stream != "stdout": - return - stripped = line.strip() - if not stripped: - return - try: - json.loads(stripped) - except json.JSONDecodeError: - _record_promise_fragment(f"{line}\n") if emit.wants_agent_output_lines() or config.log_dir is not None: on_output_line = _on_output_line else: on_output_line = None + # Capture full stdout only when somebody downstream actually needs the + # bytes — log writing, or promise detection for adapters that cannot + # work from ``agent.result_text`` alone. Without this gate every + # iteration would buffer the entire transcript even for verbose + # streaming agents, regressing memory vs the prior tail-scan path. + capture_stdout_for_promise = ( + config.stop_on_completion_signal and adapter.requires_full_stdout_for_completion + ) + capture_stdout = config.log_dir is not None or capture_stdout_for_promise + try: def on_activity(data: dict[str, Any]) -> None: @@ -225,9 +214,11 @@ def on_activity(data: dict[str, Any]) -> None: timeout=config.timeout, log_dir=config.log_dir, iteration=state.iteration, + adapter=adapter, on_activity=on_activity, on_output_line=on_output_line, capture_result_text=True, + capture_stdout=capture_stdout, ) except FileNotFoundError as exc: raise FileNotFoundError( @@ -235,12 +226,10 @@ def on_activity(data: dict[str, Any]) -> None: ) from exc duration = format_duration(agent.elapsed) - completion_text = ( - agent.result_text if agent.result_text is not None else agent.captured_stdout - ) - promise_completed = agent.success and ( - promise_completed_from_output - or has_promise_completion(completion_text, completion_signal) + promise_completed = agent.success and adapter.extract_completion_signal( + result_text=agent.result_text, + stdout=agent.captured_stdout, + user_signal=completion_signal, ) if promise_completed: state.promise_completed = True diff --git a/tests/conftest.py b/tests/conftest.py index ce9e46ab..4d999714 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -2,8 +2,17 @@ import pytest +from ralphify.adapters import ADAPTERS + @pytest.fixture(autouse=True) def _disable_streaming(monkeypatch): - """Disable the Popen-based streaming path in all tests.""" - monkeypatch.setattr("ralphify._agent._supports_stream_json", lambda cmd: False) + """Force the blocking path and raw peek on every registered adapter. + + Tests that explicitly need the Popen streaming path or structured + peek rendering re-enable the relevant flag on the specific adapter + they exercise. + """ + for adapter in ADAPTERS: + monkeypatch.setattr(adapter, "supports_streaming", False) + monkeypatch.setattr(adapter, "renders_structured_peek", False) diff --git a/tests/test_adapters_claude.py b/tests/test_adapters_claude.py new file mode 100644 index 00000000..478cff7d --- /dev/null +++ b/tests/test_adapters_claude.py @@ -0,0 +1,183 @@ +"""Tests for the Claude stream-json adapter.""" + +from __future__ import annotations + +import json + +import pytest + +from ralphify.adapters import select_adapter +from ralphify.adapters.claude import ClaudeAdapter + + +def _assistant_event(*blocks: dict) -> str: + """Build one JSON line matching Claude's assistant message schema.""" + return json.dumps( + { + "type": "assistant", + "message": {"content": list(blocks)}, + } + ) + + +def _result_event(result_text: str) -> str: + return json.dumps({"type": "result", "result": result_text}) + + +def test_matches_claude_binary_stem() -> None: + adapter = ClaudeAdapter() + assert adapter.matches(["claude"]) is True + assert adapter.matches(["/usr/local/bin/claude"]) is True + assert adapter.matches(["claude", "--print"]) is True + assert adapter.matches(["codex"]) is False + assert adapter.matches([]) is False + + +def test_build_command_appends_stream_flags() -> None: + adapter = ClaudeAdapter() + result = adapter.build_command(["claude"]) + assert result == ["claude", "--output-format", "stream-json", "--verbose"] + + +def test_build_command_is_idempotent() -> None: + adapter = ClaudeAdapter() + once = adapter.build_command(["claude"]) + twice = adapter.build_command(once) + assert once == twice + + +def test_build_command_preserves_user_flags() -> None: + adapter = ClaudeAdapter() + result = adapter.build_command(["claude", "--print", "-p"]) + assert result[:3] == ["claude", "--print", "-p"] + assert "--output-format" in result + assert "stream-json" in result + + +def test_parse_tool_use_event() -> None: + adapter = ClaudeAdapter() + line = _assistant_event({"type": "tool_use", "name": "Bash", "input": {}}) + event = adapter.parse_event(line) + assert event is not None + assert event.kind == "tool_use" + assert event.name == "Bash" + + +def test_parse_result_event() -> None: + adapter = ClaudeAdapter() + event = adapter.parse_event(_result_event("done")) + assert event is not None + assert event.kind == "result" + + +def test_parse_ignores_thinking_blocks() -> None: + adapter = ClaudeAdapter() + line = _assistant_event({"type": "thinking", "thinking": "planning..."}) + event = adapter.parse_event(line) + assert event is not None + assert event.kind == "message" + + +def test_parse_malformed_json_returns_none() -> None: + adapter = ClaudeAdapter() + assert adapter.parse_event("not json") is None + assert adapter.parse_event("") is None + assert adapter.parse_event(" \n") is None + + +def test_parse_non_dict_json_returns_none() -> None: + adapter = ClaudeAdapter() + assert adapter.parse_event("[1, 2, 3]") is None + assert adapter.parse_event('"just a string"') is None + + +def test_parse_tool_use_with_non_string_name() -> None: + """Defensive: tool_use blocks with missing name must not raise.""" + adapter = ClaudeAdapter() + line = _assistant_event({"type": "tool_use", "input": {}}) + event = adapter.parse_event(line) + assert event is not None + assert event.kind == "tool_use" + assert event.name is None + + +def test_parse_skips_first_non_tool_use_block() -> None: + adapter = ClaudeAdapter() + line = _assistant_event( + {"type": "text", "text": "thinking out loud"}, + {"type": "tool_use", "name": "Edit", "input": {}}, + ) + event = adapter.parse_event(line) + assert event is not None + assert event.kind == "tool_use" + assert event.name == "Edit" + + +def test_extract_completion_signal_from_result_text() -> None: + adapter = ClaudeAdapter() + result_text = "DONE" + assert ( + adapter.extract_completion_signal( + result_text=result_text, stdout=None, user_signal="DONE" + ) + is True + ) + assert ( + adapter.extract_completion_signal( + result_text=result_text, stdout=None, user_signal="OTHER" + ) + is False + ) + + +def test_extract_completion_signal_ignores_raw_stdout() -> None: + """ClaudeAdapter only inspects ``result_text``; the streaming reader + extracts the terminal assistant message there. Promise tags embedded + in raw stdout (e.g. ``status`` or ``assistant`` JSON) must not trigger + completion.""" + adapter = ClaudeAdapter() + stdout = "raw text MARKER trailing" + assert ( + adapter.extract_completion_signal( + result_text=None, stdout=stdout, user_signal="MARKER" + ) + is False + ) + + +def test_extract_completion_signal_handles_missing_result_text() -> None: + adapter = ClaudeAdapter() + assert ( + adapter.extract_completion_signal( + result_text=None, stdout=None, user_signal="DONE" + ) + is False + ) + assert ( + adapter.extract_completion_signal( + result_text="", stdout=None, user_signal="DONE" + ) + is False + ) + + +def test_install_wind_down_hook_raises_not_implemented(tmp_path) -> None: + adapter = ClaudeAdapter() + with pytest.raises(NotImplementedError): + adapter.install_wind_down_hook(tmp_path, tmp_path / "counter", 10, 2) + + +def test_capability_flags() -> None: + adapter = ClaudeAdapter() + assert adapter.name == "claude" + assert adapter.counts_what == "tool_use" + assert adapter.supports_streaming is True + assert adapter.renders_structured_peek is True + assert adapter.supports_soft_wind_down is True + assert adapter.requires_full_stdout_for_completion is False + + +def test_registered_in_adapters_registry() -> None: + """Import side-effect registration should hand back the Claude adapter.""" + selected = select_adapter(["claude"]) + assert isinstance(selected, ClaudeAdapter) diff --git a/tests/test_adapters_codex.py b/tests/test_adapters_codex.py new file mode 100644 index 00000000..f6fcd485 --- /dev/null +++ b/tests/test_adapters_codex.py @@ -0,0 +1,159 @@ +"""Tests for the Codex CLI adapter.""" + +from __future__ import annotations + +import json + +import pytest + +from ralphify.adapters import select_adapter +from ralphify.adapters.codex import CodexAdapter + + +def test_matches_codex_binary_stem() -> None: + adapter = CodexAdapter() + assert adapter.matches(["codex"]) is True + assert adapter.matches(["/usr/local/bin/codex"]) is True + assert adapter.matches(["codex", "exec", "--sandbox"]) is True + assert adapter.matches(["claude"]) is False + assert adapter.matches([]) is False + + +def test_build_command_appends_json_flag() -> None: + adapter = CodexAdapter() + assert adapter.build_command(["codex"]) == ["codex", "--json"] + + +def test_build_command_is_idempotent() -> None: + adapter = CodexAdapter() + once = adapter.build_command(["codex"]) + twice = adapter.build_command(once) + assert once == twice + + +def test_parse_tool_call_events() -> None: + adapter = CodexAdapter() + for event_type, expected_name in [ + ("CollabToolCall", "Edit"), + ("McpToolCall", "Edit"), + ("CommandExecution", "Edit"), + ]: + line = json.dumps({"type": event_type, "name": "Edit"}) + event = adapter.parse_event(line) + assert event is not None + assert event.kind == "tool_use" + assert event.name == expected_name + + +def test_parse_turn_events() -> None: + adapter = CodexAdapter() + for event_type in ("TurnStarted", "TurnCompleted"): + # TurnCompleted is a result *and* turn event; result wins. + line = json.dumps({"type": event_type}) + event = adapter.parse_event(line) + assert event is not None + if event_type == "TurnCompleted": + assert event.kind == "result" + else: + assert event.kind == "turn" + + +def test_parse_unknown_events_become_message() -> None: + adapter = CodexAdapter() + event = adapter.parse_event(json.dumps({"type": "SomethingNew"})) + assert event is not None + assert event.kind == "message" + + +def test_parse_malformed_returns_none() -> None: + adapter = CodexAdapter() + assert adapter.parse_event("not json") is None + assert adapter.parse_event("") is None + assert adapter.parse_event("42") is None + + +def test_parse_tool_call_nested_under_msg() -> None: + """Some Codex builds wrap event data under a ``msg`` key.""" + adapter = CodexAdapter() + line = json.dumps({"msg": {"type": "CommandExecution", "command": "git status"}}) + event = adapter.parse_event(line) + assert event is not None + assert event.kind == "tool_use" + assert event.name == "git status" + + +def test_parse_falls_back_to_event_type_for_name() -> None: + adapter = CodexAdapter() + line = json.dumps({"type": "CollabToolCall"}) + event = adapter.parse_event(line) + assert event is not None + assert event.name == "CollabToolCall" + + +def test_extract_completion_signal_from_stream() -> None: + adapter = CodexAdapter() + stream = "\n".join( + [ + json.dumps({"type": "TurnStarted"}), + json.dumps({"type": "CommandExecution", "command": "ls"}), + json.dumps({"type": "TaskComplete", "text": "DONE"}), + ] + ) + assert ( + adapter.extract_completion_signal( + result_text=None, stdout=stream, user_signal="DONE" + ) + is True + ) + assert ( + adapter.extract_completion_signal( + result_text=None, stdout=stream, user_signal="OTHER" + ) + is False + ) + + +def test_extract_completion_signal_scans_plain_output() -> None: + adapter = CodexAdapter() + assert ( + adapter.extract_completion_signal( + result_text=None, + stdout="some HI text", + user_signal="HI", + ) + is True + ) + + +def test_extract_completion_signal_returns_false_when_stdout_missing() -> None: + """When the engine elects not to capture stdout, Codex cannot detect + completion — the streaming reader does not populate ``result_text`` + for Codex's ``TaskComplete`` event shape.""" + adapter = CodexAdapter() + assert ( + adapter.extract_completion_signal( + result_text=None, stdout=None, user_signal="DONE" + ) + is False + ) + + +def test_install_wind_down_hook_raises_not_implemented(tmp_path) -> None: + adapter = CodexAdapter() + with pytest.raises(NotImplementedError): + adapter.install_wind_down_hook(tmp_path, tmp_path / "counter", 10, 2) + + +def test_capability_flags() -> None: + adapter = CodexAdapter() + assert adapter.name == "codex" + assert adapter.counts_what == "tool_use" + assert adapter.supports_streaming is True + assert adapter.renders_structured_peek is False + assert adapter.supports_soft_wind_down is True + assert adapter.requires_full_stdout_for_completion is True + + +def test_registered_in_adapters_registry() -> None: + selected = select_adapter(["codex"]) + assert isinstance(selected, CodexAdapter) diff --git a/tests/test_adapters_copilot.py b/tests/test_adapters_copilot.py new file mode 100644 index 00000000..50819481 --- /dev/null +++ b/tests/test_adapters_copilot.py @@ -0,0 +1,129 @@ +"""Tests for the Copilot CLI adapter (alpha).""" + +from __future__ import annotations + +import json + +import pytest + +from ralphify.adapters import select_adapter +from ralphify.adapters.copilot import CopilotAdapter + + +def test_matches_copilot_binary_stem() -> None: + adapter = CopilotAdapter() + assert adapter.matches(["copilot"]) is True + assert adapter.matches(["/opt/copilot/bin/copilot"]) is True + # Deliberately does NOT match the gh subcommand + assert adapter.matches(["gh"]) is False + assert adapter.matches([]) is False + + +def test_build_command_appends_json_flags() -> None: + adapter = CopilotAdapter() + assert adapter.build_command(["copilot"]) == [ + "copilot", + "--output-format", + "json", + ] + + +def test_build_command_is_idempotent() -> None: + adapter = CopilotAdapter() + once = adapter.build_command(["copilot"]) + twice = adapter.build_command(once) + assert once == twice + + +def test_parse_tool_use_variants() -> None: + adapter = CopilotAdapter() + for event_type in ("tool_use", "tool_call", "ToolCall", "ToolUse"): + line = json.dumps({"type": event_type, "name": "Edit"}) + event = adapter.parse_event(line) + assert event is not None + assert event.kind == "tool_use" + assert event.name == "Edit" + + +def test_parse_result_variants() -> None: + adapter = CopilotAdapter() + for event_type in ("result", "response", "final", "Complete"): + event = adapter.parse_event(json.dumps({"type": event_type})) + assert event is not None + assert event.kind == "result" + + +def test_parse_unknown_returns_none() -> None: + """Unknown event types must NOT count against the turn cap.""" + adapter = CopilotAdapter() + assert adapter.parse_event(json.dumps({"type": "SomethingElse"})) is None + assert adapter.parse_event(json.dumps({"type": "progress"})) is None + + +def test_parse_missing_type_returns_none() -> None: + adapter = CopilotAdapter() + assert adapter.parse_event(json.dumps({"name": "Edit"})) is None + + +def test_parse_malformed_returns_none() -> None: + adapter = CopilotAdapter() + assert adapter.parse_event("not json") is None + assert adapter.parse_event("") is None + + +def test_parse_event_with_alternate_key_names() -> None: + """Covers ``event`` / ``kind`` alternative type keys.""" + adapter = CopilotAdapter() + event = adapter.parse_event(json.dumps({"event": "tool_use", "name": "Bash"})) + assert event is not None + assert event.kind == "tool_use" + assert event.name == "Bash" + + +def test_extract_completion_signal_scans_stdout() -> None: + adapter = CopilotAdapter() + assert ( + adapter.extract_completion_signal( + result_text=None, + stdout="chat chatter MARKER more text", + user_signal="MARKER", + ) + is True + ) + assert ( + adapter.extract_completion_signal( + result_text=None, stdout="no marker here", user_signal="MARKER" + ) + is False + ) + + +def test_extract_completion_signal_returns_false_when_stdout_missing() -> None: + adapter = CopilotAdapter() + assert ( + adapter.extract_completion_signal( + result_text=None, stdout=None, user_signal="MARKER" + ) + is False + ) + + +def test_install_wind_down_hook_raises_not_implemented(tmp_path) -> None: + adapter = CopilotAdapter() + with pytest.raises(NotImplementedError, match="no hook system"): + adapter.install_wind_down_hook(tmp_path, tmp_path / "counter", 10, 2) + + +def test_capability_flags() -> None: + adapter = CopilotAdapter() + assert adapter.name == "copilot" + assert adapter.counts_what == "tool_use" + assert adapter.supports_streaming is False + assert adapter.renders_structured_peek is False + assert adapter.supports_soft_wind_down is False + assert adapter.requires_full_stdout_for_completion is True + + +def test_registered_in_adapters_registry() -> None: + selected = select_adapter(["copilot"]) + assert isinstance(selected, CopilotAdapter) diff --git a/tests/test_adapters_registry.py b/tests/test_adapters_registry.py new file mode 100644 index 00000000..8072e96c --- /dev/null +++ b/tests/test_adapters_registry.py @@ -0,0 +1,43 @@ +"""Tests for the adapter registry and first-match dispatch.""" + +from __future__ import annotations + +from ralphify.adapters import ADAPTERS, CLIAdapter, select_adapter +from ralphify.adapters._generic import GenericAdapter +from ralphify.adapters.claude import ClaudeAdapter +from ralphify.adapters.codex import CodexAdapter +from ralphify.adapters.copilot import CopilotAdapter + + +def test_registry_contains_builtin_adapters() -> None: + types = {type(a) for a in ADAPTERS} + assert ClaudeAdapter in types + assert CodexAdapter in types + assert CopilotAdapter in types + + +def test_select_adapter_dispatches_by_binary_stem() -> None: + assert isinstance(select_adapter(["claude"]), ClaudeAdapter) + assert isinstance(select_adapter(["codex", "exec"]), CodexAdapter) + assert isinstance(select_adapter(["copilot"]), CopilotAdapter) + + +def test_select_adapter_falls_back_to_generic() -> None: + selected = select_adapter(["aider", "--model", "claude-4"]) + assert isinstance(selected, GenericAdapter) + + +def test_select_adapter_handles_empty_cmd() -> None: + assert isinstance(select_adapter([]), GenericAdapter) + + +def test_generic_adapter_parse_never_raises() -> None: + generic = GenericAdapter() + assert generic.parse_event("garbage") is None + assert generic.parse_event("") is None + + +def test_all_adapters_satisfy_protocol() -> None: + """Runtime Protocol check catches shape regressions in any adapter.""" + for adapter in ADAPTERS: + assert isinstance(adapter, CLIAdapter) diff --git a/tests/test_agent.py b/tests/test_agent.py index 58520141..d6b88c4e 100644 --- a/tests/test_agent.py +++ b/tests/test_agent.py @@ -19,35 +19,11 @@ _read_agent_stream, _run_agent_blocking, _run_agent_streaming, - _supports_stream_json, _write_log, execute_agent, ) - - -class TestSupportsStreamJson: - def test_claude_binary(self): - assert _supports_stream_json(["claude", "-p"]) is True - - def test_claude_absolute_path(self): - assert _supports_stream_json(["/usr/local/bin/claude", "-p"]) is True - - def test_non_claude_binary(self): - assert _supports_stream_json(["aider", "--yes"]) is False - - def test_empty_command(self): - assert _supports_stream_json([]) is False - - def test_claude_like_name(self): - assert _supports_stream_json(["claude-code"]) is False - - def test_claude_with_cmd_extension(self): - """On Windows, npm installs claude as claude.cmd — streaming must - still be detected.""" - assert _supports_stream_json(["claude.cmd", "-p"]) is True - - def test_claude_with_exe_extension(self): - assert _supports_stream_json(["claude.exe", "-p"]) is True +from ralphify.adapters import select_adapter +from ralphify.adapters.claude import ClaudeAdapter class TestWriteLog: @@ -449,8 +425,11 @@ class TestExecuteAgentDispatch: @patch(MOCK_SUBPROCESS) def test_dispatches_to_streaming_for_claude(self, mock_popen, monkeypatch): - """execute_agent uses the streaming path when the agent supports it.""" - monkeypatch.setattr("ralphify._agent._supports_stream_json", lambda cmd: True) + """execute_agent uses the streaming path when the adapter renders + structured output.""" + claude_adapter = select_adapter(["claude"]) + assert isinstance(claude_adapter, ClaudeAdapter) + monkeypatch.setattr(claude_adapter, "supports_streaming", True) mock_popen.return_value = make_mock_popen( stdout_lines='{"type": "result", "result": "done"}\n', returncode=0, @@ -474,7 +453,9 @@ def test_execute_agent_passes_capture_result_text_to_streaming_helper( on_output_line = MagicMock() fake_streaming = MagicMock(return_value=AgentResult(returncode=0, elapsed=0.01)) - monkeypatch.setattr("ralphify._agent._supports_stream_json", lambda cmd: True) + claude_adapter = select_adapter(["claude"]) + assert isinstance(claude_adapter, ClaudeAdapter) + monkeypatch.setattr(claude_adapter, "supports_streaming", True) monkeypatch.setattr("ralphify._agent._run_agent_streaming", fake_streaming) execute_agent( @@ -489,7 +470,7 @@ def test_execute_agent_passes_capture_result_text_to_streaming_helper( ) fake_streaming.assert_called_once_with( - ["claude", "-p"], + claude_adapter.build_command(["claude", "-p"]), "prompt", None, None, @@ -506,7 +487,9 @@ def test_execute_agent_passes_capture_result_text_to_blocking_helper( on_output_line = MagicMock() fake_blocking = MagicMock(return_value=AgentResult(returncode=0, elapsed=0.01)) - monkeypatch.setattr("ralphify._agent._supports_stream_json", lambda cmd: False) + # The autouse conftest fixture already forces every adapter's + # supports_streaming flag to False, so ``echo`` falls into the + # blocking path through the GenericAdapter fallback. monkeypatch.setattr("ralphify._agent._run_agent_blocking", fake_blocking) execute_agent( @@ -670,10 +653,12 @@ def test_sends_prompt_to_stdin(self, mock_popen): proc.stdin.close.assert_called_once() @patch(MOCK_SUBPROCESS) - def test_adds_stream_json_flags(self, mock_popen): + def test_passes_cmd_verbatim_to_popen(self, mock_popen): + """_run_agent_streaming no longer appends its own flags — the caller + (via ``adapter.build_command``) owns CLI flags now.""" mock_popen.return_value = make_mock_popen(returncode=0) _run_agent_streaming( - ["claude", "-p"], + ["claude", "-p", "--output-format", "stream-json", "--verbose"], "prompt", timeout=None, log_dir=None, @@ -682,9 +667,13 @@ def test_adds_stream_json_flags(self, mock_popen): call_args = mock_popen.call_args cmd = call_args[0][0] - assert "--output-format" in cmd - assert "stream-json" in cmd - assert "--verbose" in cmd + assert cmd == [ + "claude", + "-p", + "--output-format", + "stream-json", + "--verbose", + ] @patch(MOCK_SUBPROCESS) def test_writes_log_on_success(self, mock_popen, tmp_path): @@ -752,13 +741,7 @@ def test_capture_result_text_does_not_buffer_stream_output_without_log_dir( assert result.result_text is None call_args = mock_popen.call_args - assert call_args.args[0] == [ - "claude", - "-p", - "--output-format", - "stream-json", - "--verbose", - ] + assert call_args.args[0] == ["claude", "-p"] call_kwargs = call_args[1] assert call_kwargs.get("stdin") == subprocess.PIPE assert call_kwargs.get("stdout") == subprocess.PIPE diff --git a/tests/test_console_emitter.py b/tests/test_console_emitter.py index b6476d46..d5c57a87 100644 --- a/tests/test_console_emitter.py +++ b/tests/test_console_emitter.py @@ -15,9 +15,9 @@ _IterationPanel, _IterationSpinner, _SinglePanelNavigator, + _agent_renders_structured_peek, _format_run_info, _format_summary, - _is_claude_command, _scrollbar_metrics, _shorten_path, ) @@ -332,7 +332,11 @@ def test_startup_hint_shown_when_peek_on_by_default(self): output = console.export_text() assert "press p to hide" in output - def test_startup_hint_structured_for_claude(self): + def test_startup_hint_structured_for_claude(self, monkeypatch): + from ralphify.adapters import select_adapter + + claude_adapter = select_adapter(["claude"]) + monkeypatch.setattr(claude_adapter, "renders_structured_peek", True) emitter, console = _capture_emitter() emitter._peek_enabled = True emitter.emit( @@ -1499,24 +1503,33 @@ def test_peek_message_shown_in_spinner(self): assert "live output on" in output -class TestIsClaudeCommand: - def test_claude_binary(self): - assert _is_claude_command("claude") is True +class TestAgentRendersStructuredPeek: + """Verify the emitter routes through :func:`select_adapter`.""" - def test_claude_with_flags(self): - assert _is_claude_command("claude --dangerously-skip-permissions") is True + def test_claude_adapter_reports_structured_peek(self, monkeypatch): + from ralphify.adapters import select_adapter + from ralphify.adapters.claude import ClaudeAdapter - def test_claude_full_path(self): - assert _is_claude_command("/usr/local/bin/claude -p") is True + adapter = select_adapter(["claude"]) + assert isinstance(adapter, ClaudeAdapter) + monkeypatch.setattr(adapter, "renders_structured_peek", True) + assert _agent_renders_structured_peek("claude") is True + assert ( + _agent_renders_structured_peek("claude --dangerously-skip-permissions") + is True + ) + assert _agent_renders_structured_peek("/usr/local/bin/claude -p") is True - def test_not_claude(self): - assert _is_claude_command("aider --yes") is False + def test_unknown_agent_falls_back_to_generic(self): + # conftest forces every adapter's renders_structured_peek to False; + # the GenericAdapter fallback also reports False. + assert _agent_renders_structured_peek("aider --yes") is False - def test_empty(self): - assert _is_claude_command("") is False + def test_empty_string_is_not_structured(self): + assert _agent_renders_structured_peek("") is False - def test_invalid_shlex(self): - assert _is_claude_command("claude 'unterminated") is False + def test_invalid_shlex_is_not_structured(self): + assert _agent_renders_structured_peek("claude 'unterminated") is False def _populate_buffer(spinner, count: int, prefix: str = "line") -> None: diff --git a/tests/test_engine.py b/tests/test_engine.py index a6e3956c..a1d0fdb9 100644 --- a/tests/test_engine.py +++ b/tests/test_engine.py @@ -132,13 +132,21 @@ class TestPromiseCompletionSignals: def test_tagged_promise_does_not_stop_by_default( self, mock_execute_agent, tmp_path ): + """Without ``stop_on_completion_signal`` the loop must run all iterations. + + A non-streaming adapter (here ``echo`` → GenericAdapter) only sees + promise completion when the engine elects to capture stdout, which + in turn requires either logging or the explicit opt-in. Skipping + the buffer is the whole point of the gating, so + ``state.promise_completed`` legitimately stays False here. + """ config = make_config(tmp_path, max_iterations=3) state = make_state() emitter = NullEmitter() mock_execute_agent.return_value = AgentResult( returncode=0, elapsed=0.01, - captured_stdout="RALPH_PROMISE_COMPLETE\n", + captured_stdout=None, ) run_loop(config, state, emitter) @@ -147,7 +155,7 @@ def test_tagged_promise_does_not_stop_by_default( assert state.completed == 3 assert state.failed == 0 assert state.status == RunStatus.COMPLETED - assert state.promise_completed is True + assert state.promise_completed is False assert [ call.kwargs["on_output_line"] for call in mock_execute_agent.call_args_list ] == [None, None, None] @@ -155,6 +163,80 @@ def test_tagged_promise_does_not_stop_by_default( call.kwargs["capture_result_text"] for call in mock_execute_agent.call_args_list ] == [True, True, True] + # Generic adapter requires full stdout, but the user didn't opt in + # and no log dir is set — capture should stay off to avoid + # buffering verbose agent output. + assert [ + call.kwargs["capture_stdout"] for call in mock_execute_agent.call_args_list + ] == [False, False, False] + + @patch("ralphify.engine.execute_agent") + def test_capture_stdout_off_for_streaming_adapter_without_logging( + self, mock_execute_agent, tmp_path + ): + """Claude exposes ``result_text`` directly; the engine must not + force-buffer the full stdout transcript even when the user opts + into ``stop_on_completion_signal``.""" + config = make_config( + tmp_path, + agent="claude", + max_iterations=1, + stop_on_completion_signal=True, + ) + state = make_state() + + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + result_text="RALPH_PROMISE_COMPLETE", + ) + + run_loop(config, state, NullEmitter()) + + assert mock_execute_agent.call_args.kwargs["capture_stdout"] is False + assert state.promise_completed is True + + @patch("ralphify.engine.execute_agent") + def test_capture_stdout_on_for_blocking_adapter_with_opt_in( + self, mock_execute_agent, tmp_path + ): + """Generic / Copilot adapters need the full stdout buffer to + scan for the promise tag — engine must opt in when the user + opts into completion signalling.""" + config = make_config( + tmp_path, + max_iterations=1, + stop_on_completion_signal=True, + ) + state = make_state() + + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + captured_stdout="RALPH_PROMISE_COMPLETE\n", + ) + + run_loop(config, state, NullEmitter()) + + assert mock_execute_agent.call_args.kwargs["capture_stdout"] is True + assert state.promise_completed is True + + @patch("ralphify.engine.execute_agent") + def test_capture_stdout_on_when_log_dir_set(self, mock_execute_agent, tmp_path): + """Logging always needs the buffer regardless of completion signal.""" + log_dir = tmp_path / "logs" + config = make_config(tmp_path, max_iterations=1, log_dir=log_dir) + state = make_state() + + mock_execute_agent.return_value = AgentResult( + returncode=0, + elapsed=0.01, + captured_stdout="anything\n", + ) + + run_loop(config, state, NullEmitter()) + + assert mock_execute_agent.call_args.kwargs["capture_stdout"] is True @patch("ralphify.engine.execute_agent") def test_tagged_promise_stops_early_when_enabled( @@ -171,7 +253,7 @@ def test_tagged_promise_stops_early_when_enabled( mock_execute_agent.return_value = AgentResult( returncode=0, elapsed=0.01, - result_text="RALPH_PROMISE_COMPLETE", + captured_stdout="RALPH_PROMISE_COMPLETE\n", ) run_loop(config, state, emitter) @@ -210,7 +292,7 @@ def test_custom_promise_text_matches_inner_tag_text( mock_execute_agent.return_value = AgentResult( returncode=0, elapsed=0.01, - result_text="CUSTOM_DONE", + captured_stdout="CUSTOM_DONE\n", ) run_loop(config, state, emitter) @@ -243,7 +325,7 @@ def test_promise_tag_normalizes_inner_whitespace_before_matching( mock_execute_agent.return_value = AgentResult( returncode=0, elapsed=0.01, - result_text="\n CUSTOM\tDONE \n", + captured_stdout="\n CUSTOM\tDONE \n\n", ) run_loop(config, state, NullEmitter()) @@ -311,8 +393,12 @@ def test_different_tagged_promise_text_does_not_match( def test_structured_agents_ignore_raw_stdout_for_promise_detection( self, mock_execute_agent, tmp_path ): + """ClaudeAdapter only looks at ``result`` events — embedded + promise tags inside ``status`` or ``assistant`` JSON messages + must not trigger early completion.""" config = make_config( tmp_path, + agent="claude", max_iterations=2, stop_on_completion_signal=True, ) From 4d375d3c2d17ea81c0122eeabfd10f7048b16798 Mon Sep 17 00:00:00 2001 From: Paul Sorensen Date: Sat, 18 Apr 2026 22:05:45 -0700 Subject: [PATCH 3/4] refactor+feat: extract adapter Protocol and seed turn-cap foundation (#7) * refactor: extract adapter Protocol and registry into _protocol.py The adapter package's __init__ both imports concrete adapter modules (to populate ADAPTERS) and defines the Protocol those modules depend on. That's a latent circular-import hazard: it works today only because __init__ defines the Protocol before it imports concrete adapters. As soon as an adapter grows a dependency that could re-enter the package namespace, the order-of-execution becomes load-bearing. Split the Protocol, the AdapterEvent NamedTuple, the Literal kind enums, and the ADAPTERS list into a new leaf module _protocol.py. Concrete adapters (claude, codex, copilot, _generic) now import from _protocol directly. __init__.py re-exports the public names so existing `from ralphify.adapters import CLIAdapter` imports keep working, and keeps select_adapter plus the builtin-registration bootstrap. No behaviour change. * feat: add turn-cap event types and max_turns frontmatter fields Preparatory foundation for the forthcoming turn-cap enforcement work: no enforcement yet, just the event-type surface and the validated RunConfig fields that enforcement will read. - _events.py: TOOL_USE, ITERATION_TURN_APPROACHING_LIMIT, and ITERATION_TURN_CAPPED event kinds with typed data payloads (ToolUseData, TurnApproachingLimitData, TurnCappedData) threaded through the EventData union. - _frontmatter.py: FIELD_MAX_TURNS and FIELD_MAX_TURNS_GRACE constants. - _run_types.py: max_turns (int | None, default None) and max_turns_grace (int, default 2) fields on RunConfig. - cli.py: _validate_max_turns / _validate_max_turns_grace helpers with range and type checks; grace must be strictly less than the cap when the cap is set. _build_run_config threads both values into the returned RunConfig. - tests/test_cli_frontmatter_fields.py: unit coverage for the validators plus end-to-end _build_run_config assertions for the default, valid, and invalid cases. Nothing consumes max_turns yet; it will be read by the adapter-driven enforcement path in a follow-up PR. * fix: mock shutil.which in frontmatter field tests CI runners don't have 'claude' on PATH, so _build_run_config's agent validation was failing before reaching the turn-cap assertions. --- src/ralphify/_events.py | 25 +++++ src/ralphify/_frontmatter.py | 4 + src/ralphify/_run_types.py | 4 + src/ralphify/adapters/__init__.py | 114 ++------------------ src/ralphify/adapters/_generic.py | 2 +- src/ralphify/adapters/_protocol.py | 116 ++++++++++++++++++++ src/ralphify/adapters/claude.py | 2 +- src/ralphify/adapters/codex.py | 2 +- src/ralphify/adapters/copilot.py | 2 +- src/ralphify/cli.py | 43 ++++++++ tests/test_cli_frontmatter_fields.py | 152 +++++++++++++++++++++++++++ 11 files changed, 359 insertions(+), 107 deletions(-) create mode 100644 src/ralphify/adapters/_protocol.py create mode 100644 tests/test_cli_frontmatter_fields.py diff --git a/src/ralphify/_events.py b/src/ralphify/_events.py index 2c80a53a..7e5a4b90 100644 --- a/src/ralphify/_events.py +++ b/src/ralphify/_events.py @@ -82,6 +82,11 @@ class EventType(Enum): # ── Agent activity (live streaming) ───────────────────────── AGENT_ACTIVITY = "agent_activity" AGENT_OUTPUT_LINE = "agent_output_line" + TOOL_USE = "tool_use" + + # ── Turn-cap enforcement ──────────────────────────────────── + ITERATION_TURN_APPROACHING_LIMIT = "iteration_turn_approaching_limit" + ITERATION_TURN_CAPPED = "iteration_turn_capped" # ── Other ─────────────────────────────────────────────────── LOG_MESSAGE = "log_message" @@ -153,6 +158,23 @@ class AgentOutputLineData(TypedDict): iteration: int +class ToolUseData(TypedDict): + iteration: int + tool_name: str + count: int + + +class TurnApproachingLimitData(TypedDict): + iteration: int + count: int + max_turns: int + + +class TurnCappedData(TypedDict): + iteration: int + count: int + + class LogMessageData(TypedDict): message: str level: LogLevel @@ -169,6 +191,9 @@ class LogMessageData(TypedDict): | PromptAssembledData | AgentActivityData | AgentOutputLineData + | ToolUseData + | TurnApproachingLimitData + | TurnCappedData | LogMessageData ) """Union of all typed event data payloads.""" diff --git a/src/ralphify/_frontmatter.py b/src/ralphify/_frontmatter.py index fa3f67aa..ed92a1cc 100644 --- a/src/ralphify/_frontmatter.py +++ b/src/ralphify/_frontmatter.py @@ -31,6 +31,10 @@ FIELD_COMPLETION_SIGNAL = "completion_signal" FIELD_STOP_ON_COMPLETION_SIGNAL = "stop_on_completion_signal" +# Per-iteration turn-cap configuration — see docs/specs/cli-adapter-layer.md. +FIELD_MAX_TURNS = "max_turns" +FIELD_MAX_TURNS_GRACE = "max_turns_grace" + # Sub-field names within each command mapping. CMD_FIELD_NAME = "name" CMD_FIELD_RUN = "run" diff --git a/src/ralphify/_run_types.py b/src/ralphify/_run_types.py index 9218b03b..161de77b 100644 --- a/src/ralphify/_run_types.py +++ b/src/ralphify/_run_types.py @@ -104,6 +104,10 @@ class RunConfig: completion_signal: str = DEFAULT_COMPLETION_SIGNAL # Stop the run when the configured promise payload is observed. stop_on_completion_signal: bool = False + # Per-iteration tool-use cap; None disables the cap. + max_turns: int | None = None + # Soft wind-down fires at ``max_turns - max_turns_grace``. + max_turns_grace: int = 2 @dataclass(slots=True) diff --git a/src/ralphify/adapters/__init__.py b/src/ralphify/adapters/__init__.py index c0619e19..a21ab80a 100644 --- a/src/ralphify/adapters/__init__.py +++ b/src/ralphify/adapters/__init__.py @@ -11,113 +11,21 @@ loop can gracefully degrade when a CLI lacks structured output or hook injection. Process lifecycle (spawn, SIGTERM at cap, reap) stays in ``_agent.py``; adapters only observe. + +The Protocol, :class:`AdapterEvent`, and :data:`ADAPTERS` live in +:mod:`_protocol` so concrete adapter modules can depend on them +without cycling through this package's ``__init__``. """ from __future__ import annotations -from pathlib import Path -from typing import Literal, NamedTuple, Protocol, runtime_checkable - - -AdapterEventKind = Literal["tool_use", "turn", "message", "result"] -"""Categories of events an adapter can surface from a CLI's output stream.""" - -CountsWhat = Literal["tool_use", "turn", "none"] -"""What an adapter counts against ``max_turns`` — tool uses, turns, or nothing.""" - - -class AdapterEvent(NamedTuple): - """A single structured event parsed from a CLI's output stream. - - ``kind`` is the event category; ``name`` carries a tool name for - ``tool_use`` events (``None`` otherwise); ``raw`` is the original - parsed JSON object so callers can inspect extra fields when needed. - """ - - kind: AdapterEventKind - name: str | None = None - raw: dict | None = None - - -@runtime_checkable -class CLIAdapter(Protocol): - """Protocol every CLI adapter must satisfy. - - Adapters are stateless singletons: the same instance is reused for - every iteration of every run. Any per-iteration state lives in the - caller (``_agent.py``) — adapters only translate. - """ - - name: str - counts_what: CountsWhat - supports_streaming: bool - renders_structured_peek: bool - supports_soft_wind_down: bool - requires_full_stdout_for_completion: bool - - def matches(self, cmd: list[str]) -> bool: - """Return True if this adapter handles the given agent command.""" - ... - - def build_command(self, cmd: list[str]) -> list[str]: - """Return the command with any adapter-required flags appended. - - Idempotent: calling twice returns the same command. - """ - ... - - def parse_event(self, line: str) -> AdapterEvent | None: - """Parse one line of stdout into an :class:`AdapterEvent`. - - Returns ``None`` for lines that are not recognised events. - MUST NOT raise on malformed input (per FR-8). - """ - ... - - def extract_completion_signal( - self, - *, - result_text: str | None, - stdout: str | None, - user_signal: str, - ) -> bool: - """Return True if the agent's final output contains the completion signal. - - The signal is wrapped in ``...`` markup; the - inner text equals ``user_signal``. - - Adapters receive both the streaming-extracted *result_text* (the - terminal assistant message, when the streaming path could parse one) - and the full *stdout* buffer (only present when the engine chose to - capture it). Adapters with ``requires_full_stdout_for_completion`` - set False MUST be able to detect completion from *result_text* alone; - engines may pass ``stdout=None`` to skip the memory cost. - """ - ... - - def install_wind_down_hook( - self, - tempdir: Path, - counter_path: Path, - cap: int, - grace: int, - ) -> dict[str, str]: - """Write hook config files into *tempdir* and return env-var overrides. - - Only called when ``supports_soft_wind_down`` is True. Adapters that - set the flag False may leave this unimplemented (a ``NotImplementedError`` - is acceptable and is treated as a runtime downgrade to hard-cap-only). - """ - ... - - -ADAPTERS: list[CLIAdapter] = [] -"""Adapter registry, populated at import time by concrete adapter modules. - -Ordering matters: :func:`select_adapter` returns the first adapter whose -``matches`` method returns True, with :class:`GenericAdapter` as a final -catch-all. Specific adapters go first, generic last. -""" +from ralphify.adapters._protocol import ( + ADAPTERS, + AdapterEvent, + AdapterEventKind, + CLIAdapter, + CountsWhat, +) def select_adapter(cmd: list[str]) -> CLIAdapter: diff --git a/src/ralphify/adapters/_generic.py b/src/ralphify/adapters/_generic.py index 85ee477a..3bffb514 100644 --- a/src/ralphify/adapters/_generic.py +++ b/src/ralphify/adapters/_generic.py @@ -10,7 +10,7 @@ from pathlib import Path from ralphify._promise import has_promise_completion -from ralphify.adapters import AdapterEvent, CountsWhat +from ralphify.adapters._protocol import AdapterEvent, CountsWhat class GenericAdapter: diff --git a/src/ralphify/adapters/_protocol.py b/src/ralphify/adapters/_protocol.py new file mode 100644 index 00000000..e58f2a67 --- /dev/null +++ b/src/ralphify/adapters/_protocol.py @@ -0,0 +1,116 @@ +"""Adapter Protocol, event type, and registry. + +Concrete adapter modules (:mod:`claude`, :mod:`codex`, :mod:`copilot`, +:mod:`_generic`) import from here rather than from the package +``__init__``. The package ``__init__`` populates :data:`ADAPTERS` by +importing concrete adapters, and those adapters need the Protocol +before the ``__init__`` finishes executing — keeping the Protocol in a +leaf module makes the import graph acyclic. +""" + +from __future__ import annotations + +from pathlib import Path +from typing import Literal, NamedTuple, Protocol, runtime_checkable + + +AdapterEventKind = Literal["tool_use", "turn", "message", "result"] +"""Categories of events an adapter can surface from a CLI's output stream.""" + +CountsWhat = Literal["tool_use", "turn", "none"] +"""What an adapter counts against ``max_turns`` — tool uses, turns, or nothing.""" + + +class AdapterEvent(NamedTuple): + """A single structured event parsed from a CLI's output stream. + + ``kind`` is the event category; ``name`` carries a tool name for + ``tool_use`` events (``None`` otherwise); ``raw`` is the original + parsed JSON object so callers can inspect extra fields when needed. + """ + + kind: AdapterEventKind + name: str | None = None + raw: dict | None = None + + +@runtime_checkable +class CLIAdapter(Protocol): + """Protocol every CLI adapter must satisfy. + + Adapters are stateless singletons: the same instance is reused for + every iteration of every run. Any per-iteration state lives in the + caller (``_agent.py``) — adapters only translate. + """ + + name: str + counts_what: CountsWhat + supports_streaming: bool + renders_structured_peek: bool + supports_soft_wind_down: bool + requires_full_stdout_for_completion: bool + + def matches(self, cmd: list[str]) -> bool: + """Return True if this adapter handles the given agent command.""" + ... + + def build_command(self, cmd: list[str]) -> list[str]: + """Return the command with any adapter-required flags appended. + + Idempotent: calling twice returns the same command. + """ + ... + + def parse_event(self, line: str) -> AdapterEvent | None: + """Parse one line of stdout into an :class:`AdapterEvent`. + + Returns ``None`` for lines that are not recognised events. + MUST NOT raise on malformed input (per FR-8). + """ + ... + + def extract_completion_signal( + self, + *, + result_text: str | None, + stdout: str | None, + user_signal: str, + ) -> bool: + """Return True if the agent's final output contains the completion signal. + + The signal is wrapped in ``...`` markup; the + inner text equals ``user_signal``. + + Adapters receive both the streaming-extracted *result_text* (the + terminal assistant message, when the streaming path could parse one) + and the full *stdout* buffer (only present when the engine chose to + capture it). Adapters with ``requires_full_stdout_for_completion`` + set False MUST be able to detect completion from *result_text* alone; + engines may pass ``stdout=None`` to skip the memory cost. + """ + ... + + def install_wind_down_hook( + self, + tempdir: Path, + counter_path: Path, + cap: int, + grace: int, + ) -> dict[str, str]: + """Write hook config files into *tempdir* and return env-var overrides. + + Only called when ``supports_soft_wind_down`` is True. Adapters that + set the flag False may leave this unimplemented (a ``NotImplementedError`` + is acceptable and is treated as a runtime downgrade to hard-cap-only). + """ + ... + + +ADAPTERS: list[CLIAdapter] = [] +"""Adapter registry, populated at import time by concrete adapter modules. + +Ordering matters: :func:`ralphify.adapters.select_adapter` returns the +first adapter whose ``matches`` method returns True, with +:class:`ralphify.adapters._generic.GenericAdapter` as a final catch-all. +Specific adapters go first, generic last. +""" diff --git a/src/ralphify/adapters/claude.py b/src/ralphify/adapters/claude.py index d54849a7..49b0ad92 100644 --- a/src/ralphify/adapters/claude.py +++ b/src/ralphify/adapters/claude.py @@ -21,7 +21,7 @@ from pathlib import Path from ralphify._promise import has_promise_completion -from ralphify.adapters import ADAPTERS, AdapterEvent, CountsWhat +from ralphify.adapters._protocol import ADAPTERS, AdapterEvent, CountsWhat CLAUDE_BINARY_STEM = "claude" diff --git a/src/ralphify/adapters/codex.py b/src/ralphify/adapters/codex.py index a64853ea..6ee9069c 100644 --- a/src/ralphify/adapters/codex.py +++ b/src/ralphify/adapters/codex.py @@ -20,7 +20,7 @@ from pathlib import Path from ralphify._promise import has_promise_completion -from ralphify.adapters import ADAPTERS, AdapterEvent, CountsWhat +from ralphify.adapters._protocol import ADAPTERS, AdapterEvent, CountsWhat CODEX_BINARY_STEM = "codex" diff --git a/src/ralphify/adapters/copilot.py b/src/ralphify/adapters/copilot.py index 8766db31..deeb6cf1 100644 --- a/src/ralphify/adapters/copilot.py +++ b/src/ralphify/adapters/copilot.py @@ -24,7 +24,7 @@ from pathlib import Path from ralphify._promise import has_promise_completion -from ralphify.adapters import ADAPTERS, AdapterEvent, CountsWhat +from ralphify.adapters._protocol import ADAPTERS, AdapterEvent, CountsWhat COPILOT_BINARY_STEM = "copilot" diff --git a/src/ralphify/cli.py b/src/ralphify/cli.py index 93bbd0c3..48ae28e0 100644 --- a/src/ralphify/cli.py +++ b/src/ralphify/cli.py @@ -33,6 +33,8 @@ FIELD_COMMANDS, FIELD_CREDIT, FIELD_COMPLETION_SIGNAL, + FIELD_MAX_TURNS, + FIELD_MAX_TURNS_GRACE, FIELD_STOP_ON_COMPLETION_SIGNAL, RALPH_MARKER, VALID_NAME_CHARS_MSG, @@ -466,6 +468,41 @@ def _validate_completion_signal(raw_signal: Any) -> str: return raw_signal +def _validate_max_turns(raw: Any) -> int | None: + """Validate the ``max_turns`` frontmatter field. + + Returns ``None`` when absent (no cap). Exits with an error when the + value is not a positive integer. + """ + if raw is None: + return None + if isinstance(raw, bool) or not isinstance(raw, int) or raw < 1: + _exit_error(f"'{FIELD_MAX_TURNS}' must be a positive integer, got {raw!r}.") + return raw + + +def _validate_max_turns_grace(raw: Any, max_turns: int | None) -> int: + """Validate the ``max_turns_grace`` field. + + Defaults to ``2`` when absent. Must be a non-negative integer and + strictly less than *max_turns* (when *max_turns* is set). + """ + if raw is None: + grace = 2 + elif isinstance(raw, bool) or not isinstance(raw, int) or raw < 0: + _exit_error( + f"'{FIELD_MAX_TURNS_GRACE}' must be a non-negative integer, got {raw!r}." + ) + else: + grace = raw + if max_turns is not None and grace >= max_turns: + _exit_error( + f"'{FIELD_MAX_TURNS_GRACE}' ({grace}) must be less than " + f"'{FIELD_MAX_TURNS}' ({max_turns})." + ) + return grace + + def _validate_stop_on_completion_signal(raw_value: Any) -> bool: """Validate the stop-on-completion-signal frontmatter field.""" if raw_value is None: @@ -524,6 +561,10 @@ def _build_run_config( stop_on_completion_signal = _validate_stop_on_completion_signal( fm.get(FIELD_STOP_ON_COMPLETION_SIGNAL) ) + max_turns = _validate_max_turns(fm.get(FIELD_MAX_TURNS)) + max_turns_grace = _validate_max_turns_grace( + fm.get(FIELD_MAX_TURNS_GRACE), max_turns + ) return RunConfig( agent=agent, @@ -540,6 +581,8 @@ def _build_run_config( credit=credit, completion_signal=completion_signal, stop_on_completion_signal=stop_on_completion_signal, + max_turns=max_turns, + max_turns_grace=max_turns_grace, ) diff --git a/tests/test_cli_frontmatter_fields.py b/tests/test_cli_frontmatter_fields.py new file mode 100644 index 00000000..91e8071e --- /dev/null +++ b/tests/test_cli_frontmatter_fields.py @@ -0,0 +1,152 @@ +"""Tests for the new max_turns / max_turns_grace frontmatter fields.""" + +from __future__ import annotations + +from pathlib import Path +from unittest.mock import patch + +import pytest +import typer + +from helpers import MOCK_WHICH +from ralphify.cli import ( + _build_run_config, + _validate_max_turns, + _validate_max_turns_grace, +) + + +class TestValidateMaxTurns: + def test_absent_returns_none(self) -> None: + assert _validate_max_turns(None) is None + + def test_positive_int_passes(self) -> None: + assert _validate_max_turns(5) == 5 + + def test_zero_rejected(self) -> None: + with pytest.raises(typer.Exit): + _validate_max_turns(0) + + def test_negative_rejected(self) -> None: + with pytest.raises(typer.Exit): + _validate_max_turns(-1) + + def test_bool_rejected(self) -> None: + with pytest.raises(typer.Exit): + _validate_max_turns(True) + + def test_string_rejected(self) -> None: + with pytest.raises(typer.Exit): + _validate_max_turns("5") + + def test_float_rejected(self) -> None: + with pytest.raises(typer.Exit): + _validate_max_turns(5.0) + + +class TestValidateMaxTurnsGrace: + def test_absent_defaults_to_two(self) -> None: + assert _validate_max_turns_grace(None, max_turns=None) == 2 + + def test_absent_with_cap_defaults_to_two(self) -> None: + assert _validate_max_turns_grace(None, max_turns=10) == 2 + + def test_zero_allowed(self) -> None: + assert _validate_max_turns_grace(0, max_turns=10) == 0 + + def test_negative_rejected(self) -> None: + with pytest.raises(typer.Exit): + _validate_max_turns_grace(-1, max_turns=10) + + def test_bool_rejected(self) -> None: + with pytest.raises(typer.Exit): + _validate_max_turns_grace(True, max_turns=10) + + def test_float_rejected(self) -> None: + with pytest.raises(typer.Exit): + _validate_max_turns_grace(1.5, max_turns=10) + + def test_grace_equal_to_cap_rejected(self) -> None: + with pytest.raises(typer.Exit): + _validate_max_turns_grace(10, max_turns=10) + + def test_grace_above_cap_rejected(self) -> None: + with pytest.raises(typer.Exit): + _validate_max_turns_grace(11, max_turns=10) + + def test_grace_below_cap_allowed(self) -> None: + assert _validate_max_turns_grace(3, max_turns=10) == 3 + + def test_grace_without_cap_any_non_negative_allowed(self) -> None: + # No cap means no upper bound on grace; the value is still retained. + assert _validate_max_turns_grace(99, max_turns=None) == 99 + + +@patch(MOCK_WHICH, return_value="/usr/bin/claude") +class TestBuildRunConfigTurnCapFields: + def _write_ralph(self, tmp_path: Path, body: str) -> Path: + ralph = tmp_path / "RALPH.md" + ralph.write_text(body, encoding="utf-8") + return ralph + + def test_defaults_when_absent(self, _mock_which, tmp_path: Path) -> None: + self._write_ralph( + tmp_path, + "---\nagent: claude -p\n---\nhello\n", + ) + config = _build_run_config( + ralph_path=str(tmp_path), + max_iterations=1, + stop_on_error=False, + delay=0, + log_dir=None, + timeout=None, + ) + assert config.max_turns is None + assert config.max_turns_grace == 2 + + def test_values_threaded_through(self, _mock_which, tmp_path: Path) -> None: + self._write_ralph( + tmp_path, + "---\nagent: claude -p\nmax_turns: 20\nmax_turns_grace: 5\n---\nhello\n", + ) + config = _build_run_config( + ralph_path=str(tmp_path), + max_iterations=1, + stop_on_error=False, + delay=0, + log_dir=None, + timeout=None, + ) + assert config.max_turns == 20 + assert config.max_turns_grace == 5 + + def test_invalid_max_turns_exits(self, _mock_which, tmp_path: Path) -> None: + self._write_ralph( + tmp_path, + "---\nagent: claude -p\nmax_turns: 0\n---\nhello\n", + ) + with pytest.raises(typer.Exit): + _build_run_config( + ralph_path=str(tmp_path), + max_iterations=1, + stop_on_error=False, + delay=0, + log_dir=None, + timeout=None, + ) + + def test_grace_at_or_above_cap_exits(self, _mock_which, tmp_path: Path) -> None: + self._write_ralph( + tmp_path, + "---\nagent: claude -p\nmax_turns: 5\nmax_turns_grace: 5\n---\nhello\n", + ) + with pytest.raises(typer.Exit): + _build_run_config( + ralph_path=str(tmp_path), + max_iterations=1, + stop_on_error=False, + delay=0, + log_dir=None, + timeout=None, + ) From 8a68022c17f870d565cdd704774274fd57057163 Mon Sep 17 00:00:00 2001 From: Paul Sorensen Date: Wed, 22 Apr 2026 00:02:39 -0700 Subject: [PATCH 4/4] chore: sync 50 upstream micro-refactors from computerlovetech/ralphify (#10) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * workspace: bootstrap improve-codebase Co-authored-by: Ralphify * refactor: drop redundant `if parts else ""` — empty join already returns "" `" · ".join([])` returns `""`, so the explicit empty-case guard in `_format_params` is dead. Behavior is identical either way. Co-authored-by: Ralphify * workspace: record `_format_params` simplification + verified-live audit - iterations.md: log 4ccfa9a - backlog.md: note vulture false-positives confirmed live; add two future-win candidates to triage - coverage/_console_emitter.md: first coverage note for this module Co-authored-by: Ralphify * refactor: extract `_stop_compact_live_unlocked` to dedupe compact-Live teardown The three-line `if self._live is not None: self._live.stop(); self._live = None` pattern appeared in three call sites (`_stop_live_unlocked`, `enter_fullscreen`, `_on_iteration_ended`). Collapsed into a single helper — behavior unchanged. Co-authored-by: Ralphify * workspace: record `_stop_compact_live_unlocked` extraction + backlog note Co-authored-by: Ralphify * refactor: drop redundant `_iteration_order` list — dict insertion order suffices Python dicts preserve insertion order since 3.7, so the parallel `_iteration_order` list tracking the same sequence as `_iteration_history` was pure duplication. Archive re-orders via pop-then-insert, eviction iterates the dict (oldest-first), and enter_fullscreen's most-recent lookup uses `reversed()`. Behavior is unchanged; the only test hit was a direct field reference. Co-authored-by: Ralphify * workspace: record `_iteration_order` removal Co-authored-by: Ralphify * refactor: extract `_call_safely` helper for best-effort observer callbacks The same guarded-callback pattern (`if cb is not None: try: cb(...); except Exception: pass`, with identical "best-effort; draining must not stop" comment) was duplicated three times across `_read_agent_stream` and `_pump_stream`. Consolidate into a single `_call_safely` helper defined next to the callback type aliases. Behavior unchanged — the helper preserves the None guard, suppresses all exceptions, and only fires the callback once with the provided arguments. Co-authored-by: Ralphify * workspace: record `_call_safely` extraction + seed `_agent.py` coverage note Co-authored-by: Ralphify * refactor: drop redundant max(total - visible, 1) guard in `_scrollbar_metrics` The early return `if total <= visible` above guarantees `total > visible`, so `total - visible` is always ≥ 1. Inline the subtraction directly and note the invariant in a comment. Co-authored-by: Ralphify * workspace: record scrollbar-metrics simplification Co-authored-by: Ralphify * refactor: drop redundant empty-dict guard in `_format_categories` `" · ".join([])` already returns `""`, so the early-return on an empty `_tool_categories` dict was dead code — same pattern as 4ccfa9a's `_format_params` cleanup. Co-authored-by: Ralphify * workspace: record `_format_categories` empty-dict guard removal Co-authored-by: Ralphify * refactor: extract `_step_iteration` to dedupe prev/next iteration browsing `prev_iteration` and `next_iteration` were 12-line mirror images differing only in step direction and eviction-fallback choice (oldest vs newest). Extracted to a single direction-parametric helper — behavior preserved. Co-authored-by: Ralphify * workspace: record `_step_iteration` extraction Co-authored-by: Ralphify * refactor: use public `iteration_id` property in history eviction The one cross-class `_iteration_id` access in the module now goes through the public property, keeping `_FullscreenPeek`'s private attribute private. Co-authored-by: Ralphify * workspace: record `iteration_id` property-access cleanup Co-authored-by: Ralphify * refactor: drop `_reset_view` — identical to `scroll_to_bottom` `_FullscreenPeek._reset_view` set `_offset = 0` and `_auto_scroll = True`, the same two assignments as `scroll_to_bottom`. Dropped the private helper and called `scroll_to_bottom()` from the two `_step_iteration` sites instead. Added the docstring that used to live on `_reset_view` to the surviving method so the "snap to newest line + follow" intent is still recorded. Co-authored-by: Ralphify * workspace: record `_reset_view` removal Co-authored-by: Ralphify * refactor: defer `panel_for` guard to `is_live` to dedupe identical condition Both methods checked `self._current_iteration == iteration_id and self._active_renderable is not None`. Rewrite `panel_for` to call `is_live` so the condition lives in exactly one place — if the "this is the live iteration" check ever needs to grow (e.g. add a paused state), `panel_for` follows automatically. Co-authored-by: Ralphify * workspace: record `panel_for` / `is_live` dedup Co-authored-by: Ralphify * refactor: inline `total_in` alias in `_format_tokens` The local `total_in` variable was a pointless rename of `self._input_tokens` — it hinted at an aggregated "total" that no longer exists (cache-read tokens are intentionally excluded). Read `self._input_tokens` directly so the label reflects what the value actually is. Co-authored-by: Ralphify * workspace: record `total_in` alias inlining Co-authored-by: Ralphify * refactor: drop redundant f-string wrap around `_plural` result `_plural` already returns a str, so `f"{_plural(...)}"` constructs the same string via an extra format step. Pass the value straight through, matching the style of the sibling `header.append(...)` calls. Co-authored-by: Ralphify * workspace: record `_plural` f-string wrap cleanup Co-authored-by: Ralphify * refactor: drop redundant empty-user_args branch in `resolve_args` `_ARGS_RE.sub` with a callable that returns `dict.get(name, "")` already resolves every placeholder to the empty string when the dict is empty, so the `if not user_args: return _ARGS_RE.sub("", prompt)` early return was dead code producing the same output as the general path. Co-authored-by: Ralphify * workspace: record `resolve_args` empty-branch cleanup Co-authored-by: Ralphify * refactor: promote 40-line fallback height to `_DEFAULT_CONSOLE_HEIGHT` constant Both `_FullscreenPeek._console_height` (pre-render default) and `_fullscreen_page_size`'s except-branch fallback used the literal 40 to mean "reasonable terminal height when the real value isn't available." Naming the shared intent in one place keeps the two in lockstep. Co-authored-by: Ralphify * workspace: record `_DEFAULT_CONSOLE_HEIGHT` constant promotion Shift current phase from Phase 1 (dead code — exhausted) to Phase 3 (magic values), update the _console_emitter.py coverage note with the new sha, and log the iteration. Co-authored-by: Ralphify * refactor: narrow `name_col` scope to `if arg:` in `_apply_assistant` The tool_use branch computed the padded `name_col` unconditionally, then only used it when `arg` was truthy — the `else` branch uses the raw `name` instead. Move the pad-to-column formatting inside `if arg:` so the helper variable only exists where it's actually rendered, and no unnecessary f-string work happens when a tool call has no primary arg. Co-authored-by: Ralphify * workspace: record `name_col` scope narrowing and open Phase 4 Phase 3 (magic values) is essentially drained — mark it so and open Phase 4 (complex conditionals & long functions) as the current phase. The 134078d narrowing is the first concrete Phase 4 item. Co-authored-by: Ralphify * refactor: drop redundant `_active_renderable` guard in `_on_iteration_started` `_archive_current_iteration_unlocked` already no-ops when no iteration is active (guarded at its entry), so wrapping the call in `if self._active_renderable is not None:` added nothing. Dropped the `if` and updated the surrounding comment to note the archive call's no-op behavior. Same shape as 5337d88 and 4ccfa9a's dead-guard removals. Co-authored-by: Ralphify * workspace: record `_on_iteration_started` guard removal Co-authored-by: Ralphify * refactor: move `_structured_agent` check out of lock in `_on_agent_output_line` Mirrors the pattern already used in `_on_agent_activity`: the `_structured_agent` flag is write-once (set in `_on_run_started` before any iteration events can flow), so the short-circuit check is safe lock-free. Also avoids acquiring the console lock for every stdout line when running Claude in structured mode. Co-authored-by: Ralphify * workspace: record `_on_agent_output_line` lock-free check move Co-authored-by: Ralphify * refactor: skip dead `"".join(...)` in `_run_agent_streaming` when not logging The trailing block joined `stream.stdout_lines` and `stderr_lines` unconditionally, then the AgentResult ternary discarded both joined strings when `log_dir is None` (they were only ever consumed by `_write_log` and the `captured_stdout`/`captured_stderr` fields). Fold the `if log_dir is not None` check up into the join so the no-log path skips the work entirely — same shape as the already-lazy `_run_agent_blocking` tail, which uses `"".join(...) if x is not None else None` exactly once. Drops the duplicated ternary on the AgentResult fields too. Same observable behavior (verified by `test_captured_output_set_when_logging` and `test_no_log_when_dir_not_set`). Co-authored-by: Ralphify * workspace: record `_run_agent_streaming` lazy-join refactor Co-authored-by: Ralphify * refactor: drop `line_count` alias in `_IterationSpinner._build_footer` The local was computed unconditionally but only read on the truthy branch (both as predicate and as `_plural` arg). Inline the truthy check on `self._scroll_lines` and call `len()` only inside the branch that needs it — matches the sibling `_IterationPanel._build_footer` style (`if self._tool_count > 0:` with direct attribute references). Co-authored-by: Ralphify * workspace: record `_IterationSpinner._build_footer` alias drop Co-authored-by: Ralphify * refactor: inline `agent` alias in `_on_run_started` The local was read exactly once — the only use was the immediately following `_is_claude_command(agent)` call. Reading `data["agent"]` directly matches the style of the sibling fc5e1cb inline of `total_in` and keeps the function focused on what it does. Co-authored-by: Ralphify * workspace: record `agent` alias inline in `_on_run_started` Co-authored-by: Ralphify * refactor: drop `new_offset` alias in `_FullscreenPeek.scroll_down` The local was assigned immediately to `self._offset`, then the follow-mode check re-read it as `new_offset == 0`. Reading `self._offset` directly after assignment has identical semantics — the alias added no value. Sibling `scroll_up` keeps its local because it compares the new value against the old one *before* the assignment (to conditionally disable auto-scroll), which requires two distinct values. `scroll_down` has no such comparison, so the alias is dead. Same shape as ef176bf (`line_count`) and 134078d (`name_col`). Co-authored-by: Ralphify * workspace: record `_FullscreenPeek.scroll_down` alias drop Co-authored-by: Ralphify * refactor: inline `msg` alias in `_apply_assistant` The `msg = raw.get("message", {})` local was read exactly once on the next line as `msg.get("usage")`. Reading `raw.get("message", {}).get("usage")` directly matches the chained-get style already used in `_iter_content_blocks` and the inline-alias style established by 497c028 (`agent`) and fc5e1cb (`total_in`). Co-authored-by: Ralphify * workspace: record `msg` alias inline in `_apply_assistant` Co-authored-by: Ralphify * refactor: inline `_fullscreen_page_size()` into space/b lambdas The `page` local in `_handle_fullscreen_key` was computed unconditionally in the non-exit branch but only consumed by the space/b action lambdas (page-down / page-up). Inlining `self._fullscreen_page_size()` into those two lambdas means it's only evaluated when space or b is actually pressed — j/k/g/G/[/] now skip the call entirely. Behavior is identical: the dict is built and a single action is invoked per keypress, and the page value is read at action-invocation time under the same lock. Same Phase 4 shape as 134078d / ef176bf / b19625e — narrow the scope of a local that was only live on one branch. Co-authored-by: Ralphify * workspace: record `_fullscreen_page_size()` lambda-inline refactor Co-authored-by: Ralphify * refactor: use `next(reversed(...), None)` in `enter_fullscreen` history fallback The compound `if initial_id is None and self._iteration_history:` guard was doing two jobs: pick the fallback only when there's no live iteration, *and* avoid `next(reversed({}))` raising StopIteration on an empty dict. Switching to `next(reversed(...), None)` lets the standard sentinel default handle the empty case so the outer `if` only encodes the "fall back when nothing is live" intent. Same observable behavior: when both `_current_iteration` is None and `_iteration_history` is empty, `initial_id` ends up None and the existing `if initial_id is None or self.panel_for(...) is None:` branch prints "Full peek: no iterations yet" and returns False — covered by `test_enter_without_iteration_prints_hint`. Co-authored-by: Ralphify * workspace: record `enter_fullscreen` history-fallback simplification Co-authored-by: Ralphify * refactor: drop redundant BOM-startswith guard before `removeprefix` `str.removeprefix` is already a no-op when the prefix is absent — the `if text.startswith(_UTF8_BOM)` guard was dead defensive code. Same shape as 5337d88 / 4ccfa9a / 8cb0d47's "drop the guard when the operation already handles the empty/no-match case." Co-authored-by: Ralphify * workspace: record BOM-guard drop in `parse_frontmatter` Co-authored-by: Ralphify * refactor: replace `parsed = None` sentinel with try/except/else in `_read_agent_stream` The sentinel was a no-op placeholder for the error path: setting `parsed = None` just so `isinstance(parsed, dict)` would skip the forwarding block on the next line. Using the `try/except/else` structure expresses the same intent directly — the dict-handling logic runs only when `json.loads` succeeds — and drops the sentinel assignment plus the redundant dict check on the JSON-error path. Behavior preserved: the inner `isinstance(parsed, dict)` still gates forwarding for valid-but-non-dict JSON (lists, strings, numbers). Covered by the existing stream tests (`test_agent.py::test_stream_json_*`). Co-authored-by: Ralphify * workspace: record try/except/else refactor in `_read_agent_stream` Co-authored-by: Ralphify * refactor: replace `source._outcome` cross-class access with public `outcome` property `_FullscreenPeek._build_header` was reading `source._outcome` directly on a `_LivePanelBase` instance, crossing class boundaries into a private attribute. Expose an `outcome` property on `_LivePanelBase` (matching the `iteration_id` property added in ef9a178 for the parallel case) so the fullscreen header reads the value through a public API. Same observable behavior; the attribute is still the single source of truth inside `freeze`. Co-authored-by: Ralphify * workspace: record `outcome` property refactor on `_LivePanelBase` Iteration note, coverage update, and backlog entry for d0060b3 — the ef9a178-style public-property substitution applied to the remaining `source._outcome` cross-class access in the fullscreen header. Co-authored-by: Ralphify * refactor: narrow `secs` scope into `if minutes < _MINUTES_PER_HOUR:` branch `secs = total % _SECONDS_PER_MINUTE` was computed unconditionally but only consumed by the `f"{minutes}m {secs}s"` return on the truthy branch. The hours branch derives its own `mins`/`hours` pair and never references `secs`. Moving the computation inside the branch that uses it keeps the local co-located with its use site. Same Phase 4 narrow-the-scope shape as 134078d (`name_col`), ef176bf (`line_count`), 59b0e34 (`page`), b19625e (`new_offset`). Co-Authored-By: Claude Opus 4.7 (1M context) Co-authored-by: Ralphify * workspace: record `secs` scope-narrowing in `format_duration` Co-Authored-By: Claude Opus 4.7 (1M context) Co-authored-by: Ralphify * refactor: inline `text` alias in `collect_output` The local was assigned from `ensure_str(stream)` and read exactly once on the next line as `parts.append(text)`. Inlining matches the alias-inline style from fc5e1cb / 497c028 / 52e0272 — the helper name already documents the decode step. Co-authored-by: Ralphify * workspace: record `text` alias inline in `collect_output` Co-authored-by: Ralphify * refactor: narrow `line` scope past early-return in `_on_agent_output_line` `escape_markup(data["line"])` was computed unconditionally before the `if not isinstance(target, _IterationSpinner): return` guard, so the work was wasted whenever the event hit the early-return branch (no active panel, or structured panel for a non-structured-agent edge case). Moving the binding below the guard keeps the same output for the _IterationSpinner branch and drops the dead work for the other branch. Same Phase 4 shape as 134078d's `name_col` narrowing and 7730dd4's `secs` narrowing. Co-authored-by: Ralphify * workspace: record `line` scope narrowing in `_on_agent_output_line` Co-authored-by: Ralphify * refactor: inline `reason` alias in `run_loop` stop-event emit The `reason = state.status.reason` local was read exactly once on the next line as the `reason=` kwarg to `RunStoppedData(...)`. Collapsing to `reason=state.status.reason` matches the inline-alias pattern from ce487d3 (`text`) / 52e0272 (`msg`) / 497c028 (`agent`) / fc5e1cb (`total_in`). The property access is safe at this call site: the immediately-preceding `if state.status == RunStatus.RUNNING: state.status = RunStatus.COMPLETED` ensures `state.status` is always a terminal status (STOPPED / COMPLETED / FAILED), so `RunStatus.reason` cannot raise. Co-authored-by: Ralphify * workspace: record `reason` alias inline in `run_loop` Added coverage/engine.md noting the inline-alias refactor and the safety argument for the `state.status.reason` property access. Co-authored-by: Ralphify * refactor: inline `binary` alias in `_supports_stream_json` The local was read exactly once on the next line as `binary == CLAUDE_BINARY`. Collapsing to `Path(cmd[0]).stem == CLAUDE_BINARY` matches the sibling check in `_console_emitter.py:_is_claude_command`, which already uses the chained form, and the inline-alias pattern from ce487d3 / 52e0272 / 497c028 / fc5e1cb. Co-authored-by: Ralphify * workspace: record `binary` alias inline in `_supports_stream_json` Co-authored-by: Ralphify * refactor: inline `visible` alias in `_LivePanelBase._build_body` The `visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]` local was read exactly once as the iterable of the very next `for line in ...` loop. Collapsing to `for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:` matches the inline-alias pattern from 497c028 / fc5e1cb / 52e0272 / ce487d3 / e1ad87a — the slice already documents "the last N scroll lines" without needing a named intermediate. Co-authored-by: Ralphify * workspace: record `visible` alias inline in `_build_body` Co-authored-by: Ralphify * refactor: inline `reader` thread handle in `_read_agent_stream` The `reader` local served only to call `.start()` — the thread is never joined explicitly because termination is signalled through the queue's None sentinel and the daemon flag. Collapsing into the fluent `Thread(...).start()` form matches the fire-and-forget intent and drops an unused binding. Python's `threading` module keeps live threads reachable via `threading._active`, so dropping the local reference does not affect thread lifetime. Co-authored-by: Ralphify * workspace: record `reader` thread handle inline in `_read_agent_stream` Co-authored-by: Ralphify * refactor: inline `remaining` alias in `_read_agent_stream` timeout calc The `remaining = deadline - time.monotonic()` local served a single use on the very next line as `max(remaining, 0)`. Collapsing to `max(deadline - time.monotonic(), 0)` matches the inline-alias style established by e1ad87a (`binary`), 497c028 (`agent`), fc5e1cb (`total_in`), 52e0272 (`msg`), and ce487d3 (`text`). Updated the adjacent comment to refer to "clamp to 0" since the `remaining` name no longer exists. Behavior preserved — same value reaches `line_q.get(timeout=...)` on every iteration of the read loop. Co-authored-by: Ralphify * workspace: record `remaining` alias inline in `_read_agent_stream` Co-authored-by: Ralphify * refactor: inline `stream_cmd` into Popen call in `_run_agent_streaming` The local was constructed and read exactly once on the very next statement as the first positional arg to `subprocess.Popen(...)`. The three appended tokens are already named constants (`_OUTPUT_FORMAT_FLAG`, `_STREAM_FORMAT`, `_VERBOSE_FLAG`) so the intent reads clearly at the call site without the intermediate binding. Same Phase 4 inline-alias shape as 66d6c60 (`remaining`), b24accf (`reader`), 2fda4f0 (`visible`), and e1ad87a (`binary`). Co-authored-by: Ralphify * workspace: record `stream_cmd` inline in `_run_agent_streaming` Co-authored-by: Ralphify --------- Co-authored-by: Kasper Junge Co-authored-by: Ralphify Co-authored-by: Claude Opus 4.7 (1M context) --- src/ralphify/_agent.py | 63 +++--- src/ralphify/_console_emitter.py | 182 ++++++++-------- src/ralphify/_frontmatter.py | 3 +- src/ralphify/_output.py | 5 +- src/ralphify/_resolver.py | 7 +- src/ralphify/engine.py | 3 +- tests/test_console_emitter.py | 2 +- workspace/ralphs/improve-codebase/PLAN.md | 56 +++++ workspace/ralphs/improve-codebase/backlog.md | 118 ++++++++++ .../ralphs/improve-codebase/conventions.md | 3 + .../ralphs/improve-codebase/coverage/.gitkeep | 0 .../improve-codebase/coverage/_agent.md | 136 ++++++++++++ .../coverage/_console_emitter.md | 205 ++++++++++++++++++ .../improve-codebase/coverage/_frontmatter.md | 52 +++++ .../improve-codebase/coverage/_output.md | 45 ++++ .../improve-codebase/coverage/_resolver.md | 20 ++ .../improve-codebase/coverage/engine.md | 34 +++ .../ralphs/improve-codebase/iterations.md | 58 +++++ 18 files changed, 855 insertions(+), 137 deletions(-) create mode 100644 workspace/ralphs/improve-codebase/PLAN.md create mode 100644 workspace/ralphs/improve-codebase/backlog.md create mode 100644 workspace/ralphs/improve-codebase/conventions.md create mode 100644 workspace/ralphs/improve-codebase/coverage/.gitkeep create mode 100644 workspace/ralphs/improve-codebase/coverage/_agent.md create mode 100644 workspace/ralphs/improve-codebase/coverage/_console_emitter.md create mode 100644 workspace/ralphs/improve-codebase/coverage/_frontmatter.md create mode 100644 workspace/ralphs/improve-codebase/coverage/_output.md create mode 100644 workspace/ralphs/improve-codebase/coverage/_resolver.md create mode 100644 workspace/ralphs/improve-codebase/coverage/engine.md create mode 100644 workspace/ralphs/improve-codebase/iterations.md diff --git a/src/ralphify/_agent.py b/src/ralphify/_agent.py index 6823cf3d..37ece89b 100644 --- a/src/ralphify/_agent.py +++ b/src/ralphify/_agent.py @@ -49,6 +49,22 @@ OutputLineCallback = Callable[[str, OutputStream], None] """Receives raw output lines with their stream name ("stdout"/"stderr").""" + +def _call_safely(callback: Callable[..., Any] | None, *args: Any) -> None: + """Invoke an observer *callback* with *args*, swallowing any exception. + + Used for best-effort observer callbacks during output draining: a + raising callback must never stop the drain loop or leave the reader + thread hung. + """ + if callback is None: + return + try: + callback(*args) + except Exception: + pass + + # Typed constants for the OutputStream literal so the type checker enforces # that only "stdout" / "stderr" ever reach ``on_output_line``. _STDOUT: OutputStream = "stdout" @@ -340,18 +356,15 @@ def _read_agent_stream( result_text: str | None = None line_q: queue.Queue[str | None] = queue.Queue() - reader = threading.Thread(target=_readline_pump, args=(stdout, line_q), daemon=True) - reader.start() + threading.Thread(target=_readline_pump, args=(stdout, line_q), daemon=True).start() while True: # Compute how long we can wait for the next line. if deadline is not None: - remaining = deadline - time.monotonic() - # Use max(remaining, 0) so that an already-expired deadline - # still does a non-blocking drain of queued lines before - # returning — lines the reader thread already buffered are - # not silently lost. - get_timeout: float | None = max(remaining, 0) + # Clamp to 0 so that an already-expired deadline still does a + # non-blocking drain of queued lines before returning — lines + # the reader thread already buffered are not silently lost. + get_timeout: float | None = max(deadline - time.monotonic(), 0) else: get_timeout = None @@ -374,30 +387,21 @@ def _read_agent_stream( if stdout_lines is not None: stdout_lines.append(line) - if on_output_line is not None: - try: - on_output_line(line.rstrip("\r\n"), _STDOUT) - except Exception: - # Callback is best-effort; draining must not stop. - pass + _call_safely(on_output_line, line.rstrip("\r\n"), _STDOUT) stripped = line.strip() if stripped: try: parsed = json.loads(stripped) except json.JSONDecodeError: - parsed = None - if isinstance(parsed, dict): - if parsed.get("type") == _RESULT_EVENT_TYPE and isinstance( - parsed.get(_RESULT_FIELD), str - ): - result_text = parsed[_RESULT_FIELD] - if on_activity is not None: - try: - on_activity(parsed) - except Exception: - # Callback is best-effort; draining must not stop. - pass + pass + else: + if isinstance(parsed, dict): + if parsed.get("type") == _RESULT_EVENT_TYPE and isinstance( + parsed.get(_RESULT_FIELD), str + ): + result_text = parsed[_RESULT_FIELD] + _call_safely(on_activity, parsed) # Also check deadline after processing — if the reader thread # already queued many lines, this prevents unbounded processing @@ -534,12 +538,7 @@ def _pump_stream( for line in iter(stream.readline, ""): if buffer is not None: buffer.append(line) - if on_output_line is not None: - try: - on_output_line(line.rstrip("\r\n"), stream_name) - except Exception: - # Callback is best-effort; draining must not stop. - pass + _call_safely(on_output_line, line.rstrip("\r\n"), stream_name) except (ValueError, OSError): # Pipe closed concurrently — exit cleanly so join() returns. pass diff --git a/src/ralphify/_console_emitter.py b/src/ralphify/_console_emitter.py index 9ae84422..ac3bf26d 100644 --- a/src/ralphify/_console_emitter.py +++ b/src/ralphify/_console_emitter.py @@ -173,7 +173,7 @@ def _format_params(tool_input: dict[str, Any], keys: list[str]) -> str: val = tool_input.get(key) if val is not None: parts.append(f"{key}: {val}") - return " · ".join(parts) if parts else "" + return " · ".join(parts) def _extract_file_path(i: dict[str, Any]) -> str: @@ -357,6 +357,11 @@ def freeze(self, outcome: str) -> None: self._end = time.monotonic() self._outcome = outcome + @property + def outcome(self) -> str | None: + """The frozen-iteration outcome label, or ``None`` while live.""" + return self._outcome + # ── Scroll buffer management ───────────────────────────────────── def add_scroll_line(self, markup: str) -> None: @@ -411,8 +416,7 @@ def _build_body(self) -> Group: """Body group: scroll lines (or peek message) + spacer + footer.""" rows: list[Any] = [] if self._peek_visible: - visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:] - for line in visible: + for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]: line.no_wrap = True line.overflow = "ellipsis" rows.append(line) @@ -487,10 +491,8 @@ def apply(self, raw: dict[str, Any]) -> None: ) def _apply_assistant(self, raw: dict[str, Any]) -> None: - msg = raw.get("message", {}) - # Update token counts from usage - usage = msg.get("usage") + usage = raw.get("message", {}).get("usage") if isinstance(usage, dict): self._input_tokens = usage.get("input_tokens", self._input_tokens) self._output_tokens = usage.get("output_tokens", self._output_tokens) @@ -526,14 +528,14 @@ def _apply_assistant(self, raw: dict[str, Any]) -> None: color, cat, arg = _tool_display(name, tool_input) self._tool_categories[cat] = self._tool_categories.get(cat, 0) + 1 - # Pad short names to a fixed column so arguments line up; - # longer names get a guaranteed two-space gap so the arg - # never collides with the tool label. - if len(name) < _TOOL_NAME_COL: - name_col = f"{name:<{_TOOL_NAME_COL}}" - else: - name_col = f"{name} " if arg: + # Pad short names to a fixed column so arguments line up; + # longer names get a guaranteed two-space gap so the arg + # never collides with the tool label. + if len(name) < _TOOL_NAME_COL: + name_col = f"{name:<{_TOOL_NAME_COL}}" + else: + name_col = f"{name} " self.add_scroll_line( f"[bold {color}]{escape_markup(name_col)}[/]" f"[dim]{escape_markup(arg)}[/]" @@ -554,16 +556,13 @@ def _apply_user(self, raw: dict[str, Any]) -> None: def _format_tokens(self) -> str: """Format token counts as compact ctx/out string.""" parts: list[str] = [] - total_in = self._input_tokens - if total_in > 0: - parts.append(f"ctx {format_count(total_in)}") + if self._input_tokens > 0: + parts.append(f"ctx {format_count(self._input_tokens)}") if self._output_tokens > 0: parts.append(f"out {format_count(self._output_tokens)}") return " · ".join(parts) def _format_categories(self) -> str: - if not self._tool_categories: - return "" parts = [f"{v} {k}" for k, v in self._tool_categories.items()] return " · ".join(parts) @@ -615,11 +614,10 @@ class _IterationSpinner(_LivePanelBase): """ def _build_footer(self) -> Table: - line_count = len(self._scroll_lines) summary = Text(no_wrap=True, overflow="ellipsis") - if line_count > 0: + if self._scroll_lines: summary.append( - _plural(line_count, "line"), + _plural(len(self._scroll_lines), "line"), style=f"bold {_brand.PURPLE}", ) summary.append(" of agent output", style="dim") @@ -636,6 +634,10 @@ def _build_footer(self) -> Table: _FULLSCREEN_CHROME_ROWS = 2 _FULLSCREEN_MIN_VISIBLE = 3 +# Fallback terminal height used before the first render populates the +# real value, and when ``Console.size.height`` access fails. +_DEFAULT_CONSOLE_HEIGHT = 40 + @dataclass(slots=True) class _ScrollbarMetrics: @@ -665,8 +667,8 @@ def _scrollbar_metrics(total: int, visible: int, offset: int) -> _ScrollbarMetri if total <= visible: return _ScrollbarMetrics(show=False, thumb_start=0, thumb_size=0) thumb_size = max(1, visible * visible // total) - max_off = max(total - visible, 1) - frac = 1.0 - (offset / max_off) + # Safe: the early return above guarantees total > visible, so total - visible ≥ 1. + frac = 1.0 - (offset / (total - visible)) track_space = visible - thumb_size thumb_start = int(frac * track_space) return _ScrollbarMetrics(show=True, thumb_start=thumb_start, thumb_size=thumb_size) @@ -767,9 +769,8 @@ def scroll_up(self, lines: int = 1) -> None: def scroll_down(self, lines: int = 1) -> None: """Scroll toward newer lines (offset shrinks).""" - new_offset = max(0, self._offset - lines) - self._offset = new_offset - if new_offset == 0: + self._offset = max(0, self._offset - lines) + if self._offset == 0: self._auto_scroll = True def scroll_to_top(self) -> None: @@ -778,54 +779,47 @@ def scroll_to_top(self) -> None: self._auto_scroll = False def scroll_to_bottom(self) -> None: + """Snap to the newest line and re-enable follow mode.""" self._offset = 0 self._auto_scroll = True # ── Iteration navigation ───────────────────────────────────────── - def _reset_view(self) -> None: - """Snap to bottom + follow when switching iterations.""" - self._offset = 0 - self._auto_scroll = True + def _step_iteration(self, direction: int) -> bool: + """Move *direction* iterations (-1 = prev, +1 = next). - def prev_iteration(self) -> bool: - """Move to the iteration before the current one. Returns ``True`` - if the view changed; ``False`` when there is no older iteration.""" + Returns ``True`` when the view changed; ``False`` when already at + the boundary in the requested direction. When the current + iteration was evicted from the navigator, snaps to the oldest + (prev) or newest (next) entry instead of failing. + """ ids = self._navigator.iteration_ids() if not ids: return False if self._iteration_id not in ids: - # Current iteration was evicted — snap to oldest available. - self._iteration_id = ids[0] - self._reset_view() + self._iteration_id = ids[0] if direction < 0 else ids[-1] + self.scroll_to_bottom() return True - idx = ids.index(self._iteration_id) - if idx == 0: + new_idx = ids.index(self._iteration_id) + direction + if not 0 <= new_idx < len(ids): return False - self._iteration_id = ids[idx - 1] - self._reset_view() + self._iteration_id = ids[new_idx] + self.scroll_to_bottom() return True + def prev_iteration(self) -> bool: + """Move to the iteration before the current one. Returns ``True`` + if the view changed; ``False`` when there is no older iteration.""" + return self._step_iteration(-1) + def next_iteration(self) -> bool: """Move to the iteration after the current one. Returns ``True`` if the view changed; ``False`` when already on the newest.""" - ids = self._navigator.iteration_ids() - if not ids: - return False - if self._iteration_id not in ids: - self._iteration_id = ids[-1] - self._reset_view() - return True - idx = ids.index(self._iteration_id) - if idx >= len(ids) - 1: - return False - self._iteration_id = ids[idx + 1] - self._reset_view() - return True + return self._step_iteration(+1) # ── Rendering ──────────────────────────────────────────────────── - _console_height: int = 40 # updated on every render + _console_height: int = _DEFAULT_CONSOLE_HEIGHT # updated on every render def _build_header(self, total: int, visible: int) -> Text: header = Text(no_wrap=True, overflow="ellipsis") @@ -844,12 +838,12 @@ def _build_header(self, total: int, visible: int) -> Text: header.append("live", style=f"italic {_brand.GREEN}") else: source = self._source - outcome = source._outcome if source is not None else None + outcome = source.outcome if source is not None else None if outcome: header.append(" · ", style="dim") header.append(outcome, style=f"italic {_brand.LAVENDER}") header.append(" · ", style="dim") - header.append(f"{_plural(total, 'line')}", style="dim") + header.append(_plural(total, "line"), style="dim") if self._auto_scroll: header.append(" · ", style="dim") header.append("following", style=f"italic {_brand.GREEN}") @@ -978,10 +972,10 @@ def __init__(self, console: Console) -> None: # receiving events). ``None`` between iterations. self._current_iteration: int | None = None # Bounded ring buffer of finished iteration panels, keyed by - # iteration number. Insertion order is tracked separately so - # eviction is O(1). Used by fullscreen peek for browsing. + # iteration number. Python dicts preserve insertion order, so + # the oldest entry is always first — used for eviction. Used by + # fullscreen peek for browsing. self._iteration_history: dict[int, _LivePanelBase] = {} - self._iteration_order: list[int] = [] # Fullscreen peek state — a second Live using Rich's alt-screen # that shows an iteration's full activity buffer with scroll + # iteration-navigation controls. While fullscreen is active the @@ -1048,10 +1042,7 @@ def iteration_ids(self) -> list[int]: def panel_for(self, iteration_id: int) -> _LivePanelBase | None: """Look up the panel for *iteration_id* in history or active state.""" - if ( - self._current_iteration == iteration_id - and self._active_renderable is not None - ): + if self.is_live(iteration_id): return self._active_renderable return self._iteration_history.get(iteration_id) @@ -1113,28 +1104,27 @@ def _archive_current_iteration_unlocked(self, outcome: str) -> None: iteration_id = self._current_iteration panel.freeze(outcome) # Record (or refresh order of) the iteration in history. - if iteration_id in self._iteration_history: - self._iteration_order.remove(iteration_id) + # Pop-then-insert moves an existing entry to the end of the dict's + # insertion order so eviction always drops the oldest first. + self._iteration_history.pop(iteration_id, None) self._iteration_history[iteration_id] = panel - self._iteration_order.append(iteration_id) # Eviction: drop oldest until at or below the cap, but skip the # iteration the user is currently viewing in fullscreen. viewing = ( - self._fullscreen_view._iteration_id + self._fullscreen_view.iteration_id if self._fullscreen_view is not None else None ) - while len(self._iteration_order) > _MAX_HISTORY_ITERATIONS: + while len(self._iteration_history) > _MAX_HISTORY_ITERATIONS: candidate = next( - (iid for iid in self._iteration_order if iid != viewing), + (iid for iid in self._iteration_history if iid != viewing), None, ) if candidate is None: # All remaining entries are the viewed iteration (impossible # with one viewer) — bail to avoid an infinite loop. break - self._iteration_order.remove(candidate) - self._iteration_history.pop(candidate, None) + self._iteration_history.pop(candidate) self._active_renderable = None self._current_iteration = None @@ -1207,14 +1197,18 @@ def _panel_for_event(self, iteration: int | None) -> _LivePanelBase | None: return self._active_renderable def _on_agent_output_line(self, data: AgentOutputLineData) -> None: + # When we have structured rendering, raw lines are redundant noise. + # ``_structured_agent`` is write-once (set in ``_on_run_started`` + # before any iteration events flow), so the check is lock-free — + # same pattern as ``_on_agent_activity``. Skipping the lock also + # avoids contention on every stdout line when running Claude. + if self._structured_agent: + return with self._console_lock: - # When we have structured rendering, raw lines are redundant noise. - if self._structured_agent: - return - line = escape_markup(data["line"]) target = self._panel_for_event(data["iteration"]) if not isinstance(target, _IterationSpinner): return + line = escape_markup(data["line"]) target.add_scroll_line(f"[white]{line}[/]") self._refresh_live_unlocked(target) @@ -1300,6 +1294,15 @@ def _start_compact_live_unlocked(self, renderable: _LivePanelBase) -> None: ) self._live.start() + def _stop_compact_live_unlocked(self) -> None: + """Stop the compact Live region if active. No-op otherwise. + + Caller must hold ``_console_lock``. + """ + if self._live is not None: + self._live.stop() + self._live = None + def _stop_live_unlocked(self) -> None: """Tear down all Live regions and forget the active iteration. @@ -1313,9 +1316,7 @@ def _stop_live_unlocked(self) -> None: self._fullscreen_live.stop() self._fullscreen_live = None self._fullscreen_view = None - if self._live is not None: - self._live.stop() - self._live = None + self._stop_compact_live_unlocked() self._active_renderable = None self._current_iteration = None @@ -1340,8 +1341,8 @@ def enter_fullscreen(self) -> bool: if self._fullscreen_view is not None: return True # already active — no-op initial_id: int | None = self._current_iteration - if initial_id is None and self._iteration_order: - initial_id = self._iteration_order[-1] + if initial_id is None: + initial_id = next(reversed(self._iteration_history), None) if initial_id is None or self.panel_for(initial_id) is None: self._console.print("[dim]Full peek: no iterations yet[/]") return False @@ -1349,9 +1350,7 @@ def enter_fullscreen(self) -> bool: self._fullscreen_view = view # Stop the compact Live before taking over the terminal so # the two Rich renderers don't fight for the same console. - if self._live is not None: - self._live.stop() - self._live = None + self._stop_compact_live_unlocked() self._fullscreen_live = Live( view, console=self._console, @@ -1422,7 +1421,7 @@ def _fullscreen_page_size(self) -> int: try: height = self._console.size.height except Exception: - height = 40 + height = _DEFAULT_CONSOLE_HEIGHT return max(1, height - _FULLSCREEN_CHROME_ROWS - 2) def handle_key(self, key: str) -> None: @@ -1451,12 +1450,11 @@ def _handle_fullscreen_key(self, key: str) -> None: if view is None: return # raced with exit if key not in ("q", FULLSCREEN_PEEK_KEY): - page = self._fullscreen_page_size() actions: dict[str, Callable[[], object]] = { "j": lambda: view.scroll_down(1), "k": lambda: view.scroll_up(1), - " ": lambda: view.scroll_down(page), - "b": lambda: view.scroll_up(page), + " ": lambda: view.scroll_down(self._fullscreen_page_size()), + "b": lambda: view.scroll_up(self._fullscreen_page_size()), "g": view.scroll_to_top, "G": view.scroll_to_bottom, PREV_ITERATION_KEY: view.prev_iteration, @@ -1476,9 +1474,9 @@ def _on_iteration_started(self, data: IterationStartedData) -> None: with self._console_lock: self._peek_broken = False # Defensive: if a previous iteration didn't archive (engine - # error), evict it now so we don't leak panel state. - if self._active_renderable is not None: - self._archive_current_iteration_unlocked("interrupted") + # error), evict it now so we don't leak panel state. The + # archive call no-ops when nothing is active. + self._archive_current_iteration_unlocked("interrupted") self._current_iteration = iteration renderable = self._create_panel_unlocked() @@ -1528,9 +1526,7 @@ def _on_iteration_ended( # underlying panel is preserved in history for fullscreen # browsing. When fullscreen is active there is no compact # Live to stop; the panel was buffering events directly. - if self._live is not None: - self._live.stop() - self._live = None + self._stop_compact_live_unlocked() self._archive_current_iteration_unlocked(outcome) def do_print() -> None: diff --git a/src/ralphify/_frontmatter.py b/src/ralphify/_frontmatter.py index ed92a1cc..737ca6ab 100644 --- a/src/ralphify/_frontmatter.py +++ b/src/ralphify/_frontmatter.py @@ -116,8 +116,7 @@ def parse_frontmatter(text: str) -> tuple[dict[str, Any], str]: Returns ``(frontmatter_dict, body_text)``. """ - if text.startswith(_UTF8_BOM): - text = text.removeprefix(_UTF8_BOM) + text = text.removeprefix(_UTF8_BOM) fm_raw, body = _extract_frontmatter_block(text) if fm_raw: try: diff --git a/src/ralphify/_output.py b/src/ralphify/_output.py index 766dfc33..9da59e6b 100644 --- a/src/ralphify/_output.py +++ b/src/ralphify/_output.py @@ -71,10 +71,9 @@ def collect_output( parts: list[str] = [] for stream in (stdout, stderr): if stream: - text = ensure_str(stream) if parts and not parts[-1].endswith("\n"): parts.append("\n") - parts.append(text) + parts.append(ensure_str(stream)) return "".join(parts) @@ -130,8 +129,8 @@ def format_duration(seconds: float) -> str: # latter silently drops 0.5s when the total is even (e.g. 90.5→90). total = int(seconds + 0.5) minutes = total // _SECONDS_PER_MINUTE - secs = total % _SECONDS_PER_MINUTE if minutes < _MINUTES_PER_HOUR: + secs = total % _SECONDS_PER_MINUTE return f"{minutes}m {secs}s" hours = minutes // _MINUTES_PER_HOUR mins = minutes % _MINUTES_PER_HOUR diff --git a/src/ralphify/_resolver.py b/src/ralphify/_resolver.py index 6281f5d6..2ec44762 100644 --- a/src/ralphify/_resolver.py +++ b/src/ralphify/_resolver.py @@ -35,11 +35,10 @@ def resolve_args(prompt: str, user_args: dict[str, str]) -> str: """Replace ``{{ args.name }}`` placeholders with user-supplied values. - When *user_args* is empty, clears any remaining ``{{ args.* }}`` - placeholders so they don't leak into the assembled prompt. + Unknown names (and all placeholders when *user_args* is empty) resolve + to the empty string, so stray ``{{ args.* }}`` never leak into the + assembled prompt. """ - if not user_args: - return _ARGS_RE.sub("", prompt) def _replace(match: re.Match) -> str: return user_args.get(match.group(1), "") diff --git a/src/ralphify/engine.py b/src/ralphify/engine.py index 8b88139c..71876cf5 100644 --- a/src/ralphify/engine.py +++ b/src/ralphify/engine.py @@ -407,11 +407,10 @@ def run_loop( if state.status == RunStatus.RUNNING: state.status = RunStatus.COMPLETED - reason = state.status.reason emit( EventType.RUN_STOPPED, RunStoppedData( - reason=reason, + reason=state.status.reason, total=state.total, completed=state.completed, failed=state.failed, diff --git a/tests/test_console_emitter.py b/tests/test_console_emitter.py index d5c57a87..730b4edd 100644 --- a/tests/test_console_emitter.py +++ b/tests/test_console_emitter.py @@ -1871,7 +1871,7 @@ def test_history_eviction_protects_viewed_iteration(self): ) if i > 2 else None emitter.emit(_make_event(EventType.ITERATION_STARTED, iteration=i)) assert 1 in emitter._iteration_history - assert len(emitter._iteration_order) <= _MAX_HISTORY_ITERATIONS + assert len(emitter._iteration_history) <= _MAX_HISTORY_ITERATIONS finally: emitter._stop_live() diff --git a/workspace/ralphs/improve-codebase/PLAN.md b/workspace/ralphs/improve-codebase/PLAN.md new file mode 100644 index 00000000..f6a3fcaf --- /dev/null +++ b/workspace/ralphs/improve-codebase/PLAN.md @@ -0,0 +1,56 @@ +# Improve Codebase — Plan + +Ralphify is a small, well-tested Python CLI (~4.7k LOC src, 628 tests passing, +ruff + ty clean). Recent commits have been steady refactor work targeting +`_console_emitter.py` (the biggest file at ~1.6k LOC). Continue that thread: +squeeze out duplication and complexity in the hottest files first, then fan +out to smaller polish. + +## Phases + +1. **Dead code / unused symbols** — private helpers with no callers, stale + constants, dead branches. Verify with grep + tests. +2. **Duplication** — copy-pasted blocks (especially in `_console_emitter.py` + and `_agent.py`). Extract helpers where extraction does not hurt clarity. +3. **Magic values & local constants** — stringly/numeric literals that repeat + inside a single module; lift to a named constant near the top. +4. **Complex conditionals & long functions** — split `_console_emitter.py` + functions that juggle many states; extract pure helpers. +5. **Naming & structure** — vague names, misplaced helpers, module-level + imports that could collapse. +6. **Tests** — tighten unclear test names, merge duplicated fixtures. + +Each iteration must preserve behavior. If an "improvement" changes observable +behavior (even by one log line), skip it and leave a note in backlog.md. + +## Current phase + +**Phase 4 — complex conditionals & long functions.** Phase 3 (magic values) +is now essentially drained: every module-level scan for numeric literals +≥ 10 across `src/ralphify/` turns up only named constants. Phase 1 (dead +code) and Phase 2 (duplication) stay open for anything spotted in passing. +Move on to simplifying local control flow where a variable is computed +unconditionally but only used on one branch, or where a helper can tighten +a nested conditional without losing clarity. The 134078d `name_col` +scope narrowing is a representative Phase 4 move: same output, dead work +gone, clearer scope. + +## Priorities (tailored to this repo) + +- The largest module is `_console_emitter.py`; every iteration there should + leave the module smaller *or* clearer, never both-at-once. +- `_agent.py` has two execution paths (streaming / blocking) that historically + drift apart — watch for duplication. +- Constants like `_MAX_VISIBLE_SCROLL`, `_MAX_SCROLL_LINES`, + `_SIGTERM_GRACE_PERIOD` live where they're used. Don't centralize them + unless two modules need the same value. +- Do not churn public API: `src/ralphify/__init__.py` re-exports, CLI + commands, and event payload types. +- Do not change docs wording in this ralph; that's for other ralphs. + +## Out of scope + +- New features or behavior changes +- Dependency upgrades +- Docs content (beyond fixing stale contributor notes encountered in passing) +- Release tooling diff --git a/workspace/ralphs/improve-codebase/backlog.md b/workspace/ralphs/improve-codebase/backlog.md new file mode 100644 index 00000000..5c647856 --- /dev/null +++ b/workspace/ralphs/improve-codebase/backlog.md @@ -0,0 +1,118 @@ +# Backlog + +Ordered roughly by phase, then by expected payoff. Add items freely; remove +only when they land in a commit. + +## Phase 1 — dead code + +- Audit `_console_emitter.py` for unused private helpers / constants (grep + each `_foo` name for other references inside the module and tests). +- Audit `_agent.py` for parallel streaming/blocking helpers that reference + the same constants but define their own copies. (cb61477 — extracted + `_call_safely` for the 3× best-effort observer-callback pattern; no + remaining obvious dup after that pass. Streaming's `_readline_pump` and + blocking's `_pump_stream` look similar but do genuinely different work: + the queue-based pump feeds a main-thread loop that parses JSON, while + the list-based pump does its callback work inline on its own thread.) +- Check `cli.py` validators for unreachable error branches after recent + TypedDict refactors. +- Confirm every `from typing import ...` import in `src/ralphify/` is used. + (Checked 4ccfa9a — all six modules import only what they use.) +- vulture 60% flags that were verified as live: `clear_scroll`, + `_SinglePanelNavigator`, `_stop_live`, `serialize_frontmatter`, + `to_dict`, `_atexit_hook`, RunManager public methods — all used in tests, + docs, or scripts/. TypedDict field "unused" warnings are spurious. +- Consider inlining `_validate_name` into `_check_unique_name` in `cli.py` + (the former has exactly one caller). Tradeoff: the split doc-strings + document the two concerns (format vs uniqueness) cleanly. +- `_is_claude_command` (`_console_emitter.py`) and `_supports_stream_json` + (`_agent.py`) both check `Path(parts[0]).stem == CLAUDE_BINARY` but on + different inputs (string vs list). Consolidating would cross module + boundaries for modest payoff — revisit only if a third caller appears. + +## Phase 2 — duplication + +- Look for repeated `console.print(...)` formatting patterns in + `_console_emitter.py`. +- Look for repeated dict/TypedDict key access patterns in the event handlers. +- The fullscreen-Live teardown (`self._fullscreen_live.stop(); = None`) + appears in `_stop_live_unlocked` and `_teardown_fullscreen_unlocked` — + only two call sites and each is adjacent to other state mutations, so + extracting right now would just add indirection. Revisit if a third + caller appears. +- The `try: fn(); except Exception: pass` pattern appears in + `_print_or_defer_unlocked`, `_flush_deferred_unlocked` (loop body), + and around `handle_key`'s body. Could extract a tiny `_safe_call` + but each site is one line; not worth the indirection unless a fourth + appears. +- `_IterationPanel._build_footer` and `_IterationSpinner._build_footer` + both create `summary = Text(no_wrap=True, overflow="ellipsis")` then + branch on count > 0 vs "waiting…". Two subclasses only — already + noted in coverage as not-worth-extracting. +- (01f2f1c — dropped `_FullscreenPeek._reset_view` which had the same body + as `scroll_to_bottom`.) No other near-duplicate scroll helpers spotted + in that class; `scroll_up` / `scroll_down` / `scroll_to_top` each touch + `_auto_scroll` under different conditions. (b19625e — dropped the + `new_offset` alias in `scroll_down`; `scroll_up` keeps its local + because it compares old vs new before the assignment.) + +## Phase 3 — magic values + +Essentially drained. Latest full scan (at 134078d): every bare integer +≥ 10 across `src/ralphify/` already resolves to a named constant, and +the handful of remaining single-site `2`s are flagged below with +"only if a second site appears". + +- Scan each module's numeric literals (especially timeouts, widths, retry + counts) and promote to module constants when reused. (1d7251f — + promoted `40` → `_DEFAULT_CONSOLE_HEIGHT` for the two fallback-height + sites in `_console_emitter.py`.) +- `_console_emitter.py:_fullscreen_page_size` uses a bare `2` as the + "page overlap lines" magic — only one site, but could be named + `_PAGE_OVERLAP` for symmetry with `_FULLSCREEN_CHROME_ROWS` if a + second page-size helper ever appears. +- `_keypress.py` has `_POLL_INTERVAL`, `_WIN_POLL_INTERVAL`, + `_THREAD_JOIN_TIMEOUT` already at module top. No obvious leftover + literals worth promoting. + +## Phase 4 — complex conditionals & long functions + +- (134078d — narrowed `name_col` scope in `_IterationPanel._apply_assistant` + so the padded name column is only computed on the branch that renders + it.) +- (7730dd4 — narrowed `secs = total % _SECONDS_PER_MINUTE` in + `_output.py:format_duration` into the `if minutes < _MINUTES_PER_HOUR:` + branch. Saved a modulo on every duration ≥ 1h and co-located the + local with its only use site.) +- (ce487d3 — inlined `text = ensure_str(stream)` in + `_output.py:collect_output`. Same alias-inline shape as fc5e1cb / + 497c028 / 52e0272. Helper name `ensure_str` already documents the + decode step, so the intermediate binding added no clarity.) +- (d0060b3 — exposed `_LivePanelBase.outcome` as a public property and + switched `_FullscreenPeek._build_header`'s `source._outcome` read to + go through it. Mirror of ef9a178's `iteration_id` cleanup.) Two + private-attr cross-class reads remain — both of `source._scroll_lines` + in `_FullscreenPeek._max_offset` and `__rich_console__`. Those touch a + mutable list the class itself appends to, so a read-only property + would hide the mutation asymmetry; defer unless a clearer abstraction + emerges (e.g., a "get a snapshot of visible lines" helper). +- `_IterationPanel._apply_assistant` still juggles three block types + (`thinking` / `text` / `tool_use`) in one ~50-line method. Splitting + into `_render_thinking_block` / `_render_text_block` / `_render_tool_use_block` + would shorten the outer loop but each helper is short enough that the + indirection may not pay off — revisit only if a fourth block type lands. +- `cli.py:_parse_user_args` is 55 lines of token-by-token iteration with + two nested branches and a while-loop that skips already-filled declared + names. Could be split into `_consume_flag` / `_consume_positional` + helpers without changing any error message. Medium payoff, medium + churn — land only once behavior is fully pinned by tests (which it is). + +## Notes / ideas to triage + +- `scripts/tui_dev/` has its own fixtures; out of scope unless it blocks a + src/ralphify/ change. +- `_IterationPanel._cache_read_tokens` is captured from usage but never + read in production — only the regression test + `test_format_tokens_does_not_double_count_cached_input` reads it. The + capture protects against a hypothetical future display, so not strictly + dead, but worth revisiting when token rendering changes. diff --git a/workspace/ralphs/improve-codebase/conventions.md b/workspace/ralphs/improve-codebase/conventions.md new file mode 100644 index 00000000..4a16e49e --- /dev/null +++ b/workspace/ralphs/improve-codebase/conventions.md @@ -0,0 +1,3 @@ +# Conventions learned in this codebase + +Record repo-wide patterns worth preserving. One bullet per pattern. diff --git a/workspace/ralphs/improve-codebase/coverage/.gitkeep b/workspace/ralphs/improve-codebase/coverage/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/workspace/ralphs/improve-codebase/coverage/_agent.md b/workspace/ralphs/improve-codebase/coverage/_agent.md new file mode 100644 index 00000000..898b2029 --- /dev/null +++ b/workspace/ralphs/improve-codebase/coverage/_agent.md @@ -0,0 +1,136 @@ +# `_agent.py` coverage + +Valid at: 7402f04 + +## Recent changes + +- 7402f04 — inlined the `stream_cmd = cmd + [_OUTPUT_FORMAT_FLAG, + _STREAM_FORMAT, _VERBOSE_FLAG]` local in `_run_agent_streaming`. + The binding was consumed exactly once on the very next statement + (the first positional arg to `subprocess.Popen(...)`); no other + references exist in src/ or tests/ (grep confirmed). The three + appended tokens are already named constants + (`_OUTPUT_FORMAT_FLAG`, `_STREAM_FORMAT`, `_VERBOSE_FLAG`) so the + "extended command for streaming mode" intent reads cleanly at the + call site without the intermediate name. Same Phase 4 inline-alias + shape as 66d6c60 (`remaining`), b24accf (`reader`), 2fda4f0 + (`visible`), and e1ad87a (`binary`). Behavior preserved — + subprocess.Popen still receives the same list; pinned by the + streaming-path test coverage in `tests/test_agent.py`. +- 66d6c60 — inlined the `remaining = deadline - time.monotonic()` local + in `_read_agent_stream`'s per-iteration timeout calc. The alias was + read exactly once on the next line as `max(remaining, 0)`; collapsing + to `max(deadline - time.monotonic(), 0)` matches the inline-alias + style from e1ad87a / 497c028 / 52e0272 / ce487d3. The adjacent + comment was updated ("max(remaining, 0)" → "clamp to 0") since the + name no longer exists. Behavior preserved — the clamped timeout + still reaches `line_q.get(timeout=...)`, so the non-blocking drain + on an already-expired deadline still fires and deadline enforcement + is unchanged. Pinned by the streaming-path agent tests. No other + `remaining` locals remain in the module (grep confirmed). +- b24accf — inlined the `reader` thread handle in `_read_agent_stream`. + The local served only to call `.start()`; the thread is never joined + explicitly (termination is signalled through the queue's `None` + sentinel produced by `_readline_pump`'s `finally` and through the + daemon flag). Collapsing into the fluent + `threading.Thread(target=_readline_pump, args=(stdout, line_q), + daemon=True).start()` drops an unused binding and matches the + fire-and-forget intent. Python keeps live threads reachable via + `threading._active`, so no GC risk. Side effects preserved: the + reader still closes cleanly on `_close_pipes` (OSError in + `readline`), and the main loop still relies on `line_q.get` for + deadline enforcement. Pinned by the full `tests/test_agent.py` + suite (streaming-path coverage). This is the same alias/handle-drop + shape as e1ad87a / 497c028 / b19625e, specialised to a Thread — + thread-return values aren't special, they're just another handle + whose only use was `.start()`. +- e1ad87a — inlined the `binary = Path(cmd[0]).stem` local in + `_supports_stream_json`. The alias was read exactly once on the + following line as `binary == CLAUDE_BINARY`. Collapsing to + `return Path(cmd[0]).stem == CLAUDE_BINARY` matches the already-inline + sibling check in `_console_emitter.py:_is_claude_command` + (`return Path(parts[0]).stem == CLAUDE_BINARY`). Same Phase 4 + inline-alias shape as ce487d3 / 52e0272 / 497c028 / fc5e1cb. Empty-cmd + short-circuit (`if not cmd: return False`) preserved so + `Path(cmd[0])` never gets indexed into an empty list. Backlog note + about consolidating `_is_claude_command` and `_supports_stream_json` + across modules is unchanged — still deferred until a third caller + appears. +- cf72fd9 — replaced the `parsed = None` sentinel in `_read_agent_stream` + with a `try/except/else` block. The old code set `parsed = None` in + the JSON-decode-except branch solely so the next line's + `if isinstance(parsed, dict):` would fall through; restructuring with + `try: ... except: pass; else: if isinstance(...):` makes the "only + forward when parsing succeeded" intent structural instead of encoded + through a sentinel value. The error path now skips the isinstance + check entirely (dead work before), and the success path is unchanged. + `parsed` is no longer bound when the except clause runs, which matches + Python convention — the value was always meant to be ignored there. + Pinned by `tests/test_agent.py::test_ignores_non_json_lines` and the + broader stream-JSON coverage in that file. +- d8d5592 — gated the `"".join(...)` of `stream.stdout_lines` and + `stderr_lines` at the tail of `_run_agent_streaming` on + `log_dir is not None`. The joined strings were only consumed by + `_write_log` (which short-circuits when log_dir is None) and by the + `captured_stdout` / `captured_stderr` AgentResult fields, both of + which previously discarded the joined string with + `... if log_dir is not None else None`. Now matches the + already-lazy `"".join(x) if x is not None else None` idiom in + `_run_agent_blocking`'s tail, and the duplicated ternary on each + AgentResult field collapses to a bare `stdout` / `stderr`. Same + observable behavior — pinned by `test_captured_output_set_when_logging` + and `test_no_log_when_dir_not_set` in tests/test_agent.py. +- cb61477 — added `_call_safely(callback, *args)` helper next to the + callback type aliases. Replaces three copies of the + `if cb is not None: try: cb(...); except Exception: pass` pattern + (two in `_read_agent_stream`, one in `_pump_stream`) with single-line + calls. Behavior preserved — identical None guard, identical broad + `Exception` suppression, identical argument-once semantics. + +## Shape of the module + +- Two execution paths: `_run_agent_streaming` (JSON line stream, used for + `claude`) and `_run_agent_blocking` (subprocess.Popen with optional + capture, used for all other agents). +- `execute_agent` is the single public entry point; selects mode via + `_supports_stream_json(cmd)` (checks `Path(cmd[0]).stem == CLAUDE_BINARY`). +- Shared shutdown sequence is centralized in `_cleanup_agent`: + 1. `_ensure_process_dead` (SIGTERM → SIGKILL via `_try_graceful_group_kill`, + then `proc.kill()`). + 2. `_close_pipes` (raw `os.close` on stdout/stderr fds to unblock readers). + 3. `_drain_readers` (bounded join on reader/writer threads). + 4. `_finalize_pipes` (Python-level `pipe.close()` for GC hygiene). +- Thread spawning uses `_start_writer_thread` / `_start_pump_thread` to + centralize the `Thread(..., daemon=True); .start()` boilerplate. + +## Verified live (grepped, confirmed used) + +- `CLAUDE_BINARY` — public; imported by `_console_emitter.py` for display + logic (see backlog note about consolidating `_is_claude_command` / + `_supports_stream_json`; deferred until a third caller appears). +- `_STDOUT`, `_STDERR` — used in `_run_agent_streaming` / + `_run_agent_blocking` stderr pump calls and inside `_read_agent_stream`. +- `_SIGTERM_GRACE_PERIOD`, `_THREAD_JOIN_TIMEOUT`, `_PROCESS_WAIT_TIMEOUT` + — each referenced exactly once; constants kept near usage as the + project convention prefers. +- `AgentResult`, `_StreamResult` — returned from streaming/blocking paths + and consumed by `engine.py`. + +## Potential future wins (not yet taken) + +- `_run_agent_streaming` and `_run_agent_blocking` both finish with the + same "`stdout = "".join(...) if … else None; stderr = "".join(...) + if … else None; log_file = _write_log(...); return AgentResult(...)`" + tail. After d8d5592 the conditional-join idiom is now identical + across both paths (gated on `log_dir is not None` for streaming, + on `stdout_lines is not None` for blocking — but `stdout_lines` is + itself `[] if log_dir is not None else None`, so the conditions are + equivalent). The intermediate state still differs (`stream.stdout_lines` + tuple vs `stdout_lines` list|None), so extracting a shared helper + would mostly move arguments around. Revisit only if a third + execution path appears. +- The two `if proc.stdin/stdout/stderr is None: raise RuntimeError(...)` + guards just after `Popen` could use a single helper, but `subprocess` + guarantees these are non-None when `PIPE` is passed — the guards exist + mainly to narrow for the type checker, and a helper would make the + narrow less explicit. Leave as-is. diff --git a/workspace/ralphs/improve-codebase/coverage/_console_emitter.md b/workspace/ralphs/improve-codebase/coverage/_console_emitter.md new file mode 100644 index 00000000..c12834d9 --- /dev/null +++ b/workspace/ralphs/improve-codebase/coverage/_console_emitter.md @@ -0,0 +1,205 @@ +# `_console_emitter.py` coverage + +Valid at: 2fda4f0 + +## Recent changes + +- 2fda4f0 — inlined the `visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]` + local in `_LivePanelBase._build_body`. The alias was read exactly + once, as the iterable of the very next `for line in visible:` loop. + Collapsing to `for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:` + matches the inline-alias pattern from 497c028 (`agent`), fc5e1cb + (`total_in`), 52e0272 (`msg`), ce487d3 (`text`), and e1ad87a + (`binary`). Behavior unchanged — each iteration still mutates the + Text in place (`no_wrap` / `overflow`) before appending to `rows`, + and the slice still materializes the last `_MAX_VISIBLE_SCROLL` + items. No other `visible` locals remain in the class; the name is + reused elsewhere in the module (fullscreen viewport height, scrollbar + geometry) but all in unrelated scopes. +- 3823019 — narrowed `line = escape_markup(data["line"])` scope in + `_on_agent_output_line` past the + `if not isinstance(target, _IterationSpinner): return` guard. The + escape_markup call was wasted work on the early-return path (target + None or wrong-type panel); moving the binding after the guard keeps + the _IterationSpinner branch behavior identical and drops the wasted + work on the other branch. Same shape as 134078d's `name_col` + narrowing — unconditional compute that only one branch consumes. + Note: the `_structured_agent` short-circuit earlier in the method + already skips this path for Claude runs (ad7523e), so this narrowing + only affects the raw-stdout path. +- d0060b3 — added a public `outcome` property on `_LivePanelBase` and + replaced `source._outcome` in `_FullscreenPeek._build_header` with + `source.outcome`. Mirrors ef9a178's `iteration_id` cleanup — both + commits expose an existing private attribute through a getter so the + cross-class read doesn't have to dip into private state. The + `_outcome` attribute is still the single source of truth (written + only in `freeze`), and tests that read `_outcome` directly + (test_console_emitter.py:1766) keep working. Two private-attr + cross-class reads remain in the module (`source._scroll_lines` at + lines 750 and 872); those touch a mutable list that the class itself + appends to, so a read-only property would paper over the mutation + asymmetry — not taking until a real need appears. +- 3a8908d — replaced the `if initial_id is None and self._iteration_history:` + guard in `enter_fullscreen` with `next(reversed(self._iteration_history), None)`. + The compound condition was doing two jobs at once: pick the fallback only + when nothing is live, *and* sidestep `next(reversed({}))` raising + StopIteration on the empty dict. The `next(it, default)` form moves the + empty-handling into the standard library idiom so the outer `if` reads + as a single concern. Same observable behavior — the immediately-following + `if initial_id is None or self.panel_for(initial_id) is None:` branch + still prints "Full peek: no iterations yet" and returns False when the + fallback yielded nothing. Pinned by `test_enter_without_iteration_prints_hint`. +- 59b0e34 — inlined `self._fullscreen_page_size()` into the space/b + action lambdas in `_handle_fullscreen_key`. The `page` local was + computed unconditionally in the non-exit branch but only consumed by + the page-down (" ") and page-up ("b") lambdas — j/k/g/G/[/] now skip + the call entirely. Space/b still compute it exactly once per keypress, + now at action-invocation time (under the same `_console_lock`), so + behavior is unchanged. `_fullscreen_page_size()` is a pure read of + `self._console.size.height` in a try/except, so deferring evaluation + has no observable effect — the dict build and action invocation happen + back-to-back inside the lock. Same Phase 4 shape as 134078d / ef176bf + / b19625e. +- 52e0272 — inlined the `msg = raw.get("message", {})` local in + `_IterationPanel._apply_assistant`. The alias was read exactly once on + the next line as `msg.get("usage")`. Collapsing to + `usage = raw.get("message", {}).get("usage")` matches the chained-get + style already used by `_iter_content_blocks` two functions above + (`raw.get("message", {}).get("content", [])`), and the same + inline-alias pattern as 497c028 (`agent`) and fc5e1cb (`total_in`). + No other reference to `msg` exists in the function — verified by grep. +- b19625e — dropped the `new_offset` alias in `_FullscreenPeek.scroll_down`. + The local was assigned directly to `self._offset`, then the + follow-mode check re-read it as `new_offset == 0` — identical to + `self._offset == 0` after the assignment. Sibling `scroll_up` keeps + its local because it compares old vs new *before* assigning (needed + to conditionally disable auto-scroll on a real move); `scroll_down` + has no such comparison, so the alias was dead. Same Phase 4 shape + as ef176bf (`line_count`) and 134078d (`name_col`). +- 497c028 — inlined the `agent = data["agent"]` local in `_on_run_started`. + The alias was read exactly once, immediately below, as the arg to + `_is_claude_command(agent)`. Reading `data["agent"]` directly matches + the style established by fc5e1cb (inlined `total_in`). `ralph_name` + was preserved — it's used inside an f-string where `data['ralph_name']` + would be awkward. Same Phase 4 shape as fc5e1cb. +- ef176bf — dropped the `line_count = len(self._scroll_lines)` alias in + `_IterationSpinner._build_footer`. The local served dual duty as a + predicate (`if line_count > 0`) and as the `_plural` arg, but both + uses were on the same truthy branch. Replaced the predicate with + `if self._scroll_lines:` (idiomatic list truthiness) and moved the + `len()` call inside the branch that needs it. This matches the + sibling `_IterationPanel._build_footer` which uses `if self._tool_count > 0:` + inline with no local alias. Same Phase 4 shape as 134078d. +- ad7523e — moved the `if self._structured_agent: return` short-circuit + in `_on_agent_output_line` from inside `_console_lock` to before + acquisition. The flag is write-once (set in `_on_run_started` before + any iteration events can flow) and already read lock-free in + `_on_agent_activity` — now both structured/raw output handlers use the + same pattern. Bonus: avoids a lock acquisition per stdout line under + Claude, where every line short-circuits anyway. Added a comment + explaining the write-once invariant so the lock-free read doesn't look + accidental. +- bcadee1 — dropped the `if self._active_renderable is not None:` guard + wrapping `_archive_current_iteration_unlocked("interrupted")` in + `_on_iteration_started`. The archive helper already no-ops when + nothing is active (docstring: "No-op when no iteration is active"), + so the outer guard was mechanically redundant. Updated the + surrounding comment to note the no-op behavior so the call's intent + still reads as defensive. Same shape as 5337d88 / 4ccfa9a / 8cb0d47. +- 134078d — narrowed `name_col` scope in `_IterationPanel._apply_assistant`'s + tool_use branch. The padded name column was computed unconditionally + but only rendered when `arg` was truthy (the `else` branch uses raw + `name` without padding). Moved the pad-to-column if/else inside + `if arg:` so the helper variable lives only where it's used. Same + output in both branches — only the dead formatting work is gone. +- 1d7251f — promoted the `40` fallback-terminal-height literal to a named + module constant `_DEFAULT_CONSOLE_HEIGHT` near the other fullscreen + constants (`_FULLSCREEN_CHROME_ROWS`, `_FULLSCREEN_MIN_VISIBLE`). Two + use-sites both meant "reasonable default terminal height when the + real value isn't available": `_FullscreenPeek._console_height` (class- + attribute default used before the first `__rich_console__` call) and + `ConsoleEmitter._fullscreen_page_size`'s except-branch fallback. The + constant keeps them in lockstep. No other bare `40`s remain in the + module (grep confirmed). +- d34e957 — dropped redundant `f"{_plural(total, 'line')}"` wrap in + `_FullscreenPeek._build_header`. `_plural` already returns a str + so the f-string just format-identity-copied it. Same shape as the + surrounding `header.append(literal, style=...)` calls. No other + `f"{_plural(...)}"` wraps remain in the module (checked with grep). +- fc5e1cb — inlined `total_in = self._input_tokens` alias in + `_IterationPanel._format_tokens`. The rename hinted at a "total + input" aggregate that no longer exists (cache-read tokens are + intentionally excluded from ctx); reading `self._input_tokens` + directly matches what the value actually is. Matches the existing + style in the sibling `if self._output_tokens > 0` branch. +- 3838006 — rewrote `ConsoleEmitter.panel_for` to call `self.is_live(...)` + for its guard instead of re-stating the + `cur_iter == id and active is not None` expression. Same behavior; + one source of truth for "this is the active iteration" check. Type + checker is happy: returning `self._active_renderable` (typed + `_LivePanelBase | None`) matches the declared return type even though + the runtime invariant guarantees non-None whenever `is_live` is True. +- 01f2f1c — dropped `_FullscreenPeek._reset_view`. Its body + (`self._offset = 0; self._auto_scroll = True`) was byte-for-byte identical + to `scroll_to_bottom`. The two call sites in `_step_iteration` now call + `scroll_to_bottom()` directly; the "snap to newest line + follow" intent + moved into a docstring on the surviving method. No other scroll-reset + duplication remains. +- ef9a178 — replaced the single cross-class `_fullscreen_view._iteration_id` + access in `_archive_current_iteration_unlocked` with the public + `iteration_id` property on `_FullscreenPeek`. No behavior change — + `_FullscreenPeek` already exposes this via an `@property` (line 739-741); + the private-attribute shortcut was an oversight from earlier iterations. + Now `_iteration_id` is only read from within `_FullscreenPeek` itself. + (d0060b3 applied the same pattern to `_LivePanelBase._outcome`.) +- c4469a1 — extracted `_FullscreenPeek._step_iteration(direction)` from + `prev_iteration` / `next_iteration`. The two methods were 12-line + mirror images differing only in step direction (-1 vs +1) and + eviction-fallback (`ids[0]` vs `ids[-1]`). Combined boundary check + uses `0 <= new_idx < len(ids)` which collapses both `idx == 0` (prev) + and `idx >= len(ids) - 1` (next) into one expression. +- 5337d88 — dropped `if not self._tool_categories: return ""` early + return in `_IterationPanel._format_categories`. Empty dict yields an + empty list comprehension which `" · ".join` turns into `""`, so the + guard was dead — same shape as 4ccfa9a's `_format_params` cleanup. + No other empty-collection-then-join pattern remains in the module + (`_format_tokens` builds its `parts` list with conditional appends, so + it has no comprehension to short-circuit). +- 8cb0d47 — dropped the `max(total - visible, 1)` guard in + `_scrollbar_metrics`. The early return `if total <= visible` already + guarantees `total - visible ≥ 1`, so the `max(..., 1)` floor was dead + defensive code. Inlined the subtraction directly into the `frac` + calculation and added a comment noting the invariant. +- 0900aad — dropped `_iteration_order` list; `_iteration_history` dict + preserves insertion order by itself. Archive now pops-then-inserts to + move existing entries to the end; eviction iterates the dict + (oldest-first); `enter_fullscreen` uses `next(reversed(...))` for the + most recent finished iteration. Updated the single direct-field + reference in `tests/test_console_emitter.py`. +- 3e9627b — extracted `_stop_compact_live_unlocked` helper to dedupe the + `if self._live is not None: self._live.stop(); self._live = None` pattern + across `_stop_live_unlocked`, `enter_fullscreen`, and `_on_iteration_ended`. +- 4ccfa9a — dropped `if parts else ""` branch in `_format_params`. + `" · ".join([])` returns `""`, so the guard was dead. + +## Verified live (grepped, confirmed used) + +Private helpers and constants that look unused but are legitimately used: + +- `_ICON_SUCCESS`, `_ICON_FAILURE`, `_ICON_TIMEOUT`, `_ICON_ARROW`, + `_ICON_DASH`, `_ICON_PLAY` — all referenced in handler print strings. +- `clear_scroll` — used by test_console_emitter tests. +- `_SinglePanelNavigator` — used by tests and `scripts/tui_dev/snapshot.py`. +- `_stop_live` (the locked wrapper) — used only in tests for cleanup + between test cases. Production code uses `_stop_live_unlocked` inside + an existing lock. +- `_format_params`, `_extract_file_path`, `_extract_key`, `_extract_params` + — all referenced in the `_TOOL_REGISTRY` table (`"Read"`, `"Glob"`, + `"Grep"`, `"Edit"`, `"Write"`, `"Bash"`, `"WebFetch"`, `"WebSearch"`, etc.). + +## Potential future wins (not yet taken) + +- `_IterationPanel._build_footer` and `_IterationSpinner._build_footer` both + start with `Text(no_wrap=True, overflow="ellipsis")` and use + `_footer_grid(summary)` — the `Text(...)` construction repeats, but only + twice. Not worth extracting unless a third subclass appears. diff --git a/workspace/ralphs/improve-codebase/coverage/_frontmatter.md b/workspace/ralphs/improve-codebase/coverage/_frontmatter.md new file mode 100644 index 00000000..3f4faeb2 --- /dev/null +++ b/workspace/ralphs/improve-codebase/coverage/_frontmatter.md @@ -0,0 +1,52 @@ +# `_frontmatter.py` coverage + +Valid at: a6f4c47 + +## Recent changes + +- a6f4c47 — dropped the `if text.startswith(_UTF8_BOM):` guard in + `parse_frontmatter` before the `text = text.removeprefix(_UTF8_BOM)` + call. Python's `str.removeprefix` is already a no-op (returns the + same string object) when the prefix is absent, so the guard was + purely decorative dead code. Behavior preserved: + - BOM-prefixed input still gets stripped (pinned by + `test_utf8_bom_does_not_break_frontmatter` in + `tests/test_frontmatter.py`). + - Non-BOM input is passed through unchanged (exercised by every + other parse test in the file). + - The CPython implementation returns the same object identity when + no prefix match occurs, so there's no allocation overhead either. + +## Shape of the module + +- `parse_frontmatter(text)` — public entry point. Strips optional + UTF-8 BOM, splits on `---` delimiters via `_extract_frontmatter_block`, + runs `yaml.safe_load`, strips HTML comments from the body with + `_strip_html_comments`, returns `(dict, body)`. +- `serialize_frontmatter(frontmatter, body)` — inverse; emits + `---`-delimited blocks only when the frontmatter is non-empty *or* + the body would otherwise be mis-parsed as frontmatter. +- Constants: `RALPH_MARKER`, `FIELD_*`, `CMD_FIELD_*`, `NAME_RE`, + `VALID_NAME_CHARS_MSG`. All imported by `cli.py` and + `_resolver.py`; each one is reused across modules so centralisation + is justified. + +## Verified live (grepped, confirmed used) + +- `_FRONTMATTER_DELIMITER` — used 4× in `serialize_frontmatter` (plus + 2× in `_extract_frontmatter_block`). +- `_FENCE_OR_COMMENT_RE` — used in `_strip_html_comments`. +- `_UTF8_BOM` — used in `parse_frontmatter` (BOM strip). + +## Potential future wins (not yet taken) + +- `_extract_frontmatter_block` splits `text` on `"\n"` up front, then + re-joins slices for the body — the body slice is re-joined even when + the input is tiny. Could stream with `str.find` + `str.index` to + avoid the list allocation, but this function runs once per + iteration and the input is ~1 KB in practice; not worth the churn. +- The `serialize_frontmatter` `needs_delimiters` expression uses + `body.lstrip().startswith(...)` which allocates a new stripped + string just to check a prefix. Could collapse via + `re.match(r"\s*---", body)` but the current form reads cleanly; + revisit only if this becomes hot. diff --git a/workspace/ralphs/improve-codebase/coverage/_output.md b/workspace/ralphs/improve-codebase/coverage/_output.md new file mode 100644 index 00000000..e0a9dfbb --- /dev/null +++ b/workspace/ralphs/improve-codebase/coverage/_output.md @@ -0,0 +1,45 @@ +# `_output.py` coverage + +Valid at: ce487d3 + +## Recent changes + +- ce487d3 — inlined `text = ensure_str(stream)` in `collect_output`. + The local was assigned then read exactly once on the next line as + `parts.append(text)`. The `ensure_str` helper name already + documents the decode step, so the intermediate binding was pure + noise. Same alias-inline shape as fc5e1cb / 497c028 / 52e0272. +- 7730dd4 — narrowed `secs = total % _SECONDS_PER_MINUTE` scope in + `format_duration`. The local was computed unconditionally between + the `total = int(seconds + 0.5)` / `minutes = total // 60` setup and + the `if minutes < _MINUTES_PER_HOUR:` branch, but only consumed by + the `f"{minutes}m {secs}s"` return on the truthy branch. The hours + branch (`hours = minutes // 60; mins = minutes % 60`) never touches + `secs`, so for any duration ≥ 1h the modulo was wasted work. Moved + the assignment inside the if to co-locate with its only use site. + Same Phase 4 shape as 134078d / ef176bf / 59b0e34 / b19625e. + +## Layout snapshot (at 7730dd4) + +- Module is 139 lines — small, mostly format helpers. +- Top: `IS_WINDOWS`, `SUBPROCESS_TEXT_KWARGS`, `SESSION_KWARGS` — + shared subprocess kwargs imported by `_agent.py` and `_runner.py`. +- `ProcessResult` — base dataclass for `RunResult` / `AgentResult`, + with the shared `success` property. +- Format helpers: `ensure_str`, `collect_output`, `warn`, `format_count`, + `format_duration`. +- Module-level constants `_COUNT_THOUSANDS`, `_COUNT_MILLIONS`, + `_SECONDS_PER_MINUTE`, `_MINUTES_PER_HOUR` are used only by the two + `format_*` functions below them. + +## Potential future wins (not yet taken) + +- `format_count` repeats `f"{n / _COUNT_MILLIONS:.1f}M"` in two + branches (the bare ≥1M return and the rounded-cross-into-M guard). + Could lift to a `_format_millions(n)` helper, but each site is one + line and the duplication is intentional (the rounding guard explains + the second site). Skip unless a third user appears. +- The two `format_*` functions both have a "rounded value crossed into + next unit" guard (59.95s → "1m 0s"; 999_950 → "1.0M") with parallel + comments referencing each other. Already kept in lockstep — no + refactor needed. diff --git a/workspace/ralphs/improve-codebase/coverage/_resolver.md b/workspace/ralphs/improve-codebase/coverage/_resolver.md new file mode 100644 index 00000000..db553415 --- /dev/null +++ b/workspace/ralphs/improve-codebase/coverage/_resolver.md @@ -0,0 +1,20 @@ +# `_resolver.py` coverage + +Valid at: 6227863 + +## Recent changes + +- 6227863 — dropped `if not user_args: return _ARGS_RE.sub("", prompt)` + early return in `resolve_args`. `_ARGS_RE.sub` with the callable + `lambda m: user_args.get(m.group(1), "")` already resolves every match + to `""` when the dict is empty, so the fast path produced byte-for-byte + identical output to the general path. Test + `test_empty_args_clears_placeholders` still covers the empty-dict + behavior. + +## Potential future wins (not yet taken) + +- None obvious; the module is 80 lines and both `resolve_args` / + `resolve_all` share the substitution shape but operate on different + regexes (single-kind vs multi-kind), so extracting a helper would add + indirection for two very short call sites. diff --git a/workspace/ralphs/improve-codebase/coverage/engine.md b/workspace/ralphs/improve-codebase/coverage/engine.md new file mode 100644 index 00000000..274137c1 --- /dev/null +++ b/workspace/ralphs/improve-codebase/coverage/engine.md @@ -0,0 +1,34 @@ +# `engine.py` coverage + +Valid at: c5ce11d + +## Recent changes + +- c5ce11d — inlined the `reason = state.status.reason` local in + `run_loop`. The alias was read exactly once on the next line as the + `reason=` kwarg to `RunStoppedData(...)`. Inlined to + `reason=state.status.reason` to match the chained-read style elsewhere + (e.g. `_IterationPanel._apply_assistant`'s `raw.get("message", {}).get("usage")` + after 52e0272). Same Phase 4 inline-alias shape as ce487d3 (`text`), + 52e0272 (`msg`), 497c028 (`agent`), fc5e1cb (`total_in`). + Safe: the immediately-preceding `if state.status == RunStatus.RUNNING: + state.status = RunStatus.COMPLETED` normalizes the status to a terminal + value, so `RunStatus.reason`'s ValueError guard (non-terminal) cannot + fire here. + +## Structure notes + +`run_loop` is the main loop orchestrator. Its control-flow helpers +(`_handle_control_signals`, `_wait_for_resume`, `_run_iteration`, +`_delay_if_needed`) are each short and single-purpose. `_run_iteration` +splits into `_run_commands` → `_assemble_prompt` → `_run_agent_phase`. +`_run_agent_phase` is the longest at ~80 lines but its branches map 1:1 +to the agent outcome tri-state (timed_out / success / failure) — no +obvious extraction candidate without inventing synthetic abstractions. + +## Potential future wins (not yet taken) + +- None spotted. The locals that survive in `_run_agent_phase` + (`duration`, `event_type`, `state_detail`, `ended_data`) all have + multiple uses and good names. `_run_commands`'s `output` is mutated + before use. `_run_iteration`'s `iteration` alias is used 4 times. diff --git a/workspace/ralphs/improve-codebase/iterations.md b/workspace/ralphs/improve-codebase/iterations.md new file mode 100644 index 00000000..c742b441 --- /dev/null +++ b/workspace/ralphs/improve-codebase/iterations.md @@ -0,0 +1,58 @@ +# Iterations + +One line per iteration: ` `. + +7402f04 refactor: inline `stream_cmd` into Popen call in `_run_agent_streaming` — the `stream_cmd = cmd + [_OUTPUT_FORMAT_FLAG, _STREAM_FORMAT, _VERBOSE_FLAG]` local was consumed exactly once on the very next statement (the first positional arg to `subprocess.Popen(...)`), and grep confirmed no other references in src/ or tests/. The three appended tokens are already named constants, so the intent reads cleanly at the call site without the intermediate binding. Same Phase 4 inline-alias shape as 66d6c60 (`remaining`), b24accf (`reader`), 2fda4f0 (`visible`), e1ad87a (`binary`), and 52e0272 (`msg`). Behavior preserved — subprocess.Popen still receives the same list, pinned by the full streaming-path test coverage in `tests/test_agent.py`. + +66d6c60 refactor: inline `remaining` alias in `_read_agent_stream` timeout calc — the `remaining = deadline - time.monotonic()` local served a single use on the very next line as `max(remaining, 0)`. Collapsing to `max(deadline - time.monotonic(), 0)` matches the inline-alias pattern from e1ad87a (`binary`), 497c028 (`agent`), fc5e1cb (`total_in`), 52e0272 (`msg`), and ce487d3 (`text`). The adjacent comment was updated to refer to "clamp to 0" since `remaining` no longer exists. Behavior preserved — the same clamped value reaches `line_q.get(timeout=...)` on every iteration of the read loop, so deadline enforcement and the non-blocking drain on an already-expired deadline both behave identically. Pinned by the streaming-path tests in `tests/test_agent.py`. + +b24accf refactor: inline `reader` thread handle in `_read_agent_stream` — the local served only to call `.start()`; the thread is never joined because termination flows through the queue's None sentinel plus the daemon flag, so the handle had no further role. Collapsing into the fluent `threading.Thread(..., daemon=True).start()` form matches the fire-and-forget intent and drops an unused binding. Python's `threading` module keeps live threads reachable via `threading._active`, so dropping the local reference does not affect thread lifetime — verified by the full agent test suite (628 passed). Same alias/handle-drop shape as 2fda4f0 / e1ad87a / 497c028 / b19625e, specialised to a Thread. + +2fda4f0 refactor: inline `visible` alias in `_LivePanelBase._build_body` — the `visible = self._scroll_lines[-_MAX_VISIBLE_SCROLL:]` local was read exactly once, as the iterable of the very next `for line in visible:` loop. Collapsing to `for line in self._scroll_lines[-_MAX_VISIBLE_SCROLL:]:` matches the inline-alias pattern from 497c028 / fc5e1cb / 52e0272 / ce487d3 / e1ad87a. Behavior unchanged — the slice still materializes the last `_MAX_VISIBLE_SCROLL` items, and each loop iteration still mutates the Text in-place (`no_wrap` / `overflow`) before appending to `rows`. Pinned by the broad `test_console_emitter.py` suite (peek-visible rendering paths). + +e1ad87a refactor: inline `binary` alias in `_supports_stream_json` — the local was assigned to `Path(cmd[0]).stem` and then read exactly once on the next line as `binary == CLAUDE_BINARY`. Collapsing to the chained form `Path(cmd[0]).stem == CLAUDE_BINARY` matches the already-inline sibling check in `_console_emitter.py:_is_claude_command` (`return Path(parts[0]).stem == CLAUDE_BINARY`) and the inline-alias pattern from ce487d3 / 52e0272 / 497c028 / fc5e1cb. Same behavior — `_supports_stream_json` still short-circuits on empty `cmd` first, and the final boolean result is unchanged. Pinned by `tests/test_agent.py::test_streaming_mode_used_for_claude` and the broader agent-selection tests. + +c5ce11d refactor: inline `reason = state.status.reason` alias in `run_loop`'s `RUN_STOPPED` emit. The local was read exactly once as the `reason=` kwarg to `RunStoppedData(...)` on the next line; collapsing to `reason=state.status.reason` matches the inline-alias pattern from ce487d3 (`text`) / 52e0272 (`msg`) / 497c028 (`agent`) / fc5e1cb (`total_in`). Safe at this call site — the immediately-preceding `if state.status == RunStatus.RUNNING: state.status = RunStatus.COMPLETED` normalizes status to a terminal value (STOPPED / COMPLETED / FAILED), so the `RunStatus.reason` property's "non-terminal raises ValueError" guard cannot fire. + +3823019 refactor: narrow `line = escape_markup(data["line"])` scope past the early-return guard in `_on_agent_output_line` — the local was computed unconditionally inside `_console_lock` but only used on the `_IterationSpinner` branch; the `if not isinstance(target, _IterationSpinner): return` early-return path would have thrown the value away. Moving the binding below the guard preserves exactly the same output (same escape_markup call, same f-string, same add_scroll_line) while skipping the wasted work whenever target is None or an _IterationPanel. Same Phase 4 shape as 134078d (`name_col`), 7730dd4 (`secs`), ef176bf (`line_count`), 59b0e34 (`page`). Pinned by the raw-line handling tests in test_console_emitter.py. + +ce487d3 refactor: inline `text` alias in `_output.py:collect_output` — the local was assigned from `ensure_str(stream)` and read exactly once on the next line as `parts.append(text)`. Same alias-inline shape as fc5e1cb / 497c028 / 52e0272 — the `ensure_str` helper name already documents the decode step, so the intermediate binding was pure noise. Tests in `test_output.py` cover the str, bytes, and stdout+stderr-newline-join paths. + +7730dd4 refactor: narrow `secs = total % _SECONDS_PER_MINUTE` scope into the `if minutes < _MINUTES_PER_HOUR:` branch in `_output.py:format_duration` — the local was computed unconditionally but only consumed by `f"{minutes}m {secs}s"` on the truthy branch. The hours branch derives `hours`/`mins` independently and never references `secs`, so the modulo was wasted work for any duration ≥ 1h. Same Phase 4 narrow-the-scope shape as 134078d (`name_col`), ef176bf (`line_count`), 59b0e34 (`page`), b19625e (`new_offset`). Tests in test_output.py cover both branches (sub-hour: lines 110–137; hour+: lines 140–144), so the move is pinned in either direction. + +d0060b3 refactor: replace `source._outcome` cross-class access with public `outcome` property on `_LivePanelBase` — `_FullscreenPeek._build_header` was reading the private attribute directly on its `source` panel. Added an `@property outcome` that returns `self._outcome`, matching the `iteration_id` property pattern from ef9a178 (which cleaned up the parallel `_fullscreen_view._iteration_id` cross-class access). The two remaining cross-class private reads on `_LivePanelBase` (`source._scroll_lines` at lines 750 and 872) read a mutable list that's also appended to from within the class, so a read-only property would mask the asymmetry — leaving those alone until a real need appears. Same observable behavior — `_outcome` is still the single source of truth written by `freeze`, and all existing readers get the same value. The direct `panel_before._outcome == "completed"` assertion in test_console_emitter.py:1766 still passes (underlying attribute is unchanged); fullscreen-header rendering paths exercise the new property transitively. + +cf72fd9 refactor: replace `parsed = None` sentinel with try/except/else in `_read_agent_stream` — the sentinel only existed so that `isinstance(parsed, dict)` on the next line would fall through on JSON-parse failure. Moving the dict-handling into the try/else branch expresses "only run when parse succeeded" structurally and drops both the sentinel assignment and the redundant isinstance check on the error path. Behavior preserved — valid-but-non-dict JSON (lists, numbers, strings) still skips forwarding via the inner isinstance guard. Same sentinel→structure shape as prior Phase 4 scope narrowings, applied one level up. + +a6f4c47 refactor: drop redundant BOM-startswith guard before `removeprefix` in `parse_frontmatter` — `str.removeprefix` already returns the string unchanged when the prefix is absent, so the `if text.startswith(_UTF8_BOM):` wrapper was dead defensive code. Behavior preserved — `test_utf8_bom_does_not_break_frontmatter` still pins the BOM-stripping path and every other test exercises the no-BOM path. Same "drop the guard when the operation already handles the no-match case" shape as 5337d88 (empty dict → `" · ".join([])`), 4ccfa9a (empty list → `" · ".join([])`), and 8cb0d47 (redundant `max(..., 1)` floor). + +3a8908d refactor: use `next(reversed(...), None)` in `enter_fullscreen` history fallback — the compound `if initial_id is None and self._iteration_history:` guard mixed two concerns (only fall back when nothing is live, *and* avoid `next(reversed({}))` raising StopIteration). Switching to the `next(it, default)` sentinel idiom lets the standard default handle empty-dict case so the outer `if` only encodes the "fall back when nothing live" intent. Behavior preserved — the existing `if initial_id is None or panel_for(...) is None:` next branch still prints "no iterations yet" and returns False, covered by `test_enter_without_iteration_prints_hint`. + +59b0e34 refactor: inline `_fullscreen_page_size()` into the space/b lambdas in `_handle_fullscreen_key` — the `page` local was computed unconditionally in the non-exit branch but only consumed by the page-down (" ") and page-up ("b") action lambdas. j/k/g/G/[/] now skip the call entirely; space/b still compute it exactly once per keypress, now at action-invocation time under the same lock. Same Phase 4 "narrow the scope of a one-branch local" shape as 134078d / ef176bf / b19625e. + +52e0272 refactor: inline `msg` alias in `_apply_assistant` — the `msg = raw.get("message", {})` local was read exactly once on the next line as `msg.get("usage")`. Collapsing into `raw.get("message", {}).get("usage")` matches the chained-get style already used by `_iter_content_blocks` (which independently does `raw.get("message", {}).get("content", [])`) and the inline-alias pattern from 497c028 / fc5e1cb. + +b19625e refactor: drop `new_offset` alias in `_FullscreenPeek.scroll_down` — the local was assigned to `self._offset` and then re-read only as `new_offset == 0`, which is identical to `self._offset == 0` after the assignment. Sibling `scroll_up` needs its local because it compares old vs new before the assignment; `scroll_down` has no such comparison, so the alias was dead. Same shape as ef176bf and 134078d. + +497c028 refactor: inline `agent` alias in `_on_run_started` — the local served a single use as the arg to `_is_claude_command(...)`. Reading `data["agent"]` directly matches the sibling style from fc5e1cb (`total_in` inline). Preserved `ralph_name` — it's used inside an f-string on line 1247 where the alias still helps readability. + +ef176bf refactor: drop `line_count` alias in `_IterationSpinner._build_footer` — local was computed unconditionally but only used on the truthy branch (both as predicate `> 0` and as `_plural` arg). Inlined the truthy check on `self._scroll_lines` and moved the `len()` call inside the branch that uses it. Matches the style of sibling `_IterationPanel._build_footer` which uses `if self._tool_count > 0:` with no local alias. Same Phase 4 shape as 134078d's `name_col` narrowing. + +d8d5592 refactor: gate `"".join(stdout/stderr_lines)` in `_run_agent_streaming` on `log_dir is not None` — the joined strings were only consumed by `_write_log` and the `captured_*` AgentResult fields, both of which discarded the result when log_dir was None. Mirrors the already-lazy idiom in `_run_agent_blocking` and drops the duplicated `... if log_dir is not None else None` ternary on each AgentResult field. +ad7523e refactor: move `_structured_agent` short-circuit out of `_console_lock` in `_on_agent_output_line` — flag is write-once (set in `_on_run_started`), matches `_on_agent_activity`'s pattern, avoids lock acquisition per stdout line under Claude +bcadee1 refactor: drop redundant `_active_renderable` guard in `_on_iteration_started` — archive call already no-ops when nothing is active +134078d refactor: narrow `name_col` scope into `if arg:` in `_apply_assistant` — padded variant was computed but unused when arg falsy +1d7251f refactor: promote 40-line fallback height to `_DEFAULT_CONSOLE_HEIGHT` — two sites used the literal 40 for "default terminal height when unknown" +6227863 refactor: drop redundant empty-user_args branch in `resolve_args` — callable path already yields "" for every match when dict is empty +d34e957 refactor: drop redundant f-string wrap around `_plural(total, 'line')` in fullscreen header +fc5e1cb refactor: inline `total_in` alias in `_format_tokens` — direct read of `self._input_tokens` clarifies intent +3838006 refactor: defer `panel_for` guard to `is_live` to dedupe identical condition +01f2f1c refactor: drop `_reset_view` in `_FullscreenPeek` — identical body to `scroll_to_bottom` +ef9a178 refactor: replace `_fullscreen_view._iteration_id` cross-class private access with public `iteration_id` property +c4469a1 refactor: extract `_step_iteration` to dedupe prev/next iteration browsing in `_FullscreenPeek` +5337d88 refactor: drop redundant empty-dict guard in `_format_categories` — `" · ".join([])` already returns `""` +8cb0d47 refactor: drop redundant `max(total - visible, 1)` guard in `_scrollbar_metrics` — early return makes `total - visible ≥ 1` +cb61477 refactor: extract `_call_safely` helper — dedupes 3× best-effort callback guards in `_agent.py` +4ccfa9a refactor: drop redundant `if parts else ""` in `_format_params` (empty join already returns "") +3e9627b refactor: extract `_stop_compact_live_unlocked` to dedupe compact-Live teardown across 3 call sites +0900aad refactor: drop redundant `_iteration_order` list — dict insertion order suffices