Add goal-based validation gate to AgenticPhase tasks#23840
Conversation
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 956672a | Docs | Datadog PR Page | Give us feedback! |
eab3f8d to
11dba42
Compare
Validation ReportAll 21 validations passed. Show details
|
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 956672a0ae
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| the phase callbacks see only the bracketing before/after_goal_check events. | ||
| """ | ||
| log_dir = log_root / "goal_agent" / phase_id | ||
| log_path = log_dir / f"{task.name}.jsonl" |
There was a problem hiding this comment.
Sanitize task names before building reviewer log paths
When a goal-enabled task name contains a path separator such as "setup/db", the config still validates (there is no pattern restriction on TaskConfig.name), but this constructs goal_agent/<phase>/setup/db.jsonl after only creating goal_agent/<phase>. AgentLogger then fails to open the file because the intermediate directory does not exist, so the whole phase fails before validation even runs for an otherwise valid flow. Either constrain task names to path-safe values or derive the log filename from a sanitized task name.
Useful? React with 👍 / 👎.
What does this PR do?
Adds an optional, non-deterministic validation gate to
AgenticPhasetasks. When a task declares agoal(orgoal_path), a fresh independent reviewer agent runs after the worker finishes and checks whether the goal was met. If the check fails, the worker gets one retry; this repeats up tomax_goal_attemptstotal reviewer runs (default: 5). On exhaustion the phase raisesGoalAttemptsExhausted, which flows through the existingPhase.on_errorpath. Tasks without agoalare unaffected.Key design decisions:
read_only=Truesubset of the parent agent's tools.ToolSpecgains aread_onlyflag for this, along with afilter_read_only()helper.AgentConfigare intentionally not forwarded.spawn_subagentis not a read-only tool, so it is filtered out automatically.Files changed:
phases/config.py—TaskConfiggainsgoal,goal_path, andmax_goal_attemptsfields with validators.tools/registry.py—ToolSpecgainsread_only: bool; all manifest entries are explicitly annotated; newfilter_read_only()helper.phases/goal.py(new) — Reviewer system prompt, exceptions, helper functions, andrun_goal_loop().agent/build.py—build_goal_agent()andmake_goal_agent_builder().phases/agentic_phase.py—goal_agent_builderparam,_compact_if_needed()helper, goal loop integration inrun_tasks(),goal_validationssurfaced in the success checkpoint.callbacks/callbacks.py—OnBeforeGoalCheckCallbackandOnAfterGoalCheckCallbackwith matchingfire_*methods onCallbackSetandCallbacks.Motivation
Agentic pipelines produce output that is difficult to verify deterministically. A lightweight, independent reviewer pass — run against the same files the worker produced — catches systematic gaps (missing tests, incomplete implementations, wrong output format) before the phase is considered complete, without requiring the worker to self-evaluate against the goal criterion.
Review checklist (to be filled by reviewers)
qa/requiredif this PR needs QA validation, orqa/skip-qaif it does not. Exactly one of the two is required.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged