Skip to content

Add goal-based validation gate to AgenticPhase tasks#23840

Draft
luisorofino wants to merge 2 commits into
loa/subagent-toolfrom
loa/goal-task
Draft

Add goal-based validation gate to AgenticPhase tasks#23840
luisorofino wants to merge 2 commits into
loa/subagent-toolfrom
loa/goal-task

Conversation

@luisorofino
Copy link
Copy Markdown
Contributor

@luisorofino luisorofino commented May 26, 2026

What does this PR do?

Adds an optional, non-deterministic validation gate to AgenticPhase tasks. When a task declares a goal (or goal_path), a fresh independent reviewer agent runs after the worker finishes and checks whether the goal was met. If the check fails, the worker gets one retry; this repeats up to max_goal_attempts total reviewer runs (default: 5). On exhaustion the phase raises GoalAttemptsExhausted, which flows through the existing Phase.on_error path. Tasks without a goal are unaffected.

Key design decisions:

  • Worker is blind to the goal. It only receives a generic suffix ("your work will be checked by an independent reviewer") appended to its task prompt. It learns the specific criterion only when a check fails, and only sees the reviewer's reason.
  • Reviewer is fresh every attempt. History is reset between attempts; it never sees prior reviewer turns.
  • Reviewer is read-only. It gets only the read_only=True subset of the parent agent's tools. ToolSpec gains a read_only flag for this, along with a filter_read_only() helper.
  • Reviewer uses the parent's provider with default model/max_tokens. No user-visible knobs; overrides declared on the parent AgentConfig are intentionally not forwarded.
  • Reviewer cannot spawn subagents. spawn_subagent is not a read-only tool, so it is filtered out automatically.

Files changed:

  • phases/config.pyTaskConfig gains goal, goal_path, and max_goal_attempts fields with validators.
  • tools/registry.pyToolSpec gains read_only: bool; all manifest entries are explicitly annotated; new filter_read_only() helper.
  • phases/goal.py (new) — Reviewer system prompt, exceptions, helper functions, and run_goal_loop().
  • agent/build.pybuild_goal_agent() and make_goal_agent_builder().
  • phases/agentic_phase.pygoal_agent_builder param, _compact_if_needed() helper, goal loop integration in run_tasks(), goal_validations surfaced in the success checkpoint.
  • callbacks/callbacks.pyOnBeforeGoalCheckCallback and OnAfterGoalCheckCallback with matching fire_* methods on CallbackSet and Callbacks.

Motivation

Agentic pipelines produce output that is difficult to verify deterministically. A lightweight, independent reviewer pass — run against the same files the worker produced — catches systematic gaps (missing tests, incomplete implementations, wrong output format) before the phase is considered complete, without requiring the worker to self-evaluate against the goal criterion.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@luisorofino luisorofino added the qa/skip-qa Automatically skip this PR for the next QA label May 26, 2026
@dd-octo-sts dd-octo-sts Bot added the ddev label May 26, 2026
@datadog-datadog-prod-us1
Copy link
Copy Markdown
Contributor

datadog-datadog-prod-us1 Bot commented May 26, 2026

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 97.93%
Overall Coverage: 87.84% (+0.04%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 956672a | Docs | Datadog PR Page | Give us feedback!

@luisorofino luisorofino changed the title Add goal option to phase task Add goal-based validation gate to AgenticPhase tasks May 26, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 26, 2026

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 97.92627% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.14%. Comparing base (c6bddc5) to head (956672a).

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@luisorofino
Copy link
Copy Markdown
Contributor Author

@codex

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 956672a0ae

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

the phase callbacks see only the bracketing before/after_goal_check events.
"""
log_dir = log_root / "goal_agent" / phase_id
log_path = log_dir / f"{task.name}.jsonl"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sanitize task names before building reviewer log paths

When a goal-enabled task name contains a path separator such as "setup/db", the config still validates (there is no pattern restriction on TaskConfig.name), but this constructs goal_agent/<phase>/setup/db.jsonl after only creating goal_agent/<phase>. AgentLogger then fails to open the file because the intermediate directory does not exist, so the whole phase fails before validation even runs for an otherwise valid flow. Either constrain task names to path-safe values or derive the log filename from a sanitized task name.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddev qa/skip-qa Automatically skip this PR for the next QA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant