Add goal-based validation gate to AgenticPhase tasks by luisorofino · Pull Request #23840 · DataDog/integrations-core

luisorofino · 2026-05-26T15:35:43Z

What does this PR do?

Adds an optional, non-deterministic validation gate to AgenticPhase tasks. When a task declares a goal (or goal_path), a fresh independent reviewer agent runs after the worker finishes and checks whether the goal was met. If the check fails, the worker gets one retry; this repeats up to max_goal_attempts total reviewer runs (default: 5). On exhaustion the phase raises GoalAttemptsExhausted, which flows through the existing Phase.on_error path. Tasks without a goal are unaffected.

Key design decisions:

Worker is blind to the goal. It only receives a generic suffix ("your work will be checked by an independent reviewer") appended to its task prompt. It learns the specific criterion only when a check fails, and only sees the reviewer's reason.
Reviewer is fresh every attempt. History is reset between attempts; it never sees prior reviewer turns.
Reviewer is read-only. It gets only the read_only=True subset of the parent agent's tools. ToolSpec gains a read_only flag for this, along with a filter_read_only() helper.
Reviewer uses the parent's provider with default model/max_tokens. No user-visible knobs; overrides declared on the parent AgentConfig are intentionally not forwarded.
Reviewer cannot spawn subagents. spawn_subagent is not a read-only tool, so it is filtered out automatically.

Files changed:

phases/config.py — TaskConfig gains goal, goal_path, and max_goal_attempts fields with validators.
tools/registry.py — ToolSpec gains read_only: bool; all manifest entries are explicitly annotated; new filter_read_only() helper.
phases/goal.py (new) — Reviewer system prompt, exceptions, helper functions, and run_goal_loop().
agent/build.py — build_goal_agent() and make_goal_agent_builder().
phases/agentic_phase.py — goal_agent_builder param, _compact_if_needed() helper, goal loop integration in run_tasks(), goal_validations surfaced in the success checkpoint.
callbacks/callbacks.py — OnBeforeGoalCheckCallback and OnAfterGoalCheckCallback with matching fire_* methods on CallbackSet and Callbacks.

Motivation

Agentic pipelines produce output that is difficult to verify deterministically. A lightweight, independent reviewer pass — run against the same files the worker produced — catches systematic gaps (missing tests, incomplete implementations, wrong output format) before the phase is considered complete, without requiring the worker to self-evaluate against the goal criterion.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

datadog-datadog-prod-us1 · 2026-05-26T15:36:54Z

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 97.93%
• Overall Coverage: 87.84% (+0.04%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 956672a | Docs | Datadog PR Page | Give us feedback!}

dd-octo-sts · 2026-05-26T16:23:30Z

Validation Report

All 21 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and Codecov settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`qa-label`	Validate the pull request declares whether it needs QA for the next Agent release	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

codecov · 2026-05-26T16:45:32Z

Codecov Report

❌ Patch coverage is 97.92627% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.14%. Comparing base (c6bddc5) to head (956672a).

Additional details and impacted files

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

luisorofino · 2026-05-27T08:41:21Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 956672a0ae

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-27T08:44:17Z

+    the phase callbacks see only the bracketing before/after_goal_check events.
+    """
+    log_dir = log_root / "goal_agent" / phase_id
+    log_path = log_dir / f"{task.name}.jsonl"


Sanitize task names before building reviewer log paths

When a goal-enabled task name contains a path separator such as "setup/db", the config still validates (there is no pattern restriction on TaskConfig.name), but this constructs goal_agent/<phase>/setup/db.jsonl after only creating goal_agent/<phase>. AgentLogger then fails to open the file because the intermediate directory does not exist, so the whole phase fails before validation even runs for an otherwise valid flow. Either constrain task names to path-safe values or derive the log filename from a sanitized task name.

Useful? React with 👍 / 👎.

luisorofino added the qa/skip-qa Automatically skip this PR for the next QA label May 26, 2026

dd-octo-sts Bot added the ddev label May 26, 2026

luisorofino changed the title ~~Add goal option to phase task~~ Add goal-based validation gate to AgenticPhase tasks May 26, 2026

Add goal option to phase task

11dba42

luisorofino force-pushed the loa/goal-task branch from eab3f8d to 11dba42 Compare May 26, 2026 15:48

Little nits

956672a

chatgpt-codex-connector Bot reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add goal-based validation gate to AgenticPhase tasks#23840

Add goal-based validation gate to AgenticPhase tasks#23840
luisorofino wants to merge 2 commits into
loa/subagent-toolfrom
loa/goal-task

luisorofino commented May 26, 2026 •

edited

Loading

Uh oh!

datadog-datadog-prod-us1 Bot commented May 26, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

Uh oh!

dd-octo-sts Bot commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026 •

edited

Loading

Uh oh!

luisorofino commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

luisorofino commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

datadog-datadog-prod-us1 Bot commented May 26, 2026 • edited by datadog-datadog-prod-us1-2 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dd-octo-sts Bot commented May 26, 2026

Validation Report

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

luisorofino commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

luisorofino commented May 26, 2026 •

edited

Loading

datadog-datadog-prod-us1 Bot commented May 26, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

codecov Bot commented May 26, 2026 •

edited

Loading