Skip to content

feat: Add evaluations support to ManagedAgent.run()#153

Draft
jsonbailey wants to merge 1 commit intojb/aic-2174/langchain-graph-runnerfrom
jb/aic-2174/agent-evaluations
Draft

feat: Add evaluations support to ManagedAgent.run()#153
jsonbailey wants to merge 1 commit intojb/aic-2174/langchain-graph-runnerfrom
jb/aic-2174/agent-evaluations

Conversation

@jsonbailey
Copy link
Copy Markdown
Contributor

Summary

  • Wires judge evaluations into ManagedAgent.run() via asyncio.Task, mirroring ManagedModel.run() (PR 7 / PR 8)
  • run() returns immediately; await result.evaluations guarantees both evaluation and tracker.track_judge_result() complete
  • Uses ai_config.evaluator.evaluate(input, content) — resolves to empty list with Evaluator.noop()
  • Failed judge results (success=False) do NOT call track_judge_result()
  • Adds 6 new tests covering the full evaluations contract

Depends on

Test plan

  • All existing tests pass (uv run pytest packages/sdk/server-ai/tests/)
  • New TestManagedAgentEvaluations tests: run returns before evaluations resolve, collect results, tracking fires on await, noop evaluator returns empty list, failed results not tracked

🤖 Generated with Claude Code

Wire judge evaluations into ManagedAgent.run() via an asyncio.Task, mirroring
ManagedModel.run(). Awaiting result.evaluations guarantees both evaluation and
tracker.track_judge_result() complete. run() returns immediately; the
evaluations task resolves asynchronously.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey force-pushed the jb/aic-2174/agent-evaluations branch from 4f29d99 to 0ea4a04 Compare April 28, 2026 23:56
@jsonbailey jsonbailey changed the base branch from jb/aic-2388/enrich-metrics to jb/aic-2174/langchain-graph-runner April 28, 2026 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant