fix(frontend): re-enable full-page playground for evaluator workflows by ardaerzin · Pull Request #4474 · Agenta-AI/agenta

ardaerzin · 2026-05-28T11:12:40Z

Summary

PR #4384 disabled EVALUATOR_FULL_PAGE_NAV_ENABLED because the app-style playground was a regression for evaluators (lost the upstream-app connection) and app-scoped observability defaulted to "invocation" instead of "annotation" for evaluator workflows. This change addresses both blockers and re-enables the flow by default.

Playground

added app chaining for evaluator workflows
minor ui fixes

Observability

fixed and improved filtering for evaluator workflows

QA follow-up

full app pages router tests for evaluator workflows, and checking against reasons why we disabled this feature after its initial release

Demo

Checklist

I have included a video or screen recording for UI changes, or marked Demo as N/A
Relevant tests pass locally
Relevant linting and formatting pass locally
I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

PR #4384 disabled EVALUATOR_FULL_PAGE_NAV_ENABLED because the app-style playground was a regression for evaluators (lost the upstream-app connection) and app-scoped observability defaulted to "invocation" instead of "annotation" for evaluator workflows. This change addresses both blockers and re-enables the flow by default. Playground - ConfigureEvaluatorPage: upstream app workflow can be connected via EntityPicker (skip-variant adapter, filtered to non-evaluator non-feedback workflows). Disconnect affordance on the picker trigger and as a popup footer. - Standalone evaluator runs no longer require an upstream app (TestsetDropdown is always available; runDisabled gate removed). - Playground chain traces now write evaluator references (evaluator / evaluator_variant / evaluator_revision slots) so the per-evaluator observability page can find them. EntityPicker search bar respects a new parentLabel option so app pickers no longer show "Search evaluator..." Observability filters - Per-workflow-kind trace_type default extracted into @agenta/entities (defaultTraceTypeForWorkflow): annotation for evaluators, invocation otherwise. Pure helper unit-tested with vitest. - References scope filter adapts to the effective trace_type: evaluators with trace_type=annotation pin to references.evaluator, invocation pins to references.application, and "no trace_type" ORs across both slots so all traces mentioning the evaluator surface. - Dialog reconciliation: live label flip while editing trace_type in the filter dialog ("Application ID" / "Evaluator ID") via an opt-in reconcileFilterRows callback on Filters; observability page provides an evaluator-workflow-aware reconciler. - Filter persistence across reloads: per-app via atomWithStorage under "agenta:observability:filters", with __global__ fallback for project-level pages. Both userFilters and traceTypeChoice share one packed storage atom. - Cleaner state machine for trace_type intent: tagged union (default / value / cleared) replaces the dual-atom dance that could silently revert. - application_id URL param dropped for evaluator workflows; the query is gated on workflow context being settled to avoid firing with the wrong scope. Tests - vitest unit tests for defaultTraceTypeForWorkflow. - Playwright acceptance for full-page playground: post-create nav, row click for LLM and declarative evaluators, direct URL, sidebar switcher; fixes the previously broken select-app-and-run test for the new flow.

vercel · 2026-05-28T11:12:46Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	May 29, 2026 12:48pm

coderabbitai · 2026-05-28T11:12:49Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b479207d-0410-4d83-af93-b4dfc2944ce8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR enables Phase 5 evaluator full-page playground navigation. It flips EVALUATOR_FULL_PAGE_NAV_ENABLED to true and updates routing, state management, and observability filter behavior to support evaluators as first-class playground entities with full-page rendering, app connection controls, and workflow-aware trace type defaults.

Changes

Evaluator Full-Page Navigation & Observability Integration

Layer / File(s)	Summary
Trace type defaults helper and workflow schema extensions `web/packages/agenta-entities/src/workflow/core/traceTypeDefault.ts`, `web/packages/agenta-entities/src/workflow/core/schema.ts`, `web/packages/agenta-entities/src/workflow/core/index.ts`, `web/packages/agenta-entities/src/workflow/index.ts`, `web/packages/agenta-entities/tests/unit/traceTypeDefault.test.ts`	New module defines soft-default `trace_type` behavior (`"annotation"` for evaluator/traces, `"invocation"` for app workflows, `null` otherwise); schema docs clarify `workflow_slug`/`workflow_variant_slug`; exports and tests added.
Feature flag and router navigation `web/oss/src/state/workflow/flags.ts`, `web/oss/src/components/PlaygroundRouter/index.tsx`	`EVALUATOR_FULL_PAGE_NAV_ENABLED` flips to `true` and `PlaygroundRouter` conditionally renders `ConfigureEvaluatorPage` when `workflowKind` is `"evaluator"` (excluding feedback evaluators).
App connection state management `web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts`	`selectedAppLabelAtom` becomes derived from node graph depth; `connectAppToEvaluatorAtom` persists only after graph mutations succeed; new `disconnectAppFromEvaluatorAtom` clears selection and removes downstream node.
Evaluator header UI with app controls `web/oss/src/components/Evaluators/components/ConfigureEvaluator/EvaluatorPlaygroundHeader.tsx`	Adds app disconnect button (in popover footer and as icon), manages disconnect callback, and renders `TestsetDropdown` unconditionally with updated rationale.
ConfigureEvaluatorPage for full-page mode `web/oss/src/components/Evaluators/components/ConfigureEvaluator/index.tsx`	Removes run-disabled gating and inline app-picker prompt; wires `handleAppSelect` for app connection; sets `parentLabel: "Application"` on workflow adapter.
Evaluators registry row-click simplified `web/oss/src/components/Evaluators/index.tsx`	Removes `hasFullPagePlaygroundUX` predicate; routes non-archived evaluators directly to full-page when flag enabled and `workflowId` present.
Drawer navigation and post-create flow `web/oss/src/components/WorkflowRevisionDrawerWrapper/index.tsx`, `web/oss/src/components/pages/evaluations/NewEvaluation/Components/CreateEvaluatorDrawer/index.tsx`	Restores persisted app via `connectApp` (not direct label writes); simplifies post-create eligibility to flag + presence checks; adds `parentLabel: "Application"` to drawer configs.
Sidebar evaluator switcher gating `web/oss/src/components/Sidebar/components/WorkflowEntityCard.tsx`	Uses `nonArchivedEvaluatorsAtom` directly gated by feature flag instead of removed `fullPagePlaygroundEvaluatorsAtom`.
Trace type state persistence and derivation `web/oss/src/state/newObservability/atoms/controls.ts`	Introduces `TraceTypeChoice` (`default`/`value`/`cleared`) and `effectiveTraceTypeAtomFamily` derived from stored choice plus workflow defaults; persists per-app/per-tab in `filtersByAppAtom`.
Filter regeneration and scope composition `web/oss/src/state/newObservability/atoms/controls.ts`, `web/oss/src/state/newObservability/atoms/queries.ts`	`filtersAtomFamily` regenerates permanent scope filter (with evaluator-specific reference mapping), appends derived `trace_type` row, then user filters; `tracesQueryAtom` uses `effectiveAppId` and blocks while resolving.
Filter UI reconciliation for evaluators `web/oss/src/components/Filters/Filters.tsx`, `web/oss/src/components/Filters/types.d.ts`, `web/oss/src/components/pages/observability/assets/filters/fieldAdapter.ts`, `web/oss/src/components/pages/observability/components/ObservabilityHeader/index.tsx`	`Filters.tsx` adds `reconcileFilterRows` prop for display-only projection; field adapter adds `referenceCategory` and de-duplicates values; `ObservabilityHeader` implements reconciler to remap reference categories based on derived trace_type for evaluator workflows.
Workflow adapter labeling `web/packages/agenta-entity-ui/src/selection/adapters/workflowRevisionRelationAdapter.ts`, `web/oss/src/components/Playground/Components/PlaygroundVariantConfig/assets/PlaygroundVariantConfigHeader.tsx`	`CreateWorkflowRevisionAdapterOptions` accepts `parentLabel` (defaults `"Evaluator"`); applied as `"Application"` in evaluator contexts to customize UI labels and messages.
Evaluator trace reference construction `web/packages/agenta-playground/src/state/execution/executionRunner.ts`	Adds `buildEvaluatorSelfReferences` helper to construct `references.evaluator*` fields; merges with upstream references for non-root execution stages.
Playwright test coverage `web/oss/tests/playwright/acceptance/evaluators/tests.ts`, `web/oss/tests/playwright/acceptance/evaluators/index.ts`	Exports new test ID constants and LLM-as-a-judge template name; rewrites playground test for full-page flow; adds comprehensive acceptance tests for post-create/row-click navigation, direct URLs, and sidebar switcher.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Agenta-AI/agenta#4384: Directly continues this PR's predecessor by flipping EVALUATOR_FULL_PAGE_NAV_ENABLED to true and implementing gated routing/navigation changes across PlaygroundRouter, EvaluatorsRegistry, WorkflowRevisionDrawerWrapper, and sidebar evaluator switcher.
Agenta-AI/agenta#4274: Touches evaluator/create flow wiring near WorkflowRevisionDrawerWrapper; changes are related to persisted app selection and drawer commit handling.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 60.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and concisely describes the main change: re-enabling the full-page playground for evaluator workflows, which is the primary objective stated in the PR description and embodied across multiple file changes.
Description check	✅ Passed	The PR description is well-structured and directly related to the changeset, explaining the rationale (fixing regressions from PR `#4384`), the changes made (app chaining, UI fixes, improved filtering), and QA follow-up items.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fe-fix/app-workflow-router-unification-regression-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts (1)

165-178: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Persisted app selection can get stale on failed connect/disconnect edge paths.

persistedAppSelectionAtom is written before the primary-node swap succeeds, and disconnect exits early without clearing persisted state when no downstream node is found. That can rehydrate an app selection that is no longer actually connected.

Proposed fix

 export const connectAppToEvaluatorAtom = atom(
@@
-        // Persist across sessions. The picker display label is derived from
-        // the depth-0 node's `label` via `selectedAppLabelAtom`, so no extra
-        // write needed here.
-        set(persistedAppSelectionAtom, {appRevisionId, appLabel})
-
         // Replace primary node with app
         const nodeId = set(playgroundController.actions.changePrimaryNode, {
             type: "workflow",
             id: appRevisionId,
             label: appLabel,
         })
 
         if (!nodeId) return
+        // Persist only after graph mutation succeeds.
+        set(persistedAppSelectionAtom, {appRevisionId, appLabel})
@@
 export const disconnectAppFromEvaluatorAtom = atom(null, (get, set) => {
     const nodes = get(playgroundController.selectors.nodes())
     const downstreamEvaluator = nodes.find((n) => n.depth > 0)
-    if (!downstreamEvaluator) return
+    if (!downstreamEvaluator) {
+        set(persistedAppSelectionAtom, null)
+        return
+    }

Also applies to: 208-225

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ce60569f-f33c-480b-a472-4ceb822d0b1e

📥 Commits

Reviewing files that changed from the base of the PR and between 0b9012d and 048d662.

📒 Files selected for processing (25)

web/oss/src/components/Evaluators/components/ConfigureEvaluator/EvaluatorPlaygroundHeader.tsx
web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts
web/oss/src/components/Evaluators/components/ConfigureEvaluator/index.tsx
web/oss/src/components/Evaluators/index.tsx
web/oss/src/components/Filters/Filters.tsx
web/oss/src/components/Filters/types.d.ts
web/oss/src/components/Playground/Components/PlaygroundVariantConfig/assets/PlaygroundVariantConfigHeader.tsx
web/oss/src/components/PlaygroundRouter/index.tsx
web/oss/src/components/Sidebar/components/WorkflowEntityCard.tsx
web/oss/src/components/WorkflowRevisionDrawerWrapper/index.tsx
web/oss/src/components/pages/evaluations/NewEvaluation/Components/CreateEvaluatorDrawer/index.tsx
web/oss/src/components/pages/observability/assets/filters/fieldAdapter.ts
web/oss/src/components/pages/observability/components/ObservabilityHeader/index.tsx
web/oss/src/state/newObservability/atoms/controls.ts
web/oss/src/state/newObservability/atoms/queries.ts
web/oss/src/state/workflow/flags.ts
web/oss/tests/playwright/acceptance/evaluators/index.ts
web/oss/tests/playwright/acceptance/evaluators/tests.ts
web/packages/agenta-entities/src/workflow/core/index.ts
web/packages/agenta-entities/src/workflow/core/schema.ts
web/packages/agenta-entities/src/workflow/core/traceTypeDefault.ts
web/packages/agenta-entities/src/workflow/index.ts
web/packages/agenta-entities/tests/unit/traceTypeDefault.test.ts
web/packages/agenta-entity-ui/src/selection/adapters/workflowRevisionRelationAdapter.ts
web/packages/agenta-playground/src/state/execution/executionRunner.ts

CodeRabbit flagged 5 issues on the evaluator-full-page rollout PR. This commit addresses each: 1. PlaygroundRouter — `is_feedback` evaluators skip the full-page swap. `workflowKind === "evaluator"` was too broad. Human/feedback evaluators are drawer-only in /evaluators (they capture human input, they don't run), so routing them to ConfigureEvaluatorPage produced a run-controls UI for a workflow with nothing to run. Added a `flags.is_feedback` exclusion next to the workflowKind check. 2. Sidebar — switcher filters out `is_feedback` evaluators. `nonArchivedEvaluatorsAtom` only filters by `deleted_at` and includes human evaluators; the switcher was exposing entries that, when clicked, would land on the (now-correctly-gated) generic <Playground /> for a feedback workflow. Filtered the list at the switcher boundary. 3. controls.ts — handle array-valued `trace_type` for in/not_in. The dialog dispatches `{operator: "in", value: ["annotation"]}` for the IN operator family, but the intent setter only normalized scalars — so the user's choice was silently dropped to `{kind: "cleared"}`. Normalize to an array, filter to enum values, and collapse single-value arrays back to a scalar. Multi-value selections (which mean "no filter" for a 2-value enum) still map to `cleared`. 4. Playwright — drop stale `[data-row-key]` poll in select-app-and-run. The test asserted post-create navigation to /apps/<id>/playground AFTER polling for the new row in the evaluators table — but the redirect wins first, the table disappears, and the poll became a timing-dependent failure. Removed the registry-side wait; evaluator-in-registry assertion is covered by the post-create-row-click test alongside. 5. ConfigureEvaluator/atoms.ts — fix persistedAppSelectionAtom race. `connectAppToEvaluatorAtom` persisted the app selection BEFORE `changePrimaryNode` ran, so a failed swap (returns `null` with no primary to swap from) left a stale localStorage record that the next mount re-hydrated into a phantom "connected" state. Moved the persist call to after both graph mutations succeed. `disconnectAppFromEvaluatorAtom` early-returned on no-downstream without clearing the persisted state, allowing the same phantom record to survive a disconnect attempt. Clear it on that branch too. No behavior change for the happy-path full-page flow — these all narrow edge cases the reviewer flagged.

coderabbitai · 2026-05-28T13:20:07Z

Actionable comments posted: 0

…ssion-fix Resolves a single conflict in `web/packages/agenta-entities/src/workflow/core/schema.ts` — release v0.100.4 added `artifact_slug` / `variant_slug` to the revision schema alongside the `workflow_slug` / `workflow_variant_slug` fields this branch had introduced for emitting evaluator references on playground chain runs. Both sides added `workflow_slug` and `workflow_variant_slug` with overlapping intent; resolution keeps all four fields and merges the two doc comments into one that covers both purposes (parent-workflow identification for ID-less callers + evaluator chain-trace emission). No source behavior change — schema is additive on both sides.

coderabbitai · 2026-05-28T13:44:50Z

Actionable comments posted: 0

github-actions · 2026-05-28T13:50:11Z

Railway Preview Environment


Preview URL	https://gateway-production-e120.up.railway.app/w
Image tag	`pr-4474-0748c8b`
Status	Failed
Railway logs	Open logs
Logs	View workflow run
Updated at 2026-05-29T12:55:30.139Z

…ssion-fix

…cation-regression-fix

ardaerzin marked this pull request as ready for review May 28, 2026 11:20

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Frontend labels May 28, 2026

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

vercel Bot deployed to Preview May 28, 2026 13:16 View deployment

vercel Bot deployed to Preview May 28, 2026 13:41 View deployment

Merge branch 'main' into fe-fix/app-workflow-router-unification-regre…

78aef13

…ssion-fix

vercel Bot deployed to Preview May 29, 2026 09:02 View deployment

junaway changed the base branch from main to release/v0.100.7 May 29, 2026 12:46

Merge branch 'release/v0.100.7' into fe-fix/app-workflow-router-unifi…

ed4f1e1

…cation-regression-fix

vercel Bot deployed to Preview May 29, 2026 12:48 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(frontend): re-enable full-page playground for evaluator workflows#4474

fix(frontend): re-enable full-page playground for evaluator workflows#4474
ardaerzin wants to merge 5 commits into
release/v0.100.7from
fe-fix/app-workflow-router-unification-regression-fix

ardaerzin commented May 28, 2026

Uh oh!

vercel Bot commented May 28, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ardaerzin commented May 28, 2026

Summary

Playground

Observability

QA follow-up

Demo

Checklist

Contributor Resources

Uh oh!

vercel Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 28, 2026 •

edited

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading