fix(frontend): re-enable full-page playground for evaluator workflows#4474
fix(frontend): re-enable full-page playground for evaluator workflows#4474ardaerzin wants to merge 5 commits into
Conversation
PR #4384 disabled EVALUATOR_FULL_PAGE_NAV_ENABLED because the app-style playground was a regression for evaluators (lost the upstream-app connection) and app-scoped observability defaulted to "invocation" instead of "annotation" for evaluator workflows. This change addresses both blockers and re-enables the flow by default. Playground - ConfigureEvaluatorPage: upstream app workflow can be connected via EntityPicker (skip-variant adapter, filtered to non-evaluator non-feedback workflows). Disconnect affordance on the picker trigger and as a popup footer. - Standalone evaluator runs no longer require an upstream app (TestsetDropdown is always available; runDisabled gate removed). - Playground chain traces now write evaluator references (evaluator / evaluator_variant / evaluator_revision slots) so the per-evaluator observability page can find them. EntityPicker search bar respects a new parentLabel option so app pickers no longer show "Search evaluator..." Observability filters - Per-workflow-kind trace_type default extracted into @agenta/entities (defaultTraceTypeForWorkflow): annotation for evaluators, invocation otherwise. Pure helper unit-tested with vitest. - References scope filter adapts to the effective trace_type: evaluators with trace_type=annotation pin to references.evaluator, invocation pins to references.application, and "no trace_type" ORs across both slots so all traces mentioning the evaluator surface. - Dialog reconciliation: live label flip while editing trace_type in the filter dialog ("Application ID" / "Evaluator ID") via an opt-in reconcileFilterRows callback on Filters; observability page provides an evaluator-workflow-aware reconciler. - Filter persistence across reloads: per-app via atomWithStorage under "agenta:observability:filters", with __global__ fallback for project-level pages. Both userFilters and traceTypeChoice share one packed storage atom. - Cleaner state machine for trace_type intent: tagged union (default / value / cleared) replaces the dual-atom dance that could silently revert. - application_id URL param dropped for evaluator workflows; the query is gated on workflow context being settled to avoid firing with the wrong scope. Tests - vitest unit tests for defaultTraceTypeForWorkflow. - Playwright acceptance for full-page playground: post-create nav, row click for LLM and declarative evaluators, direct URL, sidebar switcher; fixes the previously broken select-app-and-run test for the new flow.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR enables Phase 5 evaluator full-page playground navigation. It flips ChangesEvaluator Full-Page Navigation & Observability Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts (1)
165-178:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPersisted app selection can get stale on failed connect/disconnect edge paths.
persistedAppSelectionAtomis written before the primary-node swap succeeds, and disconnect exits early without clearing persisted state when no downstream node is found. That can rehydrate an app selection that is no longer actually connected.Proposed fix
export const connectAppToEvaluatorAtom = atom( @@ - // Persist across sessions. The picker display label is derived from - // the depth-0 node's `label` via `selectedAppLabelAtom`, so no extra - // write needed here. - set(persistedAppSelectionAtom, {appRevisionId, appLabel}) - // Replace primary node with app const nodeId = set(playgroundController.actions.changePrimaryNode, { type: "workflow", id: appRevisionId, label: appLabel, }) if (!nodeId) return + // Persist only after graph mutation succeeds. + set(persistedAppSelectionAtom, {appRevisionId, appLabel}) @@ export const disconnectAppFromEvaluatorAtom = atom(null, (get, set) => { const nodes = get(playgroundController.selectors.nodes()) const downstreamEvaluator = nodes.find((n) => n.depth > 0) - if (!downstreamEvaluator) return + if (!downstreamEvaluator) { + set(persistedAppSelectionAtom, null) + return + }Also applies to: 208-225
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: ce60569f-f33c-480b-a472-4ceb822d0b1e
📒 Files selected for processing (25)
web/oss/src/components/Evaluators/components/ConfigureEvaluator/EvaluatorPlaygroundHeader.tsxweb/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.tsweb/oss/src/components/Evaluators/components/ConfigureEvaluator/index.tsxweb/oss/src/components/Evaluators/index.tsxweb/oss/src/components/Filters/Filters.tsxweb/oss/src/components/Filters/types.d.tsweb/oss/src/components/Playground/Components/PlaygroundVariantConfig/assets/PlaygroundVariantConfigHeader.tsxweb/oss/src/components/PlaygroundRouter/index.tsxweb/oss/src/components/Sidebar/components/WorkflowEntityCard.tsxweb/oss/src/components/WorkflowRevisionDrawerWrapper/index.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/Components/CreateEvaluatorDrawer/index.tsxweb/oss/src/components/pages/observability/assets/filters/fieldAdapter.tsweb/oss/src/components/pages/observability/components/ObservabilityHeader/index.tsxweb/oss/src/state/newObservability/atoms/controls.tsweb/oss/src/state/newObservability/atoms/queries.tsweb/oss/src/state/workflow/flags.tsweb/oss/tests/playwright/acceptance/evaluators/index.tsweb/oss/tests/playwright/acceptance/evaluators/tests.tsweb/packages/agenta-entities/src/workflow/core/index.tsweb/packages/agenta-entities/src/workflow/core/schema.tsweb/packages/agenta-entities/src/workflow/core/traceTypeDefault.tsweb/packages/agenta-entities/src/workflow/index.tsweb/packages/agenta-entities/tests/unit/traceTypeDefault.test.tsweb/packages/agenta-entity-ui/src/selection/adapters/workflowRevisionRelationAdapter.tsweb/packages/agenta-playground/src/state/execution/executionRunner.ts
CodeRabbit flagged 5 issues on the evaluator-full-page rollout PR.
This commit addresses each:
1. PlaygroundRouter — `is_feedback` evaluators skip the full-page swap.
`workflowKind === "evaluator"` was too broad. Human/feedback
evaluators are drawer-only in /evaluators (they capture human input,
they don't run), so routing them to ConfigureEvaluatorPage produced
a run-controls UI for a workflow with nothing to run. Added a
`flags.is_feedback` exclusion next to the workflowKind check.
2. Sidebar — switcher filters out `is_feedback` evaluators.
`nonArchivedEvaluatorsAtom` only filters by `deleted_at` and
includes human evaluators; the switcher was exposing entries that,
when clicked, would land on the (now-correctly-gated) generic
<Playground /> for a feedback workflow. Filtered the list at the
switcher boundary.
3. controls.ts — handle array-valued `trace_type` for in/not_in.
The dialog dispatches `{operator: "in", value: ["annotation"]}` for
the IN operator family, but the intent setter only normalized
scalars — so the user's choice was silently dropped to
`{kind: "cleared"}`. Normalize to an array, filter to enum values,
and collapse single-value arrays back to a scalar. Multi-value
selections (which mean "no filter" for a 2-value enum) still map
to `cleared`.
4. Playwright — drop stale `[data-row-key]` poll in select-app-and-run.
The test asserted post-create navigation to /apps/<id>/playground
AFTER polling for the new row in the evaluators table — but the
redirect wins first, the table disappears, and the poll became a
timing-dependent failure. Removed the registry-side wait;
evaluator-in-registry assertion is covered by the
post-create-row-click test alongside.
5. ConfigureEvaluator/atoms.ts — fix persistedAppSelectionAtom race.
`connectAppToEvaluatorAtom` persisted the app selection BEFORE
`changePrimaryNode` ran, so a failed swap (returns `null` with no
primary to swap from) left a stale localStorage record that the
next mount re-hydrated into a phantom "connected" state. Moved the
persist call to after both graph mutations succeed.
`disconnectAppFromEvaluatorAtom` early-returned on no-downstream
without clearing the persisted state, allowing the same phantom
record to survive a disconnect attempt. Clear it on that branch
too.
No behavior change for the happy-path full-page flow — these all
narrow edge cases the reviewer flagged.
|
Actionable comments posted: 0 |
…ssion-fix Resolves a single conflict in `web/packages/agenta-entities/src/workflow/core/schema.ts` — release v0.100.4 added `artifact_slug` / `variant_slug` to the revision schema alongside the `workflow_slug` / `workflow_variant_slug` fields this branch had introduced for emitting evaluator references on playground chain runs. Both sides added `workflow_slug` and `workflow_variant_slug` with overlapping intent; resolution keeps all four fields and merges the two doc comments into one that covers both purposes (parent-workflow identification for ID-less callers + evaluator chain-trace emission). No source behavior change — schema is additive on both sides.
|
Actionable comments posted: 0 |
Railway Preview Environment
|
…cation-regression-fix
Summary
PR #4384 disabled EVALUATOR_FULL_PAGE_NAV_ENABLED because the app-style playground was a regression for evaluators (lost the upstream-app connection) and app-scoped observability defaulted to "invocation" instead of "annotation" for evaluator workflows. This change addresses both blockers and re-enables the flow by default.
Playground
Observability
QA follow-up
Demo
Checklist
Contributor Resources