Skip to content

refactor(results): remove public trace artifact surface#1526

Merged
christso merged 1 commit into
mainfrom
refactor/av-9ly-remove-public-trace-artifact
Jun 26, 2026
Merged

refactor(results): remove public trace artifact surface#1526
christso merged 1 commit into
mainfrom
refactor/av-9ly-remove-public-trace-artifact

Conversation

@christso

@christso christso commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Bead: av-9ly

AgentV result bundles no longer advertise or persist trace.json as a public result artifact. New eval, export, combine, validation, metrics, and docs paths now use the transcript, metrics, result, answer, grading, timing, and external_trace surfaces as the supported inspection contract, while the dashboard still resolves transcripts from transcript_path and renders per-case results normally.

The remaining trace-envelope code is kept where it still projects transcript/result read models or handles legacy read-only inputs. Old trace_path data is rejected from new/public indexes and stripped during combine so copied legacy bundles do not re-publish the removed surface.

Verification

bun test apps/cli/test/commands/eval/artifact-writer.test.ts apps/cli/test/commands/results/combine.test.ts apps/cli/test/commands/results/export.test.ts apps/cli/test/commands/results/validate.test.ts apps/cli/test/commands/results/remote-auto-export.test.ts apps/cli/src/commands/results/serve-file-tree.test.ts packages/core/test/evaluation/results-repo.test.ts
# 166 pass, 0 fail

bun --filter @agentv/core build
bun --filter @agentv/sdk build
bun --filter agentv typecheck

bunx biome check .agents/conventions.md AGENTS.md ROADMAP.md apps/cli/src/commands/eval/artifact-writer.ts apps/cli/src/commands/results/combine-run.ts apps/cli/src/commands/results/projection-bundle.ts apps/cli/src/commands/results/serve-file-tree.test.ts apps/cli/src/commands/results/serve.ts apps/cli/src/commands/results/validate.ts apps/cli/test/commands/eval/artifact-writer.test.ts apps/cli/test/commands/results/combine.test.ts apps/cli/test/commands/results/export.test.ts apps/cli/test/commands/results/remote-auto-export.test.ts apps/cli/test/commands/results/validate.test.ts apps/web/src/content/docs/docs/evaluation/running-evals.mdx apps/web/src/content/docs/docs/tools/import.mdx apps/web/src/content/docs/docs/tools/prepare.mdx apps/web/src/content/docs/docs/tools/results.mdx docs/adr/0003-keep-opik-export-as-post-run-adapter-over-agentv-result-bundles.md docs/adr/0008-normalized-transcript-artifact-contract.md packages/core/src/evaluation/metrics.ts packages/core/src/evaluation/result-artifact-contract.ts packages/core/src/evaluation/run-artifacts.ts packages/core/test/evaluation/results-repo.test.ts
# Checked 15 files. No fixes applied.

cd apps/dashboard && bun run build

bun apps/cli/src/cli.ts eval run /tmp/agentv-av-9ly-dogfood/dataset.eval.yaml --targets /tmp/agentv-av-9ly-dogfood/targets.yaml --target azure-live --grader-target azure-live --workers 1 --agent-timeout 90 --experiment av-9ly-dogfood --threshold 0
bun apps/cli/src/cli.ts results validate .agentv/results/av-9ly-dogfood/2026-06-26T08-49-03-415Z

Dashboard UAT used:

bun apps/cli/src/cli.ts dashboard --dir . --single --port 3127

Inspected with agent-browser at http://localhost:3127: run list, run detail, checks view, transcript tab, and file tree. The API returned metrics_path, transcript_path, and transcript_raw_path, with no trace fields; transcript rendering loaded from transcript.jsonl.

Dogfood

  • Local OpenAI-compatible endpoint check returned 502 with Provided authentication token is expired. Please try signing in again.
  • pi-cli via the local endpoint completed at .agentv/results/av-9ly-dogfood/pi-cli-local-2026-06-26T08-46-50-011Z but scored 0%; the bundle had transcript/metrics surfaces and no trace.json.
  • codex-sdk via the local endpoint failed on /v1/responses with 401 Unauthorized, token_expired.
  • copilot-sdk via the local endpoint failed with provider authentication HTTP 401.
  • Supplemental live Azure provider plus live LLM grader passed 100% at .agentv/results/av-9ly-dogfood/2026-06-26T08-49-03-415Z; results validate passed and file/grep checks found no trace.json, trace_path, or artifact_pointers.trace.

Evidence

Private evidence is on EntityProcess/agentv-private branch evidence/av-9ly-remove-public-trace-artifact, commit 01dca48, path av-9ly/remove-public-trace-artifact/.

Key artifact: screenshots/dashboard-transcript-tab.png shows the dashboard transcript view populated from the new supported transcript surface.

Risks

  • Legacy trace-session serving is retained as read-only compatibility for old materialized inputs, but new bundles do not write trace_path or artifact_pointers.trace.
  • prepared_attempt.trace_path remains intentionally because it describes prepared trace input metadata, not public result artifact discovery.

Compound Engineering
GPT-5

@christso christso force-pushed the refactor/av-9ly-remove-public-trace-artifact branch from 104e348 to d88926b Compare June 26, 2026 09:08
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 26, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 8b15647
Status: ✅  Deploy successful!
Preview URL: https://86d84167.agentv.pages.dev
Branch Preview URL: https://refactor-av-9ly-remove-publi.agentv.pages.dev

View logs

@christso christso force-pushed the refactor/av-9ly-remove-public-trace-artifact branch from d88926b to 8b15647 Compare June 26, 2026 09:14
@christso

Copy link
Copy Markdown
Collaborator Author

Review passed on head 8b15647e7acd19b43239efc3f518a84dce668c7d.

What I verified:

  • Reviewed origin/main...HEAD for the public result artifact contract: new generated/indexed/exported bundles no longer expose trace.json, trace_path, or artifact_pointers.trace as public surfaces.
  • Confirmed transcript and Dashboard paths still resolve from transcript_path; the prior stale trace-session missing-state expectation now checks for transcript.jsonl guidance instead of trace.json.
  • Confirmed metrics.json remains useful without source_artifacts.trace_path; it keeps trace identity metadata plus transcript_path, grading_path, and timing_path.
  • Rechecked combine/export/remote publishing. The earlier remote-publish risk is fixed: sidecar publishing is transcript-only, unsupported pointer families are stripped from the published index, and trace.json is skipped from the published results tree.
  • Ran a targeted legacy fixture locally: a source run containing both artifact_pointers.trace and artifact_pointers.transcript publishes only the transcript pointer, writes no trace.json to the results branch, and writes only transcript.jsonl to agentv/artifacts/v1.
  • Inspected private evidence branch EntityProcess/agentv-private:evidence/av-9ly-remove-public-trace-artifact at 01dca48045c8076eacb0398d06685335a363eee6; the canonical live bundle and provider-blocker bundles contain transcript/raw transcript/metrics/result/grading/timing artifacts and no trace.json, trace_path, or artifact_pointers.trace matches. Dashboard screenshots are present.

Validation run:

  • bun --filter @agentv/core build
  • bun test apps/cli/test/commands/eval/artifact-writer.test.ts apps/cli/test/commands/results/combine.test.ts apps/cli/test/commands/results/export.test.ts apps/cli/test/commands/results/validate.test.ts apps/cli/test/commands/results/remote-auto-export.test.ts apps/cli/src/commands/results/serve-file-tree.test.ts packages/core/test/evaluation/results-repo.test.ts -> 166 pass, 0 fail.
  • gh pr checks 1526 -> Build, Typecheck, Lint, Test, Check Links, Validate Marketplace, Validate Evals, and Cloudflare Pages all pass.

Residual risk: legacy trace-session reader support intentionally remains isolated for old artifacts, so any future cleanup should avoid re-promoting that path into the public result artifact contract.

@christso christso merged commit 856f31e into main Jun 26, 2026
8 checks passed
@christso christso deleted the refactor/av-9ly-remove-public-trace-artifact branch June 26, 2026 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant