Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .agents/conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ If you spot a camelCase key already on disk or in a response, treat it as a bug

## Result Artifact Pointers

`artifact_pointers` are for offloading large detached payload bytes from the results metadata/control plane. They describe where payloads such as trace or transcript files live when a run is projected to `agentv/artifacts/v1` or a future object store, including `key`, `object_version`, `sha256`, `size`, `media_type`, and `schema_version`.
`artifact_pointers` are for offloading large detached payload bytes from the results metadata/control plane. They describe where payloads such as transcript files live when a run is projected to `agentv/artifacts/v1` or a future object store, including `key`, `object_version`, `sha256`, `size`, `media_type`, and `schema_version`.

Do not add an `artifact_pointers.*` entry just because a new per-case artifact exists. Normal sidecars that stay in the run tree should be discoverable through explicit path fields on `index.jsonl`, manifests, or trace envelope artifacts, for example `metrics_path` for `outputs/metrics.json`.
Do not add an `artifact_pointers.*` entry just because a new per-case artifact exists. Normal sidecars that stay in the run tree should be discoverable through explicit path fields on `index.jsonl` or manifests, for example `metrics_path` for `outputs/metrics.json`.

Before adding a new pointer family, verify that the artifact is large enough or detached enough to benefit from offloading and that published result repos should avoid carrying those payload bytes on the primary results branch.

Expand Down
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Read the full rationale and examples in [.agents/product-boundary.md](.agents/pr
- When dogfood or review reveals a durable workflow lesson, capture it in this guide or the relevant `.agents/*.md` guide before merge; do not leave durable agent instructions only in PR comments, Bead comments, or private evidence. Use `docs/solutions/` for fuller reusable writeups.
- Wire formats are `snake_case`; internal TypeScript is `camelCase`. Translate only at the boundary.
- In AgentV, a `project` holds runs, traces, and experiments; a `benchmark` is a curated eval suite. Do not collapse those terms.
- `artifact_pointers` are an offload indirection for large detached payload bytes, such as trace and transcript artifacts. Do not use them as the discovery path for ordinary per-case sidecars; expose those with explicit index/manifest path fields such as `metrics_path`.
- `artifact_pointers` are an offload indirection for large detached payload bytes, such as transcript artifacts. Do not use them as the discovery path for ordinary per-case sidecars; expose those with explicit index/manifest path fields such as `metrics_path`.

## Repo Map

Expand Down
2 changes: 1 addition & 1 deletion ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This roadmap translates [STRATEGY.md](STRATEGY.md) into the next few product pha

## Phase 1: Finish the artifact and local inspection foundation

- Keep the canonical handoff surface centered on completed run bundles, `index.jsonl`, grading/timing artifacts, and `outputs/trace.json` sidecars.
- Keep the canonical handoff surface centered on completed run bundles, `index.jsonl`, grading/timing/metrics artifacts, normalized transcripts, and optional `external_trace` link metadata.
- Finish the vendor-neutral local export seams that let completed runs be re-read, compared, exported, and attached to non-Phoenix adapters without vendor-specific logic in core.
- Keep OTLP/OpenInference mapping generic and reusable before building backend-specific upload or import paths.

Expand Down
1 change: 0 additions & 1 deletion apps/cli/src/commands/eval/artifact-writer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,6 @@ export function buildIndexArtifactEntry(
summaryPath?: string;
outputPath?: string;
answerPath?: string;
tracePath?: string;
transcriptPath?: string;
transcriptRawPath?: string;
metricsPath?: string;
Expand Down
44 changes: 26 additions & 18 deletions apps/cli/src/commands/results/combine-run.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import {
existsSync,
mkdirSync,
readFileSync,
rmSync,
statSync,
writeFileSync,
} from 'node:fs';
Expand Down Expand Up @@ -367,7 +368,6 @@ const MANIFEST_PATH_FIELDS = [
'input_path',
'output_path',
'response_path',
'trace_path',
'transcript_path',
'transcript_raw_path',
'metrics_path',
Expand All @@ -380,7 +380,6 @@ const MANIFEST_PATH_FIELDS = [
] as const;

const POINTER_FAMILIES = {
trace: 'traces',
transcript: 'transcripts',
} as const;

Expand Down Expand Up @@ -467,15 +466,29 @@ function rewriteArtifactPointers(
return undefined;
}

return {
trace: rewriteArtifactPointer('trace', pointers.trace, sourceBaseDir, outputDir, sourceIndex),
transcript: rewriteTranscriptArtifactPointer(
pointers.transcript,
sourceBaseDir,
outputDir,
sourceIndex,
),
};
const transcript = rewriteTranscriptArtifactPointer(
pointers.transcript,
sourceBaseDir,
outputDir,
sourceIndex,
);
return transcript ? { transcript } : undefined;
}

function removeCopiedDeprecatedTraceArtifact(
row: SelectedRow,
outputDir: string,
sourceBaseDir: string,
): void {
const tracePath = row.record.trace_path;
if (!tracePath || !isSafeRelativeArtifactPath(tracePath)) {
return;
}
if (!existsSync(path.join(sourceBaseDir, tracePath))) {
return;
}
const copiedTracePath = path.join(outputDir, `sources/source-${row.source.index + 1}`, tracePath);
rmSync(copiedTracePath, { force: true });
}

function rewriteAndCopyRecord(
Expand All @@ -501,13 +514,8 @@ function rewriteAndCopyRecord(
row.source.index,
);
rewritten.artifact_pointers = artifactPointers;
if (
row.record.trace_path &&
rewritten.trace_path === row.record.trace_path &&
artifactPointers?.trace?.path
) {
rewritten.trace_path = artifactPointers.trace.path;
}
removeCopiedDeprecatedTraceArtifact(row, outputDir, sourceBaseDir);
rewritten.trace_path = undefined;
if (
row.record.transcript_path &&
rewritten.transcript_path === row.record.transcript_path &&
Expand Down
14 changes: 1 addition & 13 deletions apps/cli/src/commands/results/projection-bundle.ts
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@ export interface ProjectionBundleEntry {
readonly trace_id: string;
readonly root_span_id: string;
readonly span_count: number;
readonly envelope_ref?: string;
};
readonly trace_envelope: TraceEnvelopeWire;
readonly feedback: {
Expand Down Expand Up @@ -101,7 +100,7 @@ export type ProjectionBundleArtifactRefs = Partial<
| 'targets_path'
| 'files_path'
| 'graders_path'
> & { readonly trace_path: string }
>
> & {
readonly status: 'planned_export' | 'emitted';
};
Expand Down Expand Up @@ -147,13 +146,6 @@ function shortHash(parts: readonly string[], length = 20): string {
return createHash('sha256').update(parts.join('\n')).digest('hex').slice(0, length);
}

function tracePathFor(indexEntry: IndexArtifactEntry): string | undefined {
return (
indexEntry.trace_path ??
(indexEntry.artifact_dir ? path.posix.join(indexEntry.artifact_dir, 'trace.json') : undefined)
);
}

function artifactRefs(
indexEntry: IndexArtifactEntry,
options: {
Expand Down Expand Up @@ -181,7 +173,6 @@ function artifactRefs(
transcript_path: indexEntry.transcript_path,
transcript_raw_path: indexEntry.transcript_raw_path,
metrics_path: indexEntry.metrics_path,
trace_path: tracePathFor(indexEntry),
task_dir: indexEntry.task_dir,
eval_path: indexEntry.eval_path,
targets_path: indexEntry.targets_path,
Expand Down Expand Up @@ -274,13 +265,11 @@ function buildEntry(
): ProjectionBundleEntry {
const includeRawContent = options.includeRawContent ?? false;
const sourcePath = toPortablePath(options.sourceFile, options.cwd);
const plannedIndexEntry = buildResultIndexArtifact(result);
const envelope = buildTraceEnvelopeFromEvaluationResult(result, {
evalPath: sourcePath,
runId: options.runId,
source: { kind: 'agentv_run', path: sourcePath, format: 'agentv_result' },
artifacts: {
trace_path: tracePathFor(indexRecord ?? plannedIndexEntry),
answer_path: result.output.length > 0 ? 'outputs/answer.md' : undefined,
},
duplicatePolicy: options.duplicatePolicy,
Expand Down Expand Up @@ -334,7 +323,6 @@ function buildEntry(
trace_id: envelopeWire.trace.trace_id,
root_span_id: envelopeWire.trace.root_span_id,
span_count: envelopeWire.trace.spans.length,
envelope_ref: refs.trace_path,
}),
trace_envelope: envelopeWire,
feedback,
Expand Down
36 changes: 17 additions & 19 deletions apps/cli/src/commands/results/serve-file-tree.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ function localTreeRootedAtTestDir(prefix: string): FileNode[] {
];
}

function gitTraceEntry(prefix: string): ArtifactCatalogEntry {
function gitTranscriptEntry(prefix: string): ArtifactCatalogEntry {
return {
displayPath: `${prefix}/outputs/trace.json`,
kind: 'trace',
displayPath: `${prefix}/transcript.jsonl`,
kind: 'transcript',
storage: 'git',
ref: 'agentv/artifacts/v1',
key: `runs/default/2026-06-22T01-12-44-924Z/${prefix}/outputs/trace.json`,
key: `runs/default/2026-06-22T01-12-44-924Z/${prefix}/transcript.jsonl`,
};
}

Expand All @@ -45,22 +45,18 @@ describe('overlayCatalogFileNodes', () => {

it('overlays git artifacts into the existing folder instead of a duplicate subtree', () => {
const files = localTreeRootedAtTestDir(prefix);
overlayCatalogFileNodes(files, [gitTraceEntry(prefix)], prefix);
overlayCatalogFileNodes(files, [gitTranscriptEntry(prefix)], prefix);

// No duplicate `wtg-academy-n1-test` root node was created.
expect(findByName(files, 'wtg-academy-n1-test')).toBeUndefined();

// trace.json merged into the existing top-level `outputs` folder...
const outputs = findByName(files, 'outputs');
expect(outputs?.type).toBe('dir');
const trace = findByName(outputs?.children ?? [], 'trace.json');
expect(trace).toBeDefined();
// ...alongside the local answer.md, and with its full manifest-relative path
// preserved for content reads.
expect(findByName(outputs?.children ?? [], 'answer.md')).toBeDefined();
expect(trace?.path).toBe(`${prefix}/outputs/trace.json`);
expect(trace?.storage).toBe('git');
expect(trace?.ref).toBe('agentv/artifacts/v1');
// transcript.jsonl merged into the existing top-level test artifact view
// with its full manifest-relative path preserved for content reads.
const transcript = findByName(files, 'transcript.jsonl');
expect(transcript).toBeDefined();
expect(transcript?.path).toBe(`${prefix}/transcript.jsonl`);
expect(transcript?.storage).toBe('git');
expect(transcript?.ref).toBe('agentv/artifacts/v1');
});

it('does not re-add local files already present in the tree', () => {
Expand All @@ -80,15 +76,17 @@ describe('overlayCatalogFileNodes', () => {
it('falls back to full-path nesting when no root prefix applies', () => {
const files: FileNode[] = [];
const entry: ArtifactCatalogEntry = {
displayPath: 'outputs/trace.json',
kind: 'trace',
displayPath: 'outputs/transcript.jsonl',
kind: 'transcript',
storage: 'git',
ref: 'agentv/artifacts/v1',
};
overlayCatalogFileNodes(files, [entry], undefined);

const outputs = findByName(files, 'outputs');
expect(outputs?.type).toBe('dir');
expect(findByName(outputs?.children ?? [], 'trace.json')?.path).toBe('outputs/trace.json');
expect(findByName(outputs?.children ?? [], 'transcript.jsonl')?.path).toBe(
'outputs/transcript.jsonl',
);
});
});
11 changes: 8 additions & 3 deletions apps/cli/src/commands/results/serve.ts
Original file line number Diff line number Diff line change
Expand Up @@ -514,11 +514,16 @@ function resolveRecordArtifactPointer(
record: ResultManifestRecord,
kind: 'transcript' | 'answer' | 'trace',
): ResolvedArtifactPointer {
const legacyArtifactPointers = record.artifact_pointers as
| (ResultManifestRecord['artifact_pointers'] & {
readonly trace?: NonNullable<ResultManifestRecord['artifact_pointers']>['transcript'];
})
| undefined;
const pointer =
kind === 'transcript'
? record.artifact_pointers?.transcript
: kind === 'trace'
? record.artifact_pointers?.trace
? legacyArtifactPointers?.trace
: undefined;
const pointerPath = artifactPointerPath(pointer);
const description = artifactPointerDescription(pointer);
Expand Down Expand Up @@ -1059,8 +1064,8 @@ function traceSessionArtifactResponse(

function missingTraceMessage(): string {
return [
'This result does not include canonical trace.json metadata.',
'Dashboard trace sessions require an agentv.trace.v1 sidecar artifact.',
'This result does not include legacy trace artifact metadata.',
'Dashboard transcript inspection uses transcript.jsonl for current run bundles.',
].join(' ');
}

Expand Down
20 changes: 20 additions & 0 deletions apps/cli/src/commands/results/validate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,26 @@ function checkIndexJsonl(runDir: string): { diagnostics: Diagnostic[]; entries:
});
}

if (typeof entry.trace_path === 'string') {
diagnostics.push({
severity: 'error',
message: `index.jsonl line ${i + 1} (${entry.test_id ?? '?'}): trace_path is no longer supported; use transcript_path and metrics_path`,
});
}

const artifactPointers = entry.artifact_pointers;
if (
artifactPointers &&
typeof artifactPointers === 'object' &&
!Array.isArray(artifactPointers) &&
Object.hasOwn(artifactPointers, 'trace')
) {
diagnostics.push({
severity: 'error',
message: `index.jsonl line ${i + 1} (${entry.test_id ?? '?'}): artifact_pointers.trace is no longer supported`,
});
}

if (!entry.scores || !Array.isArray(entry.scores) || entry.scores.length === 0) {
diagnostics.push({
severity: 'warning',
Expand Down
23 changes: 9 additions & 14 deletions apps/cli/test/commands/eval/artifact-writer.test.ts
Original file line number Diff line number Diff line change
@@ -1,27 +1,17 @@
import { afterEach, beforeEach, describe, expect, it } from 'bun:test';
import { createHash } from 'node:crypto';
import { mkdir, readFile, readdir, rm, writeFile } from 'node:fs/promises';
import path from 'node:path';

import {
AGENTV_RESULTS_ARTIFACTS_REF,
CANONICAL_METRICS_ARTIFACT_PATH,
CANONICAL_TRACE_ARTIFACT_PATH,
CANONICAL_TRANSCRIPT_ARTIFACT_PATH,
EXECUTION_TRACE_SCHEMA_VERSION,
type EvalTest,
type EvaluationResult,
type GraderResult,
METRICS_SCHEMA_VERSION,
MetricsArtifactWireSchema,
TRACE_JSON_MEDIA_TYPE,
TRANSCRIPT_JSONL_MEDIA_TYPE,
TRANSCRIPT_SCHEMA_VERSION,
TraceEnvelopeWireSchema,
buildTraceFromMessages,
fromTraceEnvelopeWire,
parseYamlValue,
traceEnvelopeToTranscriptJsonLines,
} from '@agentv/core';

import {
Expand Down Expand Up @@ -85,10 +75,6 @@ function makeEvaluatorResult(overrides: Partial<GraderResult> = {}): GraderResul
} as GraderResult;
}

function sha256Hex(content: Buffer): string {
return createHash('sha256').update(content).digest('hex');
}

// ---------------------------------------------------------------------------
// Grading artifact
// ---------------------------------------------------------------------------
Expand Down Expand Up @@ -1250,6 +1236,9 @@ describe('writeArtifactsFromResults', () => {
await expect(
readFile(path.join(testDir, 'transcript-case', 'transcript.json'), 'utf8'),
).rejects.toThrow();
await expect(
readFile(path.join(testDir, 'transcript-case', 'run-1', 'trace.json'), 'utf8'),
).rejects.toThrow();

const indexLine = JSON.parse(
(await readFile(path.join(testDir, 'index.jsonl'), 'utf8')).trim(),
Expand Down Expand Up @@ -1366,6 +1355,12 @@ describe('writeArtifactsFromResults', () => {
);

expect(summary.schema_version).toBe(METRICS_SCHEMA_VERSION);
expect(summary.trace).toMatchObject({
schema_version: 'agentv.trace.v1',
trace_id: expect.any(String),
root_span_id: expect.any(String),
});
expect(summary.trace).not.toHaveProperty('path');
expect(summary.source_artifacts).toMatchObject({
transcript_path: 'transcript.jsonl',
grading_path: 'grading.json',
Expand Down
Loading
Loading