Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .agents/publish.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Contract gates:
## Published Packages

- `packages/core/` publishes as `@agentv/core`
- `packages/sdk/` publishes as `@agentv/sdk`
- `apps/cli/` publishes as `agentv`
- The CLI bundles workspace dependencies via tsup with `noExternal: ["@agentv/core"]`
- Install with `bun install -g agentv` or `npm install -g agentv`
3 changes: 0 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ WORKDIR /app
COPY package.json bun.lock ./
COPY packages/core/package.json packages/core/
COPY packages/sdk/package.json packages/sdk/
COPY packages/eval/package.json packages/eval/
COPY apps/cli/package.json apps/cli/
COPY apps/dashboard/package.json apps/dashboard/
COPY apps/web/package.json apps/web/
Expand Down Expand Up @@ -60,8 +59,6 @@ COPY --from=build /app/packages/core/dist ./packages/core/dist
COPY --from=build /app/packages/core/package.json ./packages/core/
COPY --from=build /app/packages/sdk/dist ./packages/sdk/dist
COPY --from=build /app/packages/sdk/package.json ./packages/sdk/
COPY --from=build /app/packages/eval/dist ./packages/eval/dist
COPY --from=build /app/packages/eval/package.json ./packages/eval/
COPY --from=build /app/apps/cli/dist ./apps/cli/dist
COPY --from=build /app/apps/cli/package.json ./apps/cli/
COPY --from=build /app/apps/cli/node_modules ./apps/cli/node_modules
Expand Down
2 changes: 1 addition & 1 deletion apps/web/src/content/docs/docs/evaluation/sdk.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ npm install @agentv/sdk
import { defineCodeGrader } from '@agentv/sdk';
```

The general policy is hard convergence for same-week or unreleased surface names: use the correct package, field, or wire name instead of carrying aliases. The package rename is the exception because `@agentv/eval` was already published. It remains a temporary deprecated compatibility package that re-exports `@agentv/sdk` for existing consumers, but it should not appear in new docs, examples, scaffolds, or skills except as migration guidance.
The general policy is hard convergence for same-week or unreleased surface names: use the correct package, field, or wire name instead of carrying aliases. `@agentv/eval` was already published, then deprecated on npm, and has been removed from this repository. New docs, examples, scaffolds, and skills should use `@agentv/sdk` directly.

## Choose a Surface

Expand Down
9 changes: 0 additions & 9 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

28 changes: 15 additions & 13 deletions docs/adr/2026-06-18-sdk-surface-decision.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ Date: 2026-06-18

Status: Accepted

Update 2026-06-22: `@agentv/eval` has been deprecated on npm and removed from
this repository. `@agentv/sdk` is the only lightweight TypeScript SDK package
published by the release workflow.

Supersedes: the earlier 2026-06-18 decision in this file that rejected a
separate `@agentv/sdk` package.

Expand Down Expand Up @@ -61,13 +65,14 @@ For this package rename, npm evidence changes the compatibility choice:
2026-05-19 through 2026-06-17.
- `@agentv/sdk` is not yet published at the time of this decision.

Therefore `@agentv/eval` remains only as a thin deprecated compatibility
package that re-exports `@agentv/sdk` for existing consumers. It should not be
used by new docs, examples, scaffolds, or skills except when explaining the
migration.
Therefore `@agentv/eval` was kept temporarily as a thin deprecated
compatibility package that re-exported `@agentv/sdk` for existing consumers. It
must not be used by new docs, examples, scaffolds, or skills except when
explaining the migration.

Future removal of `@agentv/eval` requires an explicit release/migration
decision. The compatibility package must not grow new API surface.
After npm deprecation, the explicit removal decision was made on 2026-06-22.
The compatibility package is no longer part of the workspace, release script,
publish script, or runtime Docker image.

## Non-Goals

Expand Down Expand Up @@ -95,15 +100,12 @@ Positive:

Negative:

- one temporary compatibility package remains until a later removal decision
- release scripts and examples need to carry both package paths during the
migration window
- users of the deprecated package must migrate imports to `@agentv/sdk`

## Tracker Impact

- `av-bv4.11`: this ADR supersedes the previous no-new-sdk decision and records
the new package-boundary decision.
- `av-bv4.12`: implementation should move the SDK surface to `packages/sdk` /
`@agentv/sdk`, keep `@agentv/eval` only as a deprecated shim because it was
already published, and update repo references so the old name is not taught as
primary.
- `av-bv4.12`: implementation moved the SDK surface to `packages/sdk` /
`@agentv/sdk`. The deprecated shim has since been removed after npm
deprecation.
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Prior research on `av-r0s` found that AgentV already has the right low-level com
- Custom assertions are discovered from `.agentv/assertions/` in `packages/core/src/evaluation/registry/assertion-discovery.ts`.
- Custom graders are discovered from `.agentv/graders/` in `packages/core/src/evaluation/registry/grader-discovery.ts`.
- `agentv create` already scaffolds evals, assertions, and providers in `apps/cli/src/commands/create/commands.ts`.
- The lightweight SDK contract lives in `packages/eval/src/assertion.ts`.
- The lightweight SDK contract lives in `packages/sdk/src/assertion.ts`.

The main ceremony problem is not that the schema cannot represent tasks. It is that users must repeatedly hand-write the same layout, provenance metadata, adapter scripts, and integration glue.

Expand Down
4 changes: 2 additions & 2 deletions docs/plans/trace-evaluation-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ The exact schema belongs in implementation, but these concepts should be stable:
### U1. Trace Artifact Model

- **Goal:** Introduce the core TypeScript model, Zod validation, and snake_case boundary conversion for trace artifacts.
- **Files:** `packages/core/src/evaluation/trace.ts`, `packages/core/src/evaluation/types.ts`, `packages/eval/src/schemas.ts`, new focused files under `packages/core/src/evaluation/trace/` if the existing file becomes too large.
- **Files:** `packages/core/src/evaluation/trace.ts`, `packages/core/src/evaluation/types.ts`, `packages/sdk/src/schemas.ts`, new focused files under `packages/core/src/evaluation/trace/` if the existing file becomes too large.
- **Patterns:** Follow the existing `TraceSummary`, `TokenUsage`, and project boundary conversion conventions. Keep internal fields camelCase and persisted fields snake_case.
- **Test Scenarios:** Add tests that validate round-trip conversion, missing optional content, inferred duration flags, branch metadata, and raw evidence handles.
- **Verification:** Unit tests should prove summaries can be derived from trace artifacts without changing current summary behavior, and that trace artifacts do not embed a separate summary payload.
Expand Down Expand Up @@ -268,7 +268,7 @@ The exact schema belongs in implementation, but these concepts should be stable:
### U7. Grader Context Upgrade

- **Goal:** Let built-in and code graders receive trace artifacts in addition to compact summaries and output messages.
- **Files:** `packages/core/src/evaluation/graders/types.ts`, `packages/core/src/evaluation/graders/tool-trajectory.ts`, `packages/core/src/evaluation/graders/execution-metrics.ts`, `packages/core/src/evaluation/graders/code-grader.ts`, `packages/eval/src/index.ts`, `packages/eval/src/schemas.ts`.
- **Files:** `packages/core/src/evaluation/graders/types.ts`, `packages/core/src/evaluation/graders/tool-trajectory.ts`, `packages/core/src/evaluation/graders/execution-metrics.ts`, `packages/core/src/evaluation/graders/code-grader.ts`, `packages/sdk/src/index.ts`, `packages/sdk/src/schemas.ts`.
- **Patterns:** Keep existing graders that only read `trace` or `output` working. Trace-aware graders use the richer object.
- **Test Scenarios:** Existing `tool-trajectory` modes should pass from live output and from trace artifact input. Argument matching, ordering, latency, status/error matching, and evidence text should be covered.
- **Verification:** `trace score` should run `tool-trajectory` against imported traces, not only metrics-only graders.
Expand Down
6 changes: 3 additions & 3 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@
"packageManager": "bun@1.3.3",
"workspaces": ["apps/*", "packages/*"],
"scripts": {
"build": "bun --filter @agentv/core build && bun --filter @agentv/sdk build && bun --filter @agentv/eval build && bun --filter @agentv/phoenix-adapter build && bun --filter @agentv/dashboard build && bun --filter agentv build",
"build": "bun --filter @agentv/core build && bun --filter @agentv/sdk build && bun --filter @agentv/phoenix-adapter build && bun --filter @agentv/dashboard build && bun --filter agentv build",
"verify": "bun run build && bun run typecheck && bun run lint && bun run test",
"typecheck": "bun --filter @agentv/core typecheck && bun --filter @agentv/sdk typecheck && bun --filter @agentv/eval typecheck && bun --filter @agentv/phoenix-adapter typecheck && bun --filter agentv typecheck",
"typecheck": "bun --filter @agentv/core typecheck && bun --filter @agentv/sdk typecheck && bun --filter @agentv/phoenix-adapter typecheck && bun --filter agentv typecheck",
"typecheck:workspace": "tsc -b tsconfig.build.json",
"typecheck:watch": "bun --filter @agentv/core typecheck -- --watch & bun --filter agentv typecheck -- --watch",
"lint": "biome check .",
"format": "biome format --write .",
"fix": "biome check --write .",
"test": "bun --filter @agentv/core test && bun --filter @agentv/sdk test && bun --filter @agentv/eval test && bun --filter @agentv/phoenix-adapter test && bun --filter agentv test && bun --filter @agentv/dashboard test",
"test": "bun --filter @agentv/core test && bun --filter @agentv/sdk test && bun --filter @agentv/phoenix-adapter test && bun --filter agentv test && bun --filter @agentv/dashboard test",
"test:watch": "bun --filter @agentv/core test:watch & bun --filter agentv test:watch",
"agentv": "bun apps/cli/src/cli.ts",
"agentv:buildrun": "bun run build && bun apps/cli/dist/cli.js",
Expand Down
22 changes: 0 additions & 22 deletions packages/eval/README.md

This file was deleted.

38 changes: 0 additions & 38 deletions packages/eval/package.json

This file was deleted.

9 changes: 0 additions & 9 deletions packages/eval/src/index.ts

This file was deleted.

17 changes: 0 additions & 17 deletions packages/eval/test/compatibility.test.ts

This file was deleted.

8 changes: 0 additions & 8 deletions packages/eval/tsconfig.json

This file was deleted.

21 changes: 0 additions & 21 deletions packages/eval/tsup.config.ts

This file was deleted.

2 changes: 1 addition & 1 deletion packages/sdk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ npm install @agentv/sdk
import { defineCodeGrader } from '@agentv/sdk';
```

`@agentv/eval` remains only as a temporary deprecated compatibility package that re-exports this SDK for existing consumers. New docs, examples, scaffolds, and skills should not import from it.
`@agentv/eval` was a temporary deprecated compatibility package for this SDK. It is no longer published from this repository. Use `@agentv/sdk` directly.

## Quick Start

Expand Down
2 changes: 1 addition & 1 deletion scripts/publish.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ if (requestedTag !== undefined && requestedTag !== 'next') {
const npmTag = requestedTag ?? 'latest';
const publishArgs = ['--tag', npmTag, '--access', 'public'];

const PACKAGES = ['packages/core', 'packages/sdk', 'packages/eval', 'apps/cli'];
const PACKAGES = ['packages/core', 'packages/sdk', 'apps/cli'];

interface PackageJson {
name: string;
Expand Down
1 change: 0 additions & 1 deletion scripts/release.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ const NEXT_PRERELEASE_TAG = 'next';
const PACKAGE_PATHS = [
'packages/core/package.json',
'packages/sdk/package.json',
'packages/eval/package.json',
'apps/cli/package.json',
];

Expand Down
2 changes: 1 addition & 1 deletion skills-data/agentv-eval-writer/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Comprehensive docs: https://agentv.dev

Treat YAML as the canonical portable model. Prefer authoring `.eval.yaml` / `EVAL.yaml` first, then use TypeScript helpers, Python scripts, or executable graders only when they lower to the same fields or when the evaluation logic must actually run code.

Use `@agentv/sdk` for TypeScript helper imports. Do not use `@agentv/eval` for new evals, examples, scaffolds, or skill guidance; it is only a deprecated compatibility shim for existing consumers during migration.
Use `@agentv/sdk` for TypeScript helper imports. Do not use `@agentv/eval` for new evals, examples, scaffolds, or skill guidance; it was a deprecated compatibility package and has been removed from this repository.

## Evaluation Types

Expand Down
Loading