feat: bundle all skills in CLI + single agentv-dev wrapper + subagent pipeline#1231
Conversation
Skills are now bundled inside the CLI npm package (`apps/cli/skills/` → `dist/skills/` at build time), version-matched to the binary. A new `agentv skills` subcommand serves the bundled content without any separate plugin install step. - `agentv skills list` — list available skill names (--json) - `agentv skills get <name>` — print SKILL.md content (--full, --json) - `agentv skills get --all` — print all skills - `agentv skills path [<name>]` — print resolved skills directory Resolution walks upward from the module file, validating by SKILL.md presence to avoid false matches. Prefers `dist/skills/` (production layout) over bare `skills/` (source layout). The marketplace plugin SKILL.md files are converted to discovery stubs that redirect agents to `agentv skills get <name>`. Full skill content lives in `apps/cli/skills/` as the single source of truth. Docs: update installation.mdx so the canonical setup is `npm install -g agentv` alone; the allagents plugin step moves to an optional "Claude Code Plugin" section. Closes EntityProcess#1224 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes EntityProcess#1229. - skills get <name> --ref <file>: load a single reference without --full. Searches references/, templates/, agents/, then the skill root. Auto- appends .md if the caller passed a bare name. --ref is incompatible with --all and takes precedence over --full. - readSkill --full now also collects agents/ alongside references/ and templates/, so agent role definitions ship together with the skill. - Drop scripts/ and assets/ from every bundled skill. Scripts already duplicated CLI behavior (onboard-agentv.sh ↔ agentv init, trajectory.html / eval_review.html ↔ agentv studio); lint_eval.py is replaced by an inline structural checklist in agentv-eval-review's SKILL.md until a dedicated 'agentv eval lint' lands. - Refresh the affected SKILL.md files: agentv-onboarding now invokes agentv init directly (no platform script), agentv-eval-review inlines the deterministic checks the deleted lint script performed, and every skill documents 'skills get --ref <file>' / 'skills path' for selective reference loading. - Tests: extend the skills unit test fixture to exercise agents/ and bare-root files; assert findRefFile lookup order, .md auto-append, and miss path.
…er pattern Skills are now sourced from <repo-root>/skills-data/ instead of apps/cli/skills/. This mirrors agent-browser's top-level skill-data/ layout and keeps user-authored content out of the CLI workspace. - git mv apps/cli/skills → skills-data - tsup.config.ts: srcSkillsDir now resolves to ../../skills-data - skills-resolver in src/commands/skills/index.ts learns a third candidate name (skills-data/) so dev-mode source runs (bun apps/cli/src/cli.ts skills …) keep working without first building. Order at each ancestor: dist/skills/ → skills-data/ → skills/ (legacy fallback). - Build output stays at dist/skills/, so the npm tarball is unchanged. - Verified: bun run build, dist/skills/ populated, node dist/cli.js skills list / get --ref / path all return expected content. Source mode (no dist) also resolves via skills-data/.
When pipeline input or pipeline run detects a non-CLI target (subagent-as-target mode), print actionable next steps for the orchestrating agent: - Dispatch executor subagents per test case - Run code graders via pipeline grade - Dispatch LLM grader subagents (read agents/grader.md) - Merge scores via pipeline bench Also point to the full procedure reference: agentv skills get agentv-bench --ref subagent-pipeline This addresses the gap where agents running in subagent mode had no visibility into what to do after pipeline input extracted the test cases.
29d80bf to
23bbe3f
Compare
When the agent IS the target (subagent-as-target mode), the pipeline guidance now tells the agent to grade its own outputs against criteria rather than dispatching separate grader subagents. The agent already IS the LLM — it can read its own response.md, evaluate against criteria.md, and write llm_grader_results directly. Updated: - pipeline input: guidance says "grade your own responses" - pipeline run: same guidance for subagent mode - subagent-pipeline.md: clarifies self-grading in subagent mode
Revert over-correction — the main agent should NOT grade its own outputs. Instead it spawns grader subagents (one per test x LLM grader pair) using agents/grader.md as their instructions. The orchestrating agent dispatches: 1. Executor subagents (one per test case) 2. Grader subagents (one per test x LLM grader pair) 3. Runs pipeline bench to merge scores agents/grader.md defines the full grading procedure for spawned subagents.
…instructions The main agent reads agents/grader.md and embeds its full content as system instructions in each grader subagent prompt. Subagents do not self-discover the file — they need it passed to them.
rubrics assertions are normalized to type: llm-grader with a rubrics array by the grader parser. But writeGraderConfigs only wrote prompt_content (empty for rubrics) and dropped the rubrics array. Now includes the rubrics criteria array in llm_graders/<name>.json so grader subagents can evaluate each criterion directly.
- eval run: print TIP about pipeline when target is claude-cli/copilot-cli - pipeline --help: description now says use this for agent targets - pipeline run --help: hints about executor subagents for agent targets Previously Claude would default to eval run and never discover pipeline. Now both the top-level help and the eval run output guide toward pipeline.
…l --help Pipeline now shows: Subagent-mode eval pipeline (input → executor subagents → grade → bench) — use this when the eval target is an AI agent (Claude, Codex, etc.) This means Claude/Codex can discover pipeline from agentv --help without needing a nudge.
Agents read CLAUDE.md before running tasks. Without this note, they default to eval run instead of pipeline for agent targets.
…ility" This reverts commit 1431cc5.
Copilot Eval Pipeline Test — GPT-5.4 vs Sonnet 4.6Setup: Sonnet 4.6 — Pipeline discovered, full subagent flow
GPT-5.4 — Pipeline not discovered, went to eval run
Failed tests (GPT-5.4):
Root CauseSonnet reads the installed skill ( What This Means
Commands Used# Sonnet 4.6 (pipe mode)
timeout 600 copilot -p "run evals on evals/self/skills/output-correctness.eval.yaml using agentv" --yolo
# GPT-5.4 (pipe mode)
timeout 600 copilot -p "run evals on evals/self/skills/output-correctness.eval.yaml using agentv" --yolo --model gpt-5.4
# Skill install
npx skills add tsoyangbot/agentv -yArtifacts
|
Copilot Eval Pipeline Test — GPT-5.4 in Clean Directory (No AGENTS.md)Setup: Clean temp dir ( Result: GPT-5.4 discovered pipeline from skill alone
Failed test:
Key FindingWithout AGENTS.md, GPT-5.4 follows the skill correctly. The previous test (in the agentv repo with AGENTS.md) failed because GPT-5.4 read AGENTS.md first and went to Implications
Commands Used# Setup clean directory
mkdir -p /tmp/agentv-test/evals/self/skills
cp evals/self/skills/output-correctness.eval.yaml /tmp/agentv-test/evals/self/skills/
cp -r evals/self/skills/fixtures /tmp/agentv-test/evals/self/skills/
cd /tmp/agentv-test && npx skills add tsoyangbot/agentv -y
# Run eval with GPT-5.4
timeout 600 copilot -p "run evals on evals/self/skills/output-correctness.eval.yaml using agentv" --yolo --model gpt-5.4Artifacts
|
WTG.AI.Prompts Repo Test — GPT-5.4 Subagent ModeRepo: Result
Flow
Key findingWhen explicitly asked for subagent mode, GPT-5.4 follows the correct pipeline. The repo's All tests this session
|
- Replace hardcoded [references, templates, agents] with dynamic iteration over ALL subdirectories in each skill folder - Add listSkillSubdirs() helper that reads directory entries at runtime - --full now includes scripts/, assets/, fixtures/, etc. automatically - --ref searches all subdirectories (not just the hardcoded three) - Bundle trajectory.html and lint_eval.py scripts in skills-data - No more code changes needed when adding new subdirectory types
Schema was missing after plugins/ cleanup during skills consolidation. Regenerated with bun packages/core/scripts/generate-eval-schema.ts. 1752 tests pass, 0 fail.
187c697 to
abd08cb
Compare
- Update generate script to write to skills-data/ instead of plugins/ - Update test to read from skills-data/ - Delete redundant plugins/ copy - skills-data/ is now the single source of truth for all skill content
AgentV Bundled Skills + Subagent Pipeline
Supersedes #1226.
What changed
Single
agentv-devwrapper skill inplugins/agentv-dev/skills/agentv-dev/SKILL.md— replaces 7 individual skill wrappers. User installs once vianpx skills add EntityProcess/agentv, gets one skill that lists all CLI skills.All skills bundled into CLI dist via
skills-data/→apps/cli/dist/skills/:agentv-bench— run evals, benchmark, optimize, autoresearchagentv-eval-writer— write/edit eval YAMLagentv-eval-review— review/lint eval qualityagentv-governance— governance blocks (OWASP, MITRE, EU AI Act)agentv-trace-analyst— analyze traces, find regressionsAgent loads wrapper → picks skill →
agentv skills get <name>→ CLI serves full content (version-matched).Subagent pipeline improvements:
read_agentloops)pipelinedescription in top-level--helpfor discoverabilityllm_graders/outputArchitecture
Verified
GPT-5.4 follows AGENTS.md before skills. Without AGENTS.md, skill discovery works correctly.