Skip to content

feat: Eval benchmark repo sync to remote targets #1232

@christso

Description

@christso

Objective

Provide a first-class way for agentv to deliver an eval benchmark repo (code + fixtures) onto a target before running evals, so the same eval definition can run unchanged against local, ephemeral CI, and long-lived VM targets.

Current Problem

Today, getting a benchmark repo onto a remote eval target (Azure VM, runner, etc.) is left to the caller. There is no agentv-level model for:

  • where the source of truth lives (repo, ref, fixtures),
  • how it is delivered to a given target,
  • how agentv verifies prerequisites are in place before a run starts.

This forces users to script ad-hoc clones / rsyncs per target and re-implement retries, atomic swap, and auth. As more eval targets land (Azure VM, GHA runner, long-lived dev boxes), this gap compounds.

Proposal

1. Declarative source + targets config

Separate what (source of truth) from where/how (delivery):

source:
  repo: https://github.com/org/eval-benchmarks
  ref: main
  fixtures:
    manifest: evals/fixtures.lock     # content-addressed blobs
    cache_path: .agentv/cache

targets:
  local:
    type: local
    sync: noop                        # source already on disk

  ci:
    type: gha-runner
    sync: oneshot                     # one-shot clone, no daemon

  vm-eastus:
    type: azure-vm
    host: eval-vm-1.eastus.example.com
    workdir: /srv/agentv/repo
    sync: continuous                  # long-running, kept fresh
    sync_options:                     # declarative, not raw flags
      mode: shallow                   # shallow | full | sparse
      sparse_paths: [evals/, tests/]
      refresh_interval: 60s

User config describes intent. The sync mechanism is not exposed.

2. Internal SyncStrategy interface (not user-facing)

Built-in implementations selected by sync::

  • noop — source already present on target.
  • oneshot — single shallow clone (e.g. git clone --depth 1 --filter=blob:none), suitable for ephemeral runners.
  • continuous — long-running mirror; initial backing impl is the kubernetes/git-sync Go binary (partial-clone and sparse-checkout capable, atomic swap via worktree+symlink, supports GitHub App auth).
  • (future) rsync-from-bucket, entireio/git-sync regional ref-mirror pre-stage, etc.

The binary backing continuous is an implementation detail. Strategy is swappable without breaking user config.

3. BYO external binaries — no bundling, no postinstall download

agentv is an npm/Node package (~MBs). git-sync is a ~25MB Go binary, per-platform. Bundling or postinstalling it has known failure modes at enterprise scale: npm install --ignore-scripts policies, registries that strip postinstall, locked-down VMs whose HTTPS proxy does not allowlist GitHub releases. Auto-downloading at first use just moves the same failure to a more confusing place mid-run.

Instead:

  • Runtime preflight. When a target's strategy requires an external binary, check PATH at run start. If missing, fail fast with an actionable message:
    agentv: target 'vm-eastus' uses sync mode 'continuous', which requires the
    git-sync binary. It is not installed.
    
    Install with:    agentv install git-sync
    or via:          brew install git-sync   |   apt install git-sync   |   manual download
    
    For air-gapped environments, place the binary on PATH manually.
    
  • agentv install git-sync subcommand. Opt-in installer. Downloads a pinned version from GitHub releases, verifies SHA-256 against a vendored manifest, installs to ~/.agentv/bin/ (which agentv prepends to PATH for child processes it spawns). Idempotent.
  • agentv doctor subcommand. Lists every external dependency (git-sync, rsync, ssh, target-specific CLIs) with version + resolved path, and any missing ones with their install hint.
  • Version pinning via vendored deps.json:
    {
      "git-sync": {
        "version": "vX.Y.Z",
        "sha256": {
          "linux-amd64": "...",
          "linux-arm64": "...",
          "darwin-amd64": "...",
          "darwin-arm64": "..."
        }
      }
    }
    Never resolve latest. Bumps are deliberate, reviewable, and rollback-able.

4. Cross-platform scope

  • Linux + macOS: primary, both covered by agentv install.
  • Windows: remote-target sync via WSL is primary. Windows-native is best-effort and not blocking for v1. local target on Windows-native continues to work without git-sync.

Design Latitude

  • Strategy names (noop / oneshot / continuous) are suggestions — pick what fits the existing target/provider vocabulary.
  • The shape of the internal SyncStrategy interface is up to the implementer.
  • Plugin vs built-in is open. Per AGENTS.md design principles BbEval TypeScript Migration #1 (Lightweight Core / Plugin Extensibility) and Refactor #3 (Composition), the implementer should first audit whether before_all / after_all hooks + a target-provider plugin can cover this without a new built-in primitive. If composition is sufficient, this issue resolves as "document the pattern" rather than "ship new core code."
  • agentv install may start as a single-purpose installer or be generalised to agentv install <dep> reading from deps.json — both are acceptable.
  • kubernetes/git-sync is the initial backing impl for continuous. Not a long-term commitment — the strategy interface is the contract.

Acceptance Signals

  • An eval can be defined once and run against local, ci, and a remote VM target via the same agentv eval ... invocation, with target-appropriate sync happening transparently.
  • agentv doctor reports presence/absence and version of any required external binaries for the configured targets.
  • agentv install git-sync (or generic equivalent) installs a pinned version with SHA-256 verification into ~/.agentv/bin/ and is idempotent.
  • When a target requires an external binary that isn't installed, the run fails fast at preflight with an actionable error message (not mid-run).
  • No postinstall script in the agentv npm package downloads or extracts external binaries.
  • User-facing docs describe the source / targets schema and the supported sync: modes. Raw git-sync flags are not documented as user surface.

Non-Goals

  • Bundling or auto-downloading git-sync (or any third-party binary) at install time.
  • Exposing raw git-sync flags as first-class config. An sync_options.advanced.extra_args escape hatch may exist but is intentionally absent from golden-path docs.
  • Two-way sync, conflict resolution, or write-back to the source repo.
  • Replacing existing before_all / after_all hooks for users who already script their own delivery — the new schema is opt-in.
  • Windows-native (non-WSL) support for continuous sync in v1.

Related

  • Brainstorm notes (workspace repo christso/agentv-allagents) comparing kubernetes/git-sync vs DIY vs entireio/git-sync.
  • kubernetes/git-sync — initial backing impl candidate for continuous.
  • entireio/git-sync — candidate future strategy for regional ref-mirror pre-staging (remote-to-remote, no local checkout).
  • AGENTS.md §Design Principles, particularly §1 (Lightweight Core / Plugin Extensibility), §3 (Composition), and §5 (YAGNI) — implementer should validate that this cannot be covered by composing existing primitives before adding new core surface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreAnything pertaining to core functionality of AgentV

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions