Skip to content

[Extension]: Add Research Harness #2924

@formin

Description

@formin

Extension ID

harness

Extension Name

Research Harness

Version

1.0.0

Description

State-externalizing research harness: budgeted exploration, evidence curation, and claim verification for spec-driven development. Based on Harness-1 (arXiv:2606.02373).

Author

formin

Repository URL

https://github.com/formin/spec-kit-harness

Download URL

https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip

License

MIT

Homepage (optional)

https://github.com/formin/spec-kit-harness

Documentation URL (optional)

https://github.com/formin/spec-kit-harness/blob/main/README.md

Changelog URL (optional)

https://github.com/formin/spec-kit-harness/blob/main/CHANGELOG.md

Required Spec Kit Version

=0.2.0

Required Tools (optional)

None — commands are plain prompt files for the coding agent. No external tools, MCP servers, or network access required.

Number of Commands

5

Number of Hooks (optional)

2

Tags

research, verification, evidence, context-management, workflow

Key Features

  • Externalizes long-horizon research state to per-feature markdown files: candidate pool (candidates.md), importance-tagged curated set (curated.md), compact evidence links (evidence.md), verification records (verification.md), compressed deduplicated observations (observations.md), and a budget ledger (budget.md)
  • /speckit.harness.explore — budget-aware exploration loop with a strict policy/bookkeeping separation (the agent decides search / inspect / curate / stop; the harness rules handle dedup, compression, eviction, accounting) and a marginal-gain stop rule
  • /speckit.harness.verify — adversarial claim verification: re-checks load-bearing spec/plan claims against primary sources, records verdict + method + confidence; refutations propagate as suggested artifact corrections
  • /speckit.harness.status — budget-aware context rendering: compact state slices (never full files) plus one recommended next action; lets a brand-new session resume research from files with zero context carryover
  • /speckit.harness.report — synthesizes evidence + verdicts into the feature's research.md with a requirement-coverage table (covered-verified / covered-unverified / contradicted / uncovered)
  • Optional hooks: after_specifyspeckit.harness.init, after_planspeckit.harness.verify
  • Design adapted from Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses (Jiang et al., arXiv:2606.02373; reference implementation pat-jj/harness-1), applied to the research phase of spec-driven development

Testing Checklist

  • Extension installs successfully via download URL
  • All commands execute without errors
  • Documentation is complete and accurate
  • No security vulnerabilities identified
  • Tested on at least one real project

Submission Requirements

  • Valid extension.yml manifest included
  • README.md with installation and usage instructions
  • LICENSE file included
  • GitHub release created with version tag
  • All command files exist and are properly formatted
  • Extension ID follows naming conventions (lowercase-with-hyphens)

Testing Details

Tested on:

  • Windows 11 (PowerShell 7), specify CLI from spec-kit @ main (commit 5ae7ff5) via uvx, Claude Code integration

Test project: fresh specify init harness-ext-test --integration claude scaffold

Test scenarios:

  1. Installed from the release archive: specify extension add harness --from https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip
  2. Verified registration: specify extension list shows harness 1.0.0; all five speckit.harness.* command files registered for the Claude integration
  3. Manifest validated: YAML parses, all provides.commands[].file paths resolve, command names match ^speckit\.[a-z0-9-]+\.[a-z0-9-]+$
  4. Exercised the command lifecycle end-to-end on the test project (agent-executed): init (state files created with budget ledger) → explore (budgeted actions, candidates/curated/evidence updated, observation compression) → verify (claim re-checked against primary source, verdict recorded) → status (slice rendering + next-action recommendation) → report (research.md generated with coverage table)
  5. Confirmed idempotency guard (init refuses to clobber existing state) and read-only behavior of status

Example Usage

# Install extension
specify extension add harness --from https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip

# After /speckit.specify, set up the harness and research with a budget
/speckit.harness.init How is session state currently handled, and what are the revocation options?
/speckit.harness.explore
/speckit.harness.verify
/speckit.harness.report     # research.md now has a requirement-coverage table
/speckit.plan               # planning starts from verified evidence

Proposed Catalog Entry

{
  "harness": {
    "name": "Research Harness",
    "id": "harness",
    "description": "State-externalizing research harness: budgeted exploration, evidence curation, and claim verification for spec-driven development",
    "author": "formin",
    "version": "1.0.0",
    "download_url": "https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip",
    "repository": "https://github.com/formin/spec-kit-harness",
    "homepage": "https://github.com/formin/spec-kit-harness",
    "documentation": "https://github.com/formin/spec-kit-harness/blob/main/README.md",
    "changelog": "https://github.com/formin/spec-kit-harness/blob/main/CHANGELOG.md",
    "license": "MIT",
    "category": "process",
    "effect": "read-write",
    "requires": {
      "speckit_version": ">=0.2.0"
    },
    "provides": {
      "commands": 5,
      "hooks": 2
    },
    "tags": ["research", "verification", "evidence", "context-management", "workflow"],
    "verified": false,
    "downloads": 0,
    "stars": 0,
    "created_at": "2026-06-11T00:00:00Z",
    "updated_at": "2026-06-11T00:00:00Z"
  }
}

Additional Context

The extension adapts the harness side of Harness-1 (the paper trains a 20B RL policy; here the policy is whatever coding agent runs Spec Kit, and the harness is a file-based protocol the commands enforce). The mapping from each paper mechanism to the extension — and the deliberate differences — are documented in docs/concepts.md.

Submitted via gh CLI following the Extension Submission issue form fields, so form labels may be missing from this issue.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions