Skip to content

[Extension]: Add Research Harness #2925

@formin

Description

@formin

Extension ID

harness

Extension Name

Research Harness

Version

1.0.0

Description

State-externalizing research harness: budgeted exploration, evidence curation, and claim verification for spec-driven development. Based on Harness-1 (arXiv:2606.02373).

Author

formin

Repository URL

https://github.com/formin/spec-kit-harness

Download URL

https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip

License

MIT

Homepage (optional)

https://github.com/formin/spec-kit-harness

Documentation URL (optional)

https://github.com/formin/spec-kit-harness/blob/main/README.md

Changelog URL (optional)

https://github.com/formin/spec-kit-harness/blob/main/CHANGELOG.md

Required Spec Kit Version

=0.2.0

Required Tools (optional)

None — commands are plain prompt files for the coding agent. No external tools, MCP servers, or network access required.

Number of Commands

5

Number of Hooks (optional)

2

Tags

research, verification, evidence, context-management, workflow

Key Features

  • Externalizes long-horizon research state to per-feature markdown files: candidate pool (candidates.md), importance-tagged curated set (curated.md), compact evidence links (evidence.md), verification records (verification.md), compressed deduplicated observations (observations.md), and a budget ledger (budget.md)
  • /speckit.harness.explore — budget-aware exploration loop with a strict policy/bookkeeping separation (the agent decides search / inspect / curate / stop; the harness rules handle dedup, compression, eviction, accounting) and a marginal-gain stop rule
  • /speckit.harness.verify — adversarial claim verification: re-checks load-bearing spec/plan claims against primary sources, records verdict + method + confidence; refutations propagate as suggested artifact corrections
  • /speckit.harness.status — budget-aware context rendering: compact state slices (never full files) plus one recommended next action; lets a brand-new session resume research from files with zero context carryover
  • /speckit.harness.report — synthesizes evidence + verdicts into the feature's research.md with a requirement-coverage table (covered-verified / covered-unverified / contradicted / uncovered)
  • Optional hooks: after_specifyspeckit.harness.init, after_planspeckit.harness.verify
  • Design adapted from Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses (Jiang et al., arXiv:2606.02373; reference implementation github.com/pat-jj/harness-1), applied to the research phase of spec-driven development

Testing Checklist

  • Extension installs successfully via download URL
  • All commands execute without errors
  • Documentation is complete and accurate
  • No security vulnerabilities identified
  • Tested on at least one real project

Submission Requirements

  • Valid extension.yml manifest included
  • README.md with installation and usage instructions
  • LICENSE file included
  • GitHub release created with version tag
  • All command files exist and are properly formatted
  • Extension ID follows naming conventions (lowercase-with-hyphens)

Testing Details

Tested on:

  • Windows 11 (PowerShell 7), specify CLI from spec-kit @ main (commit 5ae7ff5) via uvx, Claude Code integration

Test project: fresh specify init harness-ext-test --integration claude scaffold

Test scenarios:

  1. Installed from the release archive: specify extension add harness --from https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip
  2. Verified registration: specify extension list shows harness 1.0.0 (5 commands, 2 hooks, Enabled); five speckit-harness-* skills registered for the Claude integration
  3. Manifest validated: YAML parses, all provides.commands[].file paths resolve, command names match the required speckit.{ext-id}.{command} pattern
  4. Exercised the command lifecycle end-to-end on the test project (agent-executed): initexploreverifystatusreport (state files created, budget ledger updated, verification record written, research.md generated with a coverage table)
  5. Confirmed idempotency guard (init refuses to clobber existing state) and read-only behavior of status

Example Usage

# Install extension
specify extension add harness --from https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip

# After /speckit.specify, set up the harness and research with a budget
/speckit.harness.init How is session state currently handled, and what are the revocation options?
/speckit.harness.explore
/speckit.harness.verify
/speckit.harness.report     # research.md now has a requirement-coverage table
/speckit.plan               # planning starts from verified evidence

Proposed Catalog Entry

{
  "harness": {
    "name": "Research Harness",
    "id": "harness",
    "description": "State-externalizing research harness: budgeted exploration, evidence curation, and claim verification for spec-driven development",
    "author": "formin",
    "version": "1.0.0",
    "download_url": "https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip",
    "repository": "https://github.com/formin/spec-kit-harness",
    "homepage": "https://github.com/formin/spec-kit-harness",
    "documentation": "https://github.com/formin/spec-kit-harness/blob/main/README.md",
    "changelog": "https://github.com/formin/spec-kit-harness/blob/main/CHANGELOG.md",
    "license": "MIT",
    "category": "process",
    "effect": "read-write",
    "requires": {
      "speckit_version": ">=0.2.0"
    },
    "provides": {
      "commands": 5,
      "hooks": 2
    },
    "tags": [
      "research",
      "verification",
      "evidence",
      "context-management",
      "workflow"
    ],
    "verified": false,
    "downloads": 0,
    "stars": 0,
    "created_at": "2026-06-11T00:00:00Z",
    "updated_at": "2026-06-11T00:00:00Z"
  }
}

Additional Context

The extension adapts the harness side of Harness-1 (the paper trains a 20B RL policy; here the policy is whatever coding agent runs Spec Kit, and the harness is a file-based protocol the commands enforce). The mapping from each paper mechanism to the extension — and the deliberate differences — are documented in docs/concepts.md: https://github.com/formin/spec-kit-harness/blob/main/docs/concepts.md

Supersedes #2924, which I filed via the gh CLI before realizing the form's automatic labels (extension-submission) would be missing; #2924 is now closed in favor of this issue. Sorry for the noise.

🤖 Generated with Claude Code (https://claude.com/claude-code)

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions