Extension ID
harness
Extension Name
Research Harness
Version
1.0.0
Description
State-externalizing research harness: budgeted exploration, evidence curation, and claim verification for spec-driven development. Based on Harness-1 (arXiv:2606.02373).
Author
formin
Repository URL
https://github.com/formin/spec-kit-harness
Download URL
https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip
License
MIT
Homepage (optional)
https://github.com/formin/spec-kit-harness
Documentation URL (optional)
https://github.com/formin/spec-kit-harness/blob/main/README.md
Changelog URL (optional)
https://github.com/formin/spec-kit-harness/blob/main/CHANGELOG.md
Required Spec Kit Version
=0.2.0
Required Tools (optional)
None — commands are plain prompt files for the coding agent. No external tools, MCP servers, or network access required.
Number of Commands
5
Number of Hooks (optional)
2
Tags
research, verification, evidence, context-management, workflow
Key Features
- Externalizes long-horizon research state to per-feature markdown files: candidate pool (
candidates.md), importance-tagged curated set (curated.md), compact evidence links (evidence.md), verification records (verification.md), compressed deduplicated observations (observations.md), and a budget ledger (budget.md)
/speckit.harness.explore — budget-aware exploration loop with a strict policy/bookkeeping separation (the agent decides search / inspect / curate / stop; the harness rules handle dedup, compression, eviction, accounting) and a marginal-gain stop rule
/speckit.harness.verify — adversarial claim verification: re-checks load-bearing spec/plan claims against primary sources, records verdict + method + confidence; refutations propagate as suggested artifact corrections
/speckit.harness.status — budget-aware context rendering: compact state slices (never full files) plus one recommended next action; lets a brand-new session resume research from files with zero context carryover
/speckit.harness.report — synthesizes evidence + verdicts into the feature's research.md with a requirement-coverage table (covered-verified / covered-unverified / contradicted / uncovered)
- Optional hooks:
after_specify → speckit.harness.init, after_plan → speckit.harness.verify
- Design adapted from Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses (Jiang et al., arXiv:2606.02373; reference implementation github.com/pat-jj/harness-1), applied to the research phase of spec-driven development
Testing Checklist
Submission Requirements
Testing Details
Tested on:
- Windows 11 (PowerShell 7), specify CLI from spec-kit @ main (commit 5ae7ff5) via uvx, Claude Code integration
Test project: fresh specify init harness-ext-test --integration claude scaffold
Test scenarios:
- Installed from the release archive:
specify extension add harness --from https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip
- Verified registration:
specify extension list shows harness 1.0.0 (5 commands, 2 hooks, Enabled); five speckit-harness-* skills registered for the Claude integration
- Manifest validated: YAML parses, all
provides.commands[].file paths resolve, command names match the required speckit.{ext-id}.{command} pattern
- Exercised the command lifecycle end-to-end on the test project (agent-executed):
init → explore → verify → status → report (state files created, budget ledger updated, verification record written, research.md generated with a coverage table)
- Confirmed idempotency guard (
init refuses to clobber existing state) and read-only behavior of status
Example Usage
# Install extension
specify extension add harness --from https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip
# After /speckit.specify, set up the harness and research with a budget
/speckit.harness.init How is session state currently handled, and what are the revocation options?
/speckit.harness.explore
/speckit.harness.verify
/speckit.harness.report # research.md now has a requirement-coverage table
/speckit.plan # planning starts from verified evidence
Proposed Catalog Entry
{
"harness": {
"name": "Research Harness",
"id": "harness",
"description": "State-externalizing research harness: budgeted exploration, evidence curation, and claim verification for spec-driven development",
"author": "formin",
"version": "1.0.0",
"download_url": "https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip",
"repository": "https://github.com/formin/spec-kit-harness",
"homepage": "https://github.com/formin/spec-kit-harness",
"documentation": "https://github.com/formin/spec-kit-harness/blob/main/README.md",
"changelog": "https://github.com/formin/spec-kit-harness/blob/main/CHANGELOG.md",
"license": "MIT",
"category": "process",
"effect": "read-write",
"requires": {
"speckit_version": ">=0.2.0"
},
"provides": {
"commands": 5,
"hooks": 2
},
"tags": [
"research",
"verification",
"evidence",
"context-management",
"workflow"
],
"verified": false,
"downloads": 0,
"stars": 0,
"created_at": "2026-06-11T00:00:00Z",
"updated_at": "2026-06-11T00:00:00Z"
}
}
Additional Context
The extension adapts the harness side of Harness-1 (the paper trains a 20B RL policy; here the policy is whatever coding agent runs Spec Kit, and the harness is a file-based protocol the commands enforce). The mapping from each paper mechanism to the extension — and the deliberate differences — are documented in docs/concepts.md: https://github.com/formin/spec-kit-harness/blob/main/docs/concepts.md
Supersedes #2924, which I filed via the gh CLI before realizing the form's automatic labels (extension-submission) would be missing; #2924 is now closed in favor of this issue. Sorry for the noise.
🤖 Generated with Claude Code (https://claude.com/claude-code)
Extension ID
harness
Extension Name
Research Harness
Version
1.0.0
Description
State-externalizing research harness: budgeted exploration, evidence curation, and claim verification for spec-driven development. Based on Harness-1 (arXiv:2606.02373).
Author
formin
Repository URL
https://github.com/formin/spec-kit-harness
Download URL
https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip
License
MIT
Homepage (optional)
https://github.com/formin/spec-kit-harness
Documentation URL (optional)
https://github.com/formin/spec-kit-harness/blob/main/README.md
Changelog URL (optional)
https://github.com/formin/spec-kit-harness/blob/main/CHANGELOG.md
Required Spec Kit Version
Required Tools (optional)
Number of Commands
5
Number of Hooks (optional)
2
Tags
research, verification, evidence, context-management, workflow
Key Features
candidates.md), importance-tagged curated set (curated.md), compact evidence links (evidence.md), verification records (verification.md), compressed deduplicated observations (observations.md), and a budget ledger (budget.md)/speckit.harness.explore— budget-aware exploration loop with a strict policy/bookkeeping separation (the agent decides search / inspect / curate / stop; the harness rules handle dedup, compression, eviction, accounting) and a marginal-gain stop rule/speckit.harness.verify— adversarial claim verification: re-checks load-bearing spec/plan claims against primary sources, records verdict + method + confidence; refutations propagate as suggested artifact corrections/speckit.harness.status— budget-aware context rendering: compact state slices (never full files) plus one recommended next action; lets a brand-new session resume research from files with zero context carryover/speckit.harness.report— synthesizes evidence + verdicts into the feature'sresearch.mdwith a requirement-coverage table (covered-verified / covered-unverified / contradicted / uncovered)after_specify→speckit.harness.init,after_plan→speckit.harness.verifyTesting Checklist
Submission Requirements
extension.ymlmanifest includedTesting Details
Tested on:
Test project: fresh
specify init harness-ext-test --integration claudescaffoldTest scenarios:
specify extension add harness --from https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zipspecify extension listshowsharness 1.0.0(5 commands, 2 hooks, Enabled); fivespeckit-harness-*skills registered for the Claude integrationprovides.commands[].filepaths resolve, command names match the requiredspeckit.{ext-id}.{command}patterninit→explore→verify→status→report(state files created, budget ledger updated, verification record written,research.mdgenerated with a coverage table)initrefuses to clobber existing state) and read-only behavior ofstatusExample Usage
Proposed Catalog Entry
{ "harness": { "name": "Research Harness", "id": "harness", "description": "State-externalizing research harness: budgeted exploration, evidence curation, and claim verification for spec-driven development", "author": "formin", "version": "1.0.0", "download_url": "https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip", "repository": "https://github.com/formin/spec-kit-harness", "homepage": "https://github.com/formin/spec-kit-harness", "documentation": "https://github.com/formin/spec-kit-harness/blob/main/README.md", "changelog": "https://github.com/formin/spec-kit-harness/blob/main/CHANGELOG.md", "license": "MIT", "category": "process", "effect": "read-write", "requires": { "speckit_version": ">=0.2.0" }, "provides": { "commands": 5, "hooks": 2 }, "tags": [ "research", "verification", "evidence", "context-management", "workflow" ], "verified": false, "downloads": 0, "stars": 0, "created_at": "2026-06-11T00:00:00Z", "updated_at": "2026-06-11T00:00:00Z" } }Additional Context
The extension adapts the harness side of Harness-1 (the paper trains a 20B RL policy; here the policy is whatever coding agent runs Spec Kit, and the harness is a file-based protocol the commands enforce). The mapping from each paper mechanism to the extension — and the deliberate differences — are documented in docs/concepts.md: https://github.com/formin/spec-kit-harness/blob/main/docs/concepts.md
Supersedes #2924, which I filed via the gh CLI before realizing the form's automatic labels (extension-submission) would be missing; #2924 is now closed in favor of this issue. Sorry for the noise.
🤖 Generated with Claude Code (https://claude.com/claude-code)