[Extension]: Add Research Harness

### Extension ID

harness

### Extension Name

Research Harness

### Version

1.0.0

### Description

State-externalizing research harness: budgeted exploration, evidence curation, and claim verification for spec-driven development. Based on Harness-1 (arXiv:2606.02373).

### Author

formin

### Repository URL

https://github.com/formin/spec-kit-harness

### Download URL

https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip

### License

MIT

### Homepage (optional)

https://github.com/formin/spec-kit-harness

### Documentation URL (optional)

https://github.com/formin/spec-kit-harness/blob/main/README.md

### Changelog URL (optional)

https://github.com/formin/spec-kit-harness/blob/main/CHANGELOG.md

### Required Spec Kit Version

>=0.2.0

### Required Tools (optional)

None — commands are plain prompt files for the coding agent. No external tools, MCP servers, or network access required.

### Number of Commands

5

### Number of Hooks (optional)

2

### Tags

research, verification, evidence, context-management, workflow

### Key Features

- Externalizes long-horizon research state to per-feature markdown files: candidate pool (`candidates.md`), importance-tagged curated set (`curated.md`), compact evidence links (`evidence.md`), verification records (`verification.md`), compressed deduplicated observations (`observations.md`), and a budget ledger (`budget.md`)
- `/speckit.harness.explore` — budget-aware exploration loop with a strict policy/bookkeeping separation (the agent decides *search / inspect / curate / stop*; the harness rules handle dedup, compression, eviction, accounting) and a marginal-gain stop rule
- `/speckit.harness.verify` — adversarial claim verification: re-checks load-bearing spec/plan claims against primary sources, records verdict + method + confidence; refutations propagate as suggested artifact corrections
- `/speckit.harness.status` — budget-aware context rendering: compact state slices (never full files) plus one recommended next action; lets a brand-new session resume research from files with zero context carryover
- `/speckit.harness.report` — synthesizes evidence + verdicts into the feature's `research.md` with a requirement-coverage table (covered-verified / covered-unverified / contradicted / uncovered)
- Optional hooks: `after_specify` → `speckit.harness.init`, `after_plan` → `speckit.harness.verify`
- Design adapted from *Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses* (Jiang et al., [arXiv:2606.02373](https://arxiv.org/abs/2606.02373); reference implementation [pat-jj/harness-1](https://github.com/pat-jj/harness-1)), applied to the research phase of spec-driven development

### Testing Checklist

- [x] Extension installs successfully via download URL
- [x] All commands execute without errors
- [x] Documentation is complete and accurate
- [x] No security vulnerabilities identified
- [x] Tested on at least one real project

### Submission Requirements

- [x] Valid `extension.yml` manifest included
- [x] README.md with installation and usage instructions
- [x] LICENSE file included
- [x] GitHub release created with version tag
- [x] All command files exist and are properly formatted
- [x] Extension ID follows naming conventions (lowercase-with-hyphens)

### Testing Details

**Tested on:**
- Windows 11 (PowerShell 7), specify CLI from spec-kit @ main (commit `5ae7ff5`) via `uvx`, Claude Code integration

**Test project:** fresh `specify init harness-ext-test --integration claude` scaffold

**Test scenarios:**
1. Installed from the release archive: `specify extension add harness --from https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip`
2. Verified registration: `specify extension list` shows `harness 1.0.0`; all five `speckit.harness.*` command files registered for the Claude integration
3. Manifest validated: YAML parses, all `provides.commands[].file` paths resolve, command names match `^speckit\.[a-z0-9-]+\.[a-z0-9-]+$`
4. Exercised the command lifecycle end-to-end on the test project (agent-executed): `init` (state files created with budget ledger) → `explore` (budgeted actions, candidates/curated/evidence updated, observation compression) → `verify` (claim re-checked against primary source, verdict recorded) → `status` (slice rendering + next-action recommendation) → `report` (`research.md` generated with coverage table)
5. Confirmed idempotency guard (`init` refuses to clobber existing state) and read-only behavior of `status`

### Example Usage

```bash
# Install extension
specify extension add harness --from https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip

# After /speckit.specify, set up the harness and research with a budget
/speckit.harness.init How is session state currently handled, and what are the revocation options?
/speckit.harness.explore
/speckit.harness.verify
/speckit.harness.report     # research.md now has a requirement-coverage table
/speckit.plan               # planning starts from verified evidence
```

### Proposed Catalog Entry

```json
{
  "harness": {
    "name": "Research Harness",
    "id": "harness",
    "description": "State-externalizing research harness: budgeted exploration, evidence curation, and claim verification for spec-driven development",
    "author": "formin",
    "version": "1.0.0",
    "download_url": "https://github.com/formin/spec-kit-harness/archive/refs/tags/v1.0.0.zip",
    "repository": "https://github.com/formin/spec-kit-harness",
    "homepage": "https://github.com/formin/spec-kit-harness",
    "documentation": "https://github.com/formin/spec-kit-harness/blob/main/README.md",
    "changelog": "https://github.com/formin/spec-kit-harness/blob/main/CHANGELOG.md",
    "license": "MIT",
    "category": "process",
    "effect": "read-write",
    "requires": {
      "speckit_version": ">=0.2.0"
    },
    "provides": {
      "commands": 5,
      "hooks": 2
    },
    "tags": ["research", "verification", "evidence", "context-management", "workflow"],
    "verified": false,
    "downloads": 0,
    "stars": 0,
    "created_at": "2026-06-11T00:00:00Z",
    "updated_at": "2026-06-11T00:00:00Z"
  }
}
```

### Additional Context

The extension adapts the *harness* side of Harness-1 (the paper trains a 20B RL policy; here the policy is whatever coding agent runs Spec Kit, and the harness is a file-based protocol the commands enforce). The mapping from each paper mechanism to the extension — and the deliberate differences — are documented in [docs/concepts.md](https://github.com/formin/spec-kit-harness/blob/main/docs/concepts.md).

Submitted via `gh` CLI following the Extension Submission issue form fields, so form labels may be missing from this issue.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Extension]: Add Research Harness #2924

Extension ID

Extension Name

Version

Description

Author

Repository URL

Download URL

License

Homepage (optional)

Documentation URL (optional)

Changelog URL (optional)

Required Spec Kit Version

Required Tools (optional)

Number of Commands

Number of Hooks (optional)

Tags

Key Features

Testing Checklist

Submission Requirements

Testing Details

Example Usage

Proposed Catalog Entry

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Extension]: Add Research Harness #2924

Description

Extension ID

Extension Name

Version

Description

Author

Repository URL

Download URL

License

Homepage (optional)

Documentation URL (optional)

Changelog URL (optional)

Required Spec Kit Version

Required Tools (optional)

Number of Commands

Number of Hooks (optional)

Tags

Key Features

Testing Checklist

Submission Requirements

Testing Details

Example Usage

Proposed Catalog Entry

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions