From 4119bc04a739d3c199ded8e185ec0cd8718fd82b Mon Sep 17 00:00:00 2001 From: jsPark Date: Fri, 19 Jun 2026 13:49:32 +0900 Subject: [PATCH] Prepare v0.6.10 verified evidence release candidate --- README.md | 16 +- README_ko.md | 16 +- docs/CHANGELOG.md | 7 + docs/CHANGELOG_ko.md | 7 + internal/app/app.go | 15 + internal/app/commands.go | 17 + internal/app/doctor.go | 3 + internal/app/finish_gate.go | 14 +- internal/app/goal_state.go | 8 +- internal/app/layout.go | 24 +- internal/app/main_test.go | 171 +++++++++- internal/app/plan.go | 2 +- internal/app/readiness.go | 1 + internal/app/verified_evidence.go | 534 ++++++++++++++++++++++++++++++ plan.md | 7 +- 15 files changed, 816 insertions(+), 26 deletions(-) create mode 100644 internal/app/verified_evidence.go diff --git a/README.md b/README.md index eb7c20b..55cd5a7 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ If `plan.md` has a `Target Stage`, plain `hyper run` keeps moving packet by pack The goal is simple: start from a tiny MVP and keep upgrading it until it can behave like a real service, without every AI session losing the project thread. -Current release: `v0.6.9`. It can continue packet by packet toward a target stage, stop and write review notes when evidence is weak, require approval before changing stages, compare Service Quality work against category references, verify release downloads, and recover stale stage state with `hyper migrate`. +Current release: `v0.6.10`. It can continue packet by packet toward a target stage, stop and write review notes when evidence is weak, record command execution through Verified Evidence, require approval before changing stages, compare Service Quality work against category references, verify release downloads, and recover stale stage state with `hyper migrate`. ## First Run @@ -47,6 +47,14 @@ What happens: 5. The agent runs the finish gate internally to check the packet. 6. `.hyper/next-packet.md` tells Codex whether to continue, fix the same packet, advance stage, or stop. +When a repeatable command proves the packet, the agent should prefer: + +```bash +hyper verify --axis validation_coverage --name "go tests" -- go test ./... +``` + +This records exit code, stdout/stderr hashes, commit SHA, worktree status hash, run ID, and goal ID under `.hyper/verified-evidence/`. `evidence.md` should cite the Verified Evidence ID and explain what decision it supports. + ## Why It Helps Long AI coding sessions often drift: @@ -74,6 +82,8 @@ What stays in your repo: | `hyper run` | Creates the next focused packet from the plan and prior evidence. | | `goal.md` / `tasks.md` | What the AI should do now. | | `evidence.md` | What changed and how it was checked. | +| `hyper verify -- ` | Runs a validation command and records machine-readable proof. | +| `.hyper/verified-evidence/` | Verified command records, stdout/stderr logs, hashes, commit SHA, and goal/run metadata. | | `review.md` | What must be fixed if the packet is not good enough yet. | | `next.md` | One next step and reusable lessons. | | Agent finish gate (`hyper complete`) | Agent/runtime check that closes the packet and prepares the next action. | @@ -102,6 +112,7 @@ You do not need these terms to start, but they explain what Hyper Run is doing: | --- | --- | | Runtime packet | The next AI work bundle. | | Evidence | Proof that the work was done and checked. | +| Verified Evidence | Machine-recorded command proof from `hyper verify`, including exit code, log hashes, commit SHA, and goal/run metadata. | | Proof Contract | The packet's proof checklist. | | Learn | Extracting reusable lessons from `evidence.md` and `next.md`. Not a summary. | | Pressure Ledger | A list of repeated needs, gaps, or failures the project keeps showing. | @@ -138,6 +149,8 @@ For Service Quality benchmark examples, see [Reference Benchmark Evidence Exampl - While an explicit override is active, `hyper status` shows both the active override target and the `plan.md` target when they differ. - If you change or remove `Target Stage`, the next status/run/migrate cycle follows the updated plan target. - The agent finish gate runs before learning from the packet. If evidence is weak, it writes `review.md` findings and keeps the agent in the same packet. The underlying recovery command is `hyper complete`. +- `hyper verify` can run repeatable command checks directly and store machine-readable Verified Evidence. Finish gates and readiness can use those records instead of relying only on Markdown claims. +- `hyper status` and `hyper doctor` summarize the current packet's Verified Evidence records, including counts, newest record ID, command, status, and failed exit code when one exists. - If the same finish-gate findings repeat, Hyper Run records the repeat count and warns the agent to stop the auto loop unless the next fix directly addresses those findings. - `hyper run --auto --until ` still works as an explicit override. It still requires ready proof before stage advancement. - `hyper advance` applies a stage change only after `hyper status` says the gate is ready. In an active auto target, `.hyper/next-packet.md` can carry that advancement after the Stage Advancement Review; outside auto mode, user acceptance is still required. @@ -156,6 +169,7 @@ hyper init hyper run # agent implements the generated packet +# agent records repeatable command proof with hyper verify when available # agent updates evidence.md/next.md and runs the finish gate internally hyper status --short diff --git a/README_ko.md b/README_ko.md index dd565da..7b3167f 100644 --- a/README_ko.md +++ b/README_ko.md @@ -21,7 +21,7 @@ hyper run 목표는 단순합니다. 작은 MVP에서 시작해, AI 세션이 바뀌어도 문맥을 잃지 않고 실제 서비스처럼 다룰 수 있는 수준까지 계속 개선하는 것입니다. -현재 릴리즈는 `v0.6.9`입니다. 목표 stage까지 packet 단위로 이어가고, evidence가 약하면 멈춰서 review를 남기며, stage 변경은 사용자가 승인할 때만 적용합니다. Service Quality에서는 비슷한 reference와 비교할 수 있고, 설치/업데이트를 검증하며, 오래된 stage 상태는 `hyper migrate`로 복구합니다. +현재 릴리즈는 `v0.6.10`입니다. 목표 stage까지 packet 단위로 이어가고, evidence가 약하면 멈춰서 review를 남기며, 명령 실행은 Verified Evidence로 기록할 수 있고, stage 변경은 사용자가 승인할 때만 적용합니다. Service Quality에서는 비슷한 reference와 비교할 수 있고, 설치/업데이트를 검증하며, 오래된 stage 상태는 `hyper migrate`로 복구합니다. ## 첫 실행 @@ -47,6 +47,14 @@ $hyper run 5. agent가 finish gate를 내부적으로 실행해 packet을 확인합니다. 6. `.hyper/next-packet.md`가 계속할지, 같은 packet을 고칠지, stage를 올릴지, 멈출지 알려줍니다. +반복 가능한 명령이 packet을 증명한다면 agent는 가능한 한 이렇게 기록합니다. + +```bash +hyper verify --axis validation_coverage --name "go tests" -- go test ./... +``` + +이 명령은 exit code, stdout/stderr hash, commit SHA, worktree status hash, run ID, goal ID를 `.hyper/verified-evidence/` 아래에 저장합니다. `evidence.md`는 Verified Evidence ID를 인용하고, 그 기록이 어떤 판단을 뒷받침하는지 설명하는 사람이 읽는 요약 역할을 합니다. + ## 왜 도움이 되나요 AI 코딩을 오래 이어가면 이런 문제가 생깁니다. @@ -74,6 +82,8 @@ repo 안에 남는 것은 이 정도입니다. | `hyper run` | 계획과 이전 evidence를 읽고 다음 packet을 만듭니다. | | `goal.md` / `tasks.md` | AI가 지금 해야 할 작업입니다. | | `evidence.md` | 무엇을 바꿨고 어떻게 확인했는지 남기는 파일입니다. | +| `hyper verify -- ` | 검증 명령을 실행하고 기계가 읽을 수 있는 proof를 저장합니다. | +| `.hyper/verified-evidence/` | 검증된 명령 기록, stdout/stderr 로그, hash, commit SHA, goal/run metadata입니다. | | `review.md` | packet이 아직 부족하면 고칠 내용을 남기는 파일입니다. | | `next.md` | 다음 작업 하나와 재사용 가능한 배운 점을 남기는 파일입니다. | | Agent finish gate (`hyper complete`) | agent/runtime이 packet을 닫고 다음 행동을 준비하는 확인 단계입니다. | @@ -102,6 +112,7 @@ Hyper Run은 첫날부터 하네스를 만들라고 하지 않습니다. | --- | --- | | Runtime packet | 다음 AI 작업 묶음입니다. | | Evidence | 작업이 됐고 확인했다는 증거입니다. | +| Verified Evidence | `hyper verify`가 남기는 기계 기록입니다. exit code, log hash, commit SHA, goal/run metadata를 포함합니다. | | Proof Contract | 이번 packet의 증명 체크리스트입니다. | | Learn | `evidence.md`와 `next.md`에서 다음 작업에 다시 쓸 신호만 뽑는 단계입니다. 단순 요약이 아닙니다. | | Pressure Ledger | 프로젝트가 반복해서 보여준 필요, gap, 실패를 모아둔 목록입니다. | @@ -138,6 +149,8 @@ Service Quality benchmark 예시는 [Reference Benchmark Evidence 예시](docs/e - 명시 override가 켜져 있고 `plan.md` 목표와 다르면 `hyper status`가 현재 override 목표와 `plan.md` 목표를 같이 보여줍니다. - `Target Stage`를 바꾸거나 제거하면 다음 status/run/migrate 흐름이 수정된 plan target을 따릅니다. - Agent finish gate는 packet을 학습하기 전에 실행됩니다. evidence가 약하면 `review.md`에 보강할 내용을 남기고 agent가 같은 packet에 머무릅니다. 내부 복구 명령은 `hyper complete`입니다. +- `hyper verify`는 반복 가능한 명령 검증을 직접 실행하고 기계가 읽는 Verified Evidence를 저장합니다. Finish gate와 readiness는 Markdown 주장만이 아니라 이 기록도 근거로 사용할 수 있습니다. +- `hyper status`와 `hyper doctor`는 현재 packet의 Verified Evidence record 수, 최신 record ID, command, status, 실패한 exit code를 요약해 보여줍니다. - 같은 finish-gate finding이 반복되면 반복 횟수를 기록하고, 다음 수정이 그 finding을 직접 해결하지 않는 한 auto loop를 멈추도록 경고합니다. - `hyper run --auto --until `는 명시적인 override로 계속 사용할 수 있습니다. stage advancement 전에는 여전히 ready proof가 필요합니다. - `hyper advance`는 `hyper status`가 gate ready라고 말할 때만 적용합니다. active auto target 안에서는 `.hyper/next-packet.md`가 Stage Advancement Review 뒤의 advancement를 이어갈 수 있고, auto mode 밖에서는 사용자 승인이 필요합니다. @@ -156,6 +169,7 @@ hyper init hyper run # agent가 생성된 packet을 구현합니다 +# 반복 가능한 명령 proof가 있으면 agent가 hyper verify로 기록합니다 # agent가 evidence.md/next.md를 업데이트하고 finish gate를 내부 실행합니다 hyper status --short diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 1387d96..499e1d8 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -2,6 +2,13 @@ ## Unreleased +## v0.6.10 - 2026-06-19 + +- Add `hyper verify [--axis axis] [--name name] -- ` to run validation commands directly and record exit code, stdout/stderr hashes, commit SHA, worktree status hash, run ID, and goal ID under `.hyper/verified-evidence/`. +- Let packet completion, finish gates, readiness evidence, and active validator checks consume passed Verified Evidence records instead of relying only on Markdown validation claims. +- Update generated packet templates, Codex routing docs, `plan.md`, README, and README_ko so command proof is machine-recorded while humans stay focused on approval and policy boundaries. +- Show the current packet's Verified Evidence summary in `hyper status` and `hyper doctor`, including record counts, newest command record, and failed exit code when a failed record exists. + ## v0.6.9 - 2026-06-19 - Treat `Sustained Service Quality` as an ongoing operating target so a plan target at that stage keeps planning focused quality packets instead of stopping as target-proof-complete. diff --git a/docs/CHANGELOG_ko.md b/docs/CHANGELOG_ko.md index 00731c3..ddb308d 100644 --- a/docs/CHANGELOG_ko.md +++ b/docs/CHANGELOG_ko.md @@ -2,6 +2,13 @@ ## Unreleased +## v0.6.10 - 2026-06-19 + +- `hyper verify [--axis axis] [--name name] -- `를 추가했습니다. 검증 명령을 직접 실행하고 exit code, stdout/stderr hash, commit SHA, worktree status hash, run ID, goal ID를 `.hyper/verified-evidence/` 아래에 기록합니다. +- Packet 완료 판정, finish gate, readiness evidence, active validator check가 Markdown validation 주장만 보지 않고 통과한 Verified Evidence record도 근거로 사용할 수 있게 했습니다. +- Command proof는 기계 기록으로 남기고, 인간은 승인과 정책 경계에 집중하도록 generated packet template, Codex routing 문서, `plan.md`, README, README_ko를 갱신했습니다. +- `hyper status`와 `hyper doctor`에서 현재 packet의 Verified Evidence 요약을 보여줍니다. record 수, 최신 command record, 실패 record의 exit code를 확인할 수 있습니다. + ## v0.6.9 - 2026-06-19 - `Sustained Service Quality`를 계속 운영하는 target으로 처리해, 해당 stage를 plan target으로 둔 경우 target-proof-complete로 멈추지 않고 focused quality packet을 계속 계획하게 했습니다. diff --git a/internal/app/app.go b/internal/app/app.go index d466de4..09ab5a2 100644 --- a/internal/app/app.go +++ b/internal/app/app.go @@ -63,6 +63,11 @@ func runCLI(args []string, fsys fsRoot, updater updater) (commandOutput, *hyperE return stdout(commandUsage("status")), nil } return statusHyper(fsys, rest) + case "verify": + if helpRequested(rest) { + return stdout(commandUsage("verify")), nil + } + return verifyHyper(fsys, rest) case "doctor": if helpRequested(rest) { return stdout(commandUsage("doctor")), nil @@ -147,6 +152,7 @@ func usage() string { " hyper advance", " hyper status", " hyper status --short", + " hyper verify [--axis axis] [--name name] -- [args...]", " hyper doctor", " hyper repair", " hyper resume", @@ -159,6 +165,7 @@ func usage() string { " Edit plan.md, set `Target Stage` when you want continuation, then use plain `hyper run` to create the next runtime packet.", " Use `hyper run --auto --until service-quality [focus]` only when you need to override the plan target from the command line.", " The agent updates evidence.md and next.md, then runs the finish gate internally before another packet starts.", + " Use `hyper verify -- ` when command proof should be recorded with exit code, log hashes, commit SHA, and worktree state.", " When `hyper status` says the stage gate is ready, use `hyper advance` after user acceptance or an active auto-target review.", "", "Advanced/recovery:", @@ -203,6 +210,14 @@ func commandUsage(command string) string { "", "Shows current stage, gate, proof, pressure, next action, and blocking gaps.", }, + "verify": { + "Usage:", + " hyper verify [--axis axis] [--name name] -- [args...]", + "", + "Runs a validation command directly and records verified evidence under `.hyper/verified-evidence/`.", + "Captured metadata includes exit code, stdout/stderr hashes, commit SHA, worktree status hash, duration, run ID, and goal ID.", + "Use `--axis` to attach the proof to a readiness axis such as validation_coverage, core_ux, or sustained_quality.", + }, "doctor": { "Usage:", " hyper doctor", diff --git a/internal/app/commands.go b/internal/app/commands.go index e97612b..ad318ee 100644 --- a/internal/app/commands.go +++ b/internal/app/commands.go @@ -448,14 +448,31 @@ func statusHyper(fsys fsRoot, args []string) (commandOutput, *hyperError) { refresh := statusRefreshFor(root, state) if short { lines := statusShortLinesWithRefresh(state, derived, readiness, growth, refresh) + lines = appendStatusVerifiedEvidence(lines, root, state.CurrentGoalID, true) lines = appendStatusReviewFindings(lines, root, state.CurrentGoalID, derived) return stdout(strings.Join(lines, "\n")), nil } lines := statusDashboardLinesWithRefresh(state, derived, readiness, growth, runs, goals, refresh) + lines = appendStatusVerifiedEvidence(lines, root, state.CurrentGoalID, false) lines = appendStatusReviewFindings(lines, root, state.CurrentGoalID, derived) return stdout(strings.Join(lines, "\n")), nil } +func appendStatusVerifiedEvidence(lines []string, root, goalID string, short bool) []string { + if strings.TrimSpace(goalID) == "" { + return lines + } + if len(lines) > 0 && strings.TrimSpace(lines[len(lines)-1]) == "" { + lines = lines[:len(lines)-1] + } + if short { + lines = append(lines, verifiedEvidenceShortLine(root, goalID)) + return append(lines, "") + } + lines = append(lines, verifiedEvidenceDashboardLines(root, goalID)...) + return append(lines, "") +} + func appendStatusReviewFindings(lines []string, root, goalID string, derived goalState) []string { if !isFailedFinishGateReason(derived.Reason) { return lines diff --git a/internal/app/doctor.go b/internal/app/doctor.go index 6d164e2..aa2a583 100644 --- a/internal/app/doctor.go +++ b/internal/app/doctor.go @@ -69,6 +69,7 @@ func doctorHyper(fsys fsRoot) (commandOutput, *hyperError) { checks = append(checks, doctorStateChecks(root)...) checks = append(checks, doctorGrowthMigrationCheck(root)) checks = append(checks, doctorReadinessStateCheck(root)) + checks = append(checks, doctorVerifiedEvidenceCheck(root)) checks = append(checks, doctorNextPacketPlanCheck(root)) checks = append(checks, doctorSignatureCheck()) checks = append(checks, doctorDBCheck(root)) @@ -516,6 +517,8 @@ func doctorActionForCheck(check doctorCheck) string { return "Run `hyper repair`, then run `hyper doctor` again." } return "Run `hyper migrate`, then run `hyper doctor` again." + case "verified evidence": + return "Inspect the failed Verified Evidence record, fix the command or implementation, then rerun `hyper verify -- `." case "signature verification": return "Install `cosign` or unset `HYPER_RUN_VERIFY_SIGNATURE`." case "sqlite": diff --git a/internal/app/finish_gate.go b/internal/app/finish_gate.go index c098752..45cd056 100644 --- a/internal/app/finish_gate.go +++ b/internal/app/finish_gate.go @@ -46,7 +46,7 @@ func runFinishGate(root string, state projectState, derived goalState, readiness result.Status = "failed" result.Findings = append(result.Findings, "Runtime packet is not completed yet: "+derived.Reason) } - if !hasNonPendingSection(evidenceText, "Validation") { + if !hasNonPendingSection(evidenceText, "Validation") && !goalHasPassedVerifiedEvidence(root, state.CurrentGoalID) { result.Status = "failed" result.Findings = append(result.Findings, "Add concrete command, smoke, browser, or manual validation output under `## Validation`.") } @@ -54,11 +54,11 @@ func runFinishGate(root string, state projectState, derived goalState, readiness result.Status = "failed" result.Findings = append(result.Findings, "Add the next recommended runtime episode under `## Recommended Next Goal` in `next.md`.") } - if finding := readinessFinishGateFinding(state, evidenceText, readiness); finding != "" { + if finding := readinessFinishGateFinding(root, state, evidenceText, readiness); finding != "" { result.Status = "failed" result.Findings = append(result.Findings, finding) } - if finding := activeCapabilityFinishGateFinding(root, evidenceText); finding != "" { + if finding := activeCapabilityFinishGateFinding(root, state.CurrentGoalID, evidenceText); finding != "" { result.Status = "failed" result.Findings = append(result.Findings, finding) } @@ -128,13 +128,14 @@ func isFailedFinishGateReason(reason string) bool { return strings.Contains(strings.ToLower(strings.TrimSpace(reason)), "finish gate failed") } -func readinessFinishGateFinding(state projectState, evidenceText string, readiness readinessState) string { +func readinessFinishGateFinding(root string, state projectState, evidenceText string, readiness readinessState) string { axis := strings.TrimSpace(readiness.NextPressure.Axis) axisName := strings.TrimSpace(readiness.NextPressure.AxisName) if axis == "" || axisName == "" || axis == "stage_advancement" || axis == "product_completeness" || axis == "reference_benchmark" { return "" } records := readinessEvidenceRecordsFromGoalText(state.CurrentGoalID, evidenceText) + records = append(records, verifiedReadinessEvidenceRecords(root, state.CurrentGoalID, readinessDimensionDefs())...) if axis == "sustained_quality" { for _, record := range records { if record.Axis == axis { @@ -211,7 +212,7 @@ func readinessFinishGateHint(axis string) string { } } -func activeCapabilityFinishGateFinding(root, evidenceText string) string { +func activeCapabilityFinishGateFinding(root, goalID, evidenceText string) string { capabilities, err := activeCapabilities(root) if err != nil || len(capabilities) == 0 { return "" @@ -225,6 +226,9 @@ func activeCapabilityFinishGateFinding(root, evidenceText string) string { if activeValidatorValidationCovers(capability, evidenceText) { continue } + if activeValidatorVerifiedEvidenceCovers(root, goalID, capability) { + continue + } missing = append(missing, capability.Name) } if len(missing) == 0 { diff --git a/internal/app/goal_state.go b/internal/app/goal_state.go index 339ffaa..53832f7 100644 --- a/internal/app/goal_state.go +++ b/internal/app/goal_state.go @@ -11,10 +11,14 @@ func deriveCurrentGoalState(root, goalID string) goalState { return goalState{State: "initialized", Reason: "No current runtime packet recorded."} } goalDir := filepath.Join(root, hyperDir, "goals", goalID) - return deriveGoalState(readIfExists(filepath.Join(goalDir, "evidence.md")), readIfExists(filepath.Join(goalDir, "next.md"))) + return deriveGoalStateWithVerified(readIfExists(filepath.Join(goalDir, "evidence.md")), readIfExists(filepath.Join(goalDir, "next.md")), goalHasPassedVerifiedEvidence(root, goalID)) } func deriveGoalState(evidenceText, nextText string) goalState { + return deriveGoalStateWithVerified(evidenceText, nextText, false) +} + +func deriveGoalStateWithVerified(evidenceText, nextText string, hasVerifiedEvidence bool) goalState { if status := firstNonBlank(explicitStatus(evidenceText), explicitStatus(nextText)); status != "" { reason := firstNonBlank(firstLabelValue(evidenceText, "Reason"), firstLabelValue(nextText, "Reason"), "Explicit status marker: "+status) return goalState{State: status, Reason: reason} @@ -23,7 +27,7 @@ func deriveGoalState(evidenceText, nextText string) goalState { if len(blockers) > 0 { return goalState{State: "blocked", Reason: firstNonBlank(blockers[0], "Blocker section is populated.")} } - if hasNonPendingSection(nextText, "Recommended Next Goal") && hasNonPendingSection(evidenceText, "Validation") { + if hasNonPendingSection(nextText, "Recommended Next Goal") && (hasNonPendingSection(evidenceText, "Validation") || hasVerifiedEvidence) { if surfaceProofFollowupRequiredFromEvidence(evidenceText) { return goalState{State: "completed", Reason: "Evidence and next recommendation are populated; surface proof follow-up is needed."} } diff --git a/internal/app/layout.go b/internal/app/layout.go index 1b266d3..4ed44a2 100644 --- a/internal/app/layout.go +++ b/internal/app/layout.go @@ -30,6 +30,7 @@ func ensureProjectLayout(root string) *hyperError { ".hyper/logs", ".hyper/goals", ".hyper/memories", + ".hyper/verified-evidence", ".hyper/skills/generated", ".hyper/agents/candidates", ".hyper/agents/active", @@ -83,7 +84,7 @@ func agentsHyperRunSection() string { "", "## Hyper Run", "", - "When the user writes `$hyper`, `$hyper run`, `$hyper-run`, `$hyper status`, `$hyper status --short`, `$hyper migrate`, `$hyper advance`, `$hyper doctor`, `hyper run`, or asks Hyper Run to continue the project, treat it as a project workflow command inside the current Codex session.", + "When the user writes `$hyper`, `$hyper run`, `$hyper-run`, `$hyper status`, `$hyper status --short`, `$hyper verify`, `$hyper migrate`, `$hyper advance`, `$hyper doctor`, `hyper run`, or asks Hyper Run to continue the project, treat it as a project workflow command inside the current Codex session.", "", "Use `.agents/skills/hyper/SKILL.md` as the thin Codex Desktop router. Keep product judgment, execution state, learning, and generated project knowledge in `plan.md`, `.hyper/`, and the `hyper` CLI rather than in static skill text.", "", @@ -101,8 +102,8 @@ func agentsHyperRunSection() string { "2. Read the generated runtime packet path from the CLI output, or read `.hyper/state.json` and use `current_goal_path`.", "3. Read `.hyper/goals//goal.md` and `.hyper/goals//tasks.md`.", "4. Implement the smallest coherent step that satisfies the current episode.", - "5. Run the safest available validation or record why validation is blocked.", - "6. Update `.hyper/goals//evidence.md` with validation output, readiness evidence, active capability evidence, pressure signals, changed files, decisions, reusable patterns, and blockers.", + "5. Run the safest available validation or record why validation is blocked; prefer `hyper verify -- ` for repeatable command proof.", + "6. Update `.hyper/goals//evidence.md` with validation output or Verified Evidence IDs, readiness evidence, active capability evidence, pressure signals, changed files, decisions, reusable patterns, and blockers.", "7. Write `.hyper/goals//next.md` with the next recommended runtime episode and Learn Notes.", "8. Run the agent finish gate with `hyper complete`; if it fails, fix the same packet using `review.md` before continuing.", "9. In auto mode, read `.hyper/next-packet.md`, obey its Guard and Progress Guard, and continue only through the planned next command: `run` continues, `advance` requires Stage Advancement Review authorization or user acceptance, `complete-current` fixes review.md/evidence.md/next.md in the same packet, and `stop` reports the stop reason and waits.", @@ -154,7 +155,7 @@ func hyperRouterSkillGuide() string { return strings.Join([]string{ "---", "name: hyper", - "description: Thin Codex Desktop router for Hyper Run. Use when the user says $hyper, $hyper run, $hyper init, $hyper status, $hyper status --short, $hyper migrate, $hyper advance, $hyper doctor, $hyper resume, hyper run, or asks Hyper Run to continue the current project.", + "description: Thin Codex Desktop router for Hyper Run. Use when the user says $hyper, $hyper run, $hyper init, $hyper status, $hyper status --short, $hyper verify, $hyper migrate, $hyper advance, $hyper doctor, $hyper resume, hyper run, or asks Hyper Run to continue the current project.", "---", "", "# Hyper Router", @@ -187,6 +188,7 @@ func hyperRouterSkillGuide() string { "- `$hyper run [focus]`: run `hyper run [focus]`; if `plan.md` has `Target Stage`, plain `hyper run` uses it as the guarded auto target until that target stage's readiness proof is complete. Read the generated runtime packet, implement it in the current Codex session, update `evidence.md`, and write `next.md`.", "- `$hyper run --auto --until [focus]`: run `hyper run --auto --until [focus]` as an explicit target override, then continue packet by packet using `.hyper/next-packet.md` until the target stage proof is complete or a guard stops progress.", "- `$hyper complete`: advanced/recovery command. Run it only as the agent finish gate after evidence and next notes are written so project readiness is refreshed.", + "- `$hyper verify -- `: run repeatable validation through the CLI so exit code, log hashes, commit SHA, worktree status hash, goal ID, and run ID are recorded under `.hyper/verified-evidence/`.", "- `$hyper status`: run `hyper status` and use the dashboard to decide whether the agent should finish the packet, repair, advance, migrate, or start the next packet.", "- `$hyper status --short`: run `hyper status --short` when the user wants only the current stage, gate, proof, and next action.", "- `$hyper migrate`: run `hyper migrate` after CLI updates or when growth state/candidates look stale; then check `hyper status --short`.", @@ -199,8 +201,8 @@ func hyperRouterSkillGuide() string { "1. Run a CLI command only when a new or resumed runtime packet is needed; if `plan.md` has `Target Stage`, plain `hyper run` uses it as the guarded auto target until that target stage's readiness proof is complete.", "2. Read the generated runtime packet in `goal.md` and the checklist in `tasks.md` before editing project files.", "3. Keep implementation scoped to the current runtime episode.", - "4. Run the safest available validation, or record why validation is blocked.", - "5. Update the active runtime packet's `evidence.md` with changed files, validation output, readiness evidence, active capability evidence, pressure signals, decisions, reusable patterns, and blockers.", + "4. Run the safest available validation, or record why validation is blocked. Prefer `hyper verify -- ` when a real command can prove the behavior.", + "5. Update the active runtime packet's `evidence.md` with changed files, validation output or Verified Evidence IDs, readiness evidence, active capability evidence, pressure signals, decisions, reusable patterns, and blockers.", "6. Write the active runtime packet's `next.md` with the next recommended runtime episode and Learn Notes.", "7. Run the agent finish gate with `hyper complete`; if it fails, fix the same packet using `review.md` before continuing.", "8. In auto mode, read `.hyper/next-packet.md`, obey its Guard and Progress Guard, and continue only through the planned next command: `run` continues, `advance` requires Stage Advancement Review authorization or user acceptance, `complete-current` fixes review.md/evidence.md/next.md in the same packet, and `stop` reports the stop reason and waits.", @@ -228,8 +230,8 @@ func hyperRunSkillGuide() string { "- Run `hyper run --auto --until [focus]` when the user wants to override the plan target.", "- Read the generated runtime packet at `.hyper/goals//goal.md` and `tasks.md` before implementation.", "- Implement the work directly in the current Codex session.", - "- Run the safest available validation or record why validation is blocked.", - "- Update `evidence.md` with validation output, readiness evidence, active capability evidence, pressure signals, changed files, decisions, reusable patterns, and blockers.", + "- Run the safest available validation or record why validation is blocked; prefer `hyper verify -- ` for repeatable command proof.", + "- Update `evidence.md` with validation output or Verified Evidence IDs, readiness evidence, active capability evidence, pressure signals, changed files, decisions, reusable patterns, and blockers.", "- Write `next.md` with the next recommended runtime episode and Learn Notes.", "- Run `hyper complete` internally as the agent finish gate after evidence and next notes are written; if it fails, fix the same packet using `review.md`.", "- In auto mode, read `.hyper/next-packet.md`, obey its Guard and Progress Guard, and continue through the planned command until a guard stops progress.", @@ -282,8 +284,8 @@ func codexDesktopGuide() string { "2. Read the runtime packet path from stdout, or read `.hyper/state.json` and use `current_goal_path`.", "3. Read the generated `goal.md` runtime packet and `tasks.md` checklist.", "4. Work checkpoint by checkpoint toward the current episode.", - "5. Run the smallest safe validation available.", - "6. Update `evidence.md` with validation output, readiness evidence, active capability evidence, pressure signals, changed files, decisions, reusable patterns, and blockers.", + "5. Run the smallest safe validation available; prefer `hyper verify -- ` for repeatable command proof.", + "6. Update `evidence.md` with validation output or Verified Evidence IDs, readiness evidence, active capability evidence, pressure signals, changed files, decisions, reusable patterns, and blockers.", "7. Update `next.md` with the next recommended runtime episode and Learn Notes.", "8. Run the agent finish gate with `hyper complete`; if it fails, fix the same packet using `review.md` before continuing.", "9. In auto mode, read `.hyper/next-packet.md`, obey its Guard and Progress Guard, and continue only through the planned next command: `run` continues, `advance` requires Stage Advancement Review authorization or user acceptance, `complete-current` fixes review.md/evidence.md/next.md in the same packet, and `stop` reports the stop reason and waits.", @@ -311,7 +313,7 @@ func codexDesktopGuide() string { } func hyperRunCommandGuide() string { - return "# $hyper run\n\nMeaning: create the next Hyper Run runtime packet, execute the current episode, record evidence, capture Learn signals, and let repeated pressure become future structure. If `plan.md` has `Target Stage`, plain `hyper run` uses it as the guarded auto target until that target stage's readiness proof is complete and remains the continuation command in `.hyper/next-packet.md` until a guard stops progress.\n\nGrowth order: " + growthLoopDefinition + "\n\nPrinciples: " + growthPrinciplesLine() + "\n\nRequired flow:\n\n1. Execute `hyper run [focus]`.\n2. Open the generated runtime packet under `.hyper/goals//`.\n3. Implement the smallest coherent step that satisfies the current episode in `goal.md`.\n4. Mark real progress in `evidence.md`, including validation, readiness evidence, active capability evidence, pressure signals, decisions, reusable patterns, and blockers.\n5. Write the next recommended runtime episode and Learn Notes in `next.md`.\n6. Run `hyper complete` internally as the agent finish gate to close the packet and refresh Learn, Growth, and Readiness.\n7. Read `.hyper/next-packet.md`, obey its Guard and Progress Guard, and continue only through the planned command: if it says `run`, continue only through that command; if it says `advance`, continue only when the active auto target authorizes it after the Stage Advancement Review, otherwise wait for user review; if it says `complete-current`, fix review.md/evidence.md/next.md in the same packet and rerun the agent finish gate; if it says `stop`, report the stop reason from `.hyper/next-packet.md` and wait.\n\nCompletion requires implementation evidence, Learn signals where applicable, a next recommendation, and a completed Hyper packet.\n" + return "# $hyper run\n\nMeaning: create the next Hyper Run runtime packet, execute the current episode, record evidence, capture Learn signals, and let repeated pressure become future structure. If `plan.md` has `Target Stage`, plain `hyper run` uses it as the guarded auto target until that target stage's readiness proof is complete and remains the continuation command in `.hyper/next-packet.md` until a guard stops progress.\n\nGrowth order: " + growthLoopDefinition + "\n\nPrinciples: " + growthPrinciplesLine() + "\n\nRequired flow:\n\n1. Execute `hyper run [focus]`.\n2. Open the generated runtime packet under `.hyper/goals//`.\n3. Implement the smallest coherent step that satisfies the current episode in `goal.md`.\n4. Prefer `hyper verify -- ` for repeatable command proof, then mark real progress in `evidence.md`, including validation or Verified Evidence IDs, readiness evidence, active capability evidence, pressure signals, decisions, reusable patterns, and blockers.\n5. Write the next recommended runtime episode and Learn Notes in `next.md`.\n6. Run `hyper complete` internally as the agent finish gate to close the packet and refresh Learn, Growth, and Readiness.\n7. Read `.hyper/next-packet.md`, obey its Guard and Progress Guard, and continue only through the planned command: if it says `run`, continue only through that command; if it says `advance`, continue only when the active auto target authorizes it after the Stage Advancement Review, otherwise wait for user review; if it says `complete-current`, fix review.md/evidence.md/next.md in the same packet and rerun the agent finish gate; if it says `stop`, report the stop reason from `.hyper/next-packet.md` and wait.\n\nCompletion requires implementation evidence, Learn signals where applicable, a next recommendation, and a completed Hyper packet.\n" } func ensureMemoryFiles(root string) *hyperError { diff --git a/internal/app/main_test.go b/internal/app/main_test.go index 9a3dde8..f2618ad 100644 --- a/internal/app/main_test.go +++ b/internal/app/main_test.go @@ -28,10 +28,12 @@ func TestInitCreatesProjectStateAndRules(t *testing.T) { assertContains(t, readFile(t, filepath.Join(root, ".hyper", "logs", "project.jsonl")), "project_initialized") assertContains(t, readFile(t, filepath.Join(root, "AGENTS.md")), "$hyper run") assertContains(t, readFile(t, filepath.Join(root, "AGENTS.md")), "$hyper status --short") + assertContains(t, readFile(t, filepath.Join(root, "AGENTS.md")), "$hyper verify") assertContains(t, readFile(t, filepath.Join(root, "AGENTS.md")), "$hyper migrate") assertContains(t, readFile(t, filepath.Join(root, ".agents", "skills", "hyper", "SKILL.md")), "name: hyper") assertContains(t, readFile(t, filepath.Join(root, ".agents", "skills", "hyper", "SKILL.md")), "compatibility shim") assertContains(t, readFile(t, filepath.Join(root, ".agents", "skills", "hyper", "SKILL.md")), "$hyper status --short") + assertContains(t, readFile(t, filepath.Join(root, ".agents", "skills", "hyper", "SKILL.md")), "$hyper verify") assertContains(t, readFile(t, filepath.Join(root, ".agents", "skills", "hyper", "SKILL.md")), "$hyper migrate") assertContains(t, readFile(t, filepath.Join(root, ".agents", "skills", "hyper-run", "SKILL.md")), "name: hyper-run") assertContains(t, readFile(t, filepath.Join(root, ".agents", "skills", "hyper-run", "SKILL.md")), "hyper run") @@ -41,6 +43,9 @@ func TestInitCreatesProjectStateAndRules(t *testing.T) { assertContains(t, readFile(t, filepath.Join(root, ".hyper", "growth", "state.json")), `"pressure_ledger"`) assertContains(t, readFile(t, filepath.Join(root, ".hyper", "growth", "state.json")), `"No structure before pressure."`) assertContains(t, readFile(t, filepath.Join(root, ".hyper", "readiness", "state.json")), `"version": 1`) + if !exists(filepath.Join(root, ".hyper", "verified-evidence")) { + t.Fatal("expected verified evidence directory to be created") + } } func TestInitRejectsInvalidPlanCurrentStageBeforeStateWrite(t *testing.T) { @@ -169,6 +174,7 @@ func TestSubcommandHelpDoesNotError(t *testing.T) { }{ {args("run", "--help"), "Usage:\n hyper run [--auto] [--until stage] [focus]"}, {args("status", "--help"), "Usage:\n hyper status\n hyper status --short"}, + {args("verify", "--help"), "Usage:\n hyper verify [--axis axis] [--name name] -- [args...]"}, {args("update", "--help"), "Usage:\n hyper update [source]"}, } { out, err := runCLI(tc.args, testRoot(t.TempDir()), fakeUpdater{}) @@ -179,6 +185,112 @@ func TestSubcommandHelpDoesNotError(t *testing.T) { } } +func TestVerifyCommandRecordsExecutionMetadata(t *testing.T) { + root := t.TempDir() + mustInitWithPlan(t, root, "Verified Evidence CLI", "Record real validation commands") + mustRun(t, root, "run", "Create a verified evidence record") + + out, err := runCLI(args("verify", "--axis", "validation_coverage", "--name", "go version smoke", "--", "go", "version"), testRoot(root), fakeUpdater{}) + if err != nil { + t.Fatalf("verify failed: %v", err) + } + + assertContains(t, out.Stdout, "Verified evidence: VE-0001") + assertContains(t, out.Stdout, "Status: passed") + assertContains(t, out.Stdout, "Exit code: 0") + assertContains(t, out.Stdout, "Command: go version") + assertContains(t, out.Stdout, "Goal: GOAL-0001") + assertContains(t, out.Stdout, "Record: .hyper/verified-evidence/VE-0001.json") + recordBody := readFile(t, filepath.Join(root, hyperDir, "verified-evidence", "VE-0001.json")) + assertContains(t, recordBody, `"id": "VE-0001"`) + assertContains(t, recordBody, `"type": "verified_command"`) + assertContains(t, recordBody, `"status": "passed"`) + assertContains(t, recordBody, `"axis": "validation_coverage"`) + assertContains(t, recordBody, `"goal_id": "GOAL-0001"`) + assertContains(t, recordBody, `"run_id": "RUN-0001"`) + assertContains(t, recordBody, `"exit_code": 0`) + assertContains(t, recordBody, `"stdout_sha256"`) + assertContains(t, recordBody, `"stderr_sha256"`) + assertContains(t, recordBody, `"commit_sha"`) + assertContains(t, recordBody, `"worktree_status_sha256"`) + assertContains(t, recordBody, `"command": [`) + assertContains(t, readFile(t, filepath.Join(root, hyperDir, "verified-evidence", "VE-0001.stdout.txt")), "go version") + assertContains(t, readFile(t, filepath.Join(root, hyperDir, "logs", "verified-evidence.jsonl")), `"type":"verified_command"`) + assertContains(t, readFile(t, filepath.Join(root, hyperDir, "logs", "RUN-0001.jsonl")), `"type":"verified_command"`) +} + +func TestFinishGateAcceptsVerifiedCommandEvidence(t *testing.T) { + root := t.TempDir() + mustInitWithPlan(t, root, "Verified Finish Gate", "Close packets with machine-recorded command proof") + mustRun(t, root, "run", "Record core CLI proof") + if _, err := runCLI(args("verify", "--axis", "core_ux", "--name", "primary CLI smoke", "--", "go", "version"), testRoot(root), fakeUpdater{}); err != nil { + t.Fatalf("verify failed: %v", err) + } + goalDir := filepath.Join(root, hyperDir, "goals", "GOAL-0001") + writeFile(t, filepath.Join(goalDir, "evidence.md"), strings.Join([]string{ + "# GOAL-0001 Evidence", + "", + "## Validation", + "", + "Pending.", + "", + "## Readiness Evidence", + "", + "Core UX: Pending.", + "", + "## Blocker", + "", + "None blocking.", + }, "\n")) + writeFile(t, filepath.Join(goalDir, "next.md"), "# GOAL-0001 Next\n\n## Recommended Next Goal\n\nRun the next focused packet after verified evidence has been accepted.\n") + + out, err := runCLI(args("complete"), testRoot(root), fakeUpdater{}) + if err != nil { + t.Fatalf("complete should accept verified evidence: %v", err) + } + assertContains(t, out.Stdout, "Finish gate: passed") + assertContains(t, readFile(t, filepath.Join(goalDir, "review.md")), "Status: passed") +} + +func TestStatusShowsVerifiedEvidenceForCurrentPacket(t *testing.T) { + root := t.TempDir() + mustInitWithPlan(t, root, "Verified Status", "Show verified evidence in status") + mustRun(t, root, "run", "Create a packet with verified evidence") + writeVerifiedEvidenceFixture(t, root, "VE-0001", "GOAL-0001", "passed", "go test ./...", 0) + writeVerifiedEvidenceFixture(t, root, "VE-0002", "GOAL-0001", "failed", "git diff --check", 2) + + short, err := runCLI(args("status", "--short"), testRoot(root), fakeUpdater{}) + if err != nil { + t.Fatalf("status --short failed: %v", err) + } + assertContains(t, short.Stdout, "Verified Evidence: GOAL-0001 2 record(s); passed 1, failed 1; newest VE-0002 failed `git diff --check` exit 2") + + full, err := runCLI(args("status"), testRoot(root), fakeUpdater{}) + if err != nil { + t.Fatalf("status failed: %v", err) + } + assertContains(t, full.Stdout, "Verified Evidence:") + assertContains(t, full.Stdout, " Current packet: GOAL-0001") + assertContains(t, full.Stdout, " Records: 2 total, 1 passed, 1 failed") + assertContains(t, full.Stdout, " Newest: VE-0002 failed `git diff --check` exit 2") + assertContains(t, full.Stdout, " Latest failure: VE-0002 failed `git diff --check` exit 2") +} + +func TestDoctorWarnsOnFailedVerifiedEvidence(t *testing.T) { + root := t.TempDir() + mustInitWithPlan(t, root, "Verified Doctor", "Show verified evidence in doctor") + mustRun(t, root, "run", "Create a packet with failed verified evidence") + writeVerifiedEvidenceFixture(t, root, "VE-0001", "GOAL-0001", "passed", "go test ./...", 0) + writeVerifiedEvidenceFixture(t, root, "VE-0002", "GOAL-0001", "failed", "git diff --check", 2) + + doctor, err := runCLI(args("doctor"), testRoot(root), fakeUpdater{}) + if err != nil { + t.Fatalf("doctor failed: %v", err) + } + assertContains(t, doctor.Stdout, "[WARN] Verified Evidence: GOAL-0001 records=2 passed=1 failed=1; newest VE-0002 failed `git diff --check` exit 2") + assertContains(t, doctor.Stdout, "Inspect the failed Verified Evidence record, fix the command or implementation, then rerun `hyper verify -- `.") +} + func TestInitRejectsObjectiveArgument(t *testing.T) { root := t.TempDir() _, err := runCLI(args("init", "Build a tiny CRM MVP"), testRoot(root), fakeUpdater{}) @@ -335,6 +447,8 @@ func TestRunCreatesGoalAfterInit(t *testing.T) { assertContains(t, evidence, "## Product Satisfaction Evidence") assertContains(t, evidence, "- Target-user fit: Pending.") assertContains(t, evidence, "- Verdict: Pending. Use pass or fail.") + assertContains(t, evidence, "## Verified Evidence") + assertContains(t, evidence, "Prefer `hyper verify -- `") assertContains(t, evidence, "## Readiness Evidence") assertContains(t, evidence, "Core UX: Pending.") tasks := readFile(t, filepath.Join(root, ".hyper", "goals", "GOAL-0001", "tasks.md")) @@ -1615,6 +1729,24 @@ func TestCompleteAcceptsValidationOutputForActiveValidator(t *testing.T) { assertContains(t, out.Stdout, "Finish gate: passed") } +func TestCompleteAcceptsVerifiedEvidenceForActiveValidator(t *testing.T) { + root := t.TempDir() + mustInitWithPlan(t, root, "Tiny CLI", "Build a tiny CLI MVP") + mustRun(t, root, "run") + writeFile(t, filepath.Join(root, ".hyper", "capabilities", "active", "validator", "validator-go-version.md"), "# validator-go-version\n\nStatus: active\nKind: validator\nSignal: Run `go version` before completing packets.\n") + if _, err := runCLI(args("verify", "--axis", "core_ux", "--name", "go version smoke", "--", "go", "version"), testRoot(root), fakeUpdater{}); err != nil { + t.Fatalf("verify failed: %v", err) + } + writeFile(t, filepath.Join(root, ".hyper", "goals", "GOAL-0001", "evidence.md"), "# GOAL-0001 Evidence\n\n## Validation\n\nPending.\n\n## Readiness Evidence\n\nCore UX: Pending.\n\n## Active Capability Evidence\n\nvalidator-go-version: Pending. Required behavior: Run `go version` before completing packets.\n\n## Blocker\n\nNone blocking.\n") + writeFile(t, filepath.Join(root, ".hyper", "goals", "GOAL-0001", "next.md"), "# GOAL-0001 Next\n\n## Recommended Next Goal\n\nReview the next focused quality packet.\n") + + out, err := runCLI(args("complete"), testRoot(root), fakeUpdater{}) + if err != nil { + t.Fatalf("verified evidence should satisfy active validator proof: %v", err) + } + assertContains(t, out.Stdout, "Finish gate: passed") +} + func TestCompleteRejectsFailedValidationOutputForActiveValidator(t *testing.T) { root := t.TempDir() mustInitWithPlan(t, root, "Tiny CLI", "Build a tiny CLI MVP") @@ -4600,19 +4732,21 @@ func TestNextPacketProgressGuardExplainsAutoActions(t *testing.T) { } func TestOpenFailureFinishGateAcceptsClosureEvidence(t *testing.T) { + root := t.TempDir() evidence := "# GOAL-0003 Evidence\n\n## Validation\n\n`go test ./...` passed and covers file write failure handling.\n\n## Readiness Evidence\n\nError handling: File write failures are returned from `Store.Add`, failed writes are rolled back from memory, and API save failures return HTTP 500.\n\n## Blocker\n\nNone blocking.\n" readiness := readinessState{NextPressure: readinessPressure{Axis: "open_failure", AxisName: "Open failure"}} - if finding := readinessFinishGateFinding(projectState{CurrentGoalID: "GOAL-0003"}, evidence, readiness); finding != "" { + if finding := readinessFinishGateFinding(root, projectState{CurrentGoalID: "GOAL-0003"}, evidence, readiness); finding != "" { t.Fatalf("expected open failure closure evidence to pass, got %q", finding) } weak := "# GOAL-0003 Evidence\n\n## Validation\n\n`go test ./...` passed.\n\n## Readiness Evidence\n\nValidation coverage: tests passed.\n\n## Blocker\n\nNone blocking.\n" - if finding := readinessFinishGateFinding(projectState{CurrentGoalID: "GOAL-0003"}, weak, readiness); finding == "" { + if finding := readinessFinishGateFinding(root, projectState{CurrentGoalID: "GOAL-0003"}, weak, readiness); finding == "" { t.Fatal("expected weak open failure closure evidence to fail") } } func TestReadinessFinishGateFindingShowsOtherGateGaps(t *testing.T) { + root := t.TempDir() evidence := "# GOAL-0003 Evidence\n\n## Validation\n\n`go test ./...` passed.\n\n## Readiness Evidence\n\nValidation coverage: `go test ./...` passed and is repeatable.\n\n## Blocker\n\nNone blocking.\n" readiness := readinessState{ NextPressure: readinessPressure{Axis: "security_baseline", AxisName: "Security baseline"}, @@ -4623,7 +4757,7 @@ func TestReadinessFinishGateFindingShowsOtherGateGaps(t *testing.T) { }}, } - finding := readinessFinishGateFinding(projectState{CurrentGoalID: "GOAL-0003"}, evidence, readiness) + finding := readinessFinishGateFinding(root, projectState{CurrentGoalID: "GOAL-0003"}, evidence, readiness) assertContains(t, finding, "Add covered readiness evidence for `Security baseline`") assertContains(t, finding, "Other current gate gaps:") assertContains(t, finding, "Deployment readiness: The project is not yet proven runnable outside the local development path.") @@ -6663,6 +6797,37 @@ func writeFile(t *testing.T, path, body string) { } } +func writeVerifiedEvidenceFixture(t *testing.T, root, id, goalID, status, command string, exitCode int) { + t.Helper() + dir := filepath.Join(root, hyperDir, "verified-evidence") + if err := os.MkdirAll(dir, 0755); err != nil { + t.Fatal(err) + } + record := verifiedEvidenceRecord{ + ID: id, + Type: verifiedEvidenceEventType, + Status: status, + Axis: "validation_coverage", + Name: command, + Command: strings.Fields(command), + CommandLine: command, + GoalID: goalID, + RunID: strings.Replace(goalID, "GOAL", "RUN", 1), + ExitCode: exitCode, + RecordPath: displayRelPath(hyperDir, "verified-evidence", id+".json"), + StdoutPath: displayRelPath(hyperDir, "verified-evidence", id+".stdout.txt"), + StderrPath: displayRelPath(hyperDir, "verified-evidence", id+".stderr.txt"), + RecordedBy: "test", + } + body, err := json.MarshalIndent(record, "", " ") + if err != nil { + t.Fatal(err) + } + writeFile(t, filepath.Join(dir, id+".json"), string(body)+"\n") + writeFile(t, filepath.Join(dir, id+".stdout.txt"), "") + writeFile(t, filepath.Join(dir, id+".stderr.txt"), "") +} + func insertTestMemory(t *testing.T, db *sql.DB, kind, text string) { t.Helper() confidence := 0.8 diff --git a/internal/app/plan.go b/internal/app/plan.go index 2e465a4..feea0f5 100644 --- a/internal/app/plan.go +++ b/internal/app/plan.go @@ -1325,7 +1325,7 @@ func buildTasksDoc(goalID, buildStyle, stage string, readiness readinessState, g } func buildEvidenceDoc(goalID, stage string, readiness readinessState, growth growthState) string { - return fmt.Sprintf("# %s Evidence\n\n## Decision Hierarchy Evidence\n\n- Safety boundary: Pending.\n- Product intent: Pending.\n- Evidence gap: Pending.\n- Smallest step: Pending.\n- Validation proof: Pending.\n- Learning signal: Pending.\n\n## Autonomous Work Evidence\n\n- Research questions: Pending.\n- Research evidence: Pending.\n- Chosen implementation step: Pending.\n- Validation plan: Pending.\n- Harness pressure: Pending.\n- Progress guard: Pending.\n\n## Autonomous Safety Evidence\n\n- Classification: Pending. Use self-directed, approval-required, or blocked.\n- Boundary: Pending.\n- Approval needed: Pending.\n- Fallback or stop condition: Pending.\n\n## Capability Expansion Evidence\n\n- Reused validation: Pending.\n- Pressure recorded: Pending.\n- Candidate status change: Pending.\n- Harness decision: Pending.\n- Active capability requirement: Pending.\n\n## Research Evidence Ledger\n\n- Question: Pending.\n- Source: Pending.\n- Finding: Pending.\n- Changed: Pending. State chosen step, validation plan, stop condition, safety boundary, readiness evidence, or capability pressure.\n- Stored as Learn signal: Pending. Use yes/no and explain only when durable.\n\n## Loop Progress Evidence\n\n- Progress signal: Pending. Use code, validation evidence, readiness evidence, active capability signal, clearer blocker, or changed next step.\n- Repeated loop risk: Pending.\n- Continue decision: Pending. Use continue, complete-current, stop, or blocked.\n- Next-step change: Pending.\n\n## Product Satisfaction Evidence\n\n- Target-user fit: Pending.\n- Core loop quality: Pending.\n- Clarity and friction: Pending.\n- No drift: Pending.\n- Validation match: Pending.\n- Verdict: Pending. Use pass or fail.\n\n## Validation\n\nPending.\n\n## Readiness Evidence\n\n%s\n\n## Surface Proof Evidence\n\n- Target surface: Pending.\n- Primary user action: Pending.\n- States checked: Pending.\n- Viewports: Pending.\n- Evidence: Pending.\n- Surface risks or gaps: Pending.\n\n%s%s\n## Active Capability Evidence\n\n%s\n\n## Pressure Signals\n\nPending.\n\n## Changed Files\n\nPending.\n\n## Decisions\n\nPending.\n\n## Reusable Patterns\n\nPending.\n\n## Learn Quality Gate\n\n- Keep as memory only if it should change future work boundary, validation, stop conditions, readiness, or capability candidates.\n- Do not record one-off progress, file lists, generic summaries, or \"none\" statements as Learn signals.\n\n## Blocker\n\nPending.\n\n## Notes\n\nPending.\n", goalID, readinessEvidenceTemplate(readiness), referenceBenchmarkEvidenceTemplate(stage, readiness), selfReviewEvidenceTemplate(stage, readiness), activeCapabilityEvidenceTemplate(growth)) + return fmt.Sprintf("# %s Evidence\n\n## Decision Hierarchy Evidence\n\n- Safety boundary: Pending.\n- Product intent: Pending.\n- Evidence gap: Pending.\n- Smallest step: Pending.\n- Validation proof: Pending.\n- Learning signal: Pending.\n\n## Autonomous Work Evidence\n\n- Research questions: Pending.\n- Research evidence: Pending.\n- Chosen implementation step: Pending.\n- Validation plan: Pending.\n- Harness pressure: Pending.\n- Progress guard: Pending.\n\n## Autonomous Safety Evidence\n\n- Classification: Pending. Use self-directed, approval-required, or blocked.\n- Boundary: Pending.\n- Approval needed: Pending.\n- Fallback or stop condition: Pending.\n\n## Capability Expansion Evidence\n\n- Reused validation: Pending.\n- Pressure recorded: Pending.\n- Candidate status change: Pending.\n- Harness decision: Pending.\n- Active capability requirement: Pending.\n\n## Research Evidence Ledger\n\n- Question: Pending.\n- Source: Pending.\n- Finding: Pending.\n- Changed: Pending. State chosen step, validation plan, stop condition, safety boundary, readiness evidence, or capability pressure.\n- Stored as Learn signal: Pending. Use yes/no and explain only when durable.\n\n## Loop Progress Evidence\n\n- Progress signal: Pending. Use code, validation evidence, readiness evidence, active capability signal, clearer blocker, or changed next step.\n- Repeated loop risk: Pending.\n- Continue decision: Pending. Use continue, complete-current, stop, or blocked.\n- Next-step change: Pending.\n\n## Product Satisfaction Evidence\n\n- Target-user fit: Pending.\n- Core loop quality: Pending.\n- Clarity and friction: Pending.\n- No drift: Pending.\n- Validation match: Pending.\n- Verdict: Pending. Use pass or fail.\n\n## Validation\n\nPending.\n\n## Verified Evidence\n\nPending. Prefer `hyper verify -- ` for repeatable command validation so exit code, log hashes, commit SHA, worktree status hash, and command metadata are recorded by the runtime.\n\n## Readiness Evidence\n\n%s\n\n## Surface Proof Evidence\n\n- Target surface: Pending.\n- Primary user action: Pending.\n- States checked: Pending.\n- Viewports: Pending.\n- Evidence: Pending.\n- Surface risks or gaps: Pending.\n\n%s%s\n## Active Capability Evidence\n\n%s\n\n## Pressure Signals\n\nPending.\n\n## Changed Files\n\nPending.\n\n## Decisions\n\nPending.\n\n## Reusable Patterns\n\nPending.\n\n## Learn Quality Gate\n\n- Keep as memory only if it should change future work boundary, validation, stop conditions, readiness, or capability candidates.\n- Do not record one-off progress, file lists, generic summaries, or \"none\" statements as Learn signals.\n\n## Blocker\n\nPending.\n\n## Notes\n\nPending.\n", goalID, readinessEvidenceTemplate(readiness), referenceBenchmarkEvidenceTemplate(stage, readiness), selfReviewEvidenceTemplate(stage, readiness), activeCapabilityEvidenceTemplate(growth)) } func activeCapabilityEvidenceTemplate(growth growthState) string { diff --git a/internal/app/readiness.go b/internal/app/readiness.go index 508e90c..890a7ff 100644 --- a/internal/app/readiness.go +++ b/internal/app/readiness.go @@ -239,6 +239,7 @@ func loadReadinessEvidence(root string, defs []readinessDimensionDef) ([]readine } records = append(records, inferReadinessEvidenceFromReferenceBenchmark(goalID, usefulSectionLines(body, "Reference Benchmark Evidence"))...) } + records = append(records, verifiedReadinessEvidenceRecords(root, "", defs)...) return records, nil } diff --git a/internal/app/verified_evidence.go b/internal/app/verified_evidence.go new file mode 100644 index 0000000..d9752bd --- /dev/null +++ b/internal/app/verified_evidence.go @@ -0,0 +1,534 @@ +package app + +import ( + "bytes" + "encoding/json" + "errors" + "fmt" + "os" + "os/exec" + "path/filepath" + "sort" + "strconv" + "strings" + "time" +) + +const verifiedEvidenceEventType = "verified_command" + +type verifiedEvidenceRecord struct { + ID string `json:"id"` + Type string `json:"type"` + Status string `json:"status"` + Axis string `json:"axis,omitempty"` + Name string `json:"name,omitempty"` + Command []string `json:"command"` + CommandLine string `json:"command_line"` + CWD string `json:"cwd"` + RunID string `json:"run_id,omitempty"` + GoalID string `json:"goal_id,omitempty"` + StartedAt string `json:"started_at"` + FinishedAt string `json:"finished_at"` + DurationMillis int64 `json:"duration_millis"` + ExitCode int `json:"exit_code"` + CommitSHA string `json:"commit_sha"` + WorktreeStatusSHA256 string `json:"worktree_status_sha256"` + StdoutSHA256 string `json:"stdout_sha256"` + StderrSHA256 string `json:"stderr_sha256"` + StdoutBytes int `json:"stdout_bytes"` + StderrBytes int `json:"stderr_bytes"` + StdoutPath string `json:"stdout_path,omitempty"` + StderrPath string `json:"stderr_path,omitempty"` + RecordPath string `json:"record_path"` + RecordedBy string `json:"recorded_by"` + ReadinessEvidenceText string `json:"readiness_evidence_text"` +} + +type verifyOptions struct { + Axis string + Name string + Command []string +} + +type verifiedEvidenceGoalSummary struct { + GoalID string + Total int + Passed int + Failed int + Newest verifiedEvidenceRecord + LatestFailed verifiedEvidenceRecord +} + +func verifyHyper(fsys fsRoot, args []string) (commandOutput, *hyperError) { + root := fsys.root() + if err := ensureProjectLayout(root); err != nil { + return commandOutput{}, err + } + opts, err := parseVerifyOptions(args) + if err != nil { + return commandOutput{}, err + } + state := readStateIfExists(root) + record, stdoutText, stderrText, runErr := runVerifiedCommand(root, state, opts) + if recordErr := persistVerifiedEvidence(root, state, record, stdoutText, stderrText); recordErr != nil { + return commandOutput{}, recordErr + } + out := renderVerifiedEvidenceOutput(record) + if runErr != nil { + return stdout(out), newError(fmt.Sprintf("Verified command failed with exit code %d. Record: %s", record.ExitCode, record.RecordPath), recordExitCode(record.ExitCode)) + } + return stdout(out), nil +} + +func parseVerifyOptions(args []string) (verifyOptions, *hyperError) { + opts := verifyOptions{Axis: "validation_coverage"} + commandIndex := -1 + for i := 0; i < len(args); i++ { + arg := strings.TrimSpace(args[i]) + if arg == "--" { + commandIndex = i + 1 + break + } + switch arg { + case "--axis": + if i+1 >= len(args) { + return opts, newError("hyper verify requires a value after --axis.", 2) + } + opts.Axis = strings.TrimSpace(args[i+1]) + i++ + case "--name": + if i+1 >= len(args) { + return opts, newError("hyper verify requires a value after --name.", 2) + } + opts.Name = strings.TrimSpace(args[i+1]) + i++ + default: + return opts, newError("hyper verify options must appear before `--`.\n\n"+commandUsage("verify"), 2) + } + } + if commandIndex == -1 || commandIndex >= len(args) { + return opts, newError("hyper verify requires `-- [args...]`.", 2) + } + opts.Command = append([]string{}, args[commandIndex:]...) + if strings.TrimSpace(opts.Command[0]) == "" { + return opts, newError("hyper verify requires a non-empty command after `--`.", 2) + } + opts.Axis = normalizeVerifyAxis(opts.Axis) + if opts.Axis == "" { + return opts, newError("hyper verify --axis must match a readiness axis such as validation_coverage, core_ux, sustained_quality, operations_docs, or maintainability.", 2) + } + if strings.TrimSpace(opts.Name) == "" { + opts.Name = strings.Join(opts.Command, " ") + } + return opts, nil +} + +func normalizeVerifyAxis(axis string) string { + axis = strings.TrimSpace(axis) + if axis == "" { + return "validation_coverage" + } + if match := readinessAxisForLabel(axis, readinessDimensionDefs()); match != "" { + return match + } + compact := compactReadinessLabel(strings.ReplaceAll(axis, "_", " ")) + for _, def := range readinessDimensionDefs() { + if compact == compactReadinessLabel(def.ID) || compact == compactReadinessLabel(def.Name) { + return def.ID + } + } + return "" +} + +func runVerifiedCommand(root string, state projectState, opts verifyOptions) (verifiedEvidenceRecord, string, string, error) { + start := time.Now() + startedAt := start.UTC().Format("2006-01-02T15:04:05.000Z") + var stdoutBuf bytes.Buffer + var stderrBuf bytes.Buffer + cmd := exec.Command(opts.Command[0], opts.Command[1:]...) + cmd.Dir = root + cmd.Stdout = &stdoutBuf + cmd.Stderr = &stderrBuf + runErr := cmd.Run() + finished := time.Now() + exitCode := 0 + status := "passed" + if runErr != nil { + status = "failed" + exitCode = commandExitCode(runErr) + } + stdoutText := stdoutBuf.String() + stderrText := stderrBuf.String() + recordID := nextVerifiedEvidenceID(root) + recordRel := displayRelPath(hyperDir, "verified-evidence", recordID+".json") + stdoutRel := displayRelPath(hyperDir, "verified-evidence", recordID+".stdout.txt") + stderrRel := displayRelPath(hyperDir, "verified-evidence", recordID+".stderr.txt") + commandLine := strings.Join(opts.Command, " ") + record := verifiedEvidenceRecord{ + ID: recordID, + Type: verifiedEvidenceEventType, + Status: status, + Axis: opts.Axis, + Name: opts.Name, + Command: append([]string{}, opts.Command...), + CommandLine: commandLine, + CWD: root, + RunID: state.ActiveRunID, + GoalID: state.CurrentGoalID, + StartedAt: startedAt, + FinishedAt: finished.UTC().Format("2006-01-02T15:04:05.000Z"), + DurationMillis: finished.Sub(start).Milliseconds(), + ExitCode: exitCode, + CommitSHA: gitCommitSHA(root), + WorktreeStatusSHA256: hashText(gitStatusShort(root)), + StdoutSHA256: hashText(stdoutText), + StderrSHA256: hashText(stderrText), + StdoutBytes: len([]byte(stdoutText)), + StderrBytes: len([]byte(stderrText)), + StdoutPath: stdoutRel, + StderrPath: stderrRel, + RecordPath: recordRel, + RecordedBy: "hyper verify", + ReadinessEvidenceText: verifiedReadinessEvidenceText(opts.Axis, commandLine, status, exitCode, recordID), + } + return record, stdoutText, stderrText, runErr +} + +func persistVerifiedEvidence(root string, state projectState, record verifiedEvidenceRecord, stdoutText, stderrText string) *hyperError { + dir := filepath.Join(root, hyperDir, "verified-evidence") + if err := os.MkdirAll(dir, 0755); err != nil { + return ioError(err) + } + // Re-run is not acceptable for evidence, so write the buffers captured during + // command execution through the paths embedded in the record. + if err := writeText(filepath.Join(root, filepath.FromSlash(record.StdoutPath)), stdoutText); err != nil { + return err + } + if err := writeText(filepath.Join(root, filepath.FromSlash(record.StderrPath)), stderrText); err != nil { + return err + } + if err := writeJSON(filepath.Join(root, filepath.FromSlash(record.RecordPath)), record); err != nil { + return err + } + event := verifiedEvidenceEvent(record) + if err := appendJSONL(filepath.Join(root, hyperDir, "logs", "verified-evidence.jsonl"), event); err != nil { + return err + } + if strings.TrimSpace(state.ActiveRunID) != "" { + if err := appendJSONL(filepath.Join(root, hyperDir, "logs", state.ActiveRunID+".jsonl"), event); err != nil { + return err + } + } + db, err := openDB(root) + if err != nil { + return err + } + defer db.Close() + if err := ensureSchema(db); err != nil { + return err + } + return insertEvent(db, event) +} + +func verifiedEvidenceEvent(record verifiedEvidenceRecord) map[string]any { + return map[string]any{ + "type": verifiedEvidenceEventType, + "id": record.ID, + "status": record.Status, + "axis": record.Axis, + "name": record.Name, + "command": record.Command, + "command_line": record.CommandLine, + "run_id": record.RunID, + "goal_id": record.GoalID, + "created_at": record.FinishedAt, + "started_at": record.StartedAt, + "finished_at": record.FinishedAt, + "duration_millis": record.DurationMillis, + "exit_code": record.ExitCode, + "commit_sha": record.CommitSHA, + "worktree_status_sha256": record.WorktreeStatusSHA256, + "stdout_sha256": record.StdoutSHA256, + "stderr_sha256": record.StderrSHA256, + "stdout_bytes": record.StdoutBytes, + "stderr_bytes": record.StderrBytes, + "record_path": record.RecordPath, + "stdout_path": record.StdoutPath, + "stderr_path": record.StderrPath, + "readiness_evidence_text": record.ReadinessEvidenceText, + } +} + +func renderVerifiedEvidenceOutput(record verifiedEvidenceRecord) string { + lines := []string{ + "Verified evidence: " + record.ID, + "Status: " + record.Status, + fmt.Sprintf("Exit code: %d", record.ExitCode), + "Command: " + record.CommandLine, + "Axis: " + record.Axis, + "Goal: " + firstNonBlank(record.GoalID, "none"), + "Run: " + firstNonBlank(record.RunID, "none"), + "Record: " + record.RecordPath, + "Stdout: " + record.StdoutPath, + "Stderr: " + record.StderrPath, + "Stdout SHA256: " + record.StdoutSHA256, + "Stderr SHA256: " + record.StderrSHA256, + "Commit SHA: " + record.CommitSHA, + "Worktree status SHA256: " + record.WorktreeStatusSHA256, + } + return strings.Join(lines, "\n") +} + +func nextVerifiedEvidenceID(root string) string { + records, _ := filepath.Glob(filepath.Join(root, hyperDir, "verified-evidence", "VE-*.json")) + maxID := 0 + for _, path := range records { + base := strings.TrimSuffix(filepath.Base(path), ".json") + number := strings.TrimPrefix(base, "VE-") + value, err := strconv.Atoi(number) + if err == nil && value > maxID { + maxID = value + } + } + return fmt.Sprintf("VE-%04d", maxID+1) +} + +func commandExitCode(err error) int { + if err == nil { + return 0 + } + var exitErr *exec.ExitError + if ok := errors.As(err, &exitErr); ok { + return exitErr.ExitCode() + } + return 1 +} + +func recordExitCode(exitCode int) int { + if exitCode <= 0 { + return 1 + } + if exitCode > 125 { + return 1 + } + return exitCode +} + +func gitCommitSHA(root string) string { + out, err := exec.Command("git", "-C", root, "rev-parse", "HEAD").Output() + if err != nil { + return "unknown" + } + return strings.TrimSpace(string(out)) +} + +func gitStatusShort(root string) string { + out, err := exec.Command("git", "-C", root, "status", "--short").Output() + if err != nil { + return "unknown" + } + return string(out) +} + +func loadVerifiedEvidenceRecords(root string) []verifiedEvidenceRecord { + paths, err := filepath.Glob(filepath.Join(root, hyperDir, "verified-evidence", "VE-*.json")) + if err != nil { + return nil + } + sort.Strings(paths) + records := []verifiedEvidenceRecord{} + for _, path := range paths { + body, err := os.ReadFile(path) + if err != nil { + continue + } + var record verifiedEvidenceRecord + if err := json.Unmarshal(body, &record); err != nil { + continue + } + if record.ID == "" { + record.ID = strings.TrimSuffix(filepath.Base(path), ".json") + } + if record.RecordPath == "" { + record.RecordPath = displayRelPath(hyperDir, "verified-evidence", record.ID+".json") + } + records = append(records, record) + } + return records +} + +func verifiedReadinessEvidenceRecords(root, goalID string, defs []readinessDimensionDef) []readinessEvidenceRecord { + records := []readinessEvidenceRecord{} + for _, record := range loadVerifiedEvidenceRecords(root) { + if !verifiedEvidenceGoalMatches(record, goalID) || record.Status != "passed" || record.ExitCode != 0 { + continue + } + axis := normalizeVerifyAxis(record.Axis) + if axis == "" { + axis = "validation_coverage" + } + if !readinessAxisExists(axis, defs) { + continue + } + text := firstNonBlank(record.ReadinessEvidenceText, verifiedReadinessEvidenceText(axis, record.CommandLine, record.Status, record.ExitCode, record.ID)) + records = append(records, readinessEvidenceRecordForAxis(record.GoalID, axis, text)) + if axis != "validation_coverage" { + validationText := verifiedReadinessEvidenceText("validation_coverage", record.CommandLine, record.Status, record.ExitCode, record.ID) + records = append(records, readinessEvidenceRecordForAxis(record.GoalID, "validation_coverage", validationText)) + } + } + return records +} + +func readinessAxisExists(axis string, defs []readinessDimensionDef) bool { + for _, def := range defs { + if def.ID == axis { + return true + } + } + return false +} + +func verifiedEvidenceGoalMatches(record verifiedEvidenceRecord, goalID string) bool { + goalID = strings.TrimSpace(goalID) + return goalID == "" || strings.TrimSpace(record.GoalID) == goalID +} + +func goalHasPassedVerifiedEvidence(root, goalID string) bool { + for _, record := range loadVerifiedEvidenceRecords(root) { + if verifiedEvidenceGoalMatches(record, goalID) && record.Status == "passed" && record.ExitCode == 0 { + return true + } + } + return false +} + +func activeValidatorVerifiedEvidenceCovers(root, goalID string, capability activeCapability) bool { + if capability.Kind != "validator" { + return false + } + expectedCommand := normalizeSentence(inferredCommandForSignal(capability.Signal)) + if expectedCommand == "" { + return false + } + for _, record := range loadVerifiedEvidenceRecords(root) { + if !verifiedEvidenceGoalMatches(record, goalID) || record.Status != "passed" || record.ExitCode != 0 { + continue + } + if strings.Contains(normalizeSentence(record.CommandLine), expectedCommand) { + return true + } + } + return false +} + +func verifiedEvidenceSummaryForGoal(root, goalID string) verifiedEvidenceGoalSummary { + summary := verifiedEvidenceGoalSummary{GoalID: strings.TrimSpace(goalID)} + for _, record := range loadVerifiedEvidenceRecords(root) { + if !verifiedEvidenceGoalMatches(record, goalID) { + continue + } + summary.Total++ + switch record.Status { + case "passed": + if record.ExitCode == 0 { + summary.Passed++ + } else { + summary.Failed++ + summary.LatestFailed = record + } + case "failed": + summary.Failed++ + summary.LatestFailed = record + } + summary.Newest = record + } + return summary +} + +func verifiedEvidenceShortLine(root, goalID string) string { + summary := verifiedEvidenceSummaryForGoal(root, goalID) + goal := firstNonBlank(summary.GoalID, "current packet") + if summary.Total == 0 { + return "Verified Evidence: " + goal + " has no records yet" + } + line := fmt.Sprintf("Verified Evidence: %s %d record(s); passed %d, failed %d; newest %s", + goal, + summary.Total, + summary.Passed, + summary.Failed, + verifiedEvidenceRecordStatusPhrase(summary.Newest), + ) + if summary.Failed > 0 && summary.LatestFailed.ID != summary.Newest.ID { + line += "; latest failed " + verifiedEvidenceRecordStatusPhrase(summary.LatestFailed) + } + return line +} + +func verifiedEvidenceDashboardLines(root, goalID string) []string { + summary := verifiedEvidenceSummaryForGoal(root, goalID) + goal := firstNonBlank(summary.GoalID, "current packet") + lines := []string{"Verified Evidence:", " Current packet: " + goal} + if summary.Total == 0 { + return append(lines, " Records: none yet") + } + lines = append(lines, + fmt.Sprintf(" Records: %d total, %d passed, %d failed", summary.Total, summary.Passed, summary.Failed), + " Newest: "+verifiedEvidenceRecordStatusPhrase(summary.Newest), + " Record: "+summary.Newest.RecordPath, + ) + if summary.Failed > 0 { + lines = append(lines, " Latest failure: "+verifiedEvidenceRecordStatusPhrase(summary.LatestFailed)) + } + return lines +} + +func doctorVerifiedEvidenceCheck(root string) doctorCheck { + state := readStateIfExists(root) + goalID := strings.TrimSpace(state.CurrentGoalID) + if goalID == "" { + return doctorCheck{"Verified Evidence", "OK", "no current packet"} + } + summary := verifiedEvidenceSummaryForGoal(root, goalID) + if summary.Total == 0 { + return doctorCheck{"Verified Evidence", "OK", "no records for " + goalID + " yet"} + } + detail := fmt.Sprintf("%s records=%d passed=%d failed=%d; newest %s", + goalID, + summary.Total, + summary.Passed, + summary.Failed, + verifiedEvidenceRecordStatusPhrase(summary.Newest), + ) + status := "OK" + if summary.Failed > 0 { + status = "WARN" + if summary.LatestFailed.ID != summary.Newest.ID { + detail += "; latest failed " + verifiedEvidenceRecordStatusPhrase(summary.LatestFailed) + } + } + return doctorCheck{"Verified Evidence", status, detail} +} + +func verifiedEvidenceRecordStatusPhrase(record verifiedEvidenceRecord) string { + if strings.TrimSpace(record.ID) == "" { + return "none" + } + command := compactText(firstNonBlank(record.CommandLine, strings.Join(record.Command, " ")), 90) + displayStatus := firstNonBlank(record.Status, "unknown") + if record.ExitCode != 0 { + displayStatus = "failed" + } + phrase := record.ID + " " + displayStatus + if command != "" { + phrase += " `" + command + "`" + } + if displayStatus == "failed" { + phrase += fmt.Sprintf(" exit %d", record.ExitCode) + } + return phrase +} + +func verifiedReadinessEvidenceText(axis, commandLine, status string, exitCode int, recordID string) string { + return fmt.Sprintf("Verified Evidence %s executed CLI command `%s` with status %s and exit code %d.", recordID, commandLine, status, exitCode) +} diff --git a/plan.md b/plan.md index c46d9fb..52773d6 100644 --- a/plan.md +++ b/plan.md @@ -49,8 +49,10 @@ Native Go CLI with project-local files, SQLite-backed event storage, Codex Deskt ## Constraints - User intervention should be minimized, but not removed for irreversible, destructive, credential, payment, publication, or high-risk operations. +- Human control should sit at policy boundaries: approval, credentials, product ownership, spending, publication, destructive actions, and scope changes. Humans should not need to micromanage ordinary task selection or validation execution. - The AI must work in small coherent loops so progress is inspectable. - Every loop needs evidence: command output, smoke proof, browser proof, artifact proof, benchmark proof, or a concrete blocker. +- Repeatable command validation should prefer `hyper verify -- ` so exit code, log hashes, commit SHA, worktree status hash, run ID, and goal ID are machine-recorded. Markdown evidence should summarize or cite those records, not replace them. - Harnesses, validators, skills, agents, and stricter workflows should be generated only after repeated pressure proves they are useful. - Project knowledge must live in `plan.md`, `.hyper/`, logs, evidence, and generated candidates, not in transient chat memory. - Auto continuation must include progress guards so repeated non-progress does not look like work. @@ -86,7 +88,7 @@ The loop exists to keep project direction intact while minimizing human attentio - Read `plan.md`, the generated runtime packet, recent evidence, active capabilities, and `.hyper/next-packet.md`. - Classify safety before action. - Choose one smallest coherent episode. - - Run active validators or record a concrete reason they cannot run. + - Run active validators through `hyper verify -- ` when possible, or record a concrete reason they cannot run. - Write evidence, next notes, durable Learn signals, and self review. - Run the finish gate internally before starting another packet. @@ -106,12 +108,13 @@ The loop exists to keep project direction intact while minimizing human attentio - `.hyper/next-packet.md` says `Action: stop`, or `Action: advance` without authorized stage advancement. 5. Service-quality evidence - - Functional proof: active Go validators, targeted tests, command smoke, or artifact proof. + - Functional proof: active Go validators, targeted tests, command smoke, or artifact proof. Repeatable command proof should be backed by Verified Evidence records under `.hyper/verified-evidence/`. - Operational proof: install/update/release/checksum/signature/rollback/setup evidence when those surfaces are touched. - Core UX proof: `hyper run`, `status`, `doctor`, and generated packet guidance keep users on the intended flow. - Security proof: secrets are not exposed; release artifacts have checksum proof and signature proof when tooling exists. - Maintainability proof: stale branches, dirty state, repeated friction, and unclear handoffs are closed or routed to the next packet. - Product satisfaction proof: the result remains useful, coherent, and aligned with delegated autonomy, not merely test-passing. + - Verified Evidence proof: `hyper verify` records become the source of truth for command execution metadata; `evidence.md` remains the human-readable summary and decision ledger. 6. Validator and harness promotion - Active validators are required until evidence says otherwise: `GOCACHE=/private/tmp/hyper-go-cache go test ./...`, `go test ./...`, and `git diff --check`.