Skip to content

Add deep Codex skill loop evals#19

Merged
Grivn merged 1 commit into
masterfrom
skill-loop-eval-optimization
May 15, 2026
Merged

Add deep Codex skill loop evals#19
Grivn merged 1 commit into
masterfrom
skill-loop-eval-optimization

Conversation

@Grivn
Copy link
Copy Markdown
Member

@Grivn Grivn commented May 15, 2026

Summary

  • add skill_author for drafting reviewable SKILL.md content without activating it
  • project the skill-loop protocol surface through Codex and Claude Code adapters
  • add skill-deep Codex app-server regression coverage for evidence, curation, lifecycle moves, and author drafts

Validation

  • python3 -m py_compile scripts/codex_app_server_eval.py
  • make harness-validate
  • make codex-app-eval-suite
  • make codex-skill-deep-eval
  • go test ./...
  • go vet ./...
  • make test

Add a `skill-deep` Codex app-server suite for skill-loop behavior beyond the
existing single evidence scenario. The suite covers transient evidence skips,
missing-skill JSONL evidence, proposal-first curation, approved active skill
creation, unapproved lifecycle no-ops, stale moves, restore moves, host skill
surface preservation, and reviewable skill-author drafts.

Add `skill_author` as a skill-loop protocol skill for drafting `SKILL.md`
content under proposals without changing lifecycle state. Project it through
both Codex and Claude Code adapters, and keep lifecycle activation under
`skill_manage` approval.

Validation: py_compile, harness-validate, codex-app-eval-suite,
codex-skill-deep-eval, go test ./..., go vet ./..., make test.
@Grivn Grivn merged commit 73702c0 into master May 15, 2026
1 check passed
@Grivn Grivn deleted the skill-loop-eval-optimization branch May 15, 2026 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant