Add deep Codex skill loop evals#19
Merged
Merged
Conversation
Add a `skill-deep` Codex app-server suite for skill-loop behavior beyond the existing single evidence scenario. The suite covers transient evidence skips, missing-skill JSONL evidence, proposal-first curation, approved active skill creation, unapproved lifecycle no-ops, stale moves, restore moves, host skill surface preservation, and reviewable skill-author drafts. Add `skill_author` as a skill-loop protocol skill for drafting `SKILL.md` content under proposals without changing lifecycle state. Project it through both Codex and Claude Code adapters, and keep lifecycle activation under `skill_manage` approval. Validation: py_compile, harness-validate, codex-app-eval-suite, codex-skill-deep-eval, go test ./..., go vet ./..., make test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
skill_authorfor drafting reviewableSKILL.mdcontent without activating itskill-deepCodex app-server regression coverage for evidence, curation, lifecycle moves, and author draftsValidation
python3 -m py_compile scripts/codex_app_server_eval.pymake harness-validatemake codex-app-eval-suitemake codex-skill-deep-evalgo test ./...go vet ./...make test