Skip to content

Expand Codex app-server eval coverage#17

Merged
Grivn merged 3 commits into
masterfrom
eval-loop-optimization
May 15, 2026
Merged

Expand Codex app-server eval coverage#17
Grivn merged 3 commits into
masterfrom
eval-loop-optimization

Conversation

@Grivn
Copy link
Copy Markdown
Member

@Grivn Grivn commented May 15, 2026

Summary

  • extend the real Codex app-server eval runner with scenario suites, multi-turn support, reports, and focused assertions
  • add default and deep memory eval coverage for recall, writes, no-pollution, stale-memory replacement, secret rejection, and persisted continuity
  • document eval entrypoints and add project-level agent commit discipline while ignoring local Codex projection files

Validation

  • python3 -m py_compile scripts/codex_app_server_eval.py
  • make harness-validate
  • make codex-app-eval-suite
  • make codex-memory-deep-eval
  • go test ./...
  • go vet ./...
  • make test

Grivn added 3 commits May 14, 2026 17:58
Add a `memory-deep` Codex app-server suite covering noisy recall filtering,
stale-memory supersession, uncertain preference rejection, secret rejection,
transient no-pollution, and multi-turn continuity through persisted MEMORY.md.

The runner now supports multi-prompt scenarios, waits for turn completion from
the current notification boundary, and asserts against final answer text instead
of raw command output. Tighten memory-loop guidance so repeated safety policy
and skip-condition statements are not written as durable memory.

Validation: py_compile, harness-validate, codex-app-eval-suite,
codex-memory-deep-eval, go test ./..., go vet ./..., make test.
Add project-level agent guidance for build/test commands, local host projection
surfaces, commit splitting, and commit message style. The guidance makes commit
granularity and type selection part of the shared repo contract instead of
relying on a local Codex skill.

Also ignore `.codex/` alongside `.claude/` because both are generated host
projection directories, not canonical project state.
@Grivn Grivn merged commit ed62cd7 into master May 15, 2026
1 check passed
@Grivn Grivn deleted the eval-loop-optimization branch May 15, 2026 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant