Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ ifeq ($(GOBIN),)
GOBIN := $(shell go env GOPATH)/bin
endif

.PHONY: deps build install uninstall test unit vet harness-validate codex-app-eval codex-app-eval-suite codex-memory-deep-eval docker-build docker-run compose-up compose-down compose-dev release-snapshot clean help
.PHONY: deps build install uninstall test unit vet harness-validate codex-app-eval codex-app-eval-suite codex-memory-deep-eval codex-skill-deep-eval docker-build docker-run compose-up compose-down compose-dev release-snapshot clean help

.DEFAULT_GOAL := help

Expand Down Expand Up @@ -57,6 +57,9 @@ codex-app-eval-suite: ## Run real Codex app-server memory/skill scenario suite
codex-memory-deep-eval: ## Run deep real Codex app-server memory regression suite
python3 scripts/codex_app_server_eval.py --suite --suite-name memory-deep

codex-skill-deep-eval: ## Run deep real Codex app-server skill regression suite
python3 scripts/codex_app_server_eval.py --suite --suite-name skill-deep

# ── Containers / Deployment ──────────────────────────────────────────

docker-build: ## Build runtime Docker image
Expand Down
10 changes: 10 additions & 0 deletions docs/harness/eval/CODEX_APP_SERVER.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,16 @@ The deep memory suite adds noisy recall filtering, stale-memory supersession,
uncertain-preference rejection, secret-like value rejection, and multi-turn
continuity through persisted `MEMORY.md`.

For longer skill-loop regression, run:

```bash
make codex-skill-deep-eval
```

The deep skill suite adds transient evidence skip, missing-skill evidence,
approved active skill creation, host-surface preservation, and proposal-first
curation checks, plus reviewable skill authoring drafts.

To trigger a real Codex turn, opt in explicitly:

```bash
Expand Down
10 changes: 10 additions & 0 deletions docs/zh/harness/eval/CODEX_APP_SERVER.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,16 @@ make codex-memory-deep-eval
deep memory suite 会额外覆盖:带噪声的相关 recall、过期 memory 覆盖、
不确定偏好拒绝、疑似 secret 值拒绝,以及通过持久化 `MEMORY.md` 完成多轮连续性。

更长的 skill loop 回归可以运行:

```bash
make codex-skill-deep-eval
```

deep skill suite 会额外覆盖:跳过临时 evidence、记录 missing-skill evidence、
执行已批准的 active skill 创建、保护 host skill surface,以及 proposal-first
curation 不直接激活 skill,并验证 reviewable skill draft 的 authoring。

如果需要触发真实 Codex turn,可以显式开启:

```bash
Expand Down
15 changes: 15 additions & 0 deletions harness/eval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ Run the longer memory regression suite with:
make codex-memory-deep-eval
```

Run the longer skill-loop regression suite with:

```bash
make codex-skill-deep-eval
```

To run an actual Codex turn, use:

```bash
Expand Down Expand Up @@ -72,3 +78,12 @@ The `memory-deep` suite extends memory coverage with:
- rejecting uncertain preference changes
- rejecting secret-like values and generic restatements of existing safety policy
- multi-turn continuity through persisted `MEMORY.md`

The `skill-deep` suite extends skill-loop coverage with:

- skipping transient one-off workflow evidence
- recording missing-skill evidence as JSONL
- applying an explicitly approved active skill creation
- preserving the host skill surface during canonical skill changes
- producing proposal-first curation output without activating skills
- drafting reviewable skill content without activating it
5 changes: 4 additions & 1 deletion harness/hosts/claude-code/projector.sh
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ export MNEMON_SKILL_LOOP_USAGE_FILE="${CANONICAL_MODULE_DIR}/skills/.usage.jsonl
export MNEMON_SKILL_LOOP_PROPOSALS_DIR="${CANONICAL_MODULE_DIR}/proposals"
export MNEMON_SKILL_LOOP_HOST_SKILLS_DIR="${host_skills_dir}"
export MNEMON_SKILL_LOOP_REVIEW_MIN_EVENTS="\${MNEMON_SKILL_LOOP_REVIEW_MIN_EVENTS:-20}"
export MNEMON_SKILL_LOOP_PROTECTED_SKILLS="\${MNEMON_SKILL_LOOP_PROTECTED_SKILLS:-skill_observe,skill_curate,skill_manage,memory_get,memory_set}"
export MNEMON_SKILL_LOOP_PROTECTED_SKILLS="\${MNEMON_SKILL_LOOP_PROTECTED_SKILLS:-skill_observe,skill_curate,skill_author,skill_manage,memory_get,memory_set}"
EOF
chmod 0755 "${CONFIG_DIR}/mnemon-skill-loop/env.sh"
}
Expand Down Expand Up @@ -322,13 +322,15 @@ install_skill_loop() {
"${CANONICAL_MODULE_DIR}/reports" \
"${HOST_SKILLS_DIR}/skill_observe" \
"${HOST_SKILLS_DIR}/skill_curate" \
"${HOST_SKILLS_DIR}/skill_author" \
"${HOST_SKILLS_DIR}/skill_manage" \
"${CONFIG_DIR}/agents" \
"${CONFIG_DIR}/hooks/mnemon-skill-loop"
write_skill_projection_env

install_file "${MODULE_DIR}/skills/skill_observe.md" "${HOST_SKILLS_DIR}/skill_observe/SKILL.md" 0644
install_file "${MODULE_DIR}/skills/skill_curate.md" "${HOST_SKILLS_DIR}/skill_curate/SKILL.md" 0644
install_file "${MODULE_DIR}/skills/skill_author.md" "${HOST_SKILLS_DIR}/skill_author/SKILL.md" 0644
install_file "${MODULE_DIR}/skills/skill_manage.md" "${HOST_SKILLS_DIR}/skill_manage/SKILL.md" 0644
install_file "${MODULE_DIR}/subagents/curator.md" "${CONFIG_DIR}/agents/mnemon-skill-curator.md" 0644

Expand Down Expand Up @@ -398,6 +400,7 @@ uninstall_skill_loop() {
rm -rf "${CONFIG_DIR}/hooks/mnemon-skill-loop"
rm -rf "${host_skills_dir}/skill_observe"
rm -rf "${host_skills_dir}/skill_curate"
rm -rf "${host_skills_dir}/skill_author"
rm -rf "${host_skills_dir}/skill_manage"
rm -f "${CONFIG_DIR}/agents/mnemon-skill-curator.md"
rm -rf "${CONFIG_DIR}/mnemon-skill-loop"
Expand Down
5 changes: 5 additions & 0 deletions harness/hosts/codex/projector.sh
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,7 @@ install_skill_loop() {
"${CANONICAL_MODULE_DIR}/reports" \
"${HOST_SKILLS_DIR}/skill_observe" \
"${HOST_SKILLS_DIR}/skill_curate" \
"${HOST_SKILLS_DIR}/skill_author" \
"${HOST_SKILLS_DIR}/skill_manage" \
"${CONFIG_DIR}/mnemon-skill-loop"
write_runtime_env "${CONFIG_DIR}/mnemon-skill-loop" "MNEMON_SKILL_LOOP_ENV" "MNEMON_SKILL_LOOP_DIR"
Expand All @@ -301,13 +302,16 @@ export MNEMON_SKILL_LOOP_ARCHIVED_DIR="${CANONICAL_MODULE_DIR}/skills/archived"
export MNEMON_SKILL_LOOP_USAGE_FILE="${CANONICAL_MODULE_DIR}/skills/.usage.jsonl"
export MNEMON_SKILL_LOOP_PROPOSALS_DIR="${CANONICAL_MODULE_DIR}/proposals"
export MNEMON_SKILL_LOOP_HOST_SKILLS_DIR="${HOST_SKILLS_DIR}"
export MNEMON_SKILL_LOOP_PROTECTED_SKILLS="${MNEMON_SKILL_LOOP_PROTECTED_SKILLS:-skill_observe,skill_curate,skill_author,skill_manage,memory_get,memory_set}"
EOF

install_file "${MODULE_DIR}/skills/skill_observe.md" "${HOST_SKILLS_DIR}/skill_observe/SKILL.md" 0644
install_file "${MODULE_DIR}/skills/skill_curate.md" "${HOST_SKILLS_DIR}/skill_curate/SKILL.md" 0644
install_file "${MODULE_DIR}/skills/skill_author.md" "${HOST_SKILLS_DIR}/skill_author/SKILL.md" 0644
install_file "${MODULE_DIR}/skills/skill_manage.md" "${HOST_SKILLS_DIR}/skill_manage/SKILL.md" 0644
append_codex_runtime_note "${HOST_SKILLS_DIR}/skill_observe/SKILL.md" "MNEMON_SKILL_LOOP_DIR" "${CONFIG_DIR}/mnemon-skill-loop/env.sh"
append_codex_runtime_note "${HOST_SKILLS_DIR}/skill_curate/SKILL.md" "MNEMON_SKILL_LOOP_DIR" "${CONFIG_DIR}/mnemon-skill-loop/env.sh"
append_codex_runtime_note "${HOST_SKILLS_DIR}/skill_author/SKILL.md" "MNEMON_SKILL_LOOP_DIR" "${CONFIG_DIR}/mnemon-skill-loop/env.sh"
append_codex_runtime_note "${HOST_SKILLS_DIR}/skill_manage/SKILL.md" "MNEMON_SKILL_LOOP_DIR" "${CONFIG_DIR}/mnemon-skill-loop/env.sh"

write_host_manifest "${CONFIG_DIR}"
Expand Down Expand Up @@ -356,6 +360,7 @@ uninstall_skill_loop() {
local host_skills_dir="${MNEMON_SKILL_LOOP_HOST_SKILLS_DIR:-${HOST_SKILLS_DIR:-${CONFIG_DIR}/skills}}"
rm -rf "${host_skills_dir}/skill_observe"
rm -rf "${host_skills_dir}/skill_curate"
rm -rf "${host_skills_dir}/skill_author"
rm -rf "${host_skills_dir}/skill_manage"
rm -rf "${CONFIG_DIR}/mnemon-skill-loop"
if [[ "${PURGE_LIBRARY}" == "1" ]]; then
Expand Down
4 changes: 3 additions & 1 deletion harness/modules/skill-loop/GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ Record evidence when a session shows one of these signals:
- a skill should be protected, pinned, restored, staled, or archived

Skip evidence for one-off commands, transient progress, raw chat logs, secrets,
or facts better stored as memory.
or facts better stored as memory. Do not record evidence merely because a
single command succeeded or because the current prompt mentions the skill loop;
there must be a reusable workflow or lifecycle signal.

## Lifecycle

Expand Down
3 changes: 3 additions & 0 deletions harness/modules/skill-loop/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ harness/modules/skill-loop/
├── skills/
│ ├── skill_observe.md
│ ├── skill_curate.md
│ ├── skill_author.md
│ └── skill_manage.md
├── subagents/
│ └── curator.md
Expand All @@ -43,6 +44,7 @@ harness/modules/skill-loop/
| `hooks/*.md` | Four lifecycle reminders. Prime syncs active skills; Nudge records evidence; Compact may trigger review; Remind is no-op by default. |
| `skills/skill_observe.md` | Online evidence capture protocol. |
| `skills/skill_curate.md` | Protocol for starting a curator review. |
| `skills/skill_author.md` | Protocol for drafting reviewable `SKILL.md` content. |
| `skills/skill_manage.md` | Approved lifecycle mutation protocol. |
| `subagents/curator.md` | Background reviewer that proposes create, patch, consolidate, stale, archive, or restore actions. |
| Host adapter | Host-specific projection lives outside the module under `harness/hosts/<host>/`. |
Expand Down Expand Up @@ -90,6 +92,7 @@ The key split is:
GUIDE.md decides when skill evolution behavior is useful.
skill_observe.md records evidence only.
curator.md reviews evidence and proposes changes.
skill_author.md drafts skill content for review.
skill_manage.md applies approved changes to canonical state.
prime.sh projects active canonical skills into the host skill surface.
```
Expand Down
2 changes: 1 addition & 1 deletion harness/modules/skill-loop/env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ export MNEMON_SKILL_LOOP_USAGE_FILE="${MNEMON_SKILL_LOOP_USAGE_FILE:-${MNEMON_SK
export MNEMON_SKILL_LOOP_PROPOSALS_DIR="${MNEMON_SKILL_LOOP_PROPOSALS_DIR:-${MNEMON_SKILL_LOOP_DIR}/proposals}"
export MNEMON_SKILL_LOOP_HOST_SKILLS_DIR="${MNEMON_SKILL_LOOP_HOST_SKILLS_DIR:-${MNEMON_SKILL_LOOP_CONFIG_DIR}/skills}"
export MNEMON_SKILL_LOOP_REVIEW_MIN_EVENTS="${MNEMON_SKILL_LOOP_REVIEW_MIN_EVENTS:-20}"
export MNEMON_SKILL_LOOP_PROTECTED_SKILLS="${MNEMON_SKILL_LOOP_PROTECTED_SKILLS:-skill_observe,skill_curate,skill_manage,memory_get,memory_set}"
export MNEMON_SKILL_LOOP_PROTECTED_SKILLS="${MNEMON_SKILL_LOOP_PROTECTED_SKILLS:-skill_observe,skill_curate,skill_author,skill_manage,memory_get,memory_set}"
1 change: 1 addition & 0 deletions harness/modules/skill-loop/module.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
"skills": [
"skills/skill_observe.md",
"skills/skill_curate.md",
"skills/skill_author.md",
"skills/skill_manage.md"
],
"subagents": [
Expand Down
56 changes: 56 additions & 0 deletions harness/modules/skill-loop/skills/skill_author.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
name: skill_author
description: Draft or revise high-quality SKILL.md content for approved or proposed Mnemon skill-loop changes.
---

# skill_author

Use this skill when a curator proposal, user request, or approved lifecycle
change needs a concrete `SKILL.md` draft.

## Boundary

This skill authors skill content only. It does not decide lifecycle placement
and does not activate, stale, archive, restore, or delete skills.

Write drafts under:

```text
$MNEMON_SKILL_LOOP_PROPOSALS_DIR
```

Approved lifecycle placement is applied later with `skill_manage.md`.

## Procedure

1. Confirm the target skill id is hyphen-case: lowercase letters, numbers, and
`-`.
2. Confirm the skill captures a reusable procedure, not project facts,
preferences, credentials, raw transcripts, or one-off task context.
3. Draft a complete `SKILL.md` with:
- YAML frontmatter containing `name` and `description`
- a short trigger-oriented description
- a clear boundary section
- a concise procedure section
- safety or validation notes only when they change behavior
4. Keep the skill focused. Prefer one workflow per skill.
5. Use project-neutral language. Do not embed current branch names, temporary
tokens, credentials, private URLs, or task-specific facts.
6. Save the draft as a proposal artifact such as:

```text
$MNEMON_SKILL_LOOP_PROPOSALS_DIR/<skill-id>.SKILL.md
```

7. Leave `skills/active`, `skills/stale`, `skills/archived`, and host skill
surfaces unchanged unless the user explicitly asks to use `skill_manage.md`
after approval.

## Quality Checklist

- The description tells the host when to use the skill.
- The body teaches reusable judgment or procedure the model would not reliably
infer from the current task alone.
- The content is short enough to load on demand.
- The skill avoids duplicated policy already covered by `GUIDE.md`.
- The draft is safe to review before activation.
4 changes: 3 additions & 1 deletion harness/modules/skill-loop/skills/skill_curate.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@ It does not directly apply lifecycle changes. Approved changes are applied with
- `.usage.jsonl`
- existing proposals
3. Request proposals for create, patch, consolidate, stale, archive, or restore
actions only when evidence supports them.
actions only when evidence supports them. When a proposal needs concrete
skill content, use `skill_author.md` to draft reviewable `SKILL.md` content
under the proposals directory.
4. Keep the output proposal-first. Do not enable a new active skill in the
current session unless the user explicitly approves and the host supports it.

Expand Down
4 changes: 3 additions & 1 deletion harness/modules/skill-loop/skills/skill_manage.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ $MNEMON_SKILL_LOOP_ARCHIVED_DIR
## Allowed MVP Operations

- create an approved skill under `active/<skill-id>/SKILL.md`
- apply approved `SKILL.md` content drafted by `skill_author.md`
- patch an existing skill in its current lifecycle directory
- consolidate duplicated skills with an approved replacement
- move `active -> stale`
Expand All @@ -38,7 +39,8 @@ $MNEMON_SKILL_LOOP_ARCHIVED_DIR
1. Read the approved proposal and confirm the intended operation.
2. Check `MNEMON_SKILL_LOOP_PROTECTED_SKILLS`; do not modify protected skills
unless the approval explicitly covers the exception.
3. Keep skill ids filesystem-safe: lowercase letters, numbers, `_`, and `-`.
3. Keep new user-facing skill ids hyphen-case: lowercase letters, numbers, and
`-`. Existing protocol skill ids may keep their established underscore names.
4. Apply the smallest canonical change under the lifecycle directories.
5. Prefer moving to `archived` over deletion.
6. Do not edit the host skill surface directly. Let Prime regenerate it.
Expand Down
4 changes: 3 additions & 1 deletion harness/modules/skill-loop/subagents/curator.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description: Reviews Mnemon skill evidence and proposes skill lifecycle changes.
tools: Read, Write, Edit, Bash, Grep, Glob
skills:
- skill_observe
- skill_author
- skill_manage
---

Expand Down Expand Up @@ -44,7 +45,8 @@ Run curator review when:
2. Inspect active, stale, and archived skills.
3. Review usage evidence and existing proposals.
4. Identify only evidence-backed opportunities:
- create a skill for a repeated workflow
- create a skill for a repeated workflow, using `skill_author` for draft
`SKILL.md` content when useful
- patch a misleading, outdated, or incomplete skill
- consolidate duplicated skills
- move low-value active skills to stale
Expand Down
Loading
Loading