diff --git a/AGENTS.md b/AGENTS.md
index 105fda3..01bb783 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -25,26 +25,32 @@
 
 ### Gotcha
 
+<!-- lore:019cc484-f0e1-7016-a851-177fb9ad2cc4 -->
+* **AGENTS.md must be excluded from markdown linters**: AGENTS.md is auto-managed by lore and uses \`\*\` list markers and long lines that violate typical remark-lint rules (unordered-list-marker-style, maximum-line-length). When a project uses remark with \`--frail\` (warnings become errors), AGENTS.md will fail CI. Fix: add \`AGENTS.md\` to \`.remarkignore\`. This applies to any lore-managed project with markdown linting.
+
 <!-- lore:019c91d6-04af-7334-8374-e8bbf14cb43d -->
-* **Calibration used DB message count instead of transformed window count — caused layer 0 false passthrough**: Lore gradient/context management bugs and fixes: (1) Used DB message count instead of transformed window count — delta ≈ 1 after compression → layer 0 passthrough → overflow. Fix: getLastTransformedCount(). (2) actualInput omitted cache.write — cold-cache showed ~3 tokens → layer 0. Fix: include cache.write. (3) Trailing pure-text assistant messages cause Anthropic prefill errors. Drop loop must run at ALL layers including 0 — at layer 0 result.messages === output.messages (same ref), so pop() trims in place. Messages with tool parts must NOT be dropped (hasToolParts) — dropping causes infinite tool-call loops. (4) Lore only protects projects registered in opencode.json — unregistered projects get zero context management → stuck compaction loops creating orphaned message pairs. Recovery: delete all messages after last good assistant message (has tokens, no error).
+* **Calibration used DB message count instead of transformed window count — caused layer 0 false passthrough**: Lore gradient/context bugs: (1) Used DB message count instead of transformed window count — delta ≈ 1 → layer 0 passthrough → overflow. Fix: getLastTransformedCount(). (2) actualInput omitted cache.write — cold-cache ~3 tokens → layer 0. Fix: include cache.write. (3) Trailing pure-text assistant messages cause Anthropic prefill errors. Drop loop must run at ALL layers (layer 0 shares ref with output). Never drop messages with tool parts (hasToolParts) — causes infinite loops. (4) Unregistered projects get zero context management → stuck compaction loops. Recovery: delete messages after last good assistant message.
+
+<!-- lore:019cc40e-e56e-71e9-bc5d-545f97df732b -->
+* **Consola prompt cancel returns truthy Symbol, not false**: When a user cancels a \`consola\` / \`@clack/prompts\` confirmation prompt (Ctrl+C), the return value is \`Symbol(clack:cancel)\`, not \`false\`. Since Symbols are truthy in JavaScript, checking \`!confirmed\` will be \`false\` and the code falls through as if the user confirmed. Fix: use \`confirmed !== true\` (strict equality) instead of \`!confirmed\` to correctly handle cancel, false, and any other non-true values.
+
+<!-- lore:019cc484-f0e7-7a64-bea1-f3f98e9c56c1 -->
+* **Craft v2 GitHub App must be installed per-repo**: The Craft v2 release/publish workflows use \`actions/create-github-app-token@v1\` which requires the GitHub App to be installed on the specific repository. If the app is configured for "Only select repositories", adding a new repo to the Craft pipeline requires manually adding it at GitHub Settings → Installations → \[App] → Configure. The \`APP\_ID\` variable and \`APP\_PRIVATE\_KEY\` secret are set in the \`production\` environment, not at repo level. Symptom: 404 on \`GET /repos/{owner}/{repo}/installation\`.
 
-<!-- lore:019cb171-c0ea-75cf-bf65-b081373f136b -->
-* **mt7921e 3dBm tx power on desktop — disable CLC firmware table**: mt7921e/mt7922 PCIe WiFi cards in desktop PCs (no ACPI SAR tables like WRDS/EWRD) get stuck at ~3 dBm tx power because the CLC (Country Location Code) firmware power lookup falls back to a conservative default when no SAR table exists. Fix: set \`options mt7921\_common disable\_clc=1\` in /etc/modprobe.d/mt7921.conf. This lets the regulatory domain ceiling apply (e.g. 23 dBm on 5GHz ch44 in GB). Also set explicit tx power via \`iw dev \<iface> set txpower fixed 2000\` in ExecStartPost since the module param only takes effect on next module load/reboot.
+<!-- lore:019cb615-0b10-7bbc-a7db-50111118c200 -->
+* **Lore auto-recovery can infinite-loop without re-entrancy guard**: Three bugs in v0.5.2 caused excessive background LLM requests: (1) Auto-recovery infinite loop — session.error overflow handler injected recovery prompt via session.prompt, which could overflow again → another session.error → loop of 2+ LLM calls/cycle. Fix: recoveringSessions Set as re-entrancy guard. (2) Curator ran every idle — \`onIdle || afterTurns\` short-circuited because onIdle=true. Fix: change \`||\` to \`&&\`. Lesson: boolean flag gating numeric threshold needs AND not OR. (3) shouldSkip() fell back to session.list() on every unknown session (short IDs fail session.get). Fix: remove list fallback, cache in activeSessions after first check.
 
-<!-- lore:019cb171-c0fa-74b0-a9a6-847901efa907 -->
-* **Pixel phones fail WPA group key rekey during doze — use 86400s interval**: Android Pixel devices in deep doze/sleep fail to respond to WPA group key handshake frames within hostapd's retry window. With wpa\_group\_rekey=3600, the phone gets deauthenticated every hour ('group key handshake failed (RSN) after 4 tries'). Other devices on the same AP complete the rekey fine. Fix: set wpa\_group\_rekey=86400 (24h) instead of 0 (disabled) for security balance. Also apply to Asus router: nvram set wpa\_gtk\_rekey=86400, wl0\_wpa\_gtk\_rekey=86400, wl1\_wpa\_gtk\_rekey=86400.
+<!-- lore:019cb3e6-da66-7534-a573-30d2ecadfd53 -->
+* **Returning bare promises loses async function from error stack traces**: When an \`async\` function returns another promise without \`await\`, the calling function disappears from error stack traces if the inner promise rejects. A function that drops \`async\` and does \`return someAsyncCall()\` loses its frame entirely. Fix: keep the function \`async\` and use \`return await someAsyncCall()\`. This matters for debugging — the intermediate function name in the stack trace helps locate which code path triggered the failure. ESLint rule \`no-return-await\` is outdated; modern engines optimize \`return await\` in async functions.
 
-<!-- lore:019cb171-c0fe-78a8-a5f8-4ae8e2980a70 -->
-* **sudo changes $HOME to /root — hardcode user home in scripts run with sudo**: When running a script with \`sudo\`, \`$HOME\` resolves to \`/root\`, not the invoking user's home. SSH key paths like \`$HOME/.ssh/id\_ed25519\` break. Fix: use \`SUDO\_USER\` env var: \`USER\_HOME=$(eval echo ~${SUDO\_USER:-$USER})\` and reference \`$USER\_HOME/.ssh/id\_ed25519\`. This is a common trap in scripts that need both root privileges (systemctl, writing to /etc) and user-specific resources (SSH keys).
+<!-- lore:019cd20d-f42c-71bf-9da5-b2dd52c5014d -->
+* **sgdisk reserves 33 sectors for backup GPT, shrinking partition vs original layout**: When recreating a GPT partition entry with \`sgdisk\`, it sets \`LastUsableLBA\` conservatively — 33 sectors short of disk end to reserve space for the backup GPT table. If the original partition extended to the very last sector (common for factory-formatted exFAT SD cards), the recreated partition will be 33 sectors too small. Windows strictly validates that the exFAT VolumeLength in the VBR matches the GPT partition size and refuses to mount on mismatch ("drive not formatted" error). Fix: patch the exFAT VBR's VolumeLength to match the GPT partition size (PartitionLastLBA - PartitionFirstLBA + 1), then recalculate the exFAT boot region checksum (sector 11). Do NOT extend LastUsableLBA to the disk's last sector — that's where the backup GPT header lives, and Windows will reject the GPT as corrupt if usable range overlaps it.
 
 <!-- lore:019c8f4f-67ca-7212-a8c4-8a75b230ceea -->
 * **Test DB isolation via LORE\_DB\_PATH and Bun test preload**: Lore test suite uses isolated temp DB via test/setup.ts preload (bunfig.toml). Preload sets LORE\_DB\_PATH to mkdtempSync path before any imports of src/db.ts; afterAll cleans up. src/db.ts checks LORE\_DB\_PATH first. agents-file.test.ts needs beforeEach cleanup for intra-file isolation and TEST\_UUIDS cleanup in afterAll (shared with ltm.test.ts). Individual test files don't need close() calls — preload handles DB lifecycle.
 
-<!-- lore:019cb171-c0f5-741f-96cc-e0862c846202 -->
-* **Ubuntu packaged hostapd lacks 802.11r (CONFIG\_IEEE80211R not compiled)**: Ubuntu 24.04 hostapd (2:2.10-21ubuntu0.x) lacks CONFIG\_IEEE80211R. Using \`ieee80211r=1\`, \`mobility\_domain\`, \`FT-PSK\` etc. causes 'unknown configuration item' and fails to start. 802.11k/v directives ARE compiled in. Verify: \`strings /usr/sbin/hostapd | grep ieee80211r\` — absence confirms no FT support. Build from source with CONFIG\_IEEE80211R=y. Note: hostapd has NO config dry-run flag — \`-t\` just adds timestamps to debug output and fully starts the AP. Use grep-based validation for known-bad directives instead.
-
-<!-- lore:019cb286-7c85-7039-aecf-25781892c9da -->
-* **Zod v4 .default({}) no longer applies inner field defaults**: Zod v4 changed \`.default()\` to short-circuit: when input is \`undefined\`, it returns the default value directly without parsing it through inner schema defaults. So \`.object({ enabled: z.boolean().default(true) }).default({})\` returns \`{}\` (no \`enabled\` key), not \`{ enabled: true }\`. Fix: provide fully-populated default objects — \`.default({ enabled: true })\`. This affected all nested config sections in src/config.ts during the v3→v4 upgrade. The import \`import { z } from "zod"\` is unchanged — Zod 4's main entry point is the v4 API.
+<!-- lore:019cc303-e397-75b9-9762-6f6ad108f50a -->
+* **Zod z.coerce.number() converts null to 0 silently**: Zod gotchas in this codebase: (1) \`z.coerce.number()\` passes input through \`Number()\`, so \`null\` silently becomes \`0\`. Be aware if \`null\` vs \`0\` distinction matters. (2) Zod v4 \`.default({})\` short-circuits — it returns the default value without parsing through inner schema defaults. So \`.object({ enabled: z.boolean().default(true) }).default({})\` returns \`{}\`, not \`{ enabled: true }\`. Fix: provide fully-populated default objects. This affected nested config sections in src/config.ts during the v3→v4 upgrade.
 
 ### Pattern
 
@@ -52,16 +58,13 @@
 * **Lore logging: LORE\_DEBUG gating for info/warn, always-on for errors**: src/log.ts provides three levels: log.info() and log.warn() are suppressed unless LORE\_DEBUG=1 or LORE\_DEBUG=true; log.error() always emits. All write to stderr with \[lore] prefix. This exists because OpenCode TUI renders all stderr as red error text — routine status messages (distillation counts, pruning stats, consolidation) were alarming users. Rule: use log.info() for successful operations and status, log.warn() for non-actionable oddities (e.g. dropping trailing messages), log.error() only in catch blocks for real failures. Never use console.error directly in plugin source files.
 
 <!-- lore:019cb12a-c957-7e24-b3f5-6869f3429d13 -->
-* **Lore release process: craft + issue-label publish**: Release flow: (1) Create release/X.Y.Z branch, bump version in package.json, push and merge PR. (2) Trigger release.yml via workflow\_dispatch — uses getsentry/craft to create a GitHub issue titled 'publish: BYK/opencode-lore@X.Y.Z'. (3) Label that issue 'accepted' — triggers publish.yml (on issues:labeled) which runs craft publish with npm OIDC trusted publishing, then closes the issue. Auto-merge on release PRs requires squash merge (merge commits disallowed on this repo). The repo uses a GitHub App token (APP\_ID + APP\_PRIVATE\_KEY) for checkout in both workflows.
+* **Lore release process: craft + issue-label publish**: Release flow: (1) Trigger release.yml via workflow\_dispatch with version='auto' — uses getsentry/craft to determine version from commits and create a GitHub issue titled 'publish: BYK/opencode-lore@X.Y.Z'. (2) Label that issue 'accepted' — triggers publish.yml which runs craft publish with npm OIDC trusted publishing, then closes the issue. Do NOT create a release/X.Y.Z branch or bump package.json manually — craft handles versioning with 'auto'. The repo uses a GitHub App token (APP\_ID + APP\_PRIVATE\_KEY) for checkout in both workflows.
 
 <!-- lore:019cb200-0001-7000-8000-000000000001 -->
 * **PR workflow for opencode-lore: branch → PR → auto-merge**: All changes (including minor fixes and test-only changes) must go through a branch + PR + auto-merge, never pushed directly to main. Workflow: (1) git checkout -b \<type>/\<slug>, (2) commit, (3) git push -u origin HEAD, (4) gh pr create --title "..." --body "..." --base main, (5) gh pr merge --auto --squash \<PR#>. Branch name conventions follow merged PR history: fix/\<slug>, feat/\<slug>, chore/\<slug>. Auto-merge with squash is required (merge commits disallowed). Never push directly to main even for trivial changes.
 
 ### Preference
 
-<!-- lore:019ca190-0001-7000-8000-000000000001 -->
-* **Always dry-run before bulk DB deletes**: Never execute bulk DELETE/destructive operations without first running the equivalent SELECT to verify row count and inspect affected rows. A hardcoded timestamp off by one year caused deletion of all 1638 messages + 5927 parts instead of 5 debris rows. Pattern: (1) SELECT with same WHERE, (2) verify count, (3) then DELETE. Applies to any destructive op — DB mutations, git reset, file deletion.
-
 <!-- lore:019ca19d-fc02-7657-b2e9-7764658c01a5 -->
-* **Code style**: User prefers no backwards-compat shims — fix callers directly. Prefer explicit error handling over silent failures. Derive thresholds from existing constants rather than hardcoding magic numbers (e.g., use \`raw.length <= COL\_COUNT\` instead of \`n < 10\_000\`). In CI, define shared env vars at workflow level, not per-job.
+* **Code style**: User prefers no backwards-compat shims — fix callers directly. Prefer explicit error handling over silent failures. Derive thresholds from existing constants rather than hardcoding magic numbers (e.g., use \`raw.length <= COL\_COUNT\` instead of \`n < 10\_000\`). In CI, define shared env vars at workflow level, not per-job. Always dry-run before bulk destructive operations (SELECT before DELETE to verify row count).
 <!-- End lore-managed section -->
diff --git a/README.md b/README.md
index 14392cf..57cf1ec 100644
--- a/README.md
+++ b/README.md
@@ -14,11 +14,11 @@ Lore uses a three-tier memory architecture (following [Nuum's design](https://ww
 
 1. **Temporal storage** — every message is stored in a local SQLite FTS5 database, searchable on demand via the `recall` tool.
 
-2. **Distillation** — messages are incrementally distilled into an observation log (dated, timestamped, priority-tagged entries), following [Mastra's observer/reflector pattern](https://mastra.ai/research/observational-memory). When segments accumulate, older distillations are recursively merged to prevent unbounded growth. The observer prompt is tuned to preserve exact numbers, bug fixes, file paths, and assistant-generated content.
+2. **Distillation** — messages are incrementally distilled into an observation log (dated, timestamped, priority-tagged entries), following [Mastra's observer/reflector pattern](https://mastra.ai/research/observational-memory). When segments accumulate, older distillations are consolidated into structured context documents optimized for diverse downstream queries (current state, key decisions, technical changes, timeline) — a [context-distillation objective](https://arxiv.org/abs/2501.17390) that generalizes better than flat summarization. Consolidated entries are archived rather than deleted, preserving a searchable detail layer for the `recall` tool. The observer prompt is tuned to preserve exact numbers, bug fixes, file paths, and assistant-generated content.
 
 3. **Long-term knowledge** — a curated knowledge base of facts, patterns, decisions, and gotchas that matter across projects, maintained by a background curator agent.
 
-A **gradient context manager** decides how much of each tier to include in each turn, using a 4-layer safety system that calibrates overhead dynamically from real API token counts. This handles the unpredictable context consumption of coding agents (large tool outputs, system prompts, injected instructions) better than a fixed-budget approach.
+A **gradient context manager** decides how much of each tier to include in each turn, using a 4-layer safety system that calibrates overhead dynamically from real API token counts. When tool outputs are stripped for compression, [loss-annotated metadata](https://arxiv.org/abs/2602.16284) preserves key signals (tool name, size, error presence, file paths) so the model can make informed decisions about whether to recall the full content. This handles the unpredictable context consumption of coding agents (large tool outputs, system prompts, injected instructions) better than a fixed-budget approach.
 
 ## Benchmarks
 
@@ -78,6 +78,8 @@ This plugin was built in a few intense sessions. Some highlights:
 
 **v3 — gradient fixes, caching, and proper eval.** A month of fixes (per-session gradient state, current-turn protection, cache.write calibration, prefix caching, LTM relevance scoring) shipped alongside a new self-contained eval harness. The old coding eval used DB-resident sessions that degraded over time as temporal pruning deleted messages. The new eval extracts full session transcripts into portable JSON files, distills on the fly with the current production prompt, seeds the DB for recall tool access, and compares against OpenCode's actual compaction behavior. This moved the coding eval from 15 questions on degraded data to 20 questions on clean 113K-353K token sessions — and confirmed the +35pp accuracy gap and 7x cost efficiency advantage.
 
+**v4 — research-informed compaction improvements.** Three changes informed by the KV cache compression literature ([Zweiger et al. 2025](https://arxiv.org/abs/2602.16284), [Eyuboglu et al. 2025](https://arxiv.org/abs/2501.17390)): (1) *Loss-annotated tool stripping* — when tool outputs are compressed away at higher gradient layers, the replacement now includes metadata (tool name, line count, error presence, file paths) instead of a static placeholder, helping the model decide whether to recall the full content. (2) *Context-distillation meta-distillation* — the reflector prompt was restructured to produce a working context document with sections for current state, key decisions, technical changes, and timeline, rather than a flat re-organized event log — an objective that generalizes better to diverse downstream queries. (3) *Multi-resolution composable distillations* — gen-0 observations are now archived instead of deleted during meta-distillation, preserving a searchable detail layer for the recall tool while the compressed gen-1 serves as the in-context summary.
+
 ## Installation
 
 ### Prerequisites
@@ -214,6 +216,8 @@ The assistant gets a `recall` tool that searches across stored messages and know
 - [How we solved the agent memory problem](https://www.sanity.io/blog/how-we-solved-the-agent-memory-problem) — Simen Svale at Sanity on the Nuum memory architecture: three-tier storage, distillation not summarization, recursive compression. The foundation this plugin is built on.
 - [Mastra Observational Memory](https://mastra.ai/research/observational-memory) — the observer/reflector architecture and the switch from structured JSON to timestamped observation logs that made v2 work.
 - [Mastra Memory source](https://github.com/mastra-ai/mastra/tree/main/packages/memory) — reference implementation.
+- [Fast KV Compaction via Attention Matching](https://arxiv.org/abs/2602.16284) — Adam Zweiger, Xinghong Fu, Han Guo, Yoon Kim on preserving attention mass when compressing KV caches. Inspired the loss-annotated tool stripping approach: when content is removed during compression, preserving metadata about what was lost helps the model compensate — analogous to the per-token scalar bias β that preserves attention mass when token count is reduced.
+- [Cartridges: Compact Representations of Context for LLMs](https://arxiv.org/abs/2501.17390) — Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Judd, Christopher Ré on offline compressed context representations. Two key ideas adopted: (1) the context-distillation objective for meta-distillation — optimizing compressed context for downstream query-answering rather than faithful summarization, following the Self-Study finding that memorization objectives don't generalize; (2) composable multi-resolution distillations — archiving detailed observations instead of deleting them during consolidation, preserving a searchable detail layer beneath the compressed summary.
 - [OpenCode](https://opencode.ai) — the coding agent this plugin extends.
 
 ## License
diff --git a/src/db.ts b/src/db.ts
index d73a486..361517f 100644
--- a/src/db.ts
+++ b/src/db.ts
@@ -2,7 +2,7 @@ import { Database } from "bun:sqlite";
 import { join, dirname } from "path";
 import { mkdirSync } from "fs";
 
-const SCHEMA_VERSION = 4;
+const SCHEMA_VERSION = 5;
 
 const MIGRATIONS: string[] = [
   `
@@ -141,6 +141,18 @@ const MIGRATIONS: string[] = [
     updated_at INTEGER NOT NULL
   );
   `,
+  `
+  -- Version 5: Multi-resolution composable distillations.
+  -- Instead of deleting gen-0 distillations during meta-distillation,
+  -- mark them as archived. Archived entries are excluded from the in-context
+  -- prefix but remain searchable via the recall tool, providing a detailed
+  -- "zoom-in" layer beneath the compressed gen-1 summary.
+  -- Inspired by Cartridges (Eyuboglu et al., 2025) composability: independently
+  -- compressed representations can be concatenated and queried without retraining.
+  -- Reference: https://arxiv.org/abs/2501.17390
+  ALTER TABLE distillations ADD COLUMN archived INTEGER NOT NULL DEFAULT 0;
+  CREATE INDEX IF NOT EXISTS idx_distillation_archived ON distillations(archived);
+  `,
 ];
 
 function dataDir() {
diff --git a/src/distillation.ts b/src/distillation.ts
index 51a565c..453e5bf 100644
--- a/src/distillation.ts
+++ b/src/distillation.ts
@@ -175,22 +175,25 @@ function storeDistillation(input: {
   return id;
 }
 
+// Count non-archived gen-0 distillations — these are the ones awaiting
+// meta-distillation. Archived gen-0 entries have already been consolidated.
 function gen0Count(projectPath: string, sessionID: string): number {
   const pid = ensureProject(projectPath);
   return (
     db()
       .query(
-        "SELECT COUNT(*) as count FROM distillations WHERE project_id = ? AND session_id = ? AND generation = 0",
+        "SELECT COUNT(*) as count FROM distillations WHERE project_id = ? AND session_id = ? AND generation = 0 AND archived = 0",
       )
       .get(pid, sessionID) as { count: number }
   ).count;
 }
 
+// Load non-archived gen-0 distillations for meta-distillation input.
 function loadGen0(projectPath: string, sessionID: string): Distillation[] {
   const pid = ensureProject(projectPath);
   const rows = db()
     .query(
-      "SELECT id, project_id, session_id, observations, source_ids, generation, token_count, created_at FROM distillations WHERE project_id = ? AND session_id = ? AND generation = 0 ORDER BY created_at ASC",
+      "SELECT id, project_id, session_id, observations, source_ids, generation, token_count, created_at FROM distillations WHERE project_id = ? AND session_id = ? AND generation = 0 AND archived = 0 ORDER BY created_at ASC",
     )
     .all(pid, sessionID) as Array<{
     id: string;
@@ -208,11 +211,20 @@ function loadGen0(projectPath: string, sessionID: string): Distillation[] {
   }));
 }
 
-function removeDistillations(ids: string[]) {
+// Archive distillations instead of deleting them. Archived entries are excluded
+// from the in-context prefix (loadDistillations filters them out) but remain
+// searchable via the recall tool (searchDistillations includes them). This
+// preserves a detailed "zoom-in" layer beneath the compressed gen-1 summary.
+// Inspired by Cartridges (Eyuboglu et al., 2025): independently compressed
+// representations remain composable and queryable after consolidation.
+// Reference: https://arxiv.org/abs/2501.17390
+function archiveDistillations(ids: string[]) {
   if (!ids.length) return;
   const placeholders = ids.map(() => "?").join(",");
   db()
-    .query(`DELETE FROM distillations WHERE id IN (${placeholders})`)
+    .query(
+      `UPDATE distillations SET archived = 1 WHERE id IN (${placeholders})`,
+    )
     .run(...ids);
 }
 
@@ -446,8 +458,9 @@ async function metaDistill(input: {
     generation: maxGen + 1,
   });
 
-  // Remove the gen-0 distillations that were merged
-  removeDistillations(existing.map((d) => d.id));
+  // Archive the gen-0 distillations that were merged into gen-1+.
+  // They remain searchable via recall but excluded from the in-context prefix.
+  archiveDistillations(existing.map((d) => d.id));
 
   return result;
 }
diff --git a/src/gradient.ts b/src/gradient.ts
index 5994c67..3a61473 100644
--- a/src/gradient.ts
+++ b/src/gradient.ts
@@ -254,14 +254,17 @@ type Distillation = {
   session_id: string;
 };
 
+// Load non-archived distillations for the in-context prefix.
+// Archived gen-0 entries (preserved after meta-distillation) are excluded here
+// but remain searchable via the recall tool's searchDistillations().
 function loadDistillations(
   projectPath: string,
   sessionID?: string,
 ): Distillation[] {
   const pid = ensureProject(projectPath);
   const query = sessionID
-    ? "SELECT id, observations, generation, token_count, created_at, session_id FROM distillations WHERE project_id = ? AND session_id = ? ORDER BY created_at ASC"
-    : "SELECT id, observations, generation, token_count, created_at, session_id FROM distillations WHERE project_id = ? ORDER BY created_at ASC";
+    ? "SELECT id, observations, generation, token_count, created_at, session_id FROM distillations WHERE project_id = ? AND session_id = ? AND archived = 0 ORDER BY created_at ASC"
+    : "SELECT id, observations, generation, token_count, created_at, session_id FROM distillations WHERE project_id = ? AND archived = 0 ORDER BY created_at ASC";
   const params = sessionID ? [pid, sessionID] : [pid];
   return db()
     .query(query)
@@ -311,6 +314,28 @@ function cleanParts(parts: Part[]): Part[] {
   return filtered.length > 0 ? filtered : parts;
 }
 
+// Build a metadata annotation for a stripped tool output, preserving key signals
+// about what was lost without requiring an LLM call. Inspired by the per-token
+// scalar bias β from "Fast KV Compaction via Attention Matching" (Zweiger et al.,
+// 2025) — when tokens are removed, preserving metadata about the removed content
+// helps the model compensate for information loss and decide whether to recall.
+// Reference: https://arxiv.org/abs/2602.16284
+function toolStripAnnotation(toolName: string, output: string): string {
+  const lines = output.split("\n").length;
+  const chars = output.length;
+
+  // Detect key signals via lightweight heuristics — no LLM call
+  const hasError = /\b(?:error|fail(?:ed|ure)?|exception|panic|traceback)\b/i.test(output);
+  const paths = output.match(/(?:[\w.-]+\/)+[\w.-]+\.\w{1,5}/g);
+  const uniquePaths = paths ? [...new Set(paths)].slice(0, 5) : [];
+
+  let annotation = `[output omitted — ${toolName}: ${lines} lines`;
+  if (hasError) annotation += ", contained errors";
+  if (uniquePaths.length > 0) annotation += `, paths: ${uniquePaths.join(", ")}`;
+  annotation += " — use recall for details]";
+  return annotation;
+}
+
 function stripToolOutputs(parts: Part[]): Part[] {
   return parts.map((part) => {
     if (part.type !== "tool") return part;
@@ -319,7 +344,7 @@ function stripToolOutputs(parts: Part[]): Part[] {
       ...part,
       state: {
         ...part.state,
-        output: "[output omitted — use recall for details]",
+        output: toolStripAnnotation(part.tool, part.state.output),
       },
     } as Part;
   });
diff --git a/src/prompt.ts b/src/prompt.ts
index b5575d9..2e4fc90 100644
--- a/src/prompt.ts
+++ b/src/prompt.ts
@@ -140,13 +140,38 @@ ${input.messages}
 Extract new observations. Output ONLY an <observations> block.`;
 }
 
-export const RECURSIVE_SYSTEM = `You are a memory reflector. You are given a set of observations from multiple conversation segments. Your job is to reorganize, streamline, and compress them into a single refined observation log that will become the agent's entire memory going forward.
+// Meta-distillation prompt using a context-distillation objective: instead of
+// reorganizing observations into another event log (which Eyuboglu et al. 2025
+// showed is a memorization objective that fails to generalize), produce a
+// structured working context optimized for diverse downstream queries.
+// This mirrors the Self-Study approach from "Cartridges" (Eyuboglu et al.,
+// 2025) where diverse seed prompt types ensure the compressed representation
+// supports varied information needs, not just chronological recall.
+// Reference: https://arxiv.org/abs/2501.17390
+export const RECURSIVE_SYSTEM = `You are a memory reflector. You are given a set of observations from multiple conversation segments. Your job is to consolidate them into a structured working context that will become the agent's entire memory going forward.
 
 IMPORTANT: Your reflections ARE the entirety of the assistant's memory. Any information you omit is permanently forgotten. Do not leave out anything important.
 
-REFLECTION RULES:
+STRUCTURE your output into these sections — each section supports a different type of downstream query:
+
+### Current State
+What is in progress right now? Active branches, open files, current task, blockers.
+This section answers: "What was I working on?"
+
+### Key Decisions
+What was decided and why? Include the alternatives considered and rationale.
+This section answers: "Why did we choose approach X?" and "What alternatives were rejected?"
+
+### Technical Changes
+Bugs found, root causes, fixes applied, files modified, tests added/fixed.
+Preserve exact file paths, line numbers, error messages, and commit references.
+This section answers: "What bugs were fixed?" and "What files were changed?"
+
+### Session Timeline
+Condensed chronological events with timestamps. Older events compressed more aggressively; recent events retain detail. This section answers: "When did X happen?" and "What was the sequence of events?"
+
+CONSOLIDATION RULES:
 - Preserve ALL dates and timestamps — temporal context is critical
-- Condense older observations more aggressively; retain more detail for recent ones
 - Combine related items (e.g., "agent called view tool 5 times on file x" → single line)
 - Merge duplicate facts, keeping the most specific version
 - Drop observations superseded by later info (if value changed, keep only final value)
@@ -159,8 +184,6 @@ EXACT NUMBERS: When two segments report different numbers for what seems like th
 
 EARLY-SESSION CONTENT: Bug fixes, code changes, and decisions from the start of a session are just as important as later work. Never drop them just because the segment is short or old. If the first segment contains a specific bug fix with file paths and root cause, it MUST survive into the reflection.
 
-Keep the same format: dated sections with priority-tagged observations.
-
 Output ONLY an <observations> block with the consolidated observations.`;
 
 export function recursiveUser(
diff --git a/src/temporal.ts b/src/temporal.ts
index a882987..da71f24 100644
--- a/src/temporal.ts
+++ b/src/temporal.ts
@@ -317,5 +317,14 @@ export function prune(input: {
     }
   }
 
+  // Pass 3: Prune archived distillations older than the retention window.
+  // Archived gen-0 distillations are kept for recall search but don't need
+  // to live forever — they follow the same retention policy as temporal messages.
+  database
+    .query(
+      "DELETE FROM distillations WHERE project_id = ? AND archived = 1 AND created_at < ?",
+    )
+    .run(pid, cutoff);
+
   return { ttlDeleted, capDeleted };
 }
diff --git a/test/db.test.ts b/test/db.test.ts
index b9bb326..aab0e2c 100644
--- a/test/db.test.ts
+++ b/test/db.test.ts
@@ -21,7 +21,7 @@ describe("db", () => {
     const row = db().query("SELECT version FROM schema_version").get() as {
       version: number;
     };
-    expect(row.version).toBe(4);
+    expect(row.version).toBe(5);
   });
 
   test("ensureProject creates and returns id", () => {