Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 20 additions & 17 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,43 +25,46 @@

### Gotcha

<!-- lore:019cc484-f0e1-7016-a851-177fb9ad2cc4 -->
* **AGENTS.md must be excluded from markdown linters**: AGENTS.md is auto-managed by lore and uses \`\*\` list markers and long lines that violate typical remark-lint rules (unordered-list-marker-style, maximum-line-length). When a project uses remark with \`--frail\` (warnings become errors), AGENTS.md will fail CI. Fix: add \`AGENTS.md\` to \`.remarkignore\`. This applies to any lore-managed project with markdown linting.

<!-- lore:019c91d6-04af-7334-8374-e8bbf14cb43d -->
* **Calibration used DB message count instead of transformed window count — caused layer 0 false passthrough**: Lore gradient/context management bugs and fixes: (1) Used DB message count instead of transformed window count — delta ≈ 1 after compression → layer 0 passthrough → overflow. Fix: getLastTransformedCount(). (2) actualInput omitted cache.write — cold-cache showed ~3 tokens → layer 0. Fix: include cache.write. (3) Trailing pure-text assistant messages cause Anthropic prefill errors. Drop loop must run at ALL layers including 0 — at layer 0 result.messages === output.messages (same ref), so pop() trims in place. Messages with tool parts must NOT be dropped (hasToolParts) — dropping causes infinite tool-call loops. (4) Lore only protects projects registered in opencode.json — unregistered projects get zero context management → stuck compaction loops creating orphaned message pairs. Recovery: delete all messages after last good assistant message (has tokens, no error).
* **Calibration used DB message count instead of transformed window count — caused layer 0 false passthrough**: Lore gradient/context bugs: (1) Used DB message count instead of transformed window count — delta ≈ 1 → layer 0 passthrough → overflow. Fix: getLastTransformedCount(). (2) actualInput omitted cache.write — cold-cache ~3 tokens → layer 0. Fix: include cache.write. (3) Trailing pure-text assistant messages cause Anthropic prefill errors. Drop loop must run at ALL layers (layer 0 shares ref with output). Never drop messages with tool parts (hasToolParts) — causes infinite loops. (4) Unregistered projects get zero context management → stuck compaction loops. Recovery: delete messages after last good assistant message.

<!-- lore:019cc40e-e56e-71e9-bc5d-545f97df732b -->
* **Consola prompt cancel returns truthy Symbol, not false**: When a user cancels a \`consola\` / \`@clack/prompts\` confirmation prompt (Ctrl+C), the return value is \`Symbol(clack:cancel)\`, not \`false\`. Since Symbols are truthy in JavaScript, checking \`!confirmed\` will be \`false\` and the code falls through as if the user confirmed. Fix: use \`confirmed !== true\` (strict equality) instead of \`!confirmed\` to correctly handle cancel, false, and any other non-true values.

<!-- lore:019cc484-f0e7-7a64-bea1-f3f98e9c56c1 -->
* **Craft v2 GitHub App must be installed per-repo**: The Craft v2 release/publish workflows use \`actions/create-github-app-token@v1\` which requires the GitHub App to be installed on the specific repository. If the app is configured for "Only select repositories", adding a new repo to the Craft pipeline requires manually adding it at GitHub Settings → Installations → \[App] → Configure. The \`APP\_ID\` variable and \`APP\_PRIVATE\_KEY\` secret are set in the \`production\` environment, not at repo level. Symptom: 404 on \`GET /repos/{owner}/{repo}/installation\`.

<!-- lore:019cb171-c0ea-75cf-bf65-b081373f136b -->
* **mt7921e 3dBm tx power on desktop — disable CLC firmware table**: mt7921e/mt7922 PCIe WiFi cards in desktop PCs (no ACPI SAR tables like WRDS/EWRD) get stuck at ~3 dBm tx power because the CLC (Country Location Code) firmware power lookup falls back to a conservative default when no SAR table exists. Fix: set \`options mt7921\_common disable\_clc=1\` in /etc/modprobe.d/mt7921.conf. This lets the regulatory domain ceiling apply (e.g. 23 dBm on 5GHz ch44 in GB). Also set explicit tx power via \`iw dev \<iface> set txpower fixed 2000\` in ExecStartPost since the module param only takes effect on next module load/reboot.
<!-- lore:019cb615-0b10-7bbc-a7db-50111118c200 -->
* **Lore auto-recovery can infinite-loop without re-entrancy guard**: Three bugs in v0.5.2 caused excessive background LLM requests: (1) Auto-recovery infinite loop — session.error overflow handler injected recovery prompt via session.prompt, which could overflow again → another session.error → loop of 2+ LLM calls/cycle. Fix: recoveringSessions Set as re-entrancy guard. (2) Curator ran every idle — \`onIdle || afterTurns\` short-circuited because onIdle=true. Fix: change \`||\` to \`&&\`. Lesson: boolean flag gating numeric threshold needs AND not OR. (3) shouldSkip() fell back to session.list() on every unknown session (short IDs fail session.get). Fix: remove list fallback, cache in activeSessions after first check.

<!-- lore:019cb171-c0fa-74b0-a9a6-847901efa907 -->
* **Pixel phones fail WPA group key rekey during doze — use 86400s interval**: Android Pixel devices in deep doze/sleep fail to respond to WPA group key handshake frames within hostapd's retry window. With wpa\_group\_rekey=3600, the phone gets deauthenticated every hour ('group key handshake failed (RSN) after 4 tries'). Other devices on the same AP complete the rekey fine. Fix: set wpa\_group\_rekey=86400 (24h) instead of 0 (disabled) for security balance. Also apply to Asus router: nvram set wpa\_gtk\_rekey=86400, wl0\_wpa\_gtk\_rekey=86400, wl1\_wpa\_gtk\_rekey=86400.
<!-- lore:019cb3e6-da66-7534-a573-30d2ecadfd53 -->
* **Returning bare promises loses async function from error stack traces**: When an \`async\` function returns another promise without \`await\`, the calling function disappears from error stack traces if the inner promise rejects. A function that drops \`async\` and does \`return someAsyncCall()\` loses its frame entirely. Fix: keep the function \`async\` and use \`return await someAsyncCall()\`. This matters for debugging — the intermediate function name in the stack trace helps locate which code path triggered the failure. ESLint rule \`no-return-await\` is outdated; modern engines optimize \`return await\` in async functions.

<!-- lore:019cb171-c0fe-78a8-a5f8-4ae8e2980a70 -->
* **sudo changes $HOME to /root — hardcode user home in scripts run with sudo**: When running a script with \`sudo\`, \`$HOME\` resolves to \`/root\`, not the invoking user's home. SSH key paths like \`$HOME/.ssh/id\_ed25519\` break. Fix: use \`SUDO\_USER\` env var: \`USER\_HOME=$(eval echo ~${SUDO\_USER:-$USER})\` and reference \`$USER\_HOME/.ssh/id\_ed25519\`. This is a common trap in scripts that need both root privileges (systemctl, writing to /etc) and user-specific resources (SSH keys).
<!-- lore:019cd20d-f42c-71bf-9da5-b2dd52c5014d -->
* **sgdisk reserves 33 sectors for backup GPT, shrinking partition vs original layout**: When recreating a GPT partition entry with \`sgdisk\`, it sets \`LastUsableLBA\` conservatively — 33 sectors short of disk end to reserve space for the backup GPT table. If the original partition extended to the very last sector (common for factory-formatted exFAT SD cards), the recreated partition will be 33 sectors too small. Windows strictly validates that the exFAT VolumeLength in the VBR matches the GPT partition size and refuses to mount on mismatch ("drive not formatted" error). Fix: patch the exFAT VBR's VolumeLength to match the GPT partition size (PartitionLastLBA - PartitionFirstLBA + 1), then recalculate the exFAT boot region checksum (sector 11). Do NOT extend LastUsableLBA to the disk's last sector — that's where the backup GPT header lives, and Windows will reject the GPT as corrupt if usable range overlaps it.

<!-- lore:019c8f4f-67ca-7212-a8c4-8a75b230ceea -->
* **Test DB isolation via LORE\_DB\_PATH and Bun test preload**: Lore test suite uses isolated temp DB via test/setup.ts preload (bunfig.toml). Preload sets LORE\_DB\_PATH to mkdtempSync path before any imports of src/db.ts; afterAll cleans up. src/db.ts checks LORE\_DB\_PATH first. agents-file.test.ts needs beforeEach cleanup for intra-file isolation and TEST\_UUIDS cleanup in afterAll (shared with ltm.test.ts). Individual test files don't need close() calls — preload handles DB lifecycle.

<!-- lore:019cb171-c0f5-741f-96cc-e0862c846202 -->
* **Ubuntu packaged hostapd lacks 802.11r (CONFIG\_IEEE80211R not compiled)**: Ubuntu 24.04 hostapd (2:2.10-21ubuntu0.x) lacks CONFIG\_IEEE80211R. Using \`ieee80211r=1\`, \`mobility\_domain\`, \`FT-PSK\` etc. causes 'unknown configuration item' and fails to start. 802.11k/v directives ARE compiled in. Verify: \`strings /usr/sbin/hostapd | grep ieee80211r\` — absence confirms no FT support. Build from source with CONFIG\_IEEE80211R=y. Note: hostapd has NO config dry-run flag — \`-t\` just adds timestamps to debug output and fully starts the AP. Use grep-based validation for known-bad directives instead.

<!-- lore:019cb286-7c85-7039-aecf-25781892c9da -->
* **Zod v4 .default({}) no longer applies inner field defaults**: Zod v4 changed \`.default()\` to short-circuit: when input is \`undefined\`, it returns the default value directly without parsing it through inner schema defaults. So \`.object({ enabled: z.boolean().default(true) }).default({})\` returns \`{}\` (no \`enabled\` key), not \`{ enabled: true }\`. Fix: provide fully-populated default objects — \`.default({ enabled: true })\`. This affected all nested config sections in src/config.ts during the v3→v4 upgrade. The import \`import { z } from "zod"\` is unchanged — Zod 4's main entry point is the v4 API.
<!-- lore:019cc303-e397-75b9-9762-6f6ad108f50a -->
* **Zod z.coerce.number() converts null to 0 silently**: Zod gotchas in this codebase: (1) \`z.coerce.number()\` passes input through \`Number()\`, so \`null\` silently becomes \`0\`. Be aware if \`null\` vs \`0\` distinction matters. (2) Zod v4 \`.default({})\` short-circuits — it returns the default value without parsing through inner schema defaults. So \`.object({ enabled: z.boolean().default(true) }).default({})\` returns \`{}\`, not \`{ enabled: true }\`. Fix: provide fully-populated default objects. This affected nested config sections in src/config.ts during the v3→v4 upgrade.

### Pattern

<!-- lore:019cb050-ef48-7cbe-8e58-802f17c34591 -->
* **Lore logging: LORE\_DEBUG gating for info/warn, always-on for errors**: src/log.ts provides three levels: log.info() and log.warn() are suppressed unless LORE\_DEBUG=1 or LORE\_DEBUG=true; log.error() always emits. All write to stderr with \[lore] prefix. This exists because OpenCode TUI renders all stderr as red error text — routine status messages (distillation counts, pruning stats, consolidation) were alarming users. Rule: use log.info() for successful operations and status, log.warn() for non-actionable oddities (e.g. dropping trailing messages), log.error() only in catch blocks for real failures. Never use console.error directly in plugin source files.

<!-- lore:019cb12a-c957-7e24-b3f5-6869f3429d13 -->
* **Lore release process: craft + issue-label publish**: Release flow: (1) Create release/X.Y.Z branch, bump version in package.json, push and merge PR. (2) Trigger release.yml via workflow\_dispatch — uses getsentry/craft to create a GitHub issue titled 'publish: BYK/opencode-lore@X.Y.Z'. (3) Label that issue 'accepted' — triggers publish.yml (on issues:labeled) which runs craft publish with npm OIDC trusted publishing, then closes the issue. Auto-merge on release PRs requires squash merge (merge commits disallowed on this repo). The repo uses a GitHub App token (APP\_ID + APP\_PRIVATE\_KEY) for checkout in both workflows.
* **Lore release process: craft + issue-label publish**: Release flow: (1) Trigger release.yml via workflow\_dispatch with version='auto' — uses getsentry/craft to determine version from commits and create a GitHub issue titled 'publish: BYK/opencode-lore@X.Y.Z'. (2) Label that issue 'accepted' — triggers publish.yml which runs craft publish with npm OIDC trusted publishing, then closes the issue. Do NOT create a release/X.Y.Z branch or bump package.json manually — craft handles versioning with 'auto'. The repo uses a GitHub App token (APP\_ID + APP\_PRIVATE\_KEY) for checkout in both workflows.

<!-- lore:019cb200-0001-7000-8000-000000000001 -->
* **PR workflow for opencode-lore: branch → PR → auto-merge**: All changes (including minor fixes and test-only changes) must go through a branch + PR + auto-merge, never pushed directly to main. Workflow: (1) git checkout -b \<type>/\<slug>, (2) commit, (3) git push -u origin HEAD, (4) gh pr create --title "..." --body "..." --base main, (5) gh pr merge --auto --squash \<PR#>. Branch name conventions follow merged PR history: fix/\<slug>, feat/\<slug>, chore/\<slug>. Auto-merge with squash is required (merge commits disallowed). Never push directly to main even for trivial changes.

### Preference

<!-- lore:019ca190-0001-7000-8000-000000000001 -->
* **Always dry-run before bulk DB deletes**: Never execute bulk DELETE/destructive operations without first running the equivalent SELECT to verify row count and inspect affected rows. A hardcoded timestamp off by one year caused deletion of all 1638 messages + 5927 parts instead of 5 debris rows. Pattern: (1) SELECT with same WHERE, (2) verify count, (3) then DELETE. Applies to any destructive op — DB mutations, git reset, file deletion.

<!-- lore:019ca19d-fc02-7657-b2e9-7764658c01a5 -->
* **Code style**: User prefers no backwards-compat shims — fix callers directly. Prefer explicit error handling over silent failures. Derive thresholds from existing constants rather than hardcoding magic numbers (e.g., use \`raw.length <= COL\_COUNT\` instead of \`n < 10\_000\`). In CI, define shared env vars at workflow level, not per-job.
* **Code style**: User prefers no backwards-compat shims — fix callers directly. Prefer explicit error handling over silent failures. Derive thresholds from existing constants rather than hardcoding magic numbers (e.g., use \`raw.length <= COL\_COUNT\` instead of \`n < 10\_000\`). In CI, define shared env vars at workflow level, not per-job. Always dry-run before bulk destructive operations (SELECT before DELETE to verify row count).
<!-- End lore-managed section -->
Loading