Skip to content
This repository was archived by the owner on Feb 27, 2026. It is now read-only.

Add documents table, session-init hook, and updated memory-extract hook#2

Open
auriwren wants to merge 4 commits intoNOVA-Openclaw:mainfrom
auriwren:auri/document-registry-and-hooks
Open

Add documents table, session-init hook, and updated memory-extract hook#2
auriwren wants to merge 4 commits intoNOVA-Openclaw:mainfrom
auriwren:auri/document-registry-and-hooks

Conversation

@auriwren
Copy link
Copy Markdown

@auriwren auriwren commented Feb 9, 2026

Why

AI agents using nova-memory currently face three gaps that this PR closes:

1. File-based memory is invisible across sessions. If an agent creates memory/research-humidifiers.md during one session, future sessions have no way to know it exists unless a keyword search happens to match. There is no registry, no index, no discoverability layer. The documents table solves this: every workspace file gets registered with a path, title, type, and description. An agent can query "what reference docs do I have about the Forge?" instead of guessing filenames.

2. Session context depends on manual file reads. Today, agents load recent context by reading memory/YYYY-MM-DD.md daily log files at session start. This is fragile (files can be stale, missing, or bloated) and puts context hydration outside the database where the structured memories actually live. The session-init hook replaces this by querying PostgreSQL directly for recent events, decisions, and lessons, then injecting them as a bootstrap file. Context comes from the source of truth, not a copy of it.

3. Memory extraction requires manual discipline. Agents are supposed to call memory-db after meaningful conversations to store facts and events. In practice, this gets forgotten. The memory-extract hook automates it: after each assistant response, a heuristic gate checks whether the conversation contains extractable content (new facts about people, events, decisions, lessons). Only when the heuristics match does it call Claude Haiku for structured extraction. A 5-minute cooldown per session prevents extraction storms. The result: memories flow into PostgreSQL automatically without the agent remembering to do it.

Together, these three changes move nova-memory closer to a fully automated memory pipeline where agents write to and read from PostgreSQL as the single source of truth, with files serving as human-readable reference material rather than the memory system itself.


What Changed

1. Documents Table (schema.sql)

Added a documents table for tracking workspace files and knowledge documents across sessions. This enables document discovery without filesystem scanning.

  • path (unique): Relative file path
  • title: Human-readable name
  • doc_type: Category (config, memory, tool, hook, skill, etc.)
  • description: What the document contains
  • tags: GIN-indexed array for fast tag search

2. Session Init Hook (hooks/session-init/)

New hook that fires on agent:bootstrap to inject recent activity context from PostgreSQL:

  • Queries events (last 48h), decisions (7d), and lessons (7d)
  • Formats as SESSION_CONTEXT.md bootstrap file
  • Falls back silently if PostgreSQL is unavailable
  • Skips isolated/spawn sessions

3. Memory Extract Hook (hooks/memory-extract/)

Rewritten for OpenClaw compatibility with LLM-based extraction:

  • Heuristic gate: Regex patterns detect extractable content before calling LLM
  • Cooldown: 5-minute minimum between extractions per session
  • Extraction: Claude Haiku parses facts, events, decisions, and lessons
  • Storage: Uses memory-db CLI for structured storage
  • Safety: Never blocks message delivery; all errors caught and logged

Testing

All three components are running in production on the Auri workspace. Code review findings (command injection fix, schema query corrections, path portability) have been addressed in follow-up commits.

- schema.sql: Add documents table for workspace file/knowledge registry
  - Tracks path, title, doc_type, description, tags
  - Unique constraint on path, GIN index on tags
- hooks/session-init: Updated to query PostgreSQL for recent events/decisions/lessons
  and inject as SESSION_CONTEXT.md bootstrap file (OpenClaw format)
- hooks/memory-extract: Rewritten for OpenClaw with LLM-based extraction
  - Heuristic gate to avoid unnecessary API calls
  - 5-minute cooldown per session
  - Extracts facts, events, decisions, lessons via Claude Haiku
  - Stores via memory-db CLI
- README.md: Document the documents table and session-init hook
- memory-extract: Replace execSync with execFileSync to prevent command
  injection from LLM-generated content (CVE-grade fix)
- session-init: Switch from node -e/require(pg) to psql for queries,
  reducing bootstrap overhead from ~1s to ~150ms
- Both hooks: Use HOME/WORKSPACE env vars instead of hardcoded paths
- Remove redundant idx_documents_path (UNIQUE constraint already indexes)
- memory-extract: Use tail for efficient transcript reading
Explains the fundamental problems with file-based memory (scaling,
queryability, dual-write, structure enforcement, discoverability)
and why PostgreSQL is the right foundation for AI assistant memory.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant