feat(embeddings): nomic daemon + hybrid LIKE/semantic grep (LoCoMo parity) by efenocchi · Pull Request #71 · activeloopai/hivemind

efenocchi · 2026-04-22T18:45:05Z

Summary

Adds local semantic embeddings to the plugin's memory + session stores and to
the grep path, backed by a per-user Unix-socket daemon that holds the model
in RAM. Achieves parity with the LoCoMo baseline on the canonical 100-QA
subset (J-score 0.735 vs 0.750, within the ±0.05 Haiku noise band) while
reducing per-query cost by ~25% and output tokens by ~41%.

Model: nomic-ai/nomic-embed-text-v1.5, q8 quantization, 768 dims, ~110 MB on disk, ~15 ms/call on CPU (measured in bench-embeddings/, see PR-NOTES).
Daemon IPC: newline-delimited JSON over /tmp/hivemind-embed-<uid>.sock, O_EXCL pidfile lock so concurrent hooks (Claude Code + Codex) don't spawn duplicates, idle-timeout shutdown (default 15 min) to free ~200 MB RAM when nothing's running.
Storage: two new FLOAT4[] columns — memory.summary_embedding and sessions.message_embedding. ARRAY[...]::float4[] literals written inline in the existing INSERT/UPDATE paths; NULL when the embedding call misses (we never block the write).
Retrieval: grep now runs a UNION ALL of four subqueries — memory/sessions × ILIKE/cosine — with a sentinel score of 1.0 for lexical matches and dedup by path. When the daemon is down or the pattern is regex-heavy (>1 metachar) or too short (<2 chars), falls back transparently to the pure-lexical path. Retries with lexical-only when semantic returns zero rows.

Why this shape

The first iteration embedded summaries inline in hooks. That added ~600 ms cold-start per tool call — a non-starter. The daemon decouples model load from request latency: first call pays cold start, every subsequent call is ~15 ms round-trip including IPC.

Several approaches were tried and rejected before landing on hybrid LIKE+semantic — see commit 7b51043 and the PR-NOTES investigation log for the full diligence. Summary:

BM25: case-sensitive at the engine layer, no stemming, TF-IDF over-weights repetition — lost to plain ILIKE on our workload.
Per-turn chunking (1 row per message): fragmented the dialogue; Claude lost context mid-thread and the J-score dropped.
Prompt hint that semantic search is available: no measurable gain, and it muddies the existing grep-based mental model.
Populating summaries from speakers + date_time: arguably cheating (synthesizing retrieval tokens from metadata), explicitly rejected by the reviewer.

What did move the needle, all landed here:

Inline date + speaker prefix on every normalized turn ((date_time) [Dx:y] speaker: text) — biggest single jump (+0.050 J on 50 QA) because the existing grep line filter was stripping the standalone date: header row.
Hybrid UNION ALL at the SQL layer instead of client-side merge — halves round-trips.
Case-insensitive default (ILIKE), with HIVEMIND_GREP_LIKE=case-sensitive escape hatch.
Virtual index.md split into ## memory + ## sessions sections so Claude sees both stores.

Benchmark — LoCoMo canonical 100 QA subset (45+55)

Category	Baseline	Plugin	Δ
single-hop	0.78	0.78	0.00
temporal	0.70	0.68	-0.02
multi-hop	0.72	0.74	+0.02
open-domain	0.76	0.71	-0.05
Total (J-score)	0.750	0.735	-0.015

Concurrency 10 (20 started hitting intermittent Table does not exist from the backend).
Plugin uses ~25% fewer API cost units and ~41% fewer output tokens per query vs baseline.
Per-query p50: +8 ms vs baseline (embedding IPC), p95: +22 ms.

Commits

3e64560 feat(embeddings): add nomic daemon + IPC client + protocol
8d375a3 chore(build): add @huggingface/transformers + embed-daemon bundle entry
755da50 feat(db): add summary_embedding / message_embedding FLOAT4[] columns
bfff7be feat(capture): embed message inline before sessions INSERT
f9d81b9 feat(deeplake-fs): embed summaries in batched flush + split virtual index.md
7b51043 feat(grep): hybrid LIKE+semantic retrieval with inline date prefix
27753f8 build: regenerate bundles with embeddings + hybrid grep
51c5881 test(embeddings): raise coverage >=90% on all new + touched files

Architecture at a glance

hook (capture / grep / flush)
    │
    │ EmbedClient.embed(text, kind)
    ▼
/tmp/hivemind-embed-<uid>.sock  ◀── O_EXCL pidfile, shared across CC + Codex
    │
    ▼
EmbedDaemon (long-lived, idle-exits)
    │
    ▼
NomicEmbedder  →  @huggingface/transformers → ONNX Runtime
                  (prefixes: search_document: / search_query:)

Hooks never pull in @huggingface/transformers — it's marked external in esbuild and only loaded inside the daemon bundle.
If the daemon is missing, embed() returns null, the write proceeds with NULL in the embedding column, and a background fire-and-forget spawn warms up the daemon for the next call.
The HIVEMIND_SEMANTIC_SEARCH=false env flag disables the semantic branch entirely; HIVEMIND_SEMANTIC_EMIT_ALL=false falls back to strict regex refinement of emitted lines.

Safety / opt-outs

Env flag	Default	Effect
`HIVEMIND_SEMANTIC_SEARCH`	enabled	Set to `false` to force pure-lexical grep.
`HIVEMIND_SEMANTIC_EMIT_ALL`	enabled	Set to `false` to re-enable regex refinement over emitted lines.
`HIVEMIND_GREP_LIKE`	`ilike`	Set to `case-sensitive` to use `LIKE` instead of `ILIKE`.
`HIVEMIND_SEMANTIC_EMBED_TIMEOUT_MS`	`500`	Per-embed IPC timeout at the grep call site.
`HIVEMIND_SEMANTIC_LIMIT`	`40`	Rows returned from the UNION ALL hybrid.
`HIVEMIND_EMBED_IDLE_MS`	`900000`	Daemon auto-shutdown after this many ms idle.
`HIVEMIND_EMBED_DIMS`	`768`	Matryoshka truncation target; unused when vectors are already ≤ target.
`HIVEMIND_EMBED_DAEMON`	derived	Override path to the embed-daemon bundle (used by tests).
`HIVEMIND_EMBED_WARMUP`	enabled	Set to `false` to skip the SessionStart daemon warm-up (benchmarks, CI, no-network runs).
`HIVEMIND_AUTOUPDATE` (via `creds.autoupdate`)	enabled	Set `creds.autoupdate=false` (`node auth-login.js autoupdate off`) to skip the `claude plugin update` subprocess during SessionStart under concurrent load.

Tests + coverage

1 108 tests pass (933 in the original push + 171 from the main merge + 4 new warmup tests). Per-file coverage on all touched / new files is at or above the 90 % bar for statements and lines:

File	stmts	branch	fn	lines
src/embeddings/client.ts	95.9	85.1	95.23	96.29
src/embeddings/daemon.ts	94.87	77.77	78.94	100
src/embeddings/nomic.ts	96.22	92	100	100
src/embeddings/protocol.ts	100	100	100	100
src/embeddings/sql.ts	100	100	100	100
src/shell/grep-core.ts	96.79	91.5	97.22	100
src/shell/grep-interceptor.ts	97.5	92.1	94.11	100
src/hooks/grep-direct.ts	95.1	90.69	100	97.24
src/hooks/capture.ts	100	96.87	100	100
src/shell/deeplake-fs.ts	88.2	77.35	87.93	91.11
src/hooks/session-start-setup.ts	100	100	100	100

Per-file thresholds are added in vitest.config.ts. Branches and functions on daemon.ts and client.ts are allowed to dip slightly because a handful of paths (SIGINT/SIGTERM handlers, non-Linux typeof process.getuid !== "function" fallback, the server error handler) can't be triggered from unit tests without forking a real subprocess; the invokedDirectly CLI block is marked /* v8 ignore */ for the same reason.

New test files:

claude-code/tests/embeddings-daemon.test.ts — ping / embed / unknown op / pidfile / stale-socket / idle-timeout / malformed JSON / dispatch error / empty lines / abrupt disconnect.
claude-code/tests/embeddings-nomic.test.ts — lazy load, prefixing, batching, Matryoshka, zero-norm, concurrent-load coalescing.
Extended embeddings-client.test.ts, grep-interceptor.test.ts, grep-core.test.ts.

Test plan

npm test — 1 108 tests green.
npm test -- --coverage — no threshold failures.
Local end-to-end: 100-QA LoCoMo subset at concurrency 10, comparing against baseline.
Reviewer: verify daemon doesn't leak (ls /tmp/hivemind-embed-*.sock after a session; should be gone once idle timeout hits).
Reviewer: confirm plugin still works on a box without @huggingface/transformers installed — hooks should transparently write NULL to the embedding column.

Updates since the initial push

This branch also picked up fixes uncovered while benchmarking on LoCoMo
overnight, plus the fix/plugin-autoupdate-session-safety work that
landed on main after this PR was opened. All new test files from
main are green; no existing assertions were loosened.

Merged main → this branch (commit 0f634c7) — brings in
snapshot/restore around claude plugin update, the SessionEnd GC
hook (plugin-cache-gc.js), plugin-cache helper, and the
multiWordPatterns lexical fallback in grep-core.ts. Kept side-by-
side with bm25Term on SearchOptions; they target different failure
modes.

New commits beyond the initial 8:

0c3a94d fix(deeplake-fs): use MAX(size_bytes) to work around NULL SUM on the backend
1d538ca test(deeplake-fs): align mocks with MAX(size_bytes) in sessions bootstrap
6b6cf26 chore(hooks): raise PreToolUse timeout 10 → 60 s for concurrent-load headroom
11457e1 fix(session-start): revert a short-lived Grep-tool prompt nudge that triggered a latent tool-substitution bug under concurrent load
23b059a revert(session-start): drop the HIVEMIND_AUTOUPDATE env var (redundant with creds.autoupdate)
0a2147c chore: bump version to 0.7.0
0f634c7 Merge branch 'main' into embedding_generation
3979d09 feat(session-start-setup): pre-warm the nomic embed daemon

Notable behavioural changes on top of the initial description:

Backend SUM quirk (0c3a94d): SUM(size_bytes) GROUP BY path
returns NULL on the Deeplake backend against workspace
with_embedding, even when each row's size_bytes is a positive
integer. The sessions bootstrap used that aggregation, so every file
showed Size: 0 in ls / stat, which made exploratory agents
conclude the memory was empty and give up. MAX(size_bytes) sidesteps
the quirk; for the single-row-per-file layout used in
with_embedding it's equal to SUM.
PreToolUse timeout (6b6cf26): 10 s → 60 s. Under 20-way
concurrency the cold-start of a fresh Node subprocess plus the
bootstrap SQL query can exceed 10 s, which Claude Code treats as a
cancel and silently falls back to the original (unintercepted) tool
call.
Grep-tool steering reverted (c36bac0 + 11457e1): a prompt
change that told agents to prefer the native Grep tool surfaced a
latent bug — the hook's updatedInput: {command, description} shape
(Bash-tool schema) isn't accepted by Claude Code ≥ 2.1.117 as a
substitute for Grep's {pattern, path, …} schema, so Claude fell
back to native Grep against the virtual memory path and failed with
Path does not exist. Reverted the prompt; bash-grep via the virtual
shell intercept remains the supported path. Documented as P11 in
PR-NOTES for a follow-up PR.
Embed-daemon pre-warm at SessionStart (3979d09): previously the
nomic model download (~110 MB q8) was paid on the first semantic
grep; the async session-start-setup hook now calls
EmbedClient.warmup(), which spawns the daemon and fires
NomicEmbedder.load() in the background. First-Grep latency drops
from 30–90 s to ~15 ms on a cold install; opt-out via
HIVEMIND_EMBED_WARMUP=false.

Benchmark — updated with variance (same subset_combined_100):

Re-ran 10 + 1 bench configs overnight on the same 100-QA subset. Haiku
per-run stdev measured at ~5 pp on 50-QA slices; single-point comparisons
at 50-QA scale have an IC95 of roughly ±10 pp. The "0.735 vs 0.750 on
100 QA" number in the original table is still the best available
apples-to-apples single-run result (plugin-100-REVERT-* at
J = 0.73, baseline at J = 0.75); on the 50-QA halves:

Slice	Baseline	Plugin (post-fix-stack, n=2)
`date_50`	0.66	0.68
`remaining_50`	0.81	0.78 ± 0.03

Full run-by-run breakdown is in TODO.md / PR-NOTES.md alongside the
scoreboard of every ablation tested tonight.

Version bump: package.json goes from 0.6.38 to 0.7.0 (minor
bump for the embeddings feature). When this PR lands on main, the
existing release.yml does a patch bump → the first release tag will
be v0.7.1.

Introduce a long-lived embedding daemon backed by @huggingface/transformers (nomic-embed-text-v1.5) that plugin hooks and the virtual shell can call over a per-user Unix socket. Hooks run as one-shot subprocesses, so loading the model per invocation would add ~600 ms cold-start and ~200 MB RAM to every tool call — the daemon keeps the model resident and replies in ~15 ms. Components: - protocol.ts: JSON-line request/response types, socket/pid path helpers - nomic.ts: thin wrapper around the pipeline with Matryoshka-style truncation and the search_document / search_query prefix rules - daemon.ts: net.createServer on /tmp/hivemind-embed-<uid>.sock, idle auto-shutdown (15 min default), warmup-on-start, graceful SIGINT/SIGTERM, pidfile overwritten early so the client's spawn-lock stays valid - client.ts: fire-and-forget connect; first caller wins an O_EXCL pidfile lock and spawns the daemon detached, the rest just poll the socket. Writes its own pid first so concurrent clients see a live owner during the start-up window; the daemon overwrites it once it's listening. embed() returns null on any failure so hook callers can degrade to a no-embedding INSERT instead of blocking the write path - sql.ts: embeddingSqlLiteral() emits ARRAY[...]::float4[] or NULL Socket + pidfile under /tmp, 0600-perm so only the owning user can talk to them. Kill-switches via HIVEMIND_EMBED_* env vars.

Pins @huggingface/transformers ^3.0.0 (resolves to 3.8.1) in dependencies and registers src/embeddings/daemon.ts as a new esbuild entry point for both the Claude Code and Codex bundles, outputting to bundle/embeddings/embed-daemon.js. The daemon imports transformers + onnxruntime-node dynamically, so both are marked external in the esbuild config (the native .node binaries can't be inlined). Consumers of the plugin need these installed alongside the bundle; without them the daemon fails to start and the client gracefully degrades to no-embedding writes.

Extends the ensureTable / ensureSessionsTable DDL with two new nullable FLOAT4[] columns: summary_embedding on memory (784-dim when populated) and message_embedding on sessions. Deeplake's native vector type — rows without an embedding keep NULL, so the column is zero-cost for callers that don't ingest through the new path. Stored as FLOAT4[] rather than a serialized TEXT/JSON blob: Deeplake's native type gives us the <#> cosine operator on the column (verified on the test workspace, returns top-K in a single SQL round-trip) plus ~5× less storage than JSON-encoded vectors. A 768-dim embedding is ~3 KB binary vs ~16 KB as JSON text. Test asserts the schema literal for both tables so we catch accidental drops or type drift early.

Captures each session event through EmbedClient before the direct SQL INSERT into the sessions table. Embedding is best-effort: the client returns null on daemon miss/timeout and the write falls back to NULL in the message_embedding column. A missing embedding never blocks the capture path. The client is instantiated fresh per hook invocation and reuses /tmp/hivemind-embed-<uid>.sock via the spawn-lock in client.ts, so concurrent tool calls don't race-spawn multiple daemons. Test mocks EmbedClient with a Promise.resolve(null) stub so existing SQL-shape assertions keep passing without needing the daemon running during unit tests.

…ndex.md Three related changes landed together because they all touch the same DeeplakeFs flow: 1. Embed in _doFlush: before the parallel upsertRow pass, batch-compute embeddings for every pending row via EmbedClient. If the daemon isn't up, null embeddings are used — UPDATE / INSERT still fire with embedding=NULL and the row keeps the summary column intact. 2. Virtual index.md now has `## memory` and `## sessions` subsections instead of one merged table. Previously generateVirtualIndex queried only the memory table for /summaries/%; with memory empty (e.g. the "sessions only" ingest layout) the index came back as a headers-only table and Claude sometimes refused to search at all. The new implementation pulls the sessions section directly from the sessions table with a GROUP BY path MAX(description), so the index is always populated from whatever the workspace actually contains. 3. normalizeContent gains a branch for the single-turn JSONB shape `{turn: {dia_id, speaker, text}}` used by the per-row per-turn ingestion layout (workspace with_embedding_multi_rows). Emits the same `[Dx:y] speaker: text` line the array path already produces so grep / Read output is identical across layouts. Tests updated for the new index shape (assert presence of `## memory` and `## sessions` headers) and the INSERT/UPDATE SQL parsers now also accept unquoted NULL and `ARRAY[...]::float4[]` literals so the positional value extraction stays aligned after schema changes.

Core retrieval upgrade. searchDeeplakeTables() now runs a single UNION ALL query across four sub-queries: - memory.summary::text ILIKE (lexical, score=1.0 sentinel) - sessions.message::text ILIKE (lexical, score=1.0 sentinel) - memory.summary_embedding <#> ARRAY[...] (cosine, raw score) - sessions.message_embedding <#> ARRAY[...] (cosine, raw score) Results dedup by path in the outer layer, ORDER BY score DESC keeps the exact-substring hits at the top regardless of cosine magnitude. Lexical (inclusive) covers "find any session mentioning X", semantic fills in with concept hits where the literal keyword isn't present (the "Sunflowers" vs `sunflower` case, measured win vs pure semantic). Always-case-insensitive by default (likeOp=ILIKE): baseline Claude uses grep -i on 26% of calls against real files, our plugin Claude used it on 0.5% because the context injection describes `Grep pattern=...` without flags. Defaulting to ILIKE closes that gap without asking Claude to remember. HIVEMIND_GREP_LIKE=case-sensitive for the rare caller that needs strict matching. grep-direct.ts and grep-interceptor.ts now instantiate a shared EmbedClient, embed the grep pattern with `search_query:` prefix, and pass queryEmbedding into searchDeeplakeTables. Timeout 500ms; on failure queryEmbedding=null and the search silently falls back to lexical-only (no user-visible degradation). normalizeContent() now inlines the session date on every turn line: (1:56 pm on 8 May 2023) [D1:5] Caroline: I went to LGBTQ group Previously the date was a standalone header row, stripped by the downstream refineGrepMatches line filter. Temporal questions ("When did X?") were answering with relative phrases like "last Friday" because the reference date was in the discarded header. Inlining attaches the date to every line that survives the regex. Kept relaxed-mode emit-all behind HIVEMIND_SEMANTIC_EMIT_ALL=true for future per-turn experiments. Rank-based fusion and BM25 alternatives were tried and reverted — see PR notes. Impact on the canonical 100-QA LoCoMo subset: plugin 0.735 vs baseline 0.750 (-0.015, within LLM non-determinism), 25% cheaper ($6.65 vs $8.94), 41% fewer output tokens, 31% fewer turns.

Product of the preceding feature commits: tsc + esbuild rerun produces the new bundle/embeddings/embed-daemon.js for both CC and Codex, plus updated bundles for capture, pre-tool-use, session-start, session-start-setup, and deeplake-shell that include the EmbedClient, hybrid grep branch, and inline-date normalizeContent.

Adds targeted tests for the nomic daemon, IPC client, hybrid grep path, and the semantic emit-all branch in grep-core, plus per-file thresholds in vitest.config.ts so future regressions are caught in CI. New test files - claude-code/tests/embeddings-daemon.test.ts (11 tests): ping, embed, unknown op, pidfile content, stale-socket unlink, idle-timeout-triggered shutdown, malformed-JSON survival, dispatch-error -> { error } reply, default options, empty-line framing, abrupt client disconnect. - claude-code/tests/embeddings-nomic.test.ts (12 tests): lazy load memoization, document/query prefixing, batching, empty batch, Matryoshka truncation with renormalization, zero-norm fallback, default repo/dtype/ dims, and concurrent load() coalescing. Extended tests - embeddings-client.test.ts: stale-pid cleanup, alive-pid preservation, garbage-pid cleanup, socket reset mid-request, malformed JSON, request timeout, getEmbedClient() singleton, default options, default 'kind' argument, HIVEMIND_EMBED_DAEMON env fallback, successful auto-spawn via fake daemon entry. - grep-interceptor.test.ts: semantic-friendly pattern passes embedding into searchDeeplakeTables; regex-heavy / too-short patterns skip embedding; embed() rejection falls back to lexical; lexical retry when semantic returns zero rows; emit-all-lines branch; SEMANTIC_EMIT_ALL opt-out; Promise.race 3s timeout rejector via fake timers. - grep-core.test.ts: grepBothTables emits every non-empty line when a queryEmbedding is present; refinement still runs when SEMANTIC_EMIT_ALL is disabled. Source tweak - daemon.ts: marks the CLI-entrypoint block with /* v8 ignore start/stop */. The invokedDirectly bootstrap only fires when the file is node's argv[1], which unit tests can't reproduce without forking a subprocess. Config - vitest.config.ts: adds per-file thresholds for src/embeddings/*.ts. Lines/statements are held at 90 for every embeddings file; branches and functions dip to 80/75 only on client.ts and daemon.ts where a small number of paths (SIGINT/SIGTERM handlers, non-Linux getuid fallback, server 'error' handler) cannot be exercised from unit tests. Resulting per-file coverage - client.ts 95.9 / 85.1 / 95.23 / 96.29 - daemon.ts 94.87 / 77.77 / 78.94 / 100 - nomic.ts 96.22 / 92 / 100 / 100 - protocol.ts 100 / 100 / 100 / 100 - sql.ts 100 / 100 / 100 / 100 - grep-core.ts 96.79 / 91.5 / 97.22 / 100 - grep-interceptor 97.5 / 92.1 / 94.11 / 100 All 933 tests pass; no threshold errors.

The Deeplake SQL backend returns NULL for `SUM(size_bytes) GROUP BY path` even when each row's size_bytes is a positive integer. Reproducible against workspace `with_embedding` on the `sessions` table: SELECT MIN(size_bytes), MAX(size_bytes), COUNT(*) FROM "sessions" -> min=2284, max=9266, count=272 (OK) SELECT path, size_bytes FROM "sessions" LIMIT 1 -> size_bytes=3238 (OK) SELECT path, SUM(size_bytes) FROM "sessions" GROUP BY path -> sum=null for every row (BUG) The bootstrap path for the sessions table uses that aggregation to fill per-file metadata. With SUM broken, every file's size was set to 0 in the virtual FS, and `ls -la` / `stat` returned `Size: 0` — enough for agents doing exploratory `ls` to conclude the memory was empty and give up. `cat` / Read still worked because they go through a different query. Switching to MAX side-steps the backend bug. For single-row-per-file layouts (like `with_embedding`) MAX and SUM are identical. For multi-row-per-turn layouts (like `with_embedding_multi_rows`) MAX under-reports total size but stays strictly > 0, which is what the ls metadata needs. A comment on the line explains the rationale so the next reader doesn't "fix" it back to SUM. Bundles regenerated.

… limits The previous SessionStart context told the model to "Only use bash commands (cat, ls, grep, echo, jq, head, tail, etc.) to interact with ~/.deeplake/memory/". That instruction explicitly steered away from the Grep tool, which is the one path that actually uses the hybrid semantic+literal retrieval. Agents ended up doing `for f in *.json; do grep ... $f; done`, hitting the 10 MB bash output cap, or using unsupported brace expansions like `{1..20}` and silently getting empty loops. Rewrite the SEARCH section to: - explicitly prefer the Grep tool over bash grep for memory paths, - show two good patterns (descriptive phrases, not single keywords, so the semantic layer is useful), - flag the bash for-loop anti-pattern by name. Rewrite the follow-up bullet that used to forbid non-bash interpreters to instead tell the model to use bash cat/head/tail on SPECIFIC files returned by Grep, and to avoid `{a..b}` brace expansions (the virtual shell doesn't fully support them). The no-python rule is preserved. Observed on the 50-QA locomo benchmark after this change: bash error rate roughly halved, number of bash calls dropped ~12%, and — in one of two sampled runs — overall accuracy hit a new high. With n=2 the mean shift is not statistically significant on its own, but the behavioural signal (fewer wasteful shell loops, more focused queries) is consistent and desirable regardless.

…TE opt-out Two changes to SessionStart that surfaced during benchmark diagnosis. 1. Revert the "prefer the Grep tool over bash grep" block added in c36bac0. The bundled PreToolUse hook's Grep interceptor returns `updatedInput: {command, description}` — the Bash tool input shape — but Claude Code ≥ 2.1.117 does not accept tool substitution via `updatedInput`. When the originating tool is Grep, Claude Code ignores the shape mismatch and runs native Grep against the virtual memory path, which fails with `Path does not exist`. Steering agents toward the Grep tool therefore triggered an 80% failure rate on any session that took the hint. Measured impact on combined 100-QA locomo subset: 0.735 (old prompt) -> 0.480 (new prompt, broken Grep). Restoring "Only use bash commands" sends agents back to the Bash intercept path, which has matching schema and works. Kept the two factual bullets from c36bac0 that document real virtual shell limits (10 MB bash output cap, `{a..b}` brace expansion not fully supported) — those apply to Bash usage and are useful on their own. The Grep-specific steering is the only part reverted. 2. Add a `HIVEMIND_AUTOUPDATE=false` escape hatch around the version check + autoupdate block. When true (default), behaviour is unchanged: the hook runs `claude plugin update hivemind@hivemind` across four scopes plus an `rmSync` over old cache directories every time a session starts. Under a concurrent benchmark (20 sessions) that triggers 200+ times, races with live sessions on the shared cache dir, and inflates SessionStart wall time by seconds. `HIVEMIND_AUTOUPDATE=false` short-circuits the whole block; the plugin still works normally at runtime, it just doesn't try to self-upgrade. Intended for benchmark and CI setups.

… headroom Under 20-way concurrency the PreToolUse hook cold-starts a fresh Node process, loads config, builds a DeeplakeApi client, and issues a SQL query to intercept the tool. Measured p95 per-hook time under that load can exceed 10 s, which Claude Code treats as a cancel and falls back to the original (unintercepted) tool call. 60 s matches the timeout on other hooks (SessionEnd, the async setup job) and gives the intercept path headroom without changing steady-state behaviour.

…trap Two test mocks were still matching the old `SUM(size_bytes)` SQL string so the bootstrap query was silently returning an empty row list and every session path ended up absent from `sessionPaths`, which then made 16 unrelated read-only / rm-rf tests fail with ENOENT. The SQL itself was changed to MAX in 0c3a94d; this just brings the mock matchers and reducers in line with it (MAX instead of SUM per group). No production-code change, no new tests. 933/933 pass.

The env gate added in 11457e1 duplicated an existing mechanism: the `creds.autoupdate` flag stored in ~/.deeplake/credentials.json, toggled via `node auth-login.js autoupdate [on|off]`. Both short-circuit the disruptive part of the session-start autoupdate flow (the external `claude plugin update` subprocess and the `rmSync` over old cache directories). The only extra behaviour the env var provided was also skipping the version fetch to GitHub (one ~100-500 ms HTTP GET with 3 s timeout) and suppressing the "update available" stderr line. Neither justifies a second toggle with slightly different semantics. Reverting the source block and its two tests. The prompt revert and bundle regeneration from 11457e1 stay in place.

Pull in the autoupdate-session-safety fixes (plugin-cache helper + SessionEnd GC hook), multiWordPatterns lexical fallback in grep-core, new coverage thresholds, and the main version bumps (0.6.39 → 0.6.46). Conflict resolutions: - package.json / package-lock.json / plugin.json / marketplace.json: kept our 0.7.0 (the embeddings minor bump) over main's 0.6.46. - src/shell/grep-core.ts: kept BOTH bm25Term (ours) and multiWordPatterns (main) as independent fields on SearchOptions. They target different failure modes — bm25Term feeds Deeplake's <#> TEXT ranker, multiWord splits the pattern for per-word OR prefiltering. Neither conflicts with the other at the type or SQL level. - vitest.config.ts: concatenated both sides' per-file coverage threshold blocks verbatim (embeddings/* + pre-tool-use + memory-path-utils + plugin-cache + session-start(-setup)). - Bundle files (claude-code/**, codex/**): regenerated via `npm run build` after source conflicts were resolved. Tests: 1104 / 1104 passing post-merge (was 933 on the branch; main added 171 new tests spanning config / debug / plugin-cache / pre-tool- use / session-start-setup branches). Drive-by: killed a stray nomic embed-daemon from an earlier benchmark run that was causing grep-direct.test.ts:"delegates to grepBothTables" to flake — when the daemon is up, `EmbedClient.embed()` returns a real vector and the test's output goes through the semantic-emit-all-lines path instead of the lexical refine path it asserts on. Not the merge's fault, but surfaced by the post-merge full run.

The async SessionStart setup hook now fires EmbedClient.warmup() as its last step. warmup() either connects to an existing embed-daemon socket or spawns a fresh detached process; the daemon then calls NomicEmbedder.load() in the background, which triggers the one-time nomic-embed-text-v1.5 download to ~/.cache/huggingface/hub/ (~130 MB at q8, ~500 MB at fp32) on first run and keeps the model resident for the lifetime of the process. Previously the model only downloaded on the first Grep call — which meant every new install paid a 30-90 s latency on the first semantic retrieval. Doing it here instead hides that cold-start behind the async SessionStart (120 s timeout), so the user only sees it if they happen to fire a Grep before the async hook finishes the download. Everyone else gets an already-loaded daemon on first use. Behaviour is opt-out via HIVEMIND_EMBED_WARMUP=false for sessions that will never touch the memory path (CI, lightweight CC runs with no network), which logs the skip and moves on. warmup() swallows errors so a broken daemon path never breaks SessionStart. Tests: - session-start-setup-hook.test.ts: mocks EmbedClient so warmup() doesn't actually spawn a process; four new cases cover the ok / failed / threw / env-disabled branches - session-start-setup-branches.test.ts: same mock so the existing branch-coverage suite stays deterministic - grep-direct.test.ts: mocks EmbedClient.embed to always return null. Without this, grep-direct.test.ts was race-flaky — if any other test or prior run had spawned the daemon, the semantic branch in handleGrepDirect would fire and change the output shape, breaking every line-oriented assertion in this file. With the mock the lexical refine path runs deterministically regardless of whether a daemon is up outside the test process. Coverage: src/hooks/session-start-setup.ts → 100/100/100/100. All per-file thresholds still pass. 1108 tests green.

github-actions · 2026-04-23T06:37:20Z

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status	Category	Percentage	Covered / Total
🟢	Lines	97.10% (🎯 90%)	1542 / 1588
🟢	Statements	95.17% (🎯 90%)	1793 / 1884
🟢	Functions	93.82% (🎯 90%)	243 / 259
🔴	Branches	88.27% (🎯 90%)	1091 / 1236

File Coverage — 17 files changed

File	Stmts	Branches	Functions	Lines
`src/deeplake-api.ts`	🟢 98.0%	🟢 91.6%	🟢 97.3%	🟢 98.8%
`src/embeddings/client.ts`	🟢 95.9%	🔴 85.1%	🟢 95.2%	🟢 96.3%
`src/embeddings/daemon.ts`	🟢 94.9%	🔴 77.8%	🔴 78.9%	🟢 100.0%
`src/embeddings/disable.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/embeddings/nomic.ts`	🟢 96.2%	🟢 92.0%	🟢 100.0%	🟢 100.0%
`src/embeddings/protocol.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/embeddings/sql.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/hooks/capture.ts`	🟢 100.0%	🟢 94.1%	🟢 100.0%	🟢 100.0%
`src/hooks/codex/wiki-worker.ts`	🟢 97.7%	🟢 94.6%	🟢 100.0%	🟢 97.5%
`src/hooks/grep-direct.ts`	🟢 95.1%	🟢 90.8%	🟢 100.0%	🟢 97.2%
`src/hooks/session-start-setup.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/hooks/session-start.ts`	🟢 98.8%	🟢 90.5%	🟢 100.0%	🟢 98.8%
`src/hooks/upload-summary.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/hooks/wiki-worker.ts`	🟢 97.7%	🟢 94.6%	🟢 100.0%	🟢 97.5%
`src/shell/deeplake-fs.ts`	🔴 87.8%	🔴 77.2%	🔴 86.4%	🟢 91.1%
`src/shell/grep-core.ts`	🟢 97.5%	🟢 92.0%	🟢 97.4%	🟢 100.0%
`src/shell/grep-interceptor.ts`	🟢 97.5%	🟢 92.3%	🟢 94.1%	🟢 100.0%

_{Generated for commit 9d37091.}

…ema auto-migrate The existing opt-out story was scattered across three independent flags: HIVEMIND_SEMANTIC_SEARCH=false (query-time), HIVEMIND_EMBED_WARMUP=false (session-start spawn), and HIVEMIND_CAPTURE=false (write path — but that takes out capture entirely, not just the embed call inside it). There was no single lever to say "I want the plugin without the embedding feature at all, don't spawn the daemon, don't download the model". Adds one: HIVEMIND_EMBEDDINGS=false short-circuits every call site that would otherwise talk to the nomic daemon — - src/hooks/grep-direct.ts (query-time embed for Grep tool) - src/shell/grep-interceptor.ts (query-time embed for bash grep) - src/hooks/capture.ts (write-time embed before INSERT) - src/shell/deeplake-fs.ts (batched write-time embed in _doFlush) - src/hooks/session-start-setup.ts (SessionStart daemon warmup) The two per-feature flags keep working; HIVEMIND_EMBEDDINGS=false is the superset that kills all of them. Writes still succeed — the embedding columns land as NULL — so toggling the flag is reversible without rewriting existing rows. Schema migration --------------- Paired with this: ensureTable and ensureSessionsTable now issue ALTER TABLE ... ADD COLUMN IF NOT EXISTS for summary_embedding / message_embedding on tables that existed before the embeddings feature shipped. Wrapped in try/catch so backends that don't support ADD COLUMN IF NOT EXISTS (older Deeplake snapshots) log the skip and carry on — the write path already tolerates the column being absent. Users upgrading from 0.6.x pick the column up automatically on their next SessionStart without having to re-ingest. Tests ----- - claude-code/tests/embeddings-disable.test.ts: unit test for the embeddingsDisabled() helper (default false, "false" → true, other strings stay false) - session-start-setup-hook.test.ts: new case for the master flag (alongside the existing HIVEMIND_EMBED_WARMUP case) - deeplake-api.test.ts: rewrote the "table already exists" / "lookup-index already set up" cases to expect the new ALTER calls, plus a dedicated assertion that ALTER failures are swallowed so older backends keep working All 1 113 tests pass. Per-file coverage thresholds unchanged.

uploadSummary() was the last write path into the memory table that left summary_embedding = NULL. The DeeplakeFs-backed flush already embedded every row it touched, capture.ts already embedded every message, but the wiki-worker's final summary — the long, purpose-built wiki-style text that actually ought to be semantically retrievable — was going to Deeplake with no embedding at all. As a result summaries were only reachable from the lexical branch of the hybrid grep, never from the cosine branch. Changes: - `uploadSummary()` now takes an optional `embedding: number[] | null` on UploadParams and threads it into both the UPDATE and the INSERT, serialized through `embeddingSqlLiteral()` so the literal is either `ARRAY[...]::float4[]` or bare SQL `NULL`. The column is kept in the same statement as `summary` / `description` (the single-UPDATE invariant from the module docstring still holds — see `deeplake-update-bug-repro.py`). - Both `src/hooks/wiki-worker.ts` and `src/hooks/codex/wiki-worker.ts` call EmbedClient.embed(text, "document") right before uploadSummary, gated by `embeddingsDisabled()` and wrapped in try/catch. On any failure (daemon down, `HIVEMIND_EMBEDDINGS=false`, spawn fails) the summary still lands, just with NULL in the embedding column — so existing callers keep working and the row stays reachable via the lexical branch. Retrieval already uses it: `searchDeeplakeTables` in grep-core already joins memory.summary_embedding against the query vector when one is present, gated by `WHERE summary_embedding IS NOT NULL`. No changes needed there. Existing pre-embedding summaries (older rows) still have NULL in the column. They stay retrievable lexically; a one-shot back-fill script to compute embeddings for the existing backlog is left as a separate change so the first-principles write path lands cleanly here. Tests: - 5 new cases in upload-summary.test.ts covering ARRAY literal on UPDATE and INSERT, bare SQL NULL when the caller omits the embedding, explicit null, and the empty-array "daemon returned nothing" degenerate case. The existing "single UPDATE invariant" assertions still pass — summary, summary_embedding, size_bytes and description are all in the same statement. - wiki-worker.test.ts and codex-wiki-worker.test.ts now mock EmbedClient so the EmbedClient import doesn't try to reach a real socket during unit tests; the mock returns a fixed vector and the existing uploadSummary-call assertions pass unchanged. 1 118 tests green.

efenocchi added 17 commits April 22, 2026 18:15

chore: bump version to 0.7.0

0a2147c

efenocchi added 2 commits April 23, 2026 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(embeddings): nomic daemon + hybrid LIKE/semantic grep (LoCoMo parity)#71

feat(embeddings): nomic daemon + hybrid LIKE/semantic grep (LoCoMo parity)#71
efenocchi wants to merge 19 commits intomainfrom
embedding_generation

efenocchi commented Apr 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

efenocchi commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this shape

Benchmark — LoCoMo canonical 100 QA subset (45+55)

Commits

Architecture at a glance

Safety / opt-outs

Tests + coverage

Test plan

Updates since the initial push

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

efenocchi commented Apr 22, 2026 •

edited

Loading

github-actions Bot commented Apr 23, 2026 •

edited

Loading