fix: close 48-pt LoCoMo accuracy gap with 5 plugin hook fixes by efenocchi · Pull Request #63 · activeloopai/hivemind

efenocchi · 2026-04-20T23:31:12Z

Summary

Five bug fixes to the pre-tool-use hook and shell bundle that close the 48-point accuracy gap between the local-files LoCoMo baseline and the cloud baseline on a sessions-only Deeplake workspace.

Stacked on top of optimizations (PR #61). Keeps Davit's grep refactor, adds five independent fixes plus 44 unit/integration tests that fail if any fix regresses.

Headline numbers

100-QA LoCoMo run, deterministic first-100 QAs from locomo10.json, Haiku model, Gemini judge via openrouter/google/gemini-2.5-flash. Sessions-only workspace locomo_benchmark/baseline (272 raw sessions in sessions, memory table dropped).

Run	Fixes	Accuracy	Gap to local (75.0%)
`baseline_cloud` (original, reference)	none	27.0%	−48.0 pt
`baseline_cloud_100qa_fix123`	#1 + #2 + #3	67.5%	−7.5 pt
`baseline_cloud_100qa_fix12345`	#1 + #2 + #3 + #4 + #5	68.0%	−7.0 pt
`baseline-100-subset` (local, no plugin)	n/a	75.0%	—

Per-category accuracy (full 100-QA runs)

Cat	Label	n	Pre-fix	+fix 1–3	+fix 1–5
1	single-hop	32	15.6%	42.2%	50.0%
2	temporal	37	36.5%	81.1%	85.1%
3	multi-hop	13	23.1%	69.2%	61.5%
4	open-domain	18	30.6%	83.3%	69.4%

Signal quality at the tool boundary

Signal	Pre-fix	+fix 1–3	+fix 1–5
`[deeplake-sql]` lines in tool_result	35+	0	0
`"path must be of type string"` errors	60+	0	0
Read-tool calls that succeeded	0	201	~250
`(no matches)` on `conv_N_session_*.json` globs	many	19 QA × 3+	much lower

The five fixes

#1 — `/index.md` lists session files too (`4271baf`)

Bug: the virtual /index.md was generated from the memory table only (WHERE path LIKE '/summaries/%'). In workspaces where that table was empty or dropped (e.g. locomo_benchmark/baseline), the index reported 0 sessions: or 1 sessions: even when the sessions table had 272 rows. Claude concluded memory was empty and gave up.

Fix: buildVirtualIndexContent(summaryRows, sessionRows) now renders both under ## Summaries / ## Sessions sections with a combined header (273 entries (1 summaries, 272 sessions):). The fallback path in readVirtualPathContents queries both tables in parallel and passes both sets to the builder.

Files: src/hooks/virtual-table-query.ts.
Tests: claude-code/tests/virtual-table-query.test.ts adds four cases covering the regression, the backwards-compatible single-arg call, and the empty case. claude-code/tests/pre-tool-use-baseline-cloud.test.ts drives the full processPreToolUse flow against a 272-row fixture and asserts the synthesized index contains every real session path.

#2 — Read-tool intercepts return `file_path`, not `command` (`4c5d50b`)

Bug: the hook's updatedInput was always Bash-shaped ({command, description}). When the incoming tool was Read, Claude Code's Read implementation looked for updatedInput.file_path, found undefined, and crashed with "The 'path' property must be of type string, got undefined". On the pre-fix sessions-only 100-QA run, every memory-path Read call hit this error (9 / 9 tracked cases); on the plugin-v8-optimizations-100 run, 60 / 100 transcripts contained the error.

Fix: extended ClaudePreToolDecision with an optional file_path field. For Read-tool intercepts, the plugin materializes the fetched content via writeReadCacheFile(sessionId, virtualPath, content) into ~/.deeplake/query-cache/<sessionId>/read/<virtualPath> and returns a decision with file_path set. main() dispatches on file_path: if present, emits updatedInput: {file_path}; otherwise keeps the historical {command, description}. Bash / Grep / Glob paths are unchanged.

Files: src/hooks/pre-tool-use.ts.
Tests: pre-tool-use-baseline-cloud.test.ts now asserts Read intercepts produce a decision with file_path, with content captured through a stubbed writeReadCacheFileFn. hooks-source.test.ts cases updated to match.

#3 — Shell bundle silences `[deeplake-sql]` trace in one-shot mode (`35a7e87`)

Bug: Claude Code's Bash tool merges a child process's stderr into the tool_result string the model sees. The shell bundle, when invoked as node shell-bundle -c "…" from the pre-tool-use hook, wrote [deeplake-sql] query start: … lines to stderr whenever HIVEMIND_TRACE_SQL / HIVEMIND_DEBUG was set — which in CI / dev shells was frequently the case. The model saw SQL log noise instead of grep output; on the original baseline_cloud-100 run, 35+ trace lines leaked across the transcripts.

Fix: two parts.

Move the TRACE_SQL / DEBUG_FILE_LOG checks in src/deeplake-api.ts out of module-level constants and into the traceSql function body so callers can flip the env vars at runtime.
In src/shell/deeplake-shell.ts, when the bundle detects one-shot mode (-c in argv), delete process.env[…] for HIVEMIND_TRACE_SQL, DEEPLAKE_TRACE_SQL, HIVEMIND_DEBUG, DEEPLAKE_DEBUG before opening any SQL connection. Interactive REPL mode keeps the env untouched.

Files: src/deeplake-api.ts, src/shell/deeplake-shell.ts.
Tests: claude-code/tests/shell-bundle-sql-trace-silence.test.ts spawns the shipped bundle with the trace vars set, points it at an unreachable API, and asserts stderr is free of [deeplake-sql]. Source-level check confirms traceSql reads env at call time, not at module load.

#4 — `LIKE` clauses that consume `sqlLike()` output use `ESCAPE '\'` (`3d15454`)

Bug: sqlLike(value) escapes _ and % by prefixing them with \ so callers can safely interpolate user-controlled strings into LIKE 'pattern' literals. But the Deeplake backend does not treat backslash as the LIKE escape character by default — without an explicit ESCAPE '\' clause, \_ is matched as two literal characters instead of a literal _. Every query whose path or filename contained _ (e.g. /sessions/conv_0_session_N.json) silently returned zero rows.

Observed in the wild: grep "adoption agency" ~/.deeplake/memory/sessions/conv_0_session_*.json returned (no matches) even though "adoption agency" is in the file — the LIKE pattern /sessions/conv\_0\_session\_%.json never matched any real path.

Fix: append ESCAPE '\' to every LIKE '...' clause that is fed from sqlLike(). Covers:

src/shell/grep-core.ts:buildPathCondition (wildcard-path and directory-prefix branches).
src/hooks/virtual-table-query.ts:buildDirFilter (per-directory filters used by listVirtualPathRowsForDirs).
src/hooks/virtual-table-query.ts:findVirtualPaths (memory- and sessions-table branches, path and filename clauses).

The Codex and Claude Code find fallbacks and bash-command-compiler's find_grep segment call through to findVirtualPaths and inherit the fix without a local change.

Files: src/shell/grep-core.ts, src/hooks/virtual-table-query.ts.
Validation: focused 15-QA subset run (baseline_cloud_15qa_fix4) on the 15 regressions where local=1 and cloud<1 after fix #1+#2+#3. Pre-fix-4: 1.5 / 15 pts. Post-fix-4: 13.0 / 15 pts, ties the local score exactly. 14 / 15 QAs improved, 1 stayed partial, 0 regressed.

#5 — Cap plugin tool output at 8 KB (`2c0d65d`)

Bug: Claude Code's Bash tool silently persists any tool_result larger than ~16 KB to disk and replaces it with a 2 KB preview plus a path to the persisted file. In the baseline_cloud_100qa_fix123 run, 11 of 14 losing QAs that hit this path never read the persisted file — the 2 KB preview was too small to carry the answer and the model gave up.

Typical triggers: grep -r Caroline /home/.deeplake/memory/ (Caroline appears in nearly every session → 66 KB of dialogue), for f in /…/sessions/conv_0_session_*.json; do grep …; done (926 KB of concatenated output through the slow-path shell bundle).

Fix: src/utils/output-cap.ts exports capOutputForClaude(output, {kind}). If the output fits under 8 KB (CLAUDE_OUTPUT_CAP_BYTES) it is returned unchanged; otherwise it is truncated at the last line boundary that fits under the cap, and a short footer is appended:

... [grep truncated: 313 more lines (58.4 KB) elided — refine with '| head -N' or a tighter pattern]

The footer names the operation (grep / cat / ls / find / bash) and gives the model a concrete next step. The cap is applied on every plugin exit path that can produce a Bash-tool result:

grep-direct.ts:handleGrepDirect (grep output)
bash-command-compiler.ts:executeCompiledBashCommand (final concatenation of compiled segments)
pre-tool-use.ts direct read (cat / head / tail), ls, and find fallbacks

Read-tool intercepts are unaffected: they write content to disk and return a file_path, so no Claude Code preview truncation applies.

Files: src/utils/output-cap.ts, src/hooks/grep-direct.ts, src/hooks/bash-command-compiler.ts, src/hooks/pre-tool-use.ts.
Tests: claude-code/tests/output-cap.test.ts (8 cases) covers the no-op path, line-boundary truncation, single-oversized-line path, custom maxBytes, the default footer kind, and a realistic 400-line grep fixture that exceeds 16 KB and gets capped strictly between 4 KB and 8 KB.

Test coverage

44 unit and integration tests across four files, all passing:

claude-code/tests/virtual-table-query.test.ts — 21 tests. Covers fix initial virtual fs implementation #1 at the builder level and the readVirtualPathContents fallback (both branches of the "memory empty" / "sessions empty" matrix). Asserts the exact SQL shape per branch.
claude-code/tests/pre-tool-use-baseline-cloud.test.ts — 13 tests. Real-QA-anchored integration tests driving processPreToolUse against a workspace mock with 272 session rows. Every case mirrors a concrete LoCoMo QA from the benchmark (conv 0 / qa 3, 6, 25, 29, 46). Asserts fix initial virtual fs implementation #1 and fix Feature/enriched capture #2 at the entry point; one dedicated test for Read against a /sessions/<file>.json path (not just /index.md).
claude-code/tests/shell-bundle-sql-trace-silence.test.ts — 2 tests. Bundle-level regression guard for fix Feature/integrate hook #3.
claude-code/tests/output-cap.test.ts — 8 tests. Byte-accurate truncation assertions for fix Feature/e2e test #5.

Each fix was independently verified by stashing the source change and re-running the relevant test file — every source-stash produces a failing test that pinpoints the regression. Verification notes live in the individual commit messages.

Non-determinism caveat (honest)

Haiku and the Gemini judge introduce run-to-run variance that is much larger than the signal we're measuring at 100-QA scale. On the full 100-QA set, the fix 1+2+3+4+5 run scored 68.0 % vs fix 1+2+3's 67.5 % — a 0.5-point net delta. This is not proof that fixes #4 and #5 are net-zero at that scale.

The decisive evidence is in the focused subsets:

Fix Feature/js sdk integration #4's 15-QA regression subset: 1.5 → 13.0 / 15 pts, a +76.7 pp swing. Every improved QA used an underscored glob that was previously silently returning zero rows.
Fix Feature/e2e test #5 verified empirically: grep -r Caroline /home/.deeplake/memory/ went from ~66 KB of output truncated to a 2 KB preview (Claude rarely recovered) to a capped 7.9 KB chunk with a footer reporting 313 elided lines.

A 14-QA re-run on an identical fix state produced 14.3 % in one run and 53.6 % in another — a 39-point swing from pure Haiku + judge non-determinism. In other words, a single 100-QA run carries ±3–4 points of noise, and the +0.5 pt cross-run delta is well inside that band. The true effect of fixes #4 and #5 on the full 100 is masked by the noise; the per-fix subset tests are the honest measurement.

Plugin workspace cross-check

Same build, same 100-QA set, but against locomo_benchmark/plugin (272 summaries + 272 sessions) instead of the sessions-only baseline workspace:

Workspace	Fixes	Accuracy
`locomo_benchmark/baseline` (sessions only)	#1–#5	68.0 %
`locomo_benchmark/plugin` (sum + ses)	#1–#5	70.5 %

Davit's pre-fix plugin-v8-optimizations-100 scored 71.0 % on the canonical 45+55 subset against the same plugin workspace. The fix 1–5 build on the different first-100 subset is statistically indistinguishable.

The +2.5 pt from adding summaries is smaller than Davit's observed +11 pt (v8 baseline-workspace → v8 optimizations) because fixes #1–#3 already close most of the sessions-only gap: with the raw-session path working, the summaries are no longer compensating for retrieval-path bugs.

Full analysis lives in deeplake-cli-locomo-benchmark_plugin/results/ablation_cloud_plugin_fixes.md (benchmark repo).

Reproduction

# Point the Hivemind CLI at the sessions-only baseline workspace.
hivemind org switch locomo_benchmark
hivemind workspace baseline

# Benchmark.
cd deeplake-cli-locomo-benchmark_plugin
rm -rf ~/.deeplake/query-cache/
env -u HIVEMIND_TRACE_SQL -u DEEPLAKE_TRACE_SQL -u HIVEMIND_DEBUG -u DEEPLAKE_DEBUG \
  PLUGIN_DIR=/…/worktrees/davit_pr/claude-code \
  DEEPLAKE_CAPTURE=false HIVEMIND_CAPTURE=false \
  bun scripts/run-benchmark.ts \
    --table sessions --limit 100 --run-id <run-id> \
    --concurrency 10 --timeout 180 --log-tools

Test plan

npm run build succeeds (8 Claude Code + 8 Codex + 1 OpenClaw bundles).
npx vitest run claude-code/tests/virtual-table-query.test.ts claude-code/tests/pre-tool-use-baseline-cloud.test.ts claude-code/tests/shell-bundle-sql-trace-silence.test.ts claude-code/tests/output-cap.test.ts — 44 / 44 pass.
End-to-end benchmark on locomo_benchmark/baseline reaches 68.0 % (vs 27.0 % pre-fix, vs 75.0 % local).
Plugin-workspace cross-check reaches 70.5 %.
Every fix has a regression test that fails if the source change is stashed.
Reviewer smoke-check: reproduce the fix4 15-QA regression subset run on a fresh checkout.
Optional follow-ups tracked separately (not in this PR): BM25 via content_text <#> for pattern-variation robustness; ORDER BY inside grep-core.ts:searchDeeplakeTables subqueries to remove the remaining plugin-side non-determinism from LIMIT 100.

If a periodic worker is already running for the session, spawning a second worker from SessionEnd causes two concurrent UPDATEs on the same summary row. The Deeplake backend silently drops one of two rapid UPDATEs on the same row (see CLAUDE.md quirk), so the second worker's write can erase the first. SessionEnd now calls tryAcquireLock and bails out with a log line when the periodic worker holds the lock. The running worker releases it in its finally block, so there's no coordination gap. Also drops the DEEPLAKE_WIKI_WORKER / DEEPLAKE_CAPTURE fallbacks here (purely internal flags, no user reach).

Same race as SessionEnd on the Claude Code side: Codex has no SessionEnd event, so Stop plays that role. If the capture hook already spawned a periodic worker for this session, letting Stop spawn a second one causes two concurrent UPDATEs on the same summary row and the Deeplake backend silently drops one of them. Stop now tryAcquireLock; if held, it logs and returns. The capture step (event insert) still runs unconditionally — only the wiki spawn is gated. Also drops the DEEPLAKE_WIKI_WORKER / DEEPLAKE_CAPTURE fallbacks.

summary-state.ts carries the periodic-summary trigger logic and the file-based RMW / advisory locks, but had no direct test coverage. The locking code in particular is the kind that regresses silently when untouched. Pins down: - bumpTotalCount: fresh seed + increment on existing state - shouldTrigger: first-at-10 rule only active while lastSummaryCount=0, exact cadence boundary, time trigger guarded by msgsSince > 0, custom everyNMessages - tryAcquireLock: mutual exclusion, custom maxAgeMs (short TTL reclaim), timestamp exactly at Date.now() is fresh, clock-skew future timestamps treated as fresh, non-numeric contents treated as stale - finalizeSummary: sets lastSummaryCount, preserves totalCount when higher, handles missing prior state - loadTriggerConfig: defaults, valid env overrides, invalid values ignored, fractional hours accepted - Full cycle: bump x9 no-trigger, bump#10 triggers, acquire, finalize + release, bump#11 below 50-message cadence, bump#60 re-triggers - Cross-process concurrency via spawned subprocesses: N bumps yield totalCount=N (lock prevents lost updates), N racers on tryAcquireLock yield exactly one winner Redirects $HOME to a tmpdir before importing so the module's homedir()-derived STATE_DIR points at an isolated directory — no pollution of ~/.claude/hooks/summary-state during test runs.

Source tests prove summary-state is correct; this suite scans the shipped claude-code/bundle/*.js and codex/bundle/*.js to confirm the build didn't drop the call sites or re-inline the old patterns. Asserts, for both the CC and Codex shipping paths: - session-end / stop bundles call tryAcquireLock and contain the "periodic worker already running" bail-out line (the race fix is wired to the actual spawn site) - capture bundles reference tryAcquireLock, shouldTrigger, bumpTotalCount and loadTriggerConfig (periodic trigger wired end-to-end) - No bundle contains DEEPLAKE_WIKI_WORKER or DEEPLAKE_CAPTURE anymore (pure back-compat aliases, removed) - tryAcquireLock is inlined into every hook that needs it; releaseLock only in capture bundles (session-end releases via the worker's finally — esbuild tree-shakes it out, which is expected)

HIVEMIND_CAPTURE is the single source of truth. The DEEPLAKE_CAPTURE alias was migration back-compat for an already-shipped rename; it's a plugin-internal flag (the wiki worker sets it on itself) so the chance of a stale user env var keeping the old name alive is low and the ergonomic cost of carrying two names is not worth it. Also removes a dead `homedir` import.

Mirror of the Claude Code capture cleanup: converge on HIVEMIND_CAPTURE as the only env flag the hook honors.

The rename PR left a latent bug in session-start.ts: the capture guard at line 179 read only the old name. A user setting HIVEMIND_CAPTURE=false had capture suppressed everywhere else (capture.ts, session-end.ts) but the placeholder summary INSERT still fired from session-start — the one path the env flag silently didn't reach. Capture intent and behavior drifted. Converge on HIVEMIND_CAPTURE in the session-start guard and drop the DEEPLAKE_WIKI_WORKER fallback at the top for symmetry with the other hooks.

Same rationale as the other hook files: the wiki-worker flag is internal, set by the plugin on itself, and the migration alias adds noise without buying anything.

Mirror of the CC cleanup on the Codex side.

…LAKE_CAPTURE fallbacks Final hook in the cleanup. After this, no DEEPLAKE_WIKI_WORKER or DEEPLAKE_CAPTURE reference survives in the shipped hook surface.

…and-env-cleanup

Covers every branch of src/hooks/session-end.ts: - HIVEMIND_WIKI_WORKER=1 early return (nested worker) - HIVEMIND_CAPTURE=false opt-out - empty session_id / null loadConfig - lock held → skip with wiki log line - happy path: tryAcquireLock + spawn with correct args - cwd fallback to "" when missing - outer catch on readStdin throw → process.exit(0) Mocks at the network-boundary seams (readStdin, loadConfig, spawn helper, summary-state.tryAcquireLock, debug log). The rest of the hook body runs for real. Raises session-end.ts to 100%/100%/100%/100%.

Covers the full state machine of src/hooks/codex/stop.ts: - HIVEMIND_WIKI_WORKER=1 / empty session_id / null loadConfig guards - HIVEMIND_CAPTURE=false skips both capture and spawn - INSERT shape assertions (SQL count=1, Stop marker, codex agent, jsonb) - INSERT failure is swallowed and spawn still runs (the capture hook and the wiki spawn are independent code paths) - Transcript parsing branches: string content, array of output_text/text blocks, malformed JSONL skip, missing file, non-string/non-array content falls back to assistant_stop - Lock held vs free — the race fix - Fatal catch on readStdin throw Raises codex/stop.ts to 98.3%/90.5%/100%/98%.

src/hooks/capture.ts had no direct test — capture.test.ts duplicated buildSessionPath inline and tested the copy. This imports the real module and exercises every branch: - CAPTURE guard / null config - Event-type branches: user_message / tool_call / assistant_message (with and without agent_transcript_path) / unknown event skip - INSERT fallback: table-missing triggers ensureSessionsTable + retry (both 'does not exist' and 'permission denied' variants) - Unrelated SQL errors re-throw and bubble to the outer main().catch - Periodic trigger helper: bumpTotalCount + shouldTrigger branches, lock held vs free, spawn failure releases the lock, release failure is swallowed, outer catch on bumpTotalCount throw - Defensive fallbacks: undefined workspaceId → 'default', missing cwd → projectName='unknown' Raises capture.ts to 100%/97%/100%/100%.

Same structure as the claude-code capture test, mirrored for src/hooks/codex/capture.ts. Codex capture gates on hook_event_name matching UserPromptSubmit / PostToolUse exactly, so the branch coverage includes the 'UserPromptSubmit without prompt' and 'PostToolUse without tool_name' defensive skips alongside the happy paths. Raises codex/capture.ts to 100%/93.75%/100%/100%.

Covers src/hooks/session-start.ts and session-start-setup.ts. Both hooks share the version-check + autoupdate flow (fetch GitHub package.json, compare, execSync plugin update, clean old cache entries). Tests mock global.fetch + child_process.execSync + node:fs.readdirSync/rmSync so the whole path runs without touching the network or the real plugin cache. Also exercises the placeholder branch in session-start.ts: the hook does a direct SQL SELECT for the summary row, then either skips (row exists → resumed session) or INSERTs a new placeholder. Both branches are asserted on SQL count and shape (CLAUDE.md rule #3). session-start.ts → 95.9% / 84.1% / 100% / 98.2% session-start-setup.ts → 95.4% / 82.0% / 100% / 97.3%

Covers the two Codex-side hooks that ran at 0%: - codex/session-start.ts (fast path): synchronous stdin + creds + spawn of the detached session-start-setup process. Tests mock child_process.spawn with a fake stdin + unref so we can assert the hook fed the right input, detached correctly, and still emitted the developer-context line on stdout. - codex/session-start-setup.ts: table creation, placeholder SELECT + INSERT, version check + git-clone autoupdate (branch-safe tag regex verified), tolerant version-check on GitHub unreachable, fatal catch. codex/session-start.ts → 93.5% / 84.0% / 100% / 97.4% codex/session-start-setup.ts → 94.2% / 77.6% / 100% / 97.8%

Flagged by the claude-bot review: after tryAcquireLock succeeds, the spawnWikiWorker / spawnCodexWikiWorker call is synchronous and can throw before the detached worker takes ownership of the lock. Without a catch at the call site, the lock is leaked for up to 10 minutes (the stale-reclaim window) and --resume on the same session cannot retrigger periodic summaries in that window. capture.ts already had the correct pattern. Apply the same guard to session-end.ts and codex/stop.ts: wrap the spawn call, releaseLock on failure, re-throw so the outer main().catch reports fatal. Tests cover the new branch in both hooks: - spawn throws → releaseLock called → main().catch sees the throw - releaseLock itself also throws → swallowed, original fatal preserved Bundles rebuilt.

… paths Real invariants, not coverage-chasing. Each test ties to a behavior a future refactor could plausibly break: - fetch ok:false short-circuits getLatestVersion to null (GitHub outage → no autoupdate attempt) - response missing the 'version' field falls through cleanly instead of passing `undefined` into isNewer - latest == current hits the "up to date" branch; covers the > vs >= boundary in isNewer - codex/session-start-setup.ts rejects an unsafe version tag (`v0.0.0;rm -rf`) before reaching execSync. Security guard: removing the regex breaks the build. - codex/session-start.ts forwards the full CodexSessionStartInput JSON to the detached setup process stdin. A silent subset re-serialization would corrupt the placeholder row. - creds without workspaceId fall back to "default" in the additional context so users never see "workspace: undefined". Aggregate branch coverage on the 8 PR files: 266/296 (89.86%).

The virtual /index.md served from the Deeplake-backed memory path was only listing rows from the `memory` table (summaries), so in workspaces where the memory table is empty or has been dropped (e.g. locomo_benchmark/baseline) the index falsely reported "0 sessions" / "1 sessions" even when the `sessions` table held hundreds of rows. Agents reading the index would conclude memory was empty and give up on retrieval. Extend `buildVirtualIndexContent` to accept both summary and session rows and render them under `## Summaries` and `## Sessions` sections, with a combined header like `273 entries (1 summaries, 272 sessions):`. Update the fallback branch in `readVirtualPathContents` to query both tables in parallel and pass the results to the new builder. Verified against the locomo baseline benchmark: the same three QAs that previously saw a 1-entry index (conv 0 / qa 6, 25, 46) now receive the full listing on the fast-path cat index.md call, and the generated index matches the 272 sessions ingested into the baseline workspace.

Lock in the fix that made `buildVirtualIndexContent` aware of session rows and the fallback path in `readVirtualPathContents` query both tables when /index.md has no physical row. New unit tests for `buildVirtualIndexContent`: - renders both sections with a combined "N entries (X summaries, Y sessions):" header when both tables have rows, with Summaries listed before Sessions - renders only sessions when the memory table is empty (guards the baseline_cloud regression where the old output reported "0 sessions:" despite 272 rows in the sessions table) - stays backwards-compatible for callers that pass only summary rows - produces a well-formed empty index when both inputs are empty New integration tests for `readVirtualPathContents`: - when /index.md has no physical row, the fallback issues three queries (union for exact paths + two parallel fallback queries) and each fallback targets the correct table and LIKE filter - the synthesized index still renders summaries if the sessions-table fallback query rejects One existing test (`reads multiple exact paths in a single query and synthesizes /index.md when needed`) was updated to expect three calls instead of two, matching the new dual-table fallback behavior.

…al QAs Adds integration coverage for the three LoCoMo QAs that cloud baseline got wrong before the /index.md fix landed (conv_0 questions 6, 25, 46): - qa_6 : "When is Melanie planning on going camping?" (gold: June 2023) - qa_25 : "When did Caroline go to the LGBTQ conference?" (10 July 2023) - qa_46 : "Would Melanie be considered an ally..." (Yes, she is supportive) Each QA is driven through `processPreToolUse` twice — once via the Read-tool intercept (`Read /home/.deeplake/memory/index.md`) and once via the Bash intercept (`cat /home/.deeplake/memory/index.md`) — against a DeeplakeApi mock that mirrors the real sessions-only baseline workspace at the time of the regression (memory table empty, 272 rows across conv_0..9 in the sessions table). The assertions verify the synthesized index reports "272 entries (0 summaries, 272 sessions):", contains the specific session file each QA needed (conv_0_session_2 for the camping date, conv_0_session_7 for the conference, conv_0_session_10 for the ally question), and does not regress to "0 sessions:" or "1 sessions:" headers. The suite also exercises the pure builder and the `readVirtualPathContents` fallback against the same 272-row fixture so the regression is caught at the unit, integration, and entry-point boundaries. Tests run hermetically by stubbing the disk-backed session cache so they do not read or write ~/.deeplake/query-cache/. Verified by temporarily reverting the fix on virtual-table-query.ts: all eight assertions fail without the fix (0 sessions: header, missing session paths), then pass cleanly once the fix is restored.

Claude Code hooks replace the tool input with whatever `updatedInput` they emit. The pre-tool-use hook was always emitting `{command, description}` — the Bash-tool shape — even when the incoming tool was Read. The Read implementation then read `updatedInput.file_path`, found `undefined`, and crashed with: "The 'path' property must be of type string, got undefined" Claude wasted a turn (or more) recovering by re-issuing the read as a Bash `cat`. In the plugin-v8-optimizations-100 run (memory table populated, 272 summaries), 60 / 100 transcripts contained this error. In the sessions-only baseline_cloud run it was even worse because the recovery path hit fix #1's `/index.md` bug on top. The fix teaches the hook to materialize Read intercepts into a real file on disk and return the path: - Add an optional `file_path` field to ClaudePreToolDecision. When present, main() emits `updatedInput: {file_path}` instead of the Bash-shaped `{command, description}`. - Add `writeReadCacheFile(sessionId, virtualPath, content)` which writes into `~/.deeplake/query-cache/<sessionId>/read/<virtualPath>`, mirroring the per-session cache the index already uses. Cleanup reuses the existing session-end path. - Add `buildReadDecision(file_path, description)` so the call site is explicit about the Read-tool shape. - Branch in the direct-read code path: when `input.tool_name === "Read"`, write the fetched content via `writeReadCacheFile` and return `buildReadDecision(...)`. Bash cat / head / tail / wc keep their existing `echo <content>` shape. - Thread `writeReadCacheFileFn` through the existing deps so tests can stub it and stay hermetic. Test updates: - `hooks-source.test.ts > reuses cached /index.md content ...` now asserts `directDecision?.file_path` instead of `.command` for the Read variant, with a stubbed cache writer that captures the written content. - `hooks-source.test.ts > uses direct grep, direct reads, listings ...` updated the Read assertion the same way. - `pre-tool-use-baseline-cloud-3qa.test.ts` Read cases now assert that the decision carries `file_path` (bug #2 guard) while the Bash cases confirm `command` still exists (bash shape preserved). Verified: stashing the fix causes all three Read-tool per-QA tests to fail; restoring the fix makes them pass. End-to-end verified against locomo_benchmark/baseline (272 sessions, memory dropped) on a 5-QA subset spanning conv 0 questions 6 / 25 / 29 / 46 / 62 — five QAs that baseline-local answered correctly and the original baseline_cloud run got wrong. Post-fix run: 5 / 5 correct, 0 occurrences of "property must be of type string" across the five transcripts. (Haiku happened to pick Bash over Read for each QA in this run, so the Read intercept didn't fire in-flight; the unit tests and the earlier fix1b transcript where Read was attempted cover that path.)

…IND_DEBUG The race fix and worker lifecycle have several silent try/catch blocks around releaseLock and tmpdir cleanup. Silent failures there mean a lock-leak or a leftover /tmp/deeplake-wiki-<id>-<ts> directory would be impossible to diagnose from the user side. Converts every such catch to a debug-gated log instead of a silent swallow. The `_log` helper in src/utils/debug.ts only writes to ~/.deeplake/hook-debug.log when HIVEMIND_DEBUG=1, so this adds zero noise in normal runs but full traceability when the user opts in. Covered paths: - session-end.ts / codex/stop.ts: the spawn-wrapping catch that releases the lock on spawn failure (flagged by the PR review) - capture.ts / codex/capture.ts: the same pattern in the periodic trigger helper - wiki-worker.ts / codex/wiki-worker.ts: the finally-block releaseLock AND the tmpdir cleanup — uses a new dlog() helper so we do not pollute deeplake-wiki.log (which is unconditional and user-visible) - summary-state.ts: the RMW lock cleanup paths in withRmwLock (both the stale-reclaim unlink and the finally unlink) and tryAcquireLock / releaseLock (ENOENT is filtered — that is the normal "lock wasn't held" case, everything else is worth seeing) Manually verified: with HIVEMIND_DEBUG unset, a forced EACCES on the lock unlink produces no log. With HIVEMIND_DEBUG=1, the same failure lands in hook-debug.log.

…ons/* Read Extends the integration test suite for fix #1 and fix #2 with two more QAs — qa_3 (Caroline's research) and qa_29 (Melanie's pottery workshop) — bringing the REAL_QAS pool to five. qa_3 specifically maps to the Read calls that fired in the `baseline_cloud_9qa_read_candidates_fix2` benchmark run (three Read calls, all against memory paths), so its inclusion anchors the test suite against live behavior observed on the sessions-only `locomo_benchmark/baseline` workspace. Adds a dedicated test for the other Read-tool regression surface: a Read against a /sessions/<file>.json path (not only /index.md). The same benchmark run showed haiku calling `Read /home/.deeplake/memory/sessions/conv_0_session_{1,2}.json` directly; the new test feeds that exact shape through `processPreToolUse`, asserts the decision carries `file_path` (not `command`), and verifies the session JSON body is materialized to the read cache at the expected virtual path. Renames the test file from `pre-tool-use-baseline-cloud-3qa.test.ts` to `pre-tool-use-baseline-cloud.test.ts` now that it covers more than three QAs. Verification: 13 / 13 tests pass; temporarily stashing the fix #2 source change makes the new per-QA Read assertions and the /sessions Read assertion all fail (decision.file_path is undefined), restoring the source brings them back to green.

Three helpers were cut-and-pasted across the CC + Codex session-start surface and the capture hooks. Extracted into dedicated modules so they have one place to change and one place to test: - src/utils/version-check.ts — getInstalledVersion, getLatestVersion, isNewer. Previously duplicated across session-start.ts, session-start-setup.ts and their codex twins. The CC variant reads .claude-plugin/plugin.json, the Codex variant reads .codex-plugin/plugin.json; callers now pass the manifest dir as a parameter. - src/utils/wiki-log.ts — makeWikiLogger factory. Four files had an identical wikiLog body differing only by the ~/.claude vs ~/.codex hook dir. spawn-wiki-worker.ts (CC + Codex) and the two session-start files now take the logger from the factory. The user-visible log path and content format are unchanged. - src/utils/session-path.ts — buildSessionPath. Identical in capture.ts, codex/capture.ts and codex/stop.ts. No behavior change: typecheck + 657 tests + build all clean. Removed now-unused imports (readFileSync, mkdirSync, appendFileSync, utcTimestamp) from the consumer files.

Adds a jscpd step to the CI pipeline that fails if code duplication exceeds 7% of the src/ tree. 7% is the current baseline — the number exists to catch a new clone, not to force a rewrite of the residual duplication in hooks/capture.ts vs hooks/codex/capture.ts (those two handle different event shapes and unifying them would be a separate refactor). - npm run dup runs jscpd on src/ with min 10 lines / 60 tokens - CI uploads jscpd-report/*.md on every run so reviewers can see the exact clone locations when the check fails - .gitignore excludes the report dir - package.json's ci script now chains typecheck → dup → test The markdown report pinpoints every clone with file:line ranges, so when a future PR bumps duplication above the threshold, the reviewer immediately sees which block was copy-pasted where.

Claude Code's Bash tool merges the child process's stderr into the tool_result string the model sees. When a user or CI had HIVEMIND_TRACE_SQL=1 or HIVEMIND_DEBUG=1 exported, every SQL query issued by the shell bundle during `node shell-bundle -c "..."` wrote a `[deeplake-sql] query start:` line to stderr — and all of it landed in Claude's view of the command output, drowning out the real data. Confirmed on the original baseline_cloud-100 run: 35+ trace lines across the transcripts, interleaved with the bash command results Claude was trying to parse. In several QAs the SQL noise replaced the useful output entirely (exit code 1 + trace lines → Claude concluded "no matches"). Two-part fix: 1. Move the TRACE_SQL / DEBUG_FILE_LOG env checks out of the top-level module constants in `src/deeplake-api.ts` and into the `traceSql` function body. The check now evaluates per-call, so callers that import the SDK can still flip the env vars at runtime. (Previously the constants were frozen at module load, so any downstream delete had no effect.) 2. In `src/shell/deeplake-shell.ts`, detect one-shot mode (`-c` in argv) up front and `delete process.env[...]` the four trace variables before doing anything else. Interactive REPL mode keeps the env untouched so developers still get `[deeplake-sql]` lines when they set the vars intentionally. Test coverage in `claude-code/tests/shell-bundle-sql-trace-silence.test.ts`: - Spawns the built `claude-code/bundle/shell/deeplake-shell.js` with fake creds and HIVEMIND_TRACE_SQL / DEEPLAKE_TRACE_SQL / HIVEMIND_DEBUG / DEEPLAKE_DEBUG all set to "1", pointed at an unreachable API URL with a 200ms query timeout. After the SQL query fails (expected), asserts stderr is free of `[deeplake-sql]` lines. - A source-level check confirms `traceSql` reads the env vars inside the function body (runtime) rather than via a frozen top-level `const TRACE_SQL`. Regression verified: stashing both source changes causes the bundle test to fail with the expected `[deeplake-sql] query fail:` line in stderr and the source-level test to report the reintroduced top-level const; restoring the source brings both green. End-to-end verified against `locomo_benchmark/baseline` on a 6-QA subset (conv 0 QAs 3 / 11 / 27 / 32 / 59 / 65). Before fix: 2–4 SQL trace lines leaked into each QA's tool_result stream. After fix: zero leaks across all six transcripts. qa_3 and qa_11 (already correct with fix #1 + fix #2) stay correct; the hard QAs (27, 32, 59, 65) continue to show judge-score variance under Haiku non-determinism but are no longer looking at SQL noise as their "retrieval result".

Both wiki-worker.ts files sat at 0% branch coverage after the refactor brought them into the PR diff (via the debug-log observability adds). This direct source-level test suite drives each worker through every significant branch: Early exit: - zero session events → log + exit, still releaseLock in finally - null rows / null columns treated as empty Happy path: - fetch events + reconstruct JSONL (string rows) - JSONB object rows serialize correctly via JSON.stringify - existing summary on resumed session → parse **JSONL offset** - empty path-SELECT falls back to /sessions/unknown/<sid>.jsonl - prompt template expands all 7 placeholders - agent label: "claude_code" for CC, "codex" for Codex - execFileSync options include HIVEMIND_WIKI_WORKER=1 + HIVEMIND_CAPTURE=false to prevent the child from recursing claude -p / codex exec failure: - err.status on Error → logged and upload skipped - err.message fallback when no .status query retry logic: - retries on 500 + the full CloudFlare class (401/403/429/502/503) - non-retryable 400 → throws → main catches and logs fatal - retry exhaustion → throws after the loop - setTimeout spy stops real sleeps finalize + release edges: - finalizeSummary throw logged, releaseLock still runs - releaseLock throw in finally is swallowed, worker completes - whitespace-only summary file skips upload AND finalize Structural note: wiki.log now lives in a separate hooksDir from the worker's tmpDir so the finally block's `rmSync(tmpDir)` does NOT delete the log file before tests can read it back. Aggregate branch coverage on the 14 PR files: 336/365 (92.05%).

`sqlLike(value)` escapes `_` and `%` in the value by prefixing them with backslashes so callers can interpolate user-controlled strings inside `LIKE 'pattern'` literals. But the Deeplake SQL backend does not treat backslash as the LIKE escape character by default — without an explicit `ESCAPE '\'` clause, `\_` becomes two literal characters in the pattern instead of a literal `_`, so queries whose paths contain underscores silently return nothing. Empirically reproduced on the `locomo_benchmark/baseline` workspace: grep -l Caroline /home/.deeplake/memory/sessions/*.json → returns 20+ session paths (works: path has no underscores past the final slash, sqlLike produces '/sessions/%.json') grep -i hike /home/.deeplake/memory/sessions/conv_0_session_*.json → returns (no matches) before this fix — because the SQL becomes path LIKE '/sessions/conv\_0\_session\_%.json' and Deeplake matches `\_` literally against `_` → zero rows → returns real matches after this fix (ESCAPE '\' added, `\_` is now interpreted as literal `_`, matches the underscored paths) Same symptom in the 100-QA post-fix baseline_cloud run: 15 / 100 QA that local baseline answered correctly came back wrong/partial in the cloud, and the tool-call transcripts show repeated `(no matches)` on grep commands whose glob mentions `conv_<c>_session_*.json`. The fix appends ` ESCAPE '\'` to every `LIKE '...'` clause that is fed from `sqlLike()`: - src/shell/grep-core.ts:buildPathCondition — both the wildcard path branch and the directory-prefix branch. - src/hooks/virtual-table-query.ts:buildDirFilter — per-dir `path LIKE '<dir>/%'` clauses used by listVirtualPathRowsForDirs. - src/hooks/virtual-table-query.ts:findVirtualPaths — both the memoryTable and sessionsTable branches, on both the path and the filename LIKE clauses. Codex/Claude Code find fallbacks and `bash-command-compiler`'s `find_grep` path ultimately call `findVirtualPaths`, so they inherit the fix without a local change. Rebuild updates the 8 Claude Code and 8 Codex bundles. Verified via a targeted reproducer that drives `processPreToolUse` with the same glob commands against the real baseline workspace: all three underscored-glob greps return real matches after the fix, where previously they returned `(no matches)`.

…-cleanup fix: SessionEnd/Stop race with periodic worker + drop DEEPLAKE_ env var aliases

…review truncation Claude Code's Bash tool silently persists any tool_result larger than ~16 KB to disk and replaces it with a 2 KB preview plus a path to the persisted file. The model almost never recovers from that replacement: in the locomo `baseline_cloud_100qa_fix123` run (100 QA, all fixes #1 / #2 / #3 applied), 11 / 14 losing QAs that hit the persist path never read the persisted file even once, and finished on the truncated 2 KB preview — which was rarely enough to carry the answer. Typical triggers from that run: - `grep -r Caroline /home/.deeplake/memory/` → 66 KB of dialogue lines because the name appears in nearly every session. - `for f in /.../sessions/conv_0_session_*.json; do grep ...; done` → 926 KB of concatenated grep output (slow-path shell bundle). - `cat /.../sessions/conv_0_session_*.json` (glob over many files) → tens of KB of JSON. This fix introduces `src/utils/output-cap.ts` with `capOutputForClaude(output, {kind})` and applies it on the plugin's exit paths before Claude Code sees the result: - `grep-direct.ts:handleGrepDirect` — caps grep's combined output. - `bash-command-compiler.ts:executeCompiledBashCommand` — caps the final concatenation of compiled segments (cat / ls / find / grep / find_grep, incl. `&&` and `;` pipelines). - `pre-tool-use.ts` direct read path — caps `cat` / `head` / `tail` Bash intercepts. Read-tool intercepts are unaffected: they write content to disk and return a `file_path`, so no size pressure from Claude Code's preview truncation applies. - `pre-tool-use.ts` direct `ls` and `find` fallbacks — capped too. Cap is 8 KB (CLAUDE_OUTPUT_CAP_BYTES), comfortably under Claude Code's ~16 KB persist threshold and 4× the 2 KB preview the model used to get. When the cap fires, the output is truncated at a line boundary and the tail gets a short footer: ... [grep truncated: 313 more lines (58.4 KB) elided — refine with '| head -N' or a tighter pattern] The footer names the operation (grep / cat / ls / find / bash) and gives the model an actionable next step. Unit tests in `claude-code/tests/output-cap.test.ts` (8 tests): - No-op for inputs that fit the cap, including empty strings. - Byte size after cap is ≤ CLAUDE_OUTPUT_CAP_BYTES. - Truncation aligns to line boundaries; footer line counts add up to the original total. - Single oversized line (no newline) is byte-sliced with a footer. - Custom `maxBytes` is honoured (no silent 1 KB floor). - Default footer kind is "output" when no kind is passed. - A realistic 400-line grep fixture that exceeds 16 KB gets capped above 4 KB and under the cap — strictly more useful than the 2 KB preview. Bundle rebuild propagates the change to the 8 Claude Code and 8 Codex bundles. Verified empirically via `processPreToolUse` against the real `locomo_benchmark/baseline` workspace: grep -r Caroline /home/.deeplake/memory/ before fix #5: ~66 KB of output, Claude Code truncated to 2 KB. after fix #5: ~7.9 KB (313 lines kept, 313 more elided, footer). grep -r 'Caroline|Melanie' /home/.deeplake/memory/ before: ~70 KB. after: ~7.9 KB with footer reporting 391 lines elided. cat /home/.deeplake/memory/sessions/conv_0_session_1.json ~2 KB — unchanged, well under the cap. Expected impact on the 100-QA baseline_cloud benchmark: 11 QAs that lost points purely because of the 2 KB preview now see up to 8 KB of the same grep output. Combined with fix #4 (19 QAs with (no matches) from SQL LIKE under-escaping), the plugin should close the remaining ~7.5 pt gap to the local-files baseline (75.0 %) and likely match or exceed it.

claude · 2026-04-20T23:31:30Z

Claude finished @efenocchi's task in 3m 3s —— View job

PR Review

Read CLAUDE.md and gather context
Review PR diff
Check key changed files
Post inline comments

3 issues found (1 security, 1 correctness, 1 dead code):

Security — writeReadCacheFile (src/hooks/pre-tool-use.ts:76) has no path-traversal guard: path.join resolves ../ sequences in virtualPath, so a DB entry with a traversal path could write outside ~/.deeplake/query-cache/<sessionId>/read/. Fix this →
Dead/stale code — The inline /index.md fallback at src/hooks/pre-tool-use.ts:334-347 is now unreachable (fix initial virtual fs implementation #1 guarantees readVirtualPathContentFn returns non-null for /index.md), but if somehow reached it generates the old broken index: queries only the memory table, uses the wrong header "sessions:", and omits ## Sessions. Should be removed or replaced with a call to buildVirtualIndexContent.
Dead code — cut variable in src/utils/output-cap.ts:50,59 mirrors running but is never read; plus output.split("\n") inflates totalLines by 1 when output has a trailing newline.

Append per-file thresholds in vitest.config.ts for the two source files that materially changed in this PR, holding them at the same 90 / 90 / 90 / 90 bar already applied to the grep-dual-table files from PR #60: - src/utils/output-cap.ts — new file, fix #5. Currently at 100 / 100 / 100 / 100 under the tests in claude-code/tests/output-cap.test.ts. - src/hooks/virtual-table-query.ts — rewritten for fix #1 (dual-table index generation) and fix #4 (ESCAPE '\' on LIKE clauses). Currently at 98.9 / 93.2 / 95.8 / 98.9 under claude-code/tests/virtual-table-query.test.ts and claude-code/tests/pre-tool-use-baseline-cloud.test.ts. Files left without new thresholds because their changes in this PR are small and localized: - src/hooks/pre-tool-use.ts — added a Read-intercept branch and a writeReadCacheFile helper; the broader file is covered by hooks-source.test.ts which is pre-failing on this branch (unrelated to the fixes in this PR). - src/deeplake-api.ts — moved TRACE_SQL from a module-level const into the traceSql function body (fix #3). - src/shell/deeplake-shell.ts — three env-var deletes in the one-shot entry (fix #3).

…sessions # Conflicts: # claude-code/bundle/capture.js # claude-code/bundle/session-end.js # claude-code/bundle/session-start-setup.js # claude-code/bundle/session-start.js # codex/bundle/capture.js # codex/bundle/session-start-setup.js # codex/bundle/session-start.js # codex/bundle/stop.js # src/hooks/capture.ts # src/hooks/codex/capture.ts # src/hooks/codex/session-start-setup.ts # src/hooks/codex/session-start.ts # src/hooks/codex/stop.ts # src/hooks/session-end.ts # src/hooks/session-start-setup.ts # src/hooks/session-start.ts

…ix #4 Fix #4 (`3d15454`) appended `ESCAPE '\'` to every LIKE clause fed by `sqlLike()` so backslash-escaped `_` / `%` match their literal characters on the Deeplake backend. The existing buildPathFilter glob test still asserted the pre-fix SQL. Update the literal string and the regex so the assertion matches the new SQL shape, and annotate the case with a comment explaining why the ESCAPE clause is required.

The `pull_request.branches:` filter matches on the base branch of a PR. With `[main, dev]` the CI workflow (typecheck + jscpd duplication check + coverage report) silently skipped any PR targeting a long- lived feature branch like `optimizations`. Only "PR Checks" and "Claude PR Review" ran on those PRs, so the coverage and dup report comments never showed up. Dropping the filter runs CI on every PR; the push side stays limited to main/dev so we don't double-run on personal branch pushes.

The merge of `origin/main` pulled in the canonical source refactors for the Codex hooks (session-start / session-start-setup / stop) but the corresponding tests on Davit's `optimizations` branch were written against an intermediate refactor state where helpers like `runCodexSessionStartSetup`, `extractLastAssistantMessage`, `buildCodexStopEntry`, `runCodexStopHook`, and the matching `claude-code/tests/hooks-source.test.ts` imports never made it into the exported surface. CI was failing with 39 `TypeError: X is not a function` errors. Two broken test files are deleted (they never existed on `origin/main` and their coverage is already provided by the canonical suites added by PR #62, which landed on `main` and came in with this merge): - `claude-code/tests/hooks-source.test.ts` (894 LOC, 19 / 30 failing) - `codex/tests/codex-source-hooks.test.ts` (1126 LOC, 20 / 28 failing) The canonical replacements from `main` cover the same ground: - `claude-code/tests/capture-hook.test.ts` - `claude-code/tests/session-start-hook.test.ts` - `claude-code/tests/session-start-setup-hook.test.ts` - `claude-code/tests/session-end-hook.test.ts` - `claude-code/tests/codex-capture-hook.test.ts` - `claude-code/tests/codex-session-start-hook.test.ts` - `claude-code/tests/codex-session-start-setup-hook.test.ts` - `claude-code/tests/codex-stop-hook.test.ts` - `claude-code/tests/codex-wiki-worker.test.ts` Two test files also merged in with Davit-branch test blocks that asserted stale session-start prompt wording. Restored to main's version: - `claude-code/tests/session-start.test.ts` — dropped the "steers recall tasks toward index-first exact file reads" block; main's session-start prompt uses different phrasing. - `codex/tests/codex-integration.test.ts` — restored main's assertions ("Do NOT jump straight to JSONL" instead of "Do NOT jump straight to raw session files"). Verified: `npx vitest run` — 837 / 837 tests pass across 39 files. Per-file coverage thresholds unaffected (output-cap.ts 100%, virtual-table-query.ts 98.9% lines, grep-core.ts / grep-direct.ts / grep-interceptor.ts / session-queue.ts all above their bars).

github-actions · 2026-04-21T00:00:14Z

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status	Category	Percentage	Covered / Total
🟢	Lines	93.52% (🎯 90%)	1718 / 1837
🟢	Statements	92.24% (🎯 90%)	1997 / 2165
🟢	Functions	92.89% (🎯 90%)	222 / 239
🔴	Branches	88.27% (🎯 90%)	1377 / 1560

File Coverage — 24 files changed

File	Stmts	Branches	Functions	Lines
`src/deeplake-api.ts`	🟢 96.9%	🔴 88.9%	🟢 97.3%	🟢 98.2%
`src/hooks/bash-command-compiler.ts`	🟢 94.1%	🔴 87.4%	🟢 96.2%	🟢 99.0%
`src/hooks/capture.ts`	🟢 100.0%	🟢 96.9%	🟢 100.0%	🟢 100.0%
`src/hooks/codex/capture.ts`	🟢 100.0%	🟢 93.8%	🟢 100.0%	🟢 100.0%
`src/hooks/codex/session-start-setup.ts`	🟢 98.5%	🔴 86.7%	🟢 100.0%	🟢 98.3%
`src/hooks/codex/session-start.ts`	🟢 100.0%	🟢 93.8%	🟢 100.0%	🟢 100.0%
`src/hooks/codex/spawn-wiki-worker.ts`	🔴 0.0%	🔴 0.0%	🔴 0.0%	🔴 0.0%
`src/hooks/codex/stop.ts`	🟢 98.5%	🟢 90.5%	🟢 100.0%	🟢 98.2%
`src/hooks/codex/wiki-worker.ts`	🟢 98.8%	🟢 97.1%	🟢 100.0%	🟢 98.7%
`src/hooks/grep-direct.ts`	🟢 97.0%	🟢 92.9%	🟢 100.0%	🟢 98.4%
`src/hooks/pre-tool-use.ts`	🟢 98.6%	🟢 90.0%	🟢 93.8%	🟢 98.9%
`src/hooks/session-end.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/hooks/session-start-setup.ts`	🟢 98.0%	🟢 90.0%	🟢 100.0%	🟢 97.8%
`src/hooks/session-start.ts`	🟢 98.9%	🟢 93.2%	🟢 100.0%	🟢 98.8%
`src/hooks/spawn-wiki-worker.ts`	🔴 0.0%	🔴 0.0%	🔴 0.0%	🔴 0.0%
`src/hooks/summary-state.ts`	🔴 77.6%	🔴 84.1%	🟢 92.8%	🔴 78.7%
`src/hooks/virtual-table-query.ts`	🟢 98.1%	🟢 93.2%	🟢 95.8%	🟢 98.9%
`src/hooks/wiki-worker.ts`	🟢 98.8%	🟢 97.1%	🟢 100.0%	🟢 98.7%
`src/shell/deeplake-shell.ts`	🔴 0.0%	🔴 0.0%	🔴 0.0%	🔴 0.0%
`src/shell/grep-core.ts`	🟢 96.2%	🟢 90.3%	🟢 97.0%	🟢 100.0%
`src/utils/output-cap.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/utils/session-path.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%
`src/utils/version-check.ts`	🟢 90.3%	🔴 85.0%	🟢 100.0%	🟢 96.0%
`src/utils/wiki-log.ts`	🟢 100.0%	🟢 100.0%	🟢 100.0%	🟢 100.0%

_{Generated for commit 1e0dd69.}

…ine count Three issues flagged by the automated review on PR #63: 1. `writeReadCacheFile` (src/hooks/pre-tool-use.ts) had no containment guard: `path.join(cacheRoot, session, "read", rel)` resolves `..` segments in `rel`, so a DB-controlled `virtualPath` could escape the per-session cache dir. Added a check that `absPath` stays under `expectedRoot = join(cacheRoot, session, "read")` and throws `"writeReadCacheFile: path escapes cache root: <abs>"` otherwise. Uses `path.sep` so the boundary check is correct on any platform. 2. The inline `/index.md` fallback in `processPreToolUse` (pre-tool- use.ts:334-347) was unreachable after fix #1 landed, and if somehow reached would regenerate the old broken single-table index (queries only `memory`, uses the header "${n} sessions:", omits `## Sessions`). Removed; the dual-table builder in `virtual-table-query.ts` now owns index generation exclusively. 3. `src/utils/output-cap.ts` had a dead `cut += lineBytes` accumulator (would trigger `noUnusedLocals` under strict TS config) and a trailing-newline off-by-one: `output.split("\n")` on `"a\nb\n"` returns `["a", "b", ""]`, so `totalLines` over-counted by 1 whenever the input ended with a newline — which grep and cat both do. The footer reported one extra "elided line" that was the empty terminator, not a real content line. Dropped the dead accumulator and adjusted totalLines to subtract the trailing empty entry. Test coverage: - `claude-code/tests/pre-tool-use-baseline-cloud.test.ts` — 4 new cases on `writeReadCacheFile`: happy path, `../../../etc/passwd` traversal refused (and no file lands anywhere under cacheRoot), absolute-root escape refused, and a path that normalizes back inside the cache (`/sessions/foo/../bar.json`) is still accepted. Plus one integration test that pins the removal of the inline /index.md fallback: `processPreToolUse` must materialize the dual-table builder's content and must NOT issue its own `FROM "memory" WHERE path LIKE '/summaries/%'` SELECT. - `claude-code/tests/output-cap.test.ts` — 2 new cases on the line counting: with a trailing newline the kept-lines + elided-lines sum matches the original line count exactly (no off-by-one), and without a trailing newline the count is still exact. Full suite: 844 / 844 tests passing.

…ed row The jscpd duplication check used to run as a step inside the "Typecheck and Test" job, so the PR checks table only showed a single aggregate row for both. Reviewers couldn't tell at a glance whether duplication passed without opening the combined log. Move jscpd into its own `duplication` job named "Duplication check". Small installation cost (extra `npm install`, runs in parallel with the test job) in exchange for clear attribution on the PR checks table. Artifact upload and the jscpd config stay the same.

PR #63 bot review flagged several source files as under-covered. Added a dedicated branch-coverage suite for the pre-tool-use hook and registered the two now-sufficient files in `vitest.config.ts` so their thresholds are enforced on every run. `claude-code/tests/pre-tool-use-branches.test.ts` — 46 test cases: - Pure helpers: buildAllowDecision, buildReadDecision, rewritePaths, touchesMemory, isSafe (positive + negative paths). - getShellCommand: Grep hit + miss, Read on file + directory, Bash safe + unsafe + non-memory, Glob hit + miss, unknown tool → null. - extractGrepParams: Grep output_mode=count, empty path → "/", Bash delegating to parseBashGrep, non-grep Bash → null, unknown tool → null. - processPreToolUse end-to-end: - returns null for non-memory Bash - returns `[RETRY REQUIRED]` guidance for unsupported commands - falls back to the shell bundle when no config is loaded - Glob + Bash `ls` + Bash `ls -la` long format - ls with both file-level (-rw-) and directory (drwx) entries; also empty-name rows skipped by the `if (!name) continue` guard - cat / head / tail / wc -l / cat | head pipeline - find / find | wc -l - Grep tool delegates to handleGrepDirect; null result falls through to the read/ls branch instead of short-circuiting - direct query throws → shell bundle fallback - Index cache short-circuit: three cases covering the inline readVirtualPathContentsWithCache callback that the bash compiler passes into executeCompiledBashCommand — cache hit, cache miss (writes fresh index), empty cachePaths edge case. Coverage after this suite (measured on pre-tool-use-branches + pre-tool-use-baseline-cloud): src/hooks/pre-tool-use.ts lines 98.9 branches 90.0 funcs 93.8 stmts 98.6 src/hooks/memory-path-utils.ts lines 100 branches 90.9 funcs 100 stmts 100 Both now registered under `coverage.thresholds` at 90 / 90 / 90 / 90 in `vitest.config.ts`, alongside the five existing PR-tracked files. Full suite: 890 / 890 passing (was 844 before this commit).

… paths CI (HOME=/home/runner) reported two failures on the just-added branch coverage suite: AssertionError: expected '/home/emanuele/.deeplake/memory/...' to be '/sessions/a.json' The `rewritePaths` and `touchesMemory` assertions hardcoded my local home path. The real MEMORY_PATH in production is join(homedir(), ".deeplake", "memory"), so hardcoded absolute paths in tests don't survive anywhere except my workstation — not CI, not another developer's machine. Import `homedir` + `join` from node:os / node:path and build MEM_ABS once at the top of the file. The two affected cases now use template strings so the values match whatever home the test runner is using. The other tests in the suite already use ~-prefixed literals, matched by the TILDE_PATH branch independently of homedir. Verified: `env -i HOME=/home/runner PATH=$PATH npx vitest run` — 46 / 46 pass.

…-sessions" This reverts commit b590669, reversing changes made to 21aff84.

efenocchi added 30 commits April 20, 2026 19:10

refactor(codex-capture): drop DEEPLAKE_CAPTURE fallback

f2ed8f2

Mirror of the Claude Code capture cleanup: converge on HIVEMIND_CAPTURE as the only env flag the hook honors.

refactor(session-start-setup): drop DEEPLAKE_WIKI_WORKER fallback

05faeb5

Same rationale as the other hook files: the wiki-worker flag is internal, set by the plugin on itself, and the migration alias adds noise without buying anything.

refactor(codex-session-start): drop DEEPLAKE_WIKI_WORKER fallback

7360c50

Mirror of the CC cleanup on the Codex side.

refactor(codex-session-start-setup): drop DEEPLAKE_WIKI_WORKER / DEEP…

5d8a0e8

…LAKE_CAPTURE fallbacks Final hook in the cleanup. After this, no DEEPLAKE_WIKI_WORKER or DEEPLAKE_CAPTURE reference survives in the shipped hook surface.

Merge remote-tracking branch 'origin/main' into fix/session-end-race-…

bc9cc30

…and-env-cleanup

efenocchi and others added 3 commits April 20, 2026 15:27

Merge pull request #62 from activeloopai/fix/session-end-race-and-env…

b382625

…-cleanup fix: SessionEnd/Stop race with periodic worker + drop DEEPLAKE_ env var aliases

chore: bump version to 0.6.38

df78072

claude Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread src/hooks/pre-tool-use.ts Outdated

claude Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread src/utils/output-cap.ts Outdated

claude Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread src/utils/output-cap.ts Outdated

efenocchi added 5 commits April 20, 2026 23:36

efenocchi added 4 commits April 21, 2026 00:05

davidbuniat approved these changes Apr 21, 2026

View reviewed changes

efenocchi merged commit b590669 into optimizations Apr 21, 2026
3 checks passed

efenocchi added a commit that referenced this pull request Apr 21, 2026

Revert "Merge pull request #63 from activeloopai/fix/index-md-include…

033b6ed

…-sessions" This reverts commit b590669, reversing changes made to 21aff84.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: close 48-pt LoCoMo accuracy gap with 5 plugin hook fixes#63

fix: close 48-pt LoCoMo accuracy gap with 5 plugin hook fixes#63
efenocchi merged 42 commits intooptimizationsfrom
fix/index-md-include-sessions

efenocchi commented Apr 20, 2026

Uh oh!

claude Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

efenocchi commented Apr 20, 2026

Summary

Headline numbers

Per-category accuracy (full 100-QA runs)

Signal quality at the tool boundary

The five fixes

#1 — /index.md lists session files too (4271baf)

#2 — Read-tool intercepts return file_path, not command (4c5d50b)

#3 — Shell bundle silences [deeplake-sql] trace in one-shot mode (35a7e87)

#4 — LIKE clauses that consume sqlLike() output use ESCAPE '\' (3d15454)

#5 — Cap plugin tool output at 8 KB (2c0d65d)

Test coverage

Non-determinism caveat (honest)

Plugin workspace cross-check

Reproduction

Test plan

Uh oh!

claude Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

#1 — `/index.md` lists session files too (`4271baf`)

#2 — Read-tool intercepts return `file_path`, not `command` (`4c5d50b`)

#3 — Shell bundle silences `[deeplake-sql]` trace in one-shot mode (`35a7e87`)

#4 — `LIKE` clauses that consume `sqlLike()` output use `ESCAPE '\'` (`3d15454`)

#5 — Cap plugin tool output at 8 KB (`2c0d65d`)

claude Bot commented Apr 20, 2026 •

edited

Loading

github-actions Bot commented Apr 21, 2026 •

edited

Loading