Feat/locomo accuracy improvements by kaghni · Pull Request #55 · activeloopai/hivemind

kaghni · 2026-04-16T18:27:32Z

Summary

Version Bump

To trigger a release, bump "version" in package.json before merging.

Change type Version bump Example

Bug fix patch (1.2.0 → 1.2.1) "version": "1.2.1"

New feature minor (1.2.0 → 1.3.0) "version": "1.3.0"

Breaking change major (1.2.0 → 2.0.0) "version": "2.0.0"

If you don't bump the version, no release will be created.

Test plan

Tests pass locally (npm test)
Relevant new tests added
Version bumped in package.json, or no release needed for this change

When the primary table name ends with _sessions, automatically search the companion _memory table (wiki-style summaries) and prepend matches. This gives Claude structured context alongside raw session data. LoCoMo benchmark (20 questions, conv 0): Before: 45.0% (stock 0.6.32, sessions-only grep) After: 62.5% (cross-table enrichment) Delta: +17.5% Improvements: Q7 Q8 Q9 Q15 Q17 flipped WRONG->CORRECT Regressions: Q14 Q18 (2 questions regressed) Net: +5 correct, -2 regressed = +3 net correct answers Still below Zep (75.1%) and Memobase (75.8%) targets.

The virtual index.md was querying paths LIKE /summaries/% which only matched the default hivemind table layout. For benchmark tables like locomo_sessions, paths are /locomo_sessions/sessions/... so the index was always empty. Now auto-detects companion memory table (X_sessions -> X_memory) and falls back to the primary table. Descriptions from the memory table provide richer index entries. LoCoMo benchmark (30 questions): Before: 38.3% (cross-table only, broken index) After: 41.7% (cross-table + working index) Delta: +3.4% (6 improved, 4 regressed)

When Claude tries an unsupported command (python3, node, etc.) on a specific memory file path, automatically convert it to a cat/read instead of returning a hard RETRY error. This reduces retrieval failures since haiku often gives up after seeing the retry message. LoCoMo benchmark (30 questions): Before: 41.7% (cross-table + index fix) After: 45.0% (+ auto-read conversion) Delta: +3.3% Retrieval failures: 8 -> 5

When grep matches entries in the companion memory table, return the full summary content (not just matching lines). Summaries are compact structured wiki entries (Key Facts, People, Entities sections) that give Claude complete context in a single grep call, reducing the need for multiple follow-up reads. Also moved output array declaration before cross-table section to fix scope issue. LoCoMo benchmark (30 questions): Before: 45.0% (auto-read conversion) After: 51.7% (full summaries in grep) Delta: +6.7% (5 improved, 3 regressed) Retrieval failures: 5 (unchanged) Passed baseline (64.4% flat files). Still below Zep (75.1%).

When grep pattern contains multiple words (e.g. "Caroline birthday"), split into individual words and use SQL OR to match any word. This casts a wider net and finds sessions that mention any of the search terms, improving recall for complex queries. Single-word patterns remain unchanged (exact LIKE match). LoCoMo benchmark (30 questions): Before: 51.7% (full summaries in grep) After: 56.7% (+ smart multi-word OR grep) Delta: +5.0% Cumulative improvement from stock 0.6.32: 38.3% -> 56.7% (+18.4%) Approaching Zep (75.1%) and baseline flat files (64.4%).

Extract date_time from matched session JSON and prepend to the first matching line per file. This gives Claude temporal context without needing to read the full session, significantly helping time-based questions like "When did X happen?". LoCoMo benchmark (30 questions): Before: 56.7% (smart multi-word OR grep) After: 61.7% (+ session date in grep output) Delta: +5.0% Cumulative from stock 0.6.32: 38.3% -> 61.7% (+23.4%) Approaching baseline flat files (64.4%). Closing on Zep (75.1%).

claude · 2026-04-16T18:27:57Z

Claude finished @kaghni's task in 1m 41s —— View job

Review

Command injection (pre-tool-use.ts:202): cat ${cleanPath} is unsafe — \S+ can capture ;cmd or $(cmd) suffixes, enabling shell injection. Suggest quoting the path.
Inconsistent multi-word search (grep-direct.ts:160): companion memory table uses full-phrase LIKE while primary table uses per-word OR expansion — multi-word queries won't benefit from OR matching in the companion table.
Spurious @ts-ignore (grep-direct.ts:207): the expression is valid TS; the suppressor could silently mask a real error.

claude · 2026-04-16T18:29:40Z

+          hookEventName: "PreToolUse",
+          permissionDecision: "allow",
+          updatedInput: {
+            command: `cat ${cleanPath}`,


Command injection: cleanPath is extracted via \S+ which matches non-whitespace including ;, |, $(...) — a crafted command like ~/.deeplake/memory/foo;touch /tmp/x would become cat /foo;touch /tmp/x. Wrap in single quotes and escape embedded single quotes.

Suggested change

command: `cat ${cleanPath}`,

command: `cat '${cleanPath.replace(/'/g, "'\\''")}'`,

claude · 2026-04-16T18:29:49Z

+      : (sessionsTable !== table ? sessionsTable : null);
+    if (memoryTable && memoryTable !== table) {
+      try {
+        const contentFilter = ` AND summary ${likeOp} '%${escapedLike}%'`;


Inconsistent search logic: the primary table uses per-word OR filters for multi-word queries (line 131), but the companion memory table always uses escapedLike (full phrase), so multi-word searches won't find matches in the companion table that the primary search would catch. Apply the same word-filter logic here.

Suggested change

const contentFilter = ` AND summary ${likeOp} '%${escapedLike}%'`;

const words2 = pattern.split(/\s+/).filter((w: string) => w.length > 2);

const contentFilter = words2.length > 1

? ` AND (${words2.slice(0, 4).map((w: string) => `summary ${likeOp} '%${sqlLike(w)}%'`).join(" OR ")})`

: ` AND summary ${likeOp} '%${escapedLike}%'`;

claude · 2026-04-16T18:29:54Z

        const prefix = multi ? `${p}:` : "";
        const ln = lineNumber ? `${i + 1}:` : "";
-        matched.push(`${prefix}${ln}${lines[i]}`);
+        matched.push(`${prefix}${sessionDate}${ln}${lines[i]}`);


The @ts-ignore is unnecessary — sessionDate && (sessionDate = "") is valid TypeScript; the suppressor could hide a real type error on the next line. Use a plain if instead.

Suggested change

matched.push(`${prefix}${sessionDate}${ln}${lines[i]}`);

if (sessionDate) sessionDate = "";

Two fixes from PR #55 review: 1. SECURITY: Quote cleanPath in auto-read conversion to prevent command injection. Previously cleanPath (from user-provided memory path) was interpolated raw into `cat ${cleanPath}`, allowing crafted paths like "foo;touch /tmp/x" to execute arbitrary commands. Now wraps in single quotes and escapes embedded quotes. 2. CORRECTNESS: Apply same multi-word OR logic to companion memory table search. Primary table split multi-word queries into per-word OR filters, but companion memory table always used the full phrase, causing multi-word searches to miss matches in the companion table. LoCoMo benchmark impact: within variance (63-70% on 30q). Security fix essential; consistency fix makes search behavior uniform.

@ts-ignore

- Replace @ts-ignore comment with a plain if guard (no need to suppress the diagnostic — assigning to a let is valid TypeScript). - Rebuild claude-code and codex bundles so they match the committed source. CI was failing on bundle/ drift — the accuracy commits landed source changes but never re-ran npm run build.

The earlier auto-read change was too broad: it converted any unsupported command containing a memory-path into a cat, even when the path was embedded inside command substitution, backticks, or attached to a network request such as curl -d @<path> or wget. That hid the RETRY guidance for commands that are genuinely unsafe to run and broke 4 pre-tool-use tests. Tightened the check to two conditions: - Command starts with a known read-like interpreter (python/python3/ node/deno/bun/ruby/perl) — not curl, wget, or arbitrary tools. - No shell metacharacters (dollar-paren, backtick, semicolon, pipe, ampersand, angle brackets, parens, backslash). Also tightened the path regex from non-whitespace to word/dot/slash/ underscore/dash so trailing metacharacters cannot glue onto the extracted path. Tests: 353/353 passing (was 349/353 with 4 pre-tool-use failures).

kaghni · 2026-04-21T17:48:27Z

Superseded by a fresh branch on top of main. The first 3 commits of this branch (dual-table grep, index.md fix) have been effectively merged via #64 and the grep-core.ts refactor. Only 4 unique accuracy improvements remain, which we'll re-apply to main's new architecture in a new PR.

Balanced 60-question benchmark:

origin/main (latest): 13.3%
this branch: 40.8% (+27.5%)

See deeplake-blitz PR #1 for the full comparison and methodology.

kaghni added 6 commits April 16, 2026 09:20

claude Bot reviewed Apr 16, 2026

View reviewed changes

kaghni added 3 commits April 17, 2026 10:50

kaghni closed this Apr 21, 2026

kaghni mentioned this pull request Apr 21, 2026

feat: LoCoMo accuracy improvements — auto-read + multi-word OR #67

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/locomo accuracy improvements#55

Feat/locomo accuracy improvements#55
kaghni wants to merge 9 commits intomainfrom
feat/locomo-accuracy-improvements

kaghni commented Apr 16, 2026

Uh oh!

claude Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

claude Bot Apr 16, 2026

Uh oh!

claude Bot Apr 16, 2026

Uh oh!

claude Bot Apr 16, 2026

Uh oh!

kaghni commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Change type	Version bump	Example
Bug fix	patch (1.2.0 → 1.2.1)	`"version": "1.2.1"`
New feature	minor (1.2.0 → 1.3.0)	`"version": "1.3.0"`
Breaking change	major (1.2.0 → 2.0.0)	`"version": "2.0.0"`

	command: `cat ${cleanPath}`,
	command: `cat '${cleanPath.replace(/'/g, "'\\''")}'`,

-        const contentFilter = ` AND summary ${likeOp} '%${escapedLike}%'`;
+        const words2 = pattern.split(/\s+/).filter((w: string) => w.length > 2);
+        const contentFilter = words2.length > 1
+          ? ` AND (${words2.slice(0, 4).map((w: string) => `summary ${likeOp} '%${sqlLike(w)}%'`).join(" OR ")})`
+          : ` AND summary ${likeOp} '%${escapedLike}%'`;

	matched.push(`${prefix}${sessionDate}${ln}${lines[i]}`);
	if (sessionDate) sessionDate = "";

Conversation

kaghni commented Apr 16, 2026

Summary

Version Bump

Test plan

Uh oh!

claude Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Uh oh!

claude Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

kaghni commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 16, 2026 •

edited

Loading