Skip to content

fix(code_index): stabilize digest against runtime churn to stop rebuild thrash#5

Open
lanmower wants to merge 1 commit into
mainfrom
fix/codeinsight-digest-stability
Open

fix(code_index): stabilize digest against runtime churn to stop rebuild thrash#5
lanmower wants to merge 1 commit into
mainfrom
fix/codeinsight-digest-stability

Conversation

@lanmower
Copy link
Copy Markdown
Member

@lanmower lanmower commented Jun 4, 2026

@

Problem

current_digest() hashed the full git status --porcelain, so any uncommitted change flipped the digest -- including transient runtime emissions under .gm/ (exec-spool files, prd.yml, mem-*.json) that concurrent sessions churn constantly. Every flip forced index() to clear_codeinsight() and re-embed every chunk of all ~98 files (~280s of in-wasm BGE embedding), wedging the watcher heartbeat for minutes. Because the tree was never clean, the digest never stabilized and the full rebuild ran on nearly every codesearch.

Observed repeatedly on 2026-06-04: .status.json ts frozen 200s+, busy_until pushed minutes out, watcher log stuck at code_index: indexing root=. files=98 while a concurrent loop session held it.

Fix

Filter the porcelain to indexable-source paths before hashing in current_digest():

  • drop anything under .gm/ (transient runtime state, never indexed)
  • drop any path whose extension has no tree-sitter language (lang_for_ext returns None)
  • sort the kept lines so ordering noise does not flip the hash

A real source edit still flips the digest (sync-before-emit staleness preserved -- a stale index is never served after a genuine source change); runtime churn no longer does, so the full re-embed stops firing on every tick.

Smallest safe option: no schema change, no index-format change, no incremental-reindex rewrite -- only the staleness trigger is narrowed.
@

@
fix(code_index): stabilize digest against runtime churn to stop rebuild thrash

current_digest() hashed the full `git status --porcelain`, so ANY uncommitted
change flipped the digest -- including transient runtime emissions (.gm/exec-spool,
prd.yml, mem-*.json) that concurrent sessions churn constantly. Each flip forced
index() to clear_codeinsight() and re-embed every chunk of all ~98 files (~280s
in-wasm bge), wedging the watcher heartbeat for minutes; the digest never
stabilized so the full rebuild ran on nearly every codesearch.

Filter the porcelain to indexable-source paths only: drop anything under .gm/ and
any path whose extension has no tree-sitter language (lang_for_ext). A real source
edit still invalidates the index (sync-before-emit preserved); runtime churn no
longer does. Sort the kept lines before hashing so ordering noise does not flip it.
@
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant