fix(worker): register HNSW companion tablets with Zero on schema mutation#9712
fix(worker): register HNSW companion tablets with Zero on schema mutation#9712shaunpatterson wants to merge 2 commits into
Conversation
…tion When a schema mutation creates or rebuilds a `float32vector` predicate with an HNSW index, the indexer writes three companion tablets to Badger: `<pred>__vector_`, `<pred>__vector_entry`, and `<pred>__vector_dead`. These hold the HNSW graph structure. The base predicate's tablet is registered with Zero via `groups().Tablet(pred)` at the top of `runSchemaMutation`, but the companion tablets are never registered. They exist only in Badger, not in Zero's raft membership state. Two consequences: 1. Zero's 5-minute orphan reconciliation sweep (`zero.go::deletePredicates`) sees the tablets in Alpha's tablet-size reports, does not find them in its membership state, and emits delete instructions for them on every sweep. Alpha silently ignores those instructions via the `__vector_` filter in `worker/predicate_move.go`, but the log noise is permanent. 2. More seriously, Alpha refuses to serve queries against tablets that Zero has not acknowledged. `similar_to` queries against the affected predicate hang and time out, even though the HNSW data exists on disk. The fix is to inform Zero of the companion tablets in the same loop that handles the base predicate. `groups().Inform()` is idempotent (it checks `ServingTablet` first) so this is safe on initial creation and on rebuilds. The registration is proposed through Zero's raft log, so it survives Zero restarts. This was discovered while debugging a tenant where HNSW search returned 30 s timeouts after a successful schema reapply. After calling `pb.Zero/Inform` manually for the three companion tablet predicates, the tablets appeared in Zero's `/state`, the orphan sweep stopped emitting delete instructions for them, and `similar_to` queries returned results immediately. Co-Authored-By: Claude <noreply@anthropic.com>
After registering HNSW companions with Zero in runSchemaMutation, the backup builder now sees them via state.Groups[gid].Tablets and via the pre-existing schema-scan workaround, producing duplicate entries in the manifest predicate list and breaking TestVectorBackupManifestPredicates. The schema-scan path stays as a safety net for legacy clusters whose companions were never registered, but the merge is now idempotent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shiva-istari
left a comment
There was a problem hiding this comment.
Hi @shaunpatterson Thanks for the PR. We tried to reproduce this with a multi-tenant test that applies an HNSW float32vector schema via Alter() in two separate namespaces and then runs similar_to against both.
- Zero's /state shows only the base predicates (1-embedding_v, 2-embedding_w); the _vector, __vector_entry, __vector_dead companions are missing for every tenant, not just the second one.
- similar_to returned results immediately in both tenants, no hang.
The companions being absent from /state is documented behavior: worker/online_restore.go:308-321 states these are never registered during normal operation, and Zero's checkPreds strips the _vector suffix dgraph/cmd/zero/oracle.go:370-372 to resolve them to the base predicate's group. So similar_to doesn't depend on those entries being present.
Since we couldn't reproduce the similar_to hang, could you add a test to this PR that reproduces the bug and fails on main? That way we can confirm we're fixing the actual issue rather than the symptom, and the fix is guarded against regression.
Summary
When a schema mutation creates or rebuilds a
float32vectorpredicate with an HNSW index, the indexer writes three companion tablets to Badger:<pred>__vector_,<pred>__vector_entry, and<pred>__vector_dead. These hold the HNSW graph structure.The base predicate's tablet is registered with Zero via
groups().Tablet(pred)at the top ofrunSchemaMutation, but the companion tablets are never registered. They exist only in Badger, not in Zero's raft membership state.Symptoms
Log noise. Zero's 5-minute orphan reconciliation sweep (
zero.go::deletePredicates) sees the companion tablets in Alpha's tablet-size reports, doesn't find them in its membership state, and emits delete instructions for them on every sweep. Alpha silently ignores those instructions via the__vector_filter inworker/predicate_move.go, but the noise is permanent for the lifetime of the cluster.similar_toqueries hang. More seriously, Alpha refuses to serve queries against tablets that Zero has not acknowledged.similar_toreturns 30-second timeouts on the affected predicate even though the HNSW data exists on disk.Reproduction
In a multi-tenant cluster, apply an HNSW schema in tenant A through a clean code path that does register companion tablets (e.g. via an admin tool), then apply the same HNSW schema in tenant B via
Alter(). Diff Zero's/state:similar_to(B-embedding, ...)hangs on tenant B; tenant A serves immediately.Fix
Inform Zero of the companion tablets in the same loop that handles the base predicate, gated on
VFLOAT + non-empty IndexSpecs.groups().Inform()is idempotent (it checksServingTabletfirst) so this is safe on both initial creation and rebuilds. The registration is proposed through Zero's raft log, so it survives Zero restarts.Validation
In an affected cluster I called
pb.Zero/Informmanually for the three companion tablet predicates on a broken tenant. After the call:/stateimmediately.similar_toqueries that had been timing out for 30 s started returning results.This is the same RPC the patch invokes, just triggered automatically on the schema-mutation path instead of after-the-fact.
Test plan
/stateimmediately (not after a sweep)similar_toagainst the new namespace serves on first query🤖 Generated with Claude Code