Skip to content

fix(worker): register HNSW companion tablets with Zero on schema mutation#9712

Open
shaunpatterson wants to merge 2 commits into
dgraph-io:mainfrom
shaunpatterson:sp/hnsw-companion-tablet-register
Open

fix(worker): register HNSW companion tablets with Zero on schema mutation#9712
shaunpatterson wants to merge 2 commits into
dgraph-io:mainfrom
shaunpatterson:sp/hnsw-companion-tablet-register

Conversation

@shaunpatterson
Copy link
Copy Markdown
Contributor

Summary

When a schema mutation creates or rebuilds a float32vector predicate with an HNSW index, the indexer writes three companion tablets to Badger: <pred>__vector_, <pred>__vector_entry, and <pred>__vector_dead. These hold the HNSW graph structure.

The base predicate's tablet is registered with Zero via groups().Tablet(pred) at the top of runSchemaMutation, but the companion tablets are never registered. They exist only in Badger, not in Zero's raft membership state.

Symptoms

  1. Log noise. Zero's 5-minute orphan reconciliation sweep (zero.go::deletePredicates) sees the companion tablets in Alpha's tablet-size reports, doesn't find them in its membership state, and emits delete instructions for them on every sweep. Alpha silently ignores those instructions via the __vector_ filter in worker/predicate_move.go, but the noise is permanent for the lifetime of the cluster.

  2. similar_to queries hang. More seriously, Alpha refuses to serve queries against tablets that Zero has not acknowledged. similar_to returns 30-second timeouts on the affected predicate even though the HNSW data exists on disk.

Reproduction

In a multi-tenant cluster, apply an HNSW schema in tenant A through a clean code path that does register companion tablets (e.g. via an admin tool), then apply the same HNSW schema in tenant B via Alter(). Diff Zero's /state:

ns=A: A-embedding, A-embedding__vector_, A-embedding__vector_entry, A-embedding__vector_dead   # OK
ns=B: B-embedding                                                                              # missing 3

similar_to(B-embedding, ...) hangs on tenant B; tenant A serves immediately.

Fix

Inform Zero of the companion tablets in the same loop that handles the base predicate, gated on VFLOAT + non-empty IndexSpecs. groups().Inform() is idempotent (it checks ServingTablet first) so this is safe on both initial creation and rebuilds. The registration is proposed through Zero's raft log, so it survives Zero restarts.

if su.GetValueType() == pb.Posting_VFLOAT && len(su.GetIndexSpecs()) > 0 {
    vecPreds := []string{
        hnsw.ConcatStrings(su.Predicate, hnsw.VecKeyword),
        hnsw.ConcatStrings(su.Predicate, hnsw.VecEntry),
        hnsw.ConcatStrings(su.Predicate, hnsw.VecDead),
    }
    if _, err := groups().Inform(vecPreds); err != nil {
        glog.Warningf("Failed to inform Zero about HNSW companion "+
            "tablets for %s: %v", su.Predicate, err)
    }
}

Validation

In an affected cluster I called pb.Zero/Inform manually for the three companion tablet predicates on a broken tenant. After the call:

  • The 3 tablets appeared in Zero's /state immediately.
  • The 5-minute orphan sweep stopped emitting delete instructions for them.
  • similar_to queries that had been timing out for 30 s started returning results.

This is the same RPC the patch invokes, just triggered automatically on the schema-mutation path instead of after-the-fact.

Test plan

  • CI passes
  • Existing HNSW schema tests still pass
  • Manual: apply HNSW schema in a new namespace, confirm companion tablets appear in Zero's /state immediately (not after a sweep)
  • Manual: similar_to against the new namespace serves on first query

🤖 Generated with Claude Code

…tion

When a schema mutation creates or rebuilds a `float32vector` predicate
with an HNSW index, the indexer writes three companion tablets to
Badger: `<pred>__vector_`, `<pred>__vector_entry`, and
`<pred>__vector_dead`. These hold the HNSW graph structure.

The base predicate's tablet is registered with Zero via
`groups().Tablet(pred)` at the top of `runSchemaMutation`, but the
companion tablets are never registered. They exist only in Badger, not
in Zero's raft membership state. Two consequences:

1. Zero's 5-minute orphan reconciliation sweep
   (`zero.go::deletePredicates`) sees the tablets in Alpha's
   tablet-size reports, does not find them in its membership state,
   and emits delete instructions for them on every sweep. Alpha
   silently ignores those instructions via the `__vector_` filter in
   `worker/predicate_move.go`, but the log noise is permanent.

2. More seriously, Alpha refuses to serve queries against tablets
   that Zero has not acknowledged. `similar_to` queries against the
   affected predicate hang and time out, even though the HNSW data
   exists on disk.

The fix is to inform Zero of the companion tablets in the same loop
that handles the base predicate. `groups().Inform()` is idempotent
(it checks `ServingTablet` first) so this is safe on initial creation
and on rebuilds. The registration is proposed through Zero's raft log,
so it survives Zero restarts.

This was discovered while debugging a tenant where HNSW search
returned 30 s timeouts after a successful schema reapply. After
calling `pb.Zero/Inform` manually for the three companion tablet
predicates, the tablets appeared in Zero's `/state`, the orphan
sweep stopped emitting delete instructions for them, and `similar_to`
queries returned results immediately.

Co-Authored-By: Claude <noreply@anthropic.com>
@shaunpatterson shaunpatterson requested a review from a team as a code owner May 26, 2026 01:07
@shiva-istari shiva-istari self-assigned this May 27, 2026
After registering HNSW companions with Zero in runSchemaMutation, the
backup builder now sees them via state.Groups[gid].Tablets and via the
pre-existing schema-scan workaround, producing duplicate entries in the
manifest predicate list and breaking TestVectorBackupManifestPredicates.

The schema-scan path stays as a safety net for legacy clusters whose
companions were never registered, but the merge is now idempotent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@shiva-istari shiva-istari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shaunpatterson Thanks for the PR. We tried to reproduce this with a multi-tenant test that applies an HNSW float32vector schema via Alter() in two separate namespaces and then runs similar_to against both.

  • Zero's /state shows only the base predicates (1-embedding_v, 2-embedding_w); the _vector, __vector_entry, __vector_dead companions are missing for every tenant, not just the second one.
  • similar_to returned results immediately in both tenants, no hang.

The companions being absent from /state is documented behavior: worker/online_restore.go:308-321 states these are never registered during normal operation, and Zero's checkPreds strips the _vector suffix dgraph/cmd/zero/oracle.go:370-372 to resolve them to the base predicate's group. So similar_to doesn't depend on those entries being present.

Since we couldn't reproduce the similar_to hang, could you add a test to this PR that reproduces the bug and fails on main? That way we can confirm we're fixing the actual issue rather than the symptom, and the fix is guarded against regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants