Refactor wake registry sync and fix child wake delivery#4632
Open
KyleAMathews wants to merge 22 commits into
Open
Refactor wake registry sync and fix child wake delivery#4632KyleAMathews wants to merge 22 commits into
KyleAMathews wants to merge 22 commits into
Conversation
Contributor
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4632 +/- ##
==========================================
+ Coverage 59.46% 59.50% +0.04%
==========================================
Files 385 385
Lines 43039 43121 +82
Branches 12383 12403 +20
==========================================
+ Hits 25591 25658 +67
- Misses 17371 17387 +16
+ Partials 77 76 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
Electric Agents Mobile BuildLocal mobile checks ran for commit The EAS Android preview build was skipped because the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refactor the agents-server wake registry onto TanStack DB/Electric collections and fix child wake delivery across the server and runtime. The user-visible impact is that parent agents reliably receive child completion wakes, cron wake setup is safer under concurrent tests, and wake-registry sync behavior is covered by isolated CI-stable integration tests.
Root Cause
Child wake delivery had multiple loss and flake points across the end-to-end path.
On the runtime side, the pull-wake runner intentionally avoids claiming multiple wakes for the same stream concurrently. When another wake arrived while that stream already had an active claim or handler, it was deferred, but deferred wakes were stored as a single event per stream path:
That meant later deferred wakes for the same parent stream overwrote earlier ones.
On the server side, the wake registry was a manual ShapeStream-backed cache. Registration mutations, Electric visibility, and cache lifecycle had to be coordinated by hand, which made terminal
runFinishedwake evaluation vulnerable to stale or missing in-memory state. Follow-up CI failures also exposed a test isolation problem: integration tests that reset the shared Postgres/Electric backend can delete schema/data while other concurrently running tests are polling Electric or evaluating wakes.Approach
The branch fixes both the runtime delivery issue and the server registry implementation.
Queued same-stream wake notifications are used to trigger serialized claim attempts after the active stream claim drains. The next successful claim is expected to drain the stream’s available pending wake rows together, while the queue prevents losing trigger notifications and avoids concurrent claims for the same stream.
Heartbeat state now records the
AbortSignalassociated with the in-flight heartbeat. A stale heartbeat from an aborted runner cannot suppress heartbeat startup after restart.The server wake registry now uses TanStack DB collections and optimistic actions over the
wake_registrationstable, backed by Electric sync. This removes the custom ShapeStream cache and the stale-cache reload fallback.Agents server startup now requires an Electric URL for the wake-registry runtime instead of silently falling back to a non-syncing local load path.
Cron stream creation now tolerates a concurrent
409if another caller created the stream after the existence check.Sensitive integration tests now either avoid shared-backend resets or run on isolated Docker Compose backends/ports, preventing one test from dropping schema/data while another test is still using it.
Key Invariants
wake_registrationscollection.Non-goals
wake_registrationsschema.Trade-offs
The main implementation choice was to replace the custom ShapeStream cache with TanStack DB rather than continue adding cache-reload fallbacks. TanStack DB adds explicit package dependencies to
@electric-ax/agents-server, but it gives the registry a collection/effect model that better matches the rest of the agents stack and removes bespoke cache mutation logic.For CI stabilization, isolating reset-heavy tests costs some additional Docker Compose setup and ports. That is preferable to serializing the whole package test suite or weakening assertions, because it keeps the tests representative while preventing cross-file data deletion races.
The runtime queue remains per stream path, matching the existing concurrency guard. This preserves the “one active claim per stream” behavior while fixing the trigger-notification loss from storing only one same-stream wake notification while a claim was active. It does not mean wake rows are handled one at a time; the next successful claim can still drain the stream’s available pending wake rows together.
Relation to the previous child wake fix
The previous fix in #4613 addressed a later runtime acking bug: when multiple
wakerows were already present in one pending handler window,processWakeselected one wake but acknowledged the whole window, effectively consuming sibling wake rows. That fix batches coalesced wake rows into onewake_batchpayload.This PR fixes earlier loss points:
Verification
Targeted runtime verification:
Targeted agents-server verification run during the branch:
Changeset validation:
Result:
CI status for the latest pushed commit is green for all active checks.
Files changed
.changeset/fix-deferred-pull-wakes.md@electric-ax/agents-runtimeand@electric-ax/agents-server.packages/agents-runtime/src/pull-wake-runner.tspackages/agents-runtime/test/pull-wake-runner.test.tspackages/agents-server/package.jsonandpnpm-lock.yamlpackages/agents-server/src/wake-registry.tswake_registrations.packages/agents-server/src/entity-manager.tspackages/agents-server/src/host.tspackages/agents-server/test/*.test.tsdocs/superpowers/plans/2026-06-19-wake-registry-tanstack-db.mddocs/superpowers/specs/2026-03-23-wake-registry-tanstack-db-design.mdRelated: #4613