fix(sync-service): Handle errors when collecting statistics by magnetised · Pull Request #4002 · electric-sql/electric

magnetised · 2026-03-12T11:08:08Z

Telemetry is currently started before the rest of the stack, which means that it starts trying to poll information after a few seconds, but there's no guarantee that the rest of the stack is available within the given timeframe.

Wrap all telemetry collection calls in try..catch to avoid spurious, transient, errors.

https://electricsql-04.sentry.io/issues/102382605/?alert_rule_id=130196&alert_timestamp=1773223876687&alert_type=email&notification_uuid=8fa23ea6-1e04-4a07-98bd-bd0765895424&project=4508410462404688&referrer=digest_email

codecov · 2026-03-12T11:11:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.75%. Comparing base (f7f6bc7) to head (5821ed5).
✅ All tests successful. No failed tests found.

❗ There is a different number of reports uploaded between BASE (f7f6bc7) and HEAD (5821ed5). Click for more details.

HEAD has 3 uploads less than BASE

Flag BASE (f7f6bc7) HEAD (5821ed5)

unit-tests 5 4

typescript 5 4

packages/typescript-client 1 0

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #4002       +/-   ##
===========================================
- Coverage   88.74%   75.75%   -12.99%     
===========================================
  Files          25       11       -14     
  Lines        2425      693     -1732     
  Branches      611      171      -440     
===========================================
- Hits         2152      525     -1627     
+ Misses        271      167      -104     
+ Partials        2        1        -1

Flag	Coverage Δ
packages/experimental	`87.73% <ø> (ø)`
packages/react-hooks	`86.48% <ø> (ø)`
packages/start	`82.83% <ø> (ø)`
packages/typescript-client	`?`
packages/y-electric	`56.05% <ø> (ø)`
typescript	`75.75% <ø> (-12.99%)`	⬇️
unit-tests	`75.75% <ø> (-12.99%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

msfstef · 2026-03-12T11:16:21Z

I'm pretty sure we had at some point moved the telemetry at the top of the tree for some reason - can we verify that this doesn't conflict with any previous decisions made and ens up causing more problems? We should have left comments on it

packages/sync-service/lib/electric/stack_supervisor.ex

msfstef · 2026-03-12T11:24:29Z

Found the relevant PR #2722 @icehaunter - it was moved to the top because we wanted telemetry to shut down last, not to start it first.

Otherwise there's various telemetry operations happening during shutdowns that don't find the telemetry processes alive to complete I think.

msfstef · 2026-03-12T11:25:24Z

Shouldn't we just keep it at the top but make the polling resilient to the stack not being present?

msfstef

Since all of these functions exist explicitly for the purpose of collecting telemetry, I would rather we make them return appropriate tuples so in case there's actual errors we report them as normal instead of almost silently stop reporting (I get there's a warning log but we don't want to make it an error log cause that path also includes errors that are expected due to boot order)

For ShapeDb.statistics it's a GenServer call so I suppose we do need to try-catch - although we could make the function itself encapsulate the missing genserver in its output as an error tuple
For ShapeDb.pending_buffer_size it's an ETS query, so we should just guard for missing ETS internally
For ShapeCache.count_shapes it drills down to Query.count_shapes which already returns a tuple we handle and a WriteBuffer.pending_count_diff which is an ETS check - although I'm assuming if Query.count_shapes passes then we can assume the ETS should exist, so we would just need to return the error tuple to handle it

I think this would be cleaner and still allows us to catch unexpected errors in our logic? The GenServer one is probably the more annoying one.

msfstef · 2026-03-12T11:58:34Z

packages/sync-service/lib/electric/stack_supervisor/telemetry.ex

-          )
-
-        _ ->
+      try do


why not make these ShapeDb function return tuples and handle them appropriately? I would rather we don't swallow all errors in the telemetry since some of these are exclusively called here, and any actual errors we want to catch and log would now be lost and we silently stop collecting telemetry

magnetised · 2026-03-12T14:02:05Z

@msfstef i wrote a long disagreement with you but actually I think you're right. As you say, these functions are purely for telemetry and there should be a small range of "expected" errors from them.

msfstef

Very nice! Minor comment to confirm that the non-error reporting stat is actually correct (sanity check)

msfstef · 2026-03-12T16:07:59Z

packages/sync-service/lib/electric/shape_cache/shape_status/shape_db/write_buffer.ex

  def pending_operations_count(stack_id) do
    :ets.info(operations_table_name(stack_id), :size)
+  rescue
+    ArgumentError -> 0


I don't remember the exact semantics of this buffer, but I assume this is an in-memory buffer so if the ETS table is not present then the operations are indeed 0 - is this correct or is there a chance of some perissted data not having been loaded and 0 being wrong?

magnetised requested a review from msfstef March 12, 2026 11:13

msfstef reviewed Mar 12, 2026

View reviewed changes

packages/sync-service/lib/electric/stack_supervisor.ex Outdated Show resolved Hide resolved

magnetised force-pushed the magnetised/fix-telemetry-errors branch from fad4531 to 219670e Compare March 12, 2026 11:17

This comment has been minimized.

Sign in to view

magnetised force-pushed the magnetised/fix-telemetry-errors branch from 219670e to d0a291d Compare March 12, 2026 11:47

magnetised requested a review from msfstef March 12, 2026 11:49

magnetised changed the title ~~fix(sync-service): Move telemetry to bottom of supervisor children~~ fix(sync-service): Handle errors when collecting statistics Mar 12, 2026

magnetised force-pushed the magnetised/fix-telemetry-errors branch from d0a291d to 863f935 Compare March 12, 2026 11:58

msfstef reviewed Mar 12, 2026

View reviewed changes

fix(sync-service): Handle errors when collecting statistics

5821ed5

magnetised force-pushed the magnetised/fix-telemetry-errors branch from 863f935 to 5821ed5 Compare March 12, 2026 14:43

magnetised requested a review from msfstef March 12, 2026 14:44

msfstef approved these changes Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sync-service): Handle errors when collecting statistics#4002

fix(sync-service): Handle errors when collecting statistics#4002
magnetised wants to merge 1 commit intomainfrom
magnetised/fix-telemetry-errors

magnetised commented Mar 12, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

msfstef commented Mar 12, 2026

Uh oh!

Uh oh!

This comment has been minimized.

msfstef commented Mar 12, 2026 •

edited

Loading

Uh oh!

msfstef commented Mar 12, 2026

Uh oh!

msfstef left a comment

Uh oh!

msfstef Mar 12, 2026

Uh oh!

magnetised commented Mar 12, 2026

Uh oh!

msfstef left a comment

Uh oh!

msfstef Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                        )
-                      _ ->
+                    try do

Conversation

magnetised commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

msfstef commented Mar 12, 2026

Uh oh!

Uh oh!

This comment has been minimized.

msfstef commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msfstef commented Mar 12, 2026

Uh oh!

msfstef left a comment

Choose a reason for hiding this comment

Uh oh!

msfstef Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

magnetised commented Mar 12, 2026

Uh oh!

msfstef left a comment

Choose a reason for hiding this comment

Uh oh!

msfstef Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

magnetised commented Mar 12, 2026 •

edited

Loading

codecov bot commented Mar 12, 2026 •

edited

Loading

msfstef commented Mar 12, 2026 •

edited

Loading