API specs for offline support by camillecroci · Pull Request #1743 · Financial-Times/dotcom-reliability-kit

camillecroci · 2026-04-09T15:50:04Z

Description

This PR is meant to communicate the necessary changes to the client-metrics API to support offline.
Sorry there is more code change in the PR that I originally wanted because I didnt want to forget.

Current implementation

sendEvent is pulling out the first x events from a queue and calling fetch to POST them to the client-metrics server. Three things are calling sendEvent:

recordEvent: every time an event is being added to the queue of event, if the queue is big enough, it calls the sendEvent
startTimer creates a setInterval that calls sendEvent every 10 seconds
sendEvent calls itself if there are still events in the queue after sending a first batch.

Stuff we want

we try to send events and failed ? We dont want to lose those events
we want to have some retry mechanism to try sending them
we don't want a queue that grows indefinitely
we dont want an infinite recursion of the sendEvent
no breaking change

Suggested changes

To avoid loosing the events: we add them back in case of a failed fetch
We don't really need to think of a retry mechanism because we have a timer that send events every 10 seconds if there are any in the queue
We already have a mechanism within the queue that drops old events when we reach a certain size of the queue
We can have a counter of failed attempt at fetch. If we reach a certain number of attempts, we block the sending events from sendEvent and from recordEvent. Those 2 functions follow this logic ' if the queue is big enough, try to send'. But if we failed many attempts, the size of the queue should no longer be a trigger to try to send. We only rely on a retry mechanism

Extra

Problem

If we think about the app going offline, it might be a lot to keep trying to send events every 10 seconds. This might have an impact on the mobile performance (battery, radio usage, and generally, the app doing an extra action when it could be doing nothing).

Suggested mitigation: exponential backoff

If we detect some failures, we can increase the timeout by 20% (for example) until we reach a max time (2 minutes).
So if its online for a long time, the app only tries to send metrics every 2 minutes. As soon as the fetch is successful, the timeout is back to its initial 10 seconds.

Breaking change

I don't think there are any with the suggested changes

I confirm that the code in this PR has not been generated by AI

rowanmanning

Looks good to me, don't want to nitpick much because we're still early-stages but can see a few optimisations for the repeated logic 🙂

rowanmanning · 2026-04-13T14:00:09Z

packages/client-metrics-web/lib/metrics-client.js


-		if (this.#queue.size) {
+		if (this.#queue.size 
+			&& (this.#fetchFailed < this.#maxFetchAttempt)) {


nitpick: it's an implementation detail so fine if this isn't the right time to think about it, but I see the same conditions a bunch. Maybe this could be a getter on the class?

class MetricsClient { get #fetchTriesExhausted() { return this.#fetchFailed >= this.#maxFetchAttempt; } }

feat: add specs for offline support

edb33c4

camillecroci requested a review from a team as a code owner April 9, 2026 15:50

rowanmanning reviewed Apr 13, 2026

View reviewed changes

feat: adds back event in the queue if fetch has failed

6e99444

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API specs for offline support#1743

API specs for offline support#1743
camillecroci wants to merge 2 commits intomainfrom
cc/offline-support-specs

camillecroci commented Apr 9, 2026 •

edited

Loading

Uh oh!

rowanmanning left a comment

Uh oh!

rowanmanning Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

camillecroci commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Current implementation

Stuff we want

Suggested changes

Extra

Problem

Suggested mitigation: exponential backoff

Breaking change

Uh oh!

rowanmanning left a comment

Choose a reason for hiding this comment

Uh oh!

rowanmanning Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

camillecroci commented Apr 9, 2026 •

edited

Loading