Skip to content

[Core] Main loop() is blocked for seconds during contact save operations, causing packet loss on NRF52 and ESP32 client platforms #2397

@terminalvelocity23

Description

@terminalvelocity23

Description

This issue describes a serious performance bottleneck where the main loop() is blocked for up to 5.8 seconds on nRF52-based clients (like T114) when a new advertisement is received. This is caused by heavy, synchronous Flash write operations. Tests on different platforms (T114 as client, T114 as repeater, and V4/ESP32 as client) confirm that this problem consistently affects all client nodes with a large contact list.

Testing Methodology & Results

All tests were conducted by adding millis() timestamps to putBlobByKey and saveContacts to measure their execution time. Identical contact lists of 0/144/350 entries were used to show the direct correlation between dataset size and execution time.

Platform (Role) Contacts putBlobByKey Time saveContacts Time Total Blocking Time (Worst Case)
T114 / nRF52840 (Client) ~0 ~1535 ms ~960 ms ~2.5 sec
T114 / nRF52840 (Client) 144 ~1050 ms ~2300 ms ~3.3 sec
T114 / nRF52840 (Client) 350 (Full) ~1650 ms ~4200 ms ~5.8 sec
V4 / ESP32 (Client) ~0 ~1010 ms ~30-100 ms ~1.1 sec
V4 / ESP32 (Client) 350 (Full) ~1020 ms ~1480 ms ~2.5 sec
T114 / nRF52840 (Repeater) N/A Not called Not called 0 ms

Key Findings

  1. Execution time is data-dependent. The execution time of saveContacts() is directly proportional to the number of contacts in the list, not just the Flash memory technology.
  2. All client nodes are affected. Both nRF52 and ESP32 clients suffer from multi-second blocking periods when their contact list is full.
  3. Repeaters are immune. The bottleneck is specific to the client's saveContacts() implementation, which repeaters do not execute.
  4. putBlobByKey is a universal bottleneck. This operation for the "Share" feature takes about 1 second regardless of the platform.

Consequences

  • Silent Packet Loss: The SX1262 radio can buffer 1-2 packets during the block, but after its RX FIFO is full, all subsequent packets are irreversibly lost until the main loop is released. In a busy mesh, this leads to significant message loss.
  • Degraded User Experience: The UI becomes unresponsive during the lockout.
  • Poor Network Scalability: A growing mesh makes the problem worse, as a larger active network fills up the contact list faster.

Expected Behavior

Contact saving operations should be asynchronous or optimized to be non-blocking, allowing the node to continue receiving packets without interruption.

Additional Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions