Summary
In lib/clients/metaapi/client_api_client.py (PyPI metaapi-cloud-sdk==29.1.1, latest at the time of writing), the error-retry backoff of refresh_ignored_field_lists never grows from zero, producing an unthrottled retry loop for the whole duration of a server-side outage.
The bug
except Exception as err:
self._logger.error(f'Failed to update hashing ignored field list {format_error(err)}')
self._ignored_field_lists_caches[region]['retryIntervalInSeconds'] = min(
self._ignored_field_lists_caches[region].get('retryIntervalInSeconds', 0) * 2, 300
)
await asyncio.sleep(self._ignored_field_lists_caches[region]['retryIntervalInSeconds'])
retryIntervalInSeconds is only written with its base value (self._retry_interval_in_seconds) after a successful refresh. When the cache entry is fresh (created in the same call) and the request fails, get('retryIntervalInSeconds', 0) returns 0, and 0 * 2 = 0 — forever. The except branch then does asyncio.sleep(0) and retries immediately, in a tight loop, until the endpoint recovers.
Each loop iteration also goes through HttpClient.request's own 5 internal retries, so the net effect during an outage is a continuous stream of requests plus one ERROR log per ~30s cycle, with zero pause between cycles.
Observed impact
During the mt-client-api-v1.london.agiliumtrade.ai 503 outage on 2026-06-10 (~22:00–23:00 UTC), a bot using the streaming API logged ~5,800 error lines/hour from this loop. The constant churn saturated the asyncio event loop enough that APScheduler jobs were skipped (Run time of job ... was missed by 0:01:02) and unrelated outbound HTTP calls timed out. It also flooded our error tracker.
Suggested fix
Seed the backoff with the base interval instead of 0:
previous = self._ignored_field_lists_caches[region].get('retryIntervalInSeconds', 0)
self._ignored_field_lists_caches[region]['retryIntervalInSeconds'] = min(
max(previous * 2, self._retry_interval_in_seconds), 300
)
This yields the intended 1 → 2 → 4 → … → 300s progression on consecutive failures (and keeps the existing reset-to-base on success).
Happy to provide more logs/details if useful. Thanks!
Summary
In
lib/clients/metaapi/client_api_client.py(PyPImetaapi-cloud-sdk==29.1.1, latest at the time of writing), the error-retry backoff ofrefresh_ignored_field_listsnever grows from zero, producing an unthrottled retry loop for the whole duration of a server-side outage.The bug
retryIntervalInSecondsis only written with its base value (self._retry_interval_in_seconds) after a successful refresh. When the cache entry is fresh (created in the same call) and the request fails,get('retryIntervalInSeconds', 0)returns0, and0 * 2 = 0— forever. The except branch then doesasyncio.sleep(0)and retries immediately, in a tight loop, until the endpoint recovers.Each loop iteration also goes through
HttpClient.request's own 5 internal retries, so the net effect during an outage is a continuous stream of requests plus one ERROR log per ~30s cycle, with zero pause between cycles.Observed impact
During the
mt-client-api-v1.london.agiliumtrade.ai503 outage on 2026-06-10 (~22:00–23:00 UTC), a bot using the streaming API logged ~5,800 error lines/hour from this loop. The constant churn saturated the asyncio event loop enough that APScheduler jobs were skipped (Run time of job ... was missed by 0:01:02) and unrelated outbound HTTP calls timed out. It also flooded our error tracker.Suggested fix
Seed the backoff with the base interval instead of 0:
This yields the intended 1 → 2 → 4 → … → 300s progression on consecutive failures (and keeps the existing reset-to-base on success).
Happy to provide more logs/details if useful. Thanks!