fix: re-register service when keepalive ping returns 404#11
Merged
Conversation
When the orchestrator loses a service entry (e.g. Redis TTL expiry during restart), keepalive pings return 404 forever with no recovery. The only workaround was recreating the service container. Now _keepalive_loop detects 404 responses, waits a configurable grace period (default 30s) to avoid thundering herd, then calls register_service() with the original parameters. On success, the ping URL is updated and keepalive resumes normally. On failure, the loop retries on the next cycle. Adds RegistrationConfig pydantic model to carry re-registration parameters with type safety. Exposes re_register_grace_period on BaseServiceBuilder.with_registration(). Bumps version to 0.9.0.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docker compose up -d --force-recreate)._keepalive_loopdetects 404 responses, waits a configurable grace period (default 30s, avoids thundering herd), then re-registers with the original parameters. On success the ping URL updates and keepalive resumes; on failure it retries next cycle.RegistrationConfigpydantic model to carry re-registration parameters with type safety instead of a loosedict[str, Any].re_register_grace_periodparameter onBaseServiceBuilder.with_registration().Test plan
test_keepalive_reregisters_on_404-- 404 triggers re-registration, ping_url updatestest_keepalive_grace_period_respected-- re-registration waits for grace periodtest_keepalive_reregistration_failure_retries-- failed re-registration retries on next cycletest_keepalive_non_404_does_not_reregister-- 500 errors don't trigger re-registrationtest_keepalive_no_registration_kwargs_skips-- without config, 404 just logs (backward compat)