Skip to content

fix: re-register service when keepalive ping returns 404#11

Merged
mortenoh merged 2 commits intomainfrom
fix/keepalive-reregister-on-404
Apr 13, 2026
Merged

fix: re-register service when keepalive ping returns 404#11
mortenoh merged 2 commits intomainfrom
fix/keepalive-reregister-on-404

Conversation

@mortenoh
Copy link
Copy Markdown
Contributor

Summary

  • When the orchestrator loses a service entry (Redis TTL expiry, restart), keepalive pings return 404 forever with no recovery path. The only workaround was recreating the service container (docker compose up -d --force-recreate).
  • Now _keepalive_loop detects 404 responses, waits a configurable grace period (default 30s, avoids thundering herd), then re-registers with the original parameters. On success the ping URL updates and keepalive resumes; on failure it retries next cycle.
  • Adds RegistrationConfig pydantic model to carry re-registration parameters with type safety instead of a loose dict[str, Any].
  • Exposes re_register_grace_period parameter on BaseServiceBuilder.with_registration().
  • Bumps version to 0.9.0.

Test plan

  • test_keepalive_reregisters_on_404 -- 404 triggers re-registration, ping_url updates
  • test_keepalive_grace_period_respected -- re-registration waits for grace period
  • test_keepalive_reregistration_failure_retries -- failed re-registration retries on next cycle
  • test_keepalive_non_404_does_not_reregister -- 500 errors don't trigger re-registration
  • test_keepalive_no_registration_kwargs_skips -- without config, 404 just logs (backward compat)
  • All 375 existing tests pass, lint and type checks clean

When the orchestrator loses a service entry (e.g. Redis TTL expiry
during restart), keepalive pings return 404 forever with no recovery.
The only workaround was recreating the service container.

Now _keepalive_loop detects 404 responses, waits a configurable grace
period (default 30s) to avoid thundering herd, then calls
register_service() with the original parameters. On success, the
ping URL is updated and keepalive resumes normally. On failure, the
loop retries on the next cycle.

Adds RegistrationConfig pydantic model to carry re-registration
parameters with type safety. Exposes re_register_grace_period on
BaseServiceBuilder.with_registration().

Bumps version to 0.9.0.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 86.48649% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/servicekit/api/registration.py 91.17% 2 Missing and 1 partial ⚠️
src/servicekit/api/service_builder.py 33.33% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@mortenoh mortenoh merged commit 8cc4263 into main Apr 13, 2026
1 check passed
@mortenoh mortenoh deleted the fix/keepalive-reregister-on-404 branch April 13, 2026 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant