Skip to content

fix: defer service registration until app is serving requests#12

Merged
mortenoh merged 7 commits intomainfrom
fix/defer-registration-until-ready
Apr 13, 2026
Merged

fix: defer service registration until app is serving requests#12
mortenoh merged 7 commits intomainfrom
fix/defer-registration-until-ready

Conversation

@mortenoh
Copy link
Copy Markdown
Contributor

@mortenoh mortenoh commented Apr 13, 2026

Summary

  • Defers service registration to a background task that waits for the app to be fully ready before announcing to the orchestrator
  • Fixes the race condition where the orchestrator calls back to fetch configs before uvicorn is accepting connections
  • Registration always defers (both fail_on_error=True and False); the background task is awaited during shutdown so exceptions still propagate
  • Readiness check uses the actual health endpoint path from the builder (not hardcoded /health); falls back to TCP connect check when no health endpoint is configured
  • Registration aborts if the app never becomes ready (no silent 30s-delayed registration)
  • Registration call is shielded from task cancellation so shutdown always has app.state.registration_info for proper deregistration
  • Registration state stored on app.state instead of module globals, safe for sequential app instances
  • Version bump to 0.10.0

Context

with_registration() previously called register_service() inside FastAPI's lifespan startup phase (before yield), which fires before uvicorn mounts routes. When chap-core received the registration and immediately called back to /api/v1/configs, the service wasn't ready yet -- the callback failed and the configured model row was not created.

Workaround in chap-core: retry config fetch up to 3x with 2s gaps (PR #278). This PR is the proper fix on the servicekit side.

Test plan

  • make lint passes
  • make test passes (391 passed, 21 skipped)
  • 16 new tests covering deferred registration lifecycle:
    • _wait_until_ready: health 200, TCP fallback, timeout, custom path
    • _register_after_ready: skip when not ready, app.state storage, cancellation safety
    • _resolve_port: from options, from env, default, invalid env
    • _register_and_start_keepalive: success, with keepalive, failure
    • Lifespan integration: full register/deregister cycle, skip when not ready
  • Manual test with examples/registration/ against a mock orchestrator

Registration previously happened during FastAPI lifespan startup, before
uvicorn began accepting connections. When the orchestrator called back to
fetch configs, the service could not respond.

Now registration runs in a background task that polls the local /health
endpoint first, ensuring the app is fully ready before announcing itself
to the orchestrator.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 89.39394% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/servicekit/api/service_builder.py 89.39% 3 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

- fail_on_error=True registers inline (blocking startup) to preserve
  existing fail-fast semantics
- fail_on_error=False (default) defers registration to background task
- readiness check uses the actual health endpoint path from the builder
  instead of hardcoded /health; falls back to TCP connect check when no
  health endpoint is configured
- registration info stored on app.state instead of module global,
  preventing stale state across sequential app instances
- bump version to 0.10.0
Do not register with the orchestrator if the app never became reachable.
Previously the timeout was logged as a warning but registration proceeded
anyway, defeating the purpose of the readiness gate.
- fail_on_error=True no longer registers inline during lifespan startup;
  registration always waits for the app to be serving. The background task
  is awaited during shutdown so exceptions still propagate.
- Shield _register_and_start_keepalive from task cancellation so that a
  successful POST always writes app.state.registration_info, ensuring
  shutdown can deregister properly instead of leaking the registration.
Cover readiness check (health path, TCP fallback, timeout), registration
abort on readiness failure, app.state storage, custom health prefix, and
cancellation-safe registration via asyncio.shield.
Add tests for _resolve_port (env, default, invalid), _register_and_start_keepalive
(success, keepalive, failure), and lifespan integration (deferred registration +
deregistration on shutdown, skip registration when readiness fails).
@mortenoh mortenoh merged commit 968fc85 into main Apr 13, 2026
2 checks passed
@mortenoh mortenoh deleted the fix/defer-registration-until-ready branch April 13, 2026 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant