Add instance health checks#234
Conversation
✱ Stainless preview builds for hypemanThis PR will update the
|
Monitoring Plan: Instance Health Checks (PR #234)This PR adds a new health-check subsystem to The main risks are: (1) validation errors in Key risks to watch:
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 36c8c34. Configure here.
# Conflicts: # lib/oapi/oapi.go

Summary
health_checkpolicy andhealth_statusresponse fields for http, tcp, and exec probesInitializingorRunning, while keeping public health statusstartinguntil the instance reachesRunningTestCreateInstanceWithNetworkso the VM-starting network path waits for persistedhealthystatuslib/healthcheck/README.mdTests
go test ./lib/healthcheckgo test ./lib/instances -run TestCreateInstanceWithNetwork -count=0go test ./lib/instances -run 'TestHealthCheck|TestValidateCreateRequestHealthCheck|TestValidateUpdateInstanceRequest|TestManagerUpdateInstanceHealthCheckOnlyPublishesLifecycleUpdate|TestLifecycleEventMetrics_ObserveSubscribersQueueDepthAndDrops|TestLifecycleSubscribers'go test ./cmd/api/api -run 'TestCreateInstance_MapsHealthCheckPolicy|TestUpdateInstance_MapsHealthCheckPatch|TestCreateInstance_MapsAutoStandbyPolicy|TestUpdateInstance_MapsAutoStandbyPatch'go test ./cmd/api -run TestDoesNotExistgo test ./lib/providersNotes
go test ./lib/instances -run TestCreateInstanceWithNetwork -count=1was attempted twice; both runs failed before instance creation because the existing nginx image readiness wait still saw image statuspendingafter 60s.go test ./cmd/api/apiis currently blocked by Docker Hub unauthenticated pull rate limits and local network bridge permissions in existing integration tests.make generate-wireis currently blocked because the checked-in wire binary was built with Go 1.24 and this package now requires Go 1.25;wire_gen.gowas updated in the same small shape andgo test ./cmd/api -run TestDoesNotExistpasses.Note
Medium Risk
Adds a new health-check policy surface area, background controller, and metadata persistence path; bugs could impact instance API behavior and metadata writes, but lifecycle state is intentionally unchanged.
Overview
Adds first-class instance health checks via new
health_checkpolicy (HTTP/TCP/exec) andhealth_statusfields in the API/OpenAPI, including request validation, defaulting/normalization, and bidirectional mapping between OAPI and domain types.Introduces a new
lib/healthcheckpackage plus aninstances.HealthCheckControllerthat subscribes to lifecycle events, schedules probes with interval/timeout/threshold/start-period semantics, and persists per-instance runtime status; the controller is wired into the API process startup. Instance metadata now storeshealth_check_runtime, andsaveMetadataswitches to atomic temp-file + rename writes; update flows reset health runtime when the health check policy changes, and tests/integration tests are expanded to cover health-check behavior and lifecycle metrics labeling.Reviewed by Cursor Bugbot for commit e0e1dda. Bugbot is set up for automated code reviews on this repo. Configure here.