-
Notifications
You must be signed in to change notification settings - Fork 235
improve: health probes showcase & docs to operations #3291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
927b222
5c51639
9fa9dc5
c527bf1
cc3131d
3e18152
8df7ed5
2c9eafd
be40a72
57fadb0
1ae2702
ddde815
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| --- | ||
| title: Health Probes | ||
| weight: 85 | ||
| --- | ||
|
|
||
| Operators running in Kubernetes should expose health probe endpoints so that the kubelet can detect startup | ||
| failures and runtime degradation. JOSDK provides the building blocks through its | ||
| [`RuntimeInfo`](https://github.com/java-operator-sdk/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/RuntimeInfo.java) | ||
| API. | ||
|
|
||
| ## RuntimeInfo | ||
|
|
||
| `RuntimeInfo` is available via `operator.getRuntimeInfo()` and exposes: | ||
|
|
||
| | Method | Purpose | | ||
| |---|---| | ||
| | `isStarted()` | `true` once the operator and all its controllers have fully started | | ||
| | `allEventSourcesAreHealthy()` | `true` when every registered event source (informers, polling sources, etc.) reports a healthy status | | ||
| | `unhealthyEventSources()` | returns a map of controller name → unhealthy event sources, useful for diagnostics | | ||
| | `unhealthyInformerWrappingEventSourceHealthIndicator()` | returns a map of controller name → unhealthy informer-wrapping event sources, each exposing per-informer details via `InformerHealthIndicator` (`hasSynced()`, `isWatching()`, `isRunning()`, `getTargetNamespace()`) | | ||
|
|
||
| In most cases a single readiness probe backed by `allEventSourcesAreHealthy()` is sufficient: before the | ||
| operator has fully started the informers will not have synced yet, so the check naturally covers the startup | ||
| case as well. Once running, it detects runtime degradation such as a lost watch connection. | ||
|
|
||
| ### Fine-Grained Informer Diagnostics | ||
|
|
||
| For advanced use cases — such as exposing per-informer health in a diagnostic endpoint or logging which | ||
| specific namespace lost its watch — `unhealthyInformerWrappingEventSourceHealthIndicator()` gives access to | ||
| individual `InformerHealthIndicator` instances. Each indicator exposes `hasSynced()`, `isWatching()`, | ||
| `isRunning()`, and `getTargetNamespace()`. This is typically not needed for a standard health probe but can | ||
| be valuable for operational dashboards or troubleshooting. | ||
|
|
||
| ## Setting Up a Probe Endpoint | ||
|
|
||
| The example below uses [Jetty](https://eclipse.dev/jetty/) to expose a `/healthz` endpoint. Any HTTP | ||
| server library works — the key is calling the `RuntimeInfo` methods to determine the response code. | ||
|
|
||
| ```java | ||
| import org.eclipse.jetty.server.Server; | ||
| import org.eclipse.jetty.server.handler.ContextHandler; | ||
|
|
||
| Operator operator = new Operator(); | ||
| operator.register(new MyReconciler()); | ||
|
|
||
| // start the health server before the operator so probes can be queried during startup | ||
| var health = new ContextHandler(new HealthHandler(operator), "/healthz"); | ||
| Server server = new Server(8080); | ||
| server.setHandler(health); | ||
| server.start(); | ||
|
|
||
| operator.start(); | ||
| ``` | ||
|
|
||
| Where `HealthHandler` extends `org.eclipse.jetty.server.Handler.Abstract` and checks | ||
| `operator.getRuntimeInfo().allEventSourcesAreHealthy()`. | ||
|
|
||
| See the | ||
| [`operations` sample operator](https://github.com/java-operator-sdk/java-operator-sdk/tree/main/sample-operators/operations) | ||
| for a complete working example. | ||
|
|
||
| ## Kubernetes Deployment Configuration | ||
|
|
||
| Once your operator exposes the probe endpoint, configure probes in your Deployment manifest. Both the | ||
| startup and readiness probes can point to the same `/healthz` endpoint — the startup probe simply uses a | ||
| higher `failureThreshold` to give the operator time to initialize: | ||
|
|
||
| ```yaml | ||
| containers: | ||
| - name: operator | ||
| ports: | ||
| - name: probes | ||
| containerPort: 8080 | ||
| startupProbe: | ||
| httpGet: | ||
| path: /healthz | ||
| port: probes | ||
| initialDelaySeconds: 1 | ||
| periodSeconds: 3 | ||
| failureThreshold: 20 | ||
| readinessProbe: | ||
| httpGet: | ||
| path: /healthz | ||
| port: probes | ||
| initialDelaySeconds: 5 | ||
| periodSeconds: 5 | ||
| failureThreshold: 3 | ||
| ``` | ||
|
|
||
| The startup probe gives the operator time to start (up to ~60 s with the settings above). Once the startup | ||
| probe succeeds, the readiness probe takes over and will mark the pod as not-ready if any event source | ||
| becomes unhealthy. | ||
|
|
||
| ## Helm Chart Support | ||
|
|
||
| The [generic Helm chart](/docs/documentation/operations/helm-chart) supports health probes out of the box. | ||
| Enable them in your `values.yaml`: | ||
|
|
||
| ```yaml | ||
| probes: | ||
| port: 8080 | ||
| startup: | ||
| enabled: true | ||
| path: /healthz | ||
| readiness: | ||
| enabled: true | ||
| path: /healthz | ||
| ``` | ||
|
|
||
| All probe timing parameters (`initialDelaySeconds`, `periodSeconds`, `failureThreshold`) have sensible | ||
| defaults and can be overridden. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -103,9 +103,9 @@ observability sample (see below). | |
| #### Exploring metrics end-to-end | ||
|
|
||
| The | ||
| [`metrics-processing` sample operator](https://github.com/java-operator-sdk/java-operator-sdk/tree/main/sample-operators/metrics-processing) | ||
| [`operations` sample operator](https://github.com/java-operator-sdk/java-operator-sdk/tree/main/sample-operators/operations) | ||
| includes a full end-to-end test, | ||
| [`MetricsHandlingE2E`](https://github.com/java-operator-sdk/java-operator-sdk/blob/main/sample-operators/metrics-processing/src/test/java/io/javaoperatorsdk/operator/sample/metrics/MetricsHandlingE2E.java), | ||
| [`MetricsHandlingE2E`](https://github.com/java-operator-sdk/java-operator-sdk/blob/main/sample-operators/operations/src/test/java/io/javaoperatorsdk/operator/sample/metrics/MetricsHandlingE2E.java), | ||
|
Comment on lines
+106
to
+108
|
||
| that: | ||
|
|
||
| 1. Installs a local observability stack (Prometheus, Grafana, OpenTelemetry Collector) via | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -103,6 +103,9 @@ private static Map<String, Object> load(Path path) { | |
| try (InputStream in = Files.newInputStream(path)) { | ||
| Map<String, Object> result = MAPPER.readValue(in, Map.class); | ||
| return result != null ? result : Map.of(); | ||
| } catch (com.fasterxml.jackson.databind.exc.MismatchedInputException e) { | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will extract this into separate PR. |
||
| log.warn("{} contains no configuration data", path); | ||
| return Map.of(); | ||
| } catch (IOException e) { | ||
| throw new UncheckedIOException("Failed to load config YAML from " + path, e); | ||
| } | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.