Skip to content

docs: add health check, readiness, and liveness endpoints guide#281

Open
sriramveeraghanta wants to merge 1 commit into
masterfrom
docs/health-check-endpoints
Open

docs: add health check, readiness, and liveness endpoints guide#281
sriramveeraghanta wants to merge 1 commit into
masterfrom
docs/health-check-endpoints

Conversation

@sriramveeraghanta
Copy link
Copy Markdown
Member

@sriramveeraghanta sriramveeraghanta commented Jun 1, 2026

Summary

Adds a self-hosting guide documenting the liveness, readiness, and health check endpoints exposed by Plane services, sourced from the plane-ee repo and verified against the actual code, Helm charts, and Docker Compose files.

Operators currently have no documentation for wiring up uptime monitors, load-balancer health checks, or Kubernetes/Docker probes against a self-hosted instance. This page closes that gap.

What's included

  • Liveness vs. readiness vs. health primer, with guidance on which to point each tool at (and why pointing liveness at a dependency-checking endpoint causes restart storms).
  • Primary API probes/api/live/, /api/ready/ (DB + cache), /api/health/ (detailed) — with curl examples, exact JSON responses, status codes (200/503), and the 5s per-process result cache.
  • All-services reference table marking each endpoint external vs. internal-only.
  • Per-service detail for pi, live (incl. secret-key memory endpoint), silo (incl. its non-standard 201 codes), flux, node-runner, and the Go monitor prober.
  • Infrastructure examples — Kubernetes livenessProbe/readinessProbe YAML, Docker Compose healthcheck, and external uptime/LB guidance, based on Plane's actual Helm and Compose probe configs.
  • Troubleshooting section for common 503/401/500 responses.

Edition note

Community Edition exposes only the basic root / health check; the dedicated probes are Commercial Edition features — the page is badged and the distinction is called out.

Other changes

  • Wires the page into the Self-hosting → Manage sidebar in docs/.vitepress/config.mts.

Validation

  • pnpm check:format (Prettier) passes
  • pnpm build (VitePress) succeeds with no dead-link errors

Summary by CodeRabbit

  • Documentation
    • Added comprehensive health checks documentation for self-hosted deployments, covering health probe endpoints, expected HTTP responses, and configuration guidance for Kubernetes, Docker Compose, and external monitoring systems.
    • Updated navigation to include new health checks documentation reference.

Document the liveness, readiness, and detailed health probes exposed by
self-hosted Plane (Commercial Edition) services, and how to consume them.

- New page: self-hosting/manage/health-checks.md covering the primary API
  probes (/api/live/, /api/ready/, /api/health/), per-service endpoints
  (pi, live, silo, flux, node-runner) and the Go monitor prober, with curl
  examples, exact JSON responses, status codes, and the 5s result cache.
- Add "use in your infrastructure" examples: Kubernetes liveness/readiness
  probes, Docker Compose healthchecks, and external uptime/LB guidance,
  based on Plane's actual Helm and Compose probe configs.
- Note the edition split: Community Edition exposes only the root / check;
  the dedicated probes are Commercial Edition.
- Wire the page into the Self-hosting > Manage sidebar.
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
developer-docs Ready Ready Preview, Comment Jun 1, 2026 9:37pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces comprehensive documentation for health check probes in self-hosted Plane installations. It adds a new documentation page that covers probe endpoint semantics, service-specific endpoint details across the Plane architecture, and operational guidance for Kubernetes, Docker Compose, and external monitoring with troubleshooting.

Changes

Health checks documentation for self-hosted Plane

Layer / File(s) Summary
Documentation navigation wiring
docs/.vitepress/config.mts
Adds a "Health checks" sidebar entry under /self-hosting/manage/ linking to the new health-checks page.
Introduction and core concepts
docs/self-hosting/manage/health-checks.md
Introduces page scope, edition applicability, and defines liveness/readiness/health probe semantics with guidance on avoiding restart storms.
API probe endpoint documentation
docs/self-hosting/manage/health-checks.md
Documents Django API probe endpoints (/api/live/, /api/ready/, /api/health/), response shapes, caching behavior, trailing-slash requirements, and reverse-proxy routing warnings. Includes a reference table of all service endpoints (external vs internal-only).
Service-specific endpoint documentation
docs/self-hosting/manage/health-checks.md
Details health endpoints for live, silo, pi (Plane AI), flux, node-runner, and monitor services, including response payloads, non-standard status codes, deprecated legacy endpoints, secret-key protection, and internal-only service warnings.
Operational configuration and guidance
docs/self-hosting/manage/health-checks.md
Provides Kubernetes probe configuration examples with startup/retry timing, Docker Compose healthcheck examples including pi base-path support and dependency gating, and external monitoring guidance targeting HTTPS readiness endpoint with caching and auth-free expectations.
Troubleshooting guide
docs/self-hosting/manage/health-checks.md
Maps common probe failures (HTTP 503/401/500 and restart loops) to likely root causes in dependency startup or configuration with mitigation guidance.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A health check hops through every Plane,
From API probes to pi's domain—
With Kubernetes configs and Docker delight,
The docs guide ops through the monitoring night! 🌙✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding documentation for health check, readiness, and liveness endpoints guide, which is exactly what the new health-checks.md page provides.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/health-check-endpoints

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/self-hosting/manage/health-checks.md`:
- Line 452: The text uses non-Kubernetes term "start period" which is
misleading; update the sentence to use Kubernetes probe terminology only by
replacing "start period" with either an explicit reference to startupProbe
behavior or by stating the combined effect of initialDelaySeconds plus
startupProbe settings, and clarify that the ~90s comes from initialDelaySeconds:
30 plus any configured startupProbe delays; also change "live`/`silo` readiness
probe" to reference the actual probe type (readinessProbe or startupProbe) and
replace "period" with Kubernetes field name periodSeconds, e.g., note that a
readinessProbe with failureThreshold: 30 and periodSeconds: 10 tolerates ~300s
before marking unhealthy; ensure the paragraph mentions startupProbe if that was
intended.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1988d15d-ffef-46ea-ae66-3ad8bcd6c2ec

📥 Commits

Reviewing files that changed from the base of the PR and between 3b1d721 and 2259bd5.

📒 Files selected for processing (2)
  • docs/.vitepress/config.mts
  • docs/self-hosting/manage/health-checks.md

```

::: info Timing implications
With `initialDelaySeconds: 30` plus the start period, the API and pi-api pods take roughly 90 seconds before they are considered ready — this is intentional, giving migrations and warm-up time to complete. The `live`/`silo` readiness probe with `failureThreshold: 30` at a 10s period tolerates up to ~305 seconds of startup before marking the pod unhealthy.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Kubernetes timing note uses Docker Compose terminology.

“start period” is not a Kubernetes probe field, so this guidance can mislead probe tuning. Rephrase using Kubernetes terms only (or explicitly reference startupProbe if that’s what you mean).

Suggested doc fix
-With `initialDelaySeconds: 30` plus the start period, the API and pi-api pods take roughly 90 seconds before they are considered ready — this is intentional, giving migrations and warm-up time to complete. The `live`/`silo` readiness probe with `failureThreshold: 30` at a 10s period tolerates up to ~305 seconds of startup before marking the pod unhealthy.
+With `initialDelaySeconds: 30`, readiness checks begin after 30 seconds for API and pi-api. Combined with `periodSeconds` and `failureThreshold`, this gives enough warm-up time for migrations and startup. The `live`/`silo` readiness probe with `failureThreshold: 30` at a 10s period tolerates up to ~305 seconds of startup before marking the pod unhealthy.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
With `initialDelaySeconds: 30` plus the start period, the API and pi-api pods take roughly 90 seconds before they are considered ready — this is intentional, giving migrations and warm-up time to complete. The `live`/`silo` readiness probe with `failureThreshold: 30` at a 10s period tolerates up to ~305 seconds of startup before marking the pod unhealthy.
With `initialDelaySeconds: 30`, readiness checks begin after 30 seconds for API and pi-api. Combined with `periodSeconds` and `failureThreshold`, this gives enough warm-up time for migrations and startup. The `live`/`silo` readiness probe with `failureThreshold: 30` at a 10s period tolerates up to ~305 seconds of startup before marking the pod unhealthy.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/self-hosting/manage/health-checks.md` at line 452, The text uses
non-Kubernetes term "start period" which is misleading; update the sentence to
use Kubernetes probe terminology only by replacing "start period" with either an
explicit reference to startupProbe behavior or by stating the combined effect of
initialDelaySeconds plus startupProbe settings, and clarify that the ~90s comes
from initialDelaySeconds: 30 plus any configured startupProbe delays; also
change "live`/`silo` readiness probe" to reference the actual probe type
(readinessProbe or startupProbe) and replace "period" with Kubernetes field name
periodSeconds, e.g., note that a readinessProbe with failureThreshold: 30 and
periodSeconds: 10 tolerates ~300s before marking unhealthy; ensure the paragraph
mentions startupProbe if that was intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant