docs: restructure self-hosting sidebar; add Production subsection and Support#705
docs: restructure self-hosting sidebar; add Production subsection and Support#705abhijaisrivastava15 wants to merge 3 commits into
Conversation
… Support Sidebar reshaped to the agreed tree: Configuration and Production as nested groups. Production split from the flat page into Checklist, Security & TLS, Backups & restore, Monitoring, and Upgrades & rollback. New Support page. Production overview slimmed to route into the subsection.
| ] | ||
| }, | ||
| { | ||
| title: 'Production', |
There was a problem hiding this comment.
Production folder needs an Overview child pointing at /docs/self-hosting/production, same as the Explore dashboard folder in Observe. Right now the overview page isn't reachable from the sidebar
| description: "The go-live pass before a self-hosted instance takes real traffic" | ||
| --- | ||
|
|
||
| Run through this once before the stack is reachable by anyone else. It covers the three things that separate a laptop trial from a real deployment: replacing the shipped secrets, switching the backend into production mode, and swapping compose-managed data stores for managed ones. |
There was a problem hiding this comment.
Make the three things bullet points
|
|
||
| ## Replace the shipped secrets | ||
|
|
||
| The stack boots with `CHANGEME` placeholders. Replace every one before the instance is reachable, and generate each value rather than making one up: |
There was a problem hiding this comment.
There's no CHANGEME in the repo. The compose defaults are local-dev-only-...-replace-me (and futureagi for the passwords), and the real guard is deploy/docker-compose.production.yml, which re-binds these with ${VAR:?} so prod refuses to boot until they're set. Rewrite this section around that overlay. Its required list is also longer than these four: SECRET_KEY, AGENTCC_INTERNAL_API_KEY, AGENTCC_ADMIN_TOKEN, PG_PASSWORD, MINIO_ROOT_PASSWORD, RABBITMQ_USER/PASSWORD, FRONTEND_URL
| ``` | ||
|
|
||
| <Warning> | ||
| `PG_PASSWORD` and `MINIO_ROOT_PASSWORD` are written to their volumes on first boot only. Set them before your first `docker compose up`. Changing them after the volume exists locks you out. The full field list is in [Environment variables](/docs/self-hosting/environment). |
There was a problem hiding this comment.
True for Postgres, not MinIO. MinIO reads MINIO_ROOT_PASSWORD from env on every boot, so changing it and restarting just works. Scope this to PG_PASSWORD
|
|
||
| Compose-managed Postgres, ClickHouse, Redis, and MinIO are fine for a trial. For production, point the stack at managed services so data outlives the containers: | ||
|
|
||
| | Replace | With | Set | |
There was a problem hiding this comment.
These don't work from .env. CH_HOST/CH_PORT, REDIS_URL, S3_ENDPOINT_URL (and PG_HOST) are hardcoded in the backend env block of docker-compose.yml, so setting them in .env does nothing. You have to edit the compose file, and the page should say so. Also for S3 the actual switch is STORAGE_BACKEND=s3 per the compose comments, not the endpoint URL
|
|
||
| ## Scrape the backend | ||
|
|
||
| The backend serves Prometheus metrics at `http://localhost:8000/metrics`. Add it as a scrape target: |
There was a problem hiding this comment.
This endpoint doesn't exist. The backend has no /metrics route (nothing in tfc/urls.py, and granian isn't started with metrics), and fi-collector's admin port (9464) only serves /healthz; its README still lists the metrics exporter as a TODO. There's nothing to scrape yet. Rework the page around what actually exists (/healthz checks, the PeerDB UI on 3001, container-level monitoring), or hold it until a metrics endpoint lands
| <Step title="Re-run PeerDB init if the release notes say so"> | ||
| When a release changes the replication setup, re-run init so ClickHouse stays in sync: | ||
| ```bash | ||
| docker compose run --rm peerdb-init bash /setup.sh |
There was a problem hiding this comment.
Plain docker compose run --rm peerdb-init is enough. The entrypoint is already bash /setup.sh, this passes them in again as args
| </Steps> | ||
|
|
||
| <Note> | ||
| Because ClickHouse replicates from Postgres through PeerDB, re-running PeerDB init also rebuilds ClickHouse from scratch. That's the recovery path when the [ClickHouse store](/docs/self-hosting/production/backups-restore) is lost. |
There was a problem hiding this comment.
Same problem as the backups note. PeerDB init only rebuilds the mirrored PG tables. spans and traces are written to ClickHouse directly and won't come back, so this can't be the recovery path
| ## Where to get help | ||
|
|
||
| <CardGroup cols={2}> | ||
| <Card title="Discord" icon="comments" href="https://discord.gg/QDVvTgA8Xp"> |
There was a problem hiding this comment.
The repo README and our homepage use discord.com/invite/n2tCUKBkAw. Use that one
|
|
||
| For managed hosting, an SLA, or help with a production rollout, reach out at [sales@futureagi.com](mailto:sales@futureagi.com). | ||
|
|
||
| ## Dive deeper |
There was a problem hiding this comment.
Both cards point backward and this is the last page of the section. I'd drop the footer
Addresses Khushal's #705 review. Every claim re-grounded in the product repo: - Checklist: real dev-only defaults + deploy/docker-compose.production.yml ${VAR:?} guard, full required-secrets list; PG (first-boot) vs MinIO (every-boot); managed stores are edited in compose (hardcoded hosts), not .env; S3 via STORAGE_BACKEND - Backups: Redis is a cache (RabbitMQ holds the queue); pg_dump -T; futureagi_ volume prefix + full list; ClickHouse needs its own backup post-CH25 (spans don't return from PeerDB) - Monitoring: no /metrics endpoint exists; reworked around docker health, fi-collector :9464, PeerDB UI - Upgrades: peerdb-init needs no args; PeerDB init is not the ClickHouse recovery path - Support: correct Discord invite; dropped the backward-pointing footer - Nav: Production Overview child so /production is reachable
…ring card/claim - Split the overview into 'Before you go live' (2) and 'Operating it' (3) so no lone orphan card - Overview Monitoring card no longer promises Prometheus metrics (the page was reworked away from a non-existent /metrics) - fi-collector healthz confirms the collector is up, not that it's 'accepting spans'
| Point the proxy at the frontend on `localhost:3000` and the backend on `localhost:8000`. The full port list is in [Requirements](/docs/self-hosting/requirements#ports-reference). | ||
| </Step> | ||
| <Step title="Point the frontend at HTTPS"> | ||
| Set `VITE_HOST_API=https://api.yourcompany.com` in `.env`. This is a build-time value, so the frontend has to be rebuilt for it to take effect. |
There was a problem hiding this comment.
VITE_HOST_API is actually runtime: the frontend image's entrypoint writes it into config.js on every container start (frontend/docker-entrypoint.sh), and compose passes it as plain environment. There's also no build: on the frontend service, so docker compose build frontend doesn't work on the pulled image at all. Fold steps 2-3 and the warning and rewrite as: set it in .env, then docker compose up -d frontend to recreate the container. No rebuild
|
|
||
| ## Container health | ||
|
|
||
| Every service ships a Docker health check, so `docker compose ps` is the fastest read on what's up: |
There was a problem hiding this comment.
Not every service. Only the data stores (postgres, clickhouse, redis, rabbitmq, minio, temporal) define healthchecks in compose. frontend, backend, worker, code-executor and fi-collector don't, so ps shows them as plain running, never healthy/unhealthy. Say the data stores report health and the app tier just shows running
| ## In this page | ||
|
|
||
| ## Hardening checklist | ||
| Production readiness for a self-hosted instance breaks into five steps. Do them in order the first time, then keep each page as a runbook. |
There was a problem hiding this comment.
'keep each page as a runbook' is already in the hero four lines up. Say it once
What
Restructures the self-hosting sidebar to the agreed tree, and fills in the pages that were missing behind it: the Production subsection (5 pages) and Support.
Sidebar (navigation.ts)
Self-Hosting → Overview · Requirements · Install · Configuration{Environment variables, System configuration} · Production{Checklist, Security & TLS, Backups & restore, Monitoring, Upgrades & rollback} · Troubleshooting & FAQs · Support
Dropped the flat
User managemententry from the sidebar (page left in place).Pages added
production/checklist.mdx— go-live pass: secrets, prod runtime flags, managed data storesproduction/security-tls.mdx— reverse-proxy TLS termination and secret handlingproduction/backups-restore.mdx— Postgres, ClickHouse, MinIO backup and restoreproduction/monitoring.mdx— Prometheus scrape and the signals to watchproduction/upgrades-rollback.mdx— upgrade flow, migrations, and rollbacksupport.mdx— Discord, GitHub issues, commercial contactproduction.mdxis slimmed from the flat page into a short overview that routes into the subsection.Scope / coordination
This PR owns only the sidebar + Production + Support. The rest of the tree is handled by in-flight PRs and is untouched here: env/system (#704, #695), Requirements (#696), Install (#697). When #704/#695 land the nested Configuration paths, the two Configuration child hrefs repoint from
/environmentand/configurationto the nested paths.Self-review (against review-docs-like-khushal.md)
<TLDR>/<Warning>/<Note>used for their real purposes;## Dive deeperfooters; forward-flowing cards; valid Card icons onlyPreview
Then visit
/docs/self-hosting/productionand the five subpages.TH-6391