Skip to content

docs: restructure self-hosting sidebar; add Production subsection and Support#705

Open
abhijaisrivastava15 wants to merge 3 commits into
docs/observe-conceptsfrom
docs/self-hosting-restructure
Open

docs: restructure self-hosting sidebar; add Production subsection and Support#705
abhijaisrivastava15 wants to merge 3 commits into
docs/observe-conceptsfrom
docs/self-hosting-restructure

Conversation

@abhijaisrivastava15

Copy link
Copy Markdown

What

Restructures the self-hosting sidebar to the agreed tree, and fills in the pages that were missing behind it: the Production subsection (5 pages) and Support.

Sidebar (navigation.ts)

Self-Hosting → Overview · Requirements · Install · Configuration{Environment variables, System configuration} · Production{Checklist, Security & TLS, Backups & restore, Monitoring, Upgrades & rollback} · Troubleshooting & FAQs · Support

Dropped the flat User management entry from the sidebar (page left in place).

Pages added

  • production/checklist.mdx — go-live pass: secrets, prod runtime flags, managed data stores
  • production/security-tls.mdx — reverse-proxy TLS termination and secret handling
  • production/backups-restore.mdx — Postgres, ClickHouse, MinIO backup and restore
  • production/monitoring.mdx — Prometheus scrape and the signals to watch
  • production/upgrades-rollback.mdx — upgrade flow, migrations, and rollback
  • support.mdx — Discord, GitHub issues, commercial contact

production.mdx is slimmed from the flat page into a short overview that routes into the subsection.

Scope / coordination

This PR owns only the sidebar + Production + Support. The rest of the tree is handled by in-flight PRs and is untouched here: env/system (#704, #695), Requirements (#696), Install (#697). When #704/#695 land the nested Configuration paths, the two Configuration child hrefs repoint from /environment and /configuration to the nested paths.

Self-review (against review-docs-like-khushal.md)

  • Diataxis: How-to · Depth: 3 · scannable operational runbooks
  • No em-dashes; trailing periods dropped in table cells and card bodies; <TLDR> / <Warning> / <Note> used for their real purposes; ## Dive deeper footers; forward-flowing cards; valid Card icons only
  • Rendered locally on :4321 — every page returns 200, the sidebar matches the tree, zero 404s

Preview

npm run dev

Then visit /docs/self-hosting/production and the five subpages.

TH-6391

… Support

Sidebar reshaped to the agreed tree: Configuration and Production as nested
groups. Production split from the flat page into Checklist, Security & TLS,
Backups & restore, Monitoring, and Upgrades & rollback. New Support page.
Production overview slimmed to route into the subsection.
@khushalsonawat khushalsonawat changed the base branch from dev to docs/self-hosting July 2, 2026 16:16
@khushalsonawat khushalsonawat changed the base branch from docs/self-hosting to dev July 2, 2026 16:33
@khushalsonawat khushalsonawat changed the base branch from dev to docs/observe-concepts July 2, 2026 16:33

@khushalsonawat khushalsonawat left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check comments

Comment thread src/lib/navigation.ts
]
},
{
title: 'Production',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Production folder needs an Overview child pointing at /docs/self-hosting/production, same as the Explore dashboard folder in Observe. Right now the overview page isn't reachable from the sidebar

description: "The go-live pass before a self-hosted instance takes real traffic"
---

Run through this once before the stack is reachable by anyone else. It covers the three things that separate a laptop trial from a real deployment: replacing the shipped secrets, switching the backend into production mode, and swapping compose-managed data stores for managed ones.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the three things bullet points


## Replace the shipped secrets

The stack boots with `CHANGEME` placeholders. Replace every one before the instance is reachable, and generate each value rather than making one up:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no CHANGEME in the repo. The compose defaults are local-dev-only-...-replace-me (and futureagi for the passwords), and the real guard is deploy/docker-compose.production.yml, which re-binds these with ${VAR:?} so prod refuses to boot until they're set. Rewrite this section around that overlay. Its required list is also longer than these four: SECRET_KEY, AGENTCC_INTERNAL_API_KEY, AGENTCC_ADMIN_TOKEN, PG_PASSWORD, MINIO_ROOT_PASSWORD, RABBITMQ_USER/PASSWORD, FRONTEND_URL

```

<Warning>
`PG_PASSWORD` and `MINIO_ROOT_PASSWORD` are written to their volumes on first boot only. Set them before your first `docker compose up`. Changing them after the volume exists locks you out. The full field list is in [Environment variables](/docs/self-hosting/environment).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True for Postgres, not MinIO. MinIO reads MINIO_ROOT_PASSWORD from env on every boot, so changing it and restarting just works. Scope this to PG_PASSWORD


Compose-managed Postgres, ClickHouse, Redis, and MinIO are fine for a trial. For production, point the stack at managed services so data outlives the containers:

| Replace | With | Set |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't work from .env. CH_HOST/CH_PORT, REDIS_URL, S3_ENDPOINT_URL (and PG_HOST) are hardcoded in the backend env block of docker-compose.yml, so setting them in .env does nothing. You have to edit the compose file, and the page should say so. Also for S3 the actual switch is STORAGE_BACKEND=s3 per the compose comments, not the endpoint URL


## Scrape the backend

The backend serves Prometheus metrics at `http://localhost:8000/metrics`. Add it as a scrape target:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint doesn't exist. The backend has no /metrics route (nothing in tfc/urls.py, and granian isn't started with metrics), and fi-collector's admin port (9464) only serves /healthz; its README still lists the metrics exporter as a TODO. There's nothing to scrape yet. Rework the page around what actually exists (/healthz checks, the PeerDB UI on 3001, container-level monitoring), or hold it until a metrics endpoint lands

<Step title="Re-run PeerDB init if the release notes say so">
When a release changes the replication setup, re-run init so ClickHouse stays in sync:
```bash
docker compose run --rm peerdb-init bash /setup.sh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plain docker compose run --rm peerdb-init is enough. The entrypoint is already bash /setup.sh, this passes them in again as args

</Steps>

<Note>
Because ClickHouse replicates from Postgres through PeerDB, re-running PeerDB init also rebuilds ClickHouse from scratch. That's the recovery path when the [ClickHouse store](/docs/self-hosting/production/backups-restore) is lost.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as the backups note. PeerDB init only rebuilds the mirrored PG tables. spans and traces are written to ClickHouse directly and won't come back, so this can't be the recovery path

Comment thread src/pages/docs/self-hosting/support.mdx Outdated
## Where to get help

<CardGroup cols={2}>
<Card title="Discord" icon="comments" href="https://discord.gg/QDVvTgA8Xp">

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repo README and our homepage use discord.com/invite/n2tCUKBkAw. Use that one

Comment thread src/pages/docs/self-hosting/support.mdx Outdated

For managed hosting, an SLA, or help with a production rollout, reach out at [sales@futureagi.com](mailto:sales@futureagi.com).

## Dive deeper

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both cards point backward and this is the last page of the section. I'd drop the footer

Addresses Khushal's #705 review. Every claim re-grounded in the product repo:
- Checklist: real dev-only defaults + deploy/docker-compose.production.yml ${VAR:?} guard, full required-secrets list; PG (first-boot) vs MinIO (every-boot); managed stores are edited in compose (hardcoded hosts), not .env; S3 via STORAGE_BACKEND
- Backups: Redis is a cache (RabbitMQ holds the queue); pg_dump -T; futureagi_ volume prefix + full list; ClickHouse needs its own backup post-CH25 (spans don't return from PeerDB)
- Monitoring: no /metrics endpoint exists; reworked around docker health, fi-collector :9464, PeerDB UI
- Upgrades: peerdb-init needs no args; PeerDB init is not the ClickHouse recovery path
- Support: correct Discord invite; dropped the backward-pointing footer
- Nav: Production Overview child so /production is reachable
…ring card/claim

- Split the overview into 'Before you go live' (2) and 'Operating it' (3) so no lone orphan card
- Overview Monitoring card no longer promises Prometheus metrics (the page was reworked away from a non-existent /metrics)
- fi-collector healthz confirms the collector is up, not that it's 'accepting spans'

@khushalsonawat khushalsonawat left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check comments

Point the proxy at the frontend on `localhost:3000` and the backend on `localhost:8000`. The full port list is in [Requirements](/docs/self-hosting/requirements#ports-reference).
</Step>
<Step title="Point the frontend at HTTPS">
Set `VITE_HOST_API=https://api.yourcompany.com` in `.env`. This is a build-time value, so the frontend has to be rebuilt for it to take effect.

@khushalsonawat khushalsonawat Jul 3, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VITE_HOST_API is actually runtime: the frontend image's entrypoint writes it into config.js on every container start (frontend/docker-entrypoint.sh), and compose passes it as plain environment. There's also no build: on the frontend service, so docker compose build frontend doesn't work on the pulled image at all. Fold steps 2-3 and the warning and rewrite as: set it in .env, then docker compose up -d frontend to recreate the container. No rebuild


## Container health

Every service ships a Docker health check, so `docker compose ps` is the fastest read on what's up:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every service. Only the data stores (postgres, clickhouse, redis, rabbitmq, minio, temporal) define healthchecks in compose. frontend, backend, worker, code-executor and fi-collector don't, so ps shows them as plain running, never healthy/unhealthy. Say the data stores report health and the app tier just shows running

## In this page

## Hardening checklist
Production readiness for a self-hosted instance breaks into five steps. Do them in order the first time, then keep each page as a runbook.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'keep each page as a runbook' is already in the hero four lines up. Say it once

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants