feat(docker): add self-hosting support with docker-compose configuration#375
Conversation
|
@sekhar08 is attempting to deploy a commit to the Databuddy OSS Team on Vercel. A member of the Team first needs to authorize it. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
Greptile SummaryThis PR adds a production-ready Key changes:
Issues found:
Confidence Score: 4/5Safe to merge after addressing the One P1 issue remains: the
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph infra["Infrastructure (127.0.0.1 bound)"]
PG["postgres:17-alpine\n:5432"]
CH["clickhouse:25.5.1-alpine\n:8123"]
RD["redis:7-alpine\n:6379"]
end
subgraph apps["Application Services (0.0.0.0 bound)"]
API["databuddy-api\n:3001"]
BASKET["databuddy-basket\n:4000\nSELFHOST=true"]
LINKS["databuddy-links\n:2500"]
end
PG -- "service_healthy" --> API
CH -- "service_healthy" --> API
RD -- "service_healthy" --> API
PG -- "service_healthy" --> BASKET
CH -- "service_healthy" --> BASKET
RD -- "service_healthy" --> BASKET
PG -- "service_healthy" --> LINKS
RD -- "service_healthy" --> LINKS
API -- "DATABASE_URL\nREDIS_URL\nCLICKHOUSE_URL" --> PG & CH & RD
BASKET -- "DATABASE_URL\nREDIS_URL\nCLICKHOUSE_URL" --> PG & CH & RD
LINKS -- "DATABASE_URL\nREDIS_URL" --> PG & RD
CDN["cdn.databuddy.cc\n(GEOIP_DB_URL)"] -.->|"startup download"| LINKS
Reviews (2): Last reviewed commit: "feat(docker): update docker-compose for ..." | Re-trigger Greptile |
| nofile: | ||
| soft: 262144 | ||
| hard: 262144 | ||
| healthcheck: |
There was a problem hiding this comment.
Missing
scripts/clickhouse-init.sql breaks ClickHouse init mount
The volume ./scripts/clickhouse-init.sql:/docker-entrypoint-initdb.d/clickhouse-init.sql references a file that does not exist in the repository (confirmed: no scripts/ directory is tracked in git). When Docker encounters a bind-mount where the host path is missing, it creates an empty directory at that path instead of a file. ClickHouse's init entrypoint then sees a directory named clickhouse-init.sql rather than a SQL file and silently skips or errors on it.
The same line exists in docker-compose.yaml (dev), so this appears to be a copy from there without the file ever being committed. Since the README correctly tells users to run bun run clickhouse:init via the API for first-run initialization, the cleanest fix is to remove this volume bind-mount from the selfhost compose to avoid the misleading/broken entry:
| healthcheck: | |
| - clickhouse_data:/var/lib/clickhouse |
If the intent is to seed ClickHouse automatically on first start, the SQL file needs to be created and committed to scripts/clickhouse-init.sql.
docker-compose.selfhost.yml
Outdated
| CLICKHOUSE_USER: ${CLICKHOUSE_USER:-default} | ||
| CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD:-} |
There was a problem hiding this comment.
ClickHouse exposed with empty password by default
CLICKHOUSE_PASSWORD defaults to an empty string (${CLICKHOUSE_PASSWORD:-}), and the HTTP port (8123) is published to the host via ${CLICKHOUSE_PORT:-8123}. With CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1 enabled, the ClickHouse HTTP API is accessible to anyone who can reach the host port — no credentials required.
For a production self-hosting scenario, consider using the :? syntax (same pattern used for BETTER_AUTH_SECRET) to require users to explicitly set a password, or remove the ClickHouse ports: mapping so it is only reachable within the Docker network (the app services connect over the internal network anyway). At minimum, document prominently that CLICKHOUSE_PASSWORD must be set before going to production.
| - databuddy | ||
|
|
||
| redis: |
There was a problem hiding this comment.
Redis exposed to the host without authentication
The Redis service publishes port 6379 to the host (0.0.0.0:6379) with no --requirepass or ACL configuration. In a production self-hosted deployment where the host has a public IP (e.g., a VPS), this means Redis is accessible to the internet without credentials.
Consider either:
- Removing the
portsmapping for Redis (it only needs to be reachable inside the Docker network by the app services), or - Adding a password via
--requirepass ${REDIS_PASSWORD}and updating allREDIS_URLenv vars to include the password.
If the port is needed for local debug access, document the security risk clearly in the compose file comments.
| # SELFHOST=true → basket writes directly to ClickHouse (no Kafka/Redpanda needed) | ||
| SELFHOST: "true" | ||
| depends_on: | ||
| postgres: | ||
| condition: service_healthy | ||
| clickhouse: |
There was a problem hiding this comment.
basket service missing CLICKHOUSE_USER and CLICKHOUSE_PASSWORD env vars
The api service explicitly sets both CLICKHOUSE_USER and CLICKHOUSE_PASSWORD as separate env vars (in addition to embedding them in CLICKHOUSE_URL). The basket service only sets CLICKHOUSE_URL and omits these individual vars. If basket's ClickHouse client reads the credentials from individual env vars (as many Node ClickHouse clients can), it will fall back to unauthenticated access or use incorrect credentials once a CLICKHOUSE_PASSWORD is set.
For consistency with the api service, the basket service should also declare CLICKHOUSE_USER and CLICKHOUSE_PASSWORD as separate environment variables, mirroring the pattern in the api service block.
docker-compose.selfhost.yml
Outdated
| DATABASE_URL: postgres://databuddy:${DB_PASSWORD:-CHANGE_ME_in_production}@postgres:5432/databuddy | ||
| REDIS_URL: redis://redis:6379 | ||
| APP_URL: ${APP_URL:-https://app.databuddy.cc} | ||
| LINKS_ROOT_REDIRECT_URL: ${LINKS_ROOT_REDIRECT_URL:-https://databuddy.cc} | ||
| GEOIP_DB_URL: ${GEOIP_DB_URL:-https://cdn.databuddy.cc/mmdb/GeoLite2-City.mmdb} | ||
| depends_on: | ||
| postgres: | ||
| condition: service_healthy |
There was a problem hiding this comment.
links service missing explicit PORT env var
Both api and basket explicitly set PORT to match their internal container port. The links service omits this, relying entirely on a hardcoded default inside the container image. If the links image ever changes its default port, the 2500:2500 port mapping would silently break without an obvious error.
For consistency, consider adding PORT: "2500" to the links environment block alongside the other env vars.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
…rity enhancements
|
@greptileai can you review the PR after the latest commit? |
…DB_URL configurations
|
Have you tested this, either locally or deployed somewhere and seen it work in a stable manner? also ideally it shouldn't deploy uptime & links, unless the user explicitly wants those deployed the self-host should be as minimal as possible, yet composable so those features are easy to add |
|
@izadoesdev, I tested the current compose locally and confirmed the default services I ran (api, basket, and links) came up successfully. That said, looking at the file again, it only partially matches your concern: uptime is optional already, but links is still in the default self-host stack, so it’s not as minimal/composable as it should be. I agree the base self-host compose should probably default to the core analytics services only, with links and uptime as explicit opt-ins. |
|
Hey @izadoesdev, I switched self-host to a single docker-compose.selfhost.yml using Compose profiles, so default is minimal (api + basket + infra) and links / uptime are explicit opt-ins via --profile. This seems cleaner than adding more compose files and matches the “minimal but composable” goal. The only caveat is Compose still interpolates env vars for profiled services, so I’m thinking of handling required env validation at service startup for links and uptime instead. |
I think links is easy enough to keep as part of it, uptime is a different service so let's keep that more seperate |
Description
Adds a production-ready
docker-compose.selfhost.ymlfor self-hosting Databuddy with pre-built GHCR images.What's included:
SELFHOST=trueto write directly to ClickHouse (no Kafka/Redpanda needed)API_PORT,BASKET_PORT, etc.)Documentation updates:
README.md— new "Self-Hosting" section with quick start commands and a table distinguishingdocker-compose.yaml(dev) vsdocker-compose.selfhost.yml(production)CONTRIBUTING.md— note clarifying the dev compose file with a link to the self-hosting guideWhy two compose files?
The existing
docker-compose.yamlstarts only infrastructure for local dev. The newdocker-compose.selfhost.ymlis a complete production stack using GHCR images — keeping them separate avoids confusion.Checklist