Skip to content

feat: Add file-based dashboard provisioner#1962

Open
ZeynelKoca wants to merge 1 commit intohyperdxio:mainfrom
ZeynelKoca:feature/dashboard-provisioner
Open

feat: Add file-based dashboard provisioner#1962
ZeynelKoca wants to merge 1 commit intohyperdxio:mainfrom
ZeynelKoca:feature/dashboard-provisioner

Conversation

@ZeynelKoca
Copy link
Copy Markdown

@ZeynelKoca ZeynelKoca commented Mar 22, 2026

Summary

Add a provision-dashboards task that reads .json files from a directory and upserts dashboards into MongoDB, following the existing task system pattern (same as check-alerts).

Provisioned dashboards are flagged with provisioned: true so they never overwrite user-created dashboards with the same name. Files are validated against DashboardWithoutIdSchema. Removing a file does not delete the dashboard (safe by default, same as Grafana). The task is deployment-agnostic: it reads from a directory, regardless of how files get there.

When DASHBOARD_PROVISIONER_DIR is set, entry.prod.sh automatically starts the task as an additional process alongside the API, App, and check-alerts.

Note: Users can currently edit provisioned dashboards through the UI, but changes will be overwritten on the next sync cycle. Grafana handles this by blocking saves on provisioned dashboards. Adding a similar guard would be a good follow-up to improve UX.

Variable Required Default Description
DASHBOARD_PROVISIONER_DIR Yes Directory to read .json files from
DASHBOARD_PROVISIONER_TEAM_ID No* Scope to a specific team ID
DASHBOARD_PROVISIONER_ALL_TEAMS No* false Set to true to provision to all teams

*One of DASHBOARD_PROVISIONER_TEAM_ID or DASHBOARD_PROVISIONER_ALL_TEAMS=true is required.

How to test locally or on Vercel

  1. Create a directory with a dashboard JSON file:
    mkdir /tmp/dashboards
    echo '{"name":"Test Dashboard","tiles":[],"tags":[]}' > /tmp/dashboards/test.json
  2. Run the task:
    DASHBOARD_PROVISIONER_DIR=/tmp/dashboards DASHBOARD_PROVISIONER_ALL_TEAMS=true
    ./packages/api/bin/hyperdx task provision-dashboards
  3. Verify the dashboard appears in the UI
  4. Modify the JSON file, run again, verify it updates
  5. Delete the JSON file, run again, verify the dashboard persists

References

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 22, 2026

🦋 Changeset detected

Latest commit: 65f5509

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Minor
@hyperdx/app Minor
@hyperdx/otel-collector Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 22, 2026

@ZeynelKoca is attempting to deploy a commit to the HyperDX Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 22, 2026

PR Review

The implementation is well-structured and follows existing task patterns closely. A few items worth addressing:

  • ⚠️ asyncDispose() closes the mongoose connection after every cron tick → In built-in scheduler mode (RUN_SCHEDULED_TASKS_EXTERNALLY=false), asyncDispose() is called in the finally block after each tick, closing the connection, then connectDB() re-opens it on the next tick. Verify connectDB() handles reconnection after explicit close (pre-existing pattern for other tasks, but worth confirming it works here).

  • ⚠️ No API-level guard for provisioned dashboards → Users can freely edit or delete provisioned dashboards via the API. The provisioner will overwrite their edits on the next sync. Consider either blocking edits to provisioned: true dashboards in the update/delete API, or at minimum surfacing the provisioned flag in the UI so users understand the behavior.

  • ⚠️ Partial unique index added to an existing collection → The new { name, team } unique partial index on documents with provisioned: true will be built against existing data on first deployment. Since no existing documents have provisioned: true, this is safe, but worth verifying the index creation is non-blocking for large deployments (consider { background: true } if the schema DSL supports it).

  • ✅ Shell script changes look safe — DASHBOARD_PROVISIONER_DIR is only used in a conditional check; no injection risk.

  • asyncDispose() matches the HdxTask interface contract correctly.

  • ✅ Test coverage is comprehensive and uses real DB (consistent with project pattern of no DB mocks).

@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch 17 times, most recently from 4c24c45 to abdafb3 Compare March 22, 2026 23:40
ZeynelKoca added a commit to ZeynelKoca/ClickStack-helm-charts that referenced this pull request Mar 22, 2026
k8s-sidecar watches ConfigMaps labeled "hyperdx.io/dashboard: true"
across all namespaces and writes dashboard JSON to a shared volume.
HyperDX reads and upserts them natively via file-based provisioner.

Requires hyperdxio/hyperdx#1962
@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch 5 times, most recently from 3e9be8a to ffe6158 Compare March 23, 2026 12:43
@ZeynelKoca ZeynelKoca changed the title Add file-based dashboard provisioner featu: Add file-based dashboard provisioner Mar 23, 2026
@ZeynelKoca ZeynelKoca changed the title featu: Add file-based dashboard provisioner feat: Add file-based dashboard provisioner Mar 23, 2026
@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch from ffe6158 to 10ae4d2 Compare March 24, 2026 08:21
Copy link
Copy Markdown
Contributor

@dhable dhable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the well thought out contribution. Overall this looks good but we don't typically have long running processes running as tasks inside the api process. We have put background processing like this as a task.

This approach provides a flexible deployment. Using our full stack image, we start the tasks as separate processes and then have a CronJob library manage scheduling of execution. For more advanced deployments, these scheduled tasks can run outside the main process, like using a k8s CronJob or other scheduling system.

I think this implementation could be easily adapted to that design since the startDashboardProvisioner() function is almost a direct fit for the task system. The check-alert task also needs to access Mongo so you should be able to connect in the same way.

@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch 3 times, most recently from 122266f to 32daf0f Compare March 29, 2026 17:51
@ZeynelKoca ZeynelKoca force-pushed the feature/dashboard-provisioner branch from 32daf0f to 65f5509 Compare March 29, 2026 18:01
@ZeynelKoca
Copy link
Copy Markdown
Author

PR Review

The implementation is well-structured and follows existing task patterns closely. A few items worth addressing:

* ⚠️ **`asyncDispose()` closes the mongoose connection after every cron tick** → In built-in scheduler mode (`RUN_SCHEDULED_TASKS_EXTERNALLY=false`), `asyncDispose()` is called in the `finally` block after each tick, closing the connection, then `connectDB()` re-opens it on the next tick. Verify `connectDB()` handles reconnection after explicit close (pre-existing pattern for other tasks, but worth confirming it works here).

* ⚠️ **No API-level guard for provisioned dashboards** → Users can freely edit or delete provisioned dashboards via the API. The provisioner will overwrite their edits on the next sync. Consider either blocking edits to `provisioned: true` dashboards in the update/delete API, or at minimum surfacing the `provisioned` flag in the UI so users understand the behavior.

* ⚠️ **Partial unique index added to an existing collection** → The new `{ name, team }` unique partial index on documents with `provisioned: true` will be built against existing data on first deployment. Since no existing documents have `provisioned: true`, this is safe, but worth verifying the index creation is non-blocking for large deployments (consider `{ background: true }` if the schema DSL supports it).

* ✅ Shell script changes look safe — `DASHBOARD_PROVISIONER_DIR` is only used in a conditional check; no injection risk.

* ✅ `asyncDispose()` matches the `HdxTask` interface contract correctly.

* ✅ Test coverage is comprehensive and uses real DB (consistent with project pattern of no DB mocks).
  1. asyncDispose reconnection: tested and works as intended. Same pattern as check-alerts, which also closes and reconnects each tick via its provider
  2. API guard: was already mentioned in the PR description. Recommend a follow-up PR to not overbloat this PR with code changes (since it also involves frontend work)
  3. Index creation: as mentioned, no existing documents have provisioned: true, so index builds instantly

ZeynelKoca added a commit to ZeynelKoca/ClickStack-helm-charts that referenced this pull request Mar 29, 2026
k8s-sidecar watches ConfigMaps labeled "hyperdx.io/dashboard: true"
across all namespaces and writes dashboard JSON to a shared volume.
HyperDX reads and upserts them natively via file-based provisioner.

Requires hyperdxio/hyperdx#1962
@ZeynelKoca
Copy link
Copy Markdown
Author

Thanks for the well thought out contribution. Overall this looks good but we don't typically have long running processes running as tasks inside the api process. We have put background processing like this as a task.

This approach provides a flexible deployment. Using our full stack image, we start the tasks as separate processes and then have a CronJob library manage scheduling of execution. For more advanced deployments, these scheduled tasks can run outside the main process, like using a k8s CronJob or other scheduling system.

I think this implementation could be easily adapted to that design since the startDashboardProvisioner() function is almost a direct fit for the task system. The check-alert task also needs to access Mongo so you should be able to connect in the same way.

Reimplemented with the existing concurrent task system. Relevant ClickStack helm PR is also updated

@ZeynelKoca ZeynelKoca requested a review from dhable March 29, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants