Skip to content

ci(gha): adopt chart-testing for chart CI (lint + install)#95

Open
etgraylog wants to merge 16 commits into
Graylog2:mainfrom
etgraylog:ci/chart-testing
Open

ci(gha): adopt chart-testing for chart CI (lint + install)#95
etgraylog wants to merge 16 commits into
Graylog2:mainfrom
etgraylog:ci/chart-testing

Conversation

@etgraylog
Copy link
Copy Markdown

@etgraylog etgraylog commented Jun 1, 2026

Summary

Replace the bare helm lint workflow with a Helm chart testing tool ct-based CI flow that runs:

  • ct lint (Chart.yaml schema, maintainer/icon checks, version-bump enforcement, yamllint over Chart.yaml + values.yaml + ci/overlays)
  • ct install (kind cluster + MongoDB Kubernetes Operator + helm install + helm test), with a Kubernetes version matrix covering v1.32.11, v1.33.7, and v1.34.3 — the chart's stated minimum and recent stable versions per docs/TESTING.md.

Every PR against main now automatically validates that the chart renders cleanly and deploys correctly on each supported Kubernetes version, with all 5 existing helm test pods passing. Two pre-existing yamllint violations surfaced by the new linter are also fixed so the new workflow lands green.

Details

Two-job workflow architecture

  • helm-ct-lint job — runs ct lint --all against the chart. ~10–15 seconds.
  • helm-ct-install job — runs ct install --all against a kind cluster with the MongoDB Operator pre-installed. Matrix over three Kubernetes versions (v1.32.11, v1.33.7, v1.34.3), each in parallel. ~5–6 minutes wall time per matrix variant. Gated on helm-ct-lint passing (needs: helm-ct-lint) so the longer install only runs when lint is clean.

Multi-Kubernetes version testing

The helm-ct-install job uses a strategy.matrix over Kubernetes versions, each variant spinning up its own kind cluster on a different kindest/node image. fail-fast: false ensures all three versions run to completion even if one fails, so reviewers see which versions are affected. The matrix covers the range stated in docs/TESTING.md ("Kubernetes 1.32+"):

  • v1.32.11 — stated minimum supported version
  • v1.33.7 — mid-range stable
  • v1.34.3 — recent stable

v1.35.0 is intentionally not included to keep the matrix at three parallel runners. Adding it later is a one-line change.

What ct install covers

On every PR, each matrix variant's kind cluster gets a fresh chart install using the minimal-resource overlay at charts/graylog/ci/ci-values.yaml. After install, ct runs helm test automatically, exercising all 5 existing test pods at charts/graylog/templates/tests/:

  • test-credentials-secret
  • test-mongodb-connectivity
  • test-datanode-registration
  • test-graylog-api-health
  • test-graylog-cluster-status

These test pods were previously exercised only manually per docs/TESTING.md Phase 3 — they are now enforced on every PR against three Kubernetes versions in parallel.

ct.yaml validation policy

.github/ct.yaml explicitly enables ct's validation knobs (rather than relying on default values) so the chart's CI policy is self-documenting:

  • check-version-increment: true — enforce Chart.yaml version bumps on chart changes (gated by ct's --all flag in our workflow, so currently informational; takes effect if --all is dropped)
  • validate-chart-schema: trueChart.yaml schema validation
  • validate-yaml: true — yamllint over Chart.yaml, values.yaml, and ci/*-values.yaml
  • upgrade: false — in-place upgrade testing is disabled. ct's --upgrade doesn't exercise helm upgrade against an existing deployed release the way users would expect; it installs the previous revision into an ephemeral namespace, upgrades to current within that same ephemeral namespace, then tears down. Combined with the ~2x CI runtime cost it adds, not worth enabling right now. Configuration kept visible so it's easy to re-enable later if ct semantics change or maintainers want to opt in.

Ephemeral rootPassword handling

The chart generates a random rootPassword when none is provided in values. To avoid leaking that random value into the public CI log (via NOTES.txt rendering), the workflow:

  1. Generates a per-run random password via openssl rand -hex 16.
  2. Registers it via ::add-mask:: BEFORE ct install runs, so all subsequent log output masks the value.
  3. Passes it to ct install --helm-extra-set-args "--set graylog.config.rootPassword=<value>".

Result: NOTES.txt's EXTERNAL ACCESS → password: <value> renders as *** in logs, and the "ADDITIONAL NOTES → randomly generated password" warning block is suppressed (because rootPassword is explicitly set via --set, the chart's {{- else if empty }} branch doesn't render).

New files / modifications

  • .github/ct.yaml (new) — chart-testing config (chart-dirs, target-branch, validate-maintainers, explicit validation defaults, upgrade: false with documented rationale, helm timeout).
  • charts/graylog/ci/ci-values.yaml (new) — minimal-resource overlay for the helm-ct-install job. Scales the chart down to fit a default GitHub-hosted runner (~7 GB RAM, 4 vCPU): single replica of each tier, reduced JVM heaps, MongoDB v8.0.23.
  • .github/workflows/lint-and-test.yaml — replaced. Two-job workflow with kind cluster setup, MongoDB Operator install, ephemeral rootPassword generation + masking, ct lint + ct install steps, and Kubernetes version matrix.

MongoDB version override (workaround for #89)

The ci-values overlay pins mongodb.version: "8.0.23". The chart's default is "7.0.25", which is the exact version affected by issue #89 ("MongoDB Version 7.0 silently fails"). Pinning to 8.0.23 in CI ensures the install completes cleanly; the chart's default value is unchanged and remains in scope for the #89 fix.

Action version pinning (Node 24 readiness)

  • actions/checkout@v6 (was v4)
  • azure/setup-helm@v5 (was v4; deprecated token: input dropped)
  • helm/chart-testing-action@v2.8.0 (defaults to ct 3.14.0)
  • helm/kind-action@v1.14.0

All on Node 24-supporting versions, ahead of the June 2026 Node 20 runtime sunset.

yamllint violations fixed

  • charts/graylog/Chart.yaml — added missing trailing newline (rule: new-line-at-end-of-file).
  • charts/graylog/values.yaml — RBAC role-rules block (lines 344-351): removed inner spaces in brackets ([ "" ][""]) per rule brackets, added missing trailing newline.

Linked issues

None directly. This converts docs/TESTING.md Phases 1-3 (Static Validation, Installation, Automated Test Suite) from documented-manual to enforced-by-CI, and adds Kubernetes version-matrix testing for the chart's stated supported range.

PR Checklist

  • Tests added/updated
  • Documentation updated
  • This PR includes a new feature
  • This PR includes a bugfix
  • This PR includes a refactor

Testing Checklist

Static Validation

  • Linter check passes: helm lint ./charts/graylog
  • Helm renders local template successfully: helm template graylog ./charts/graylog --validate

Installation

  • Fresh installation completes successfully: helm install graylog ./charts/graylog
  • All pods reach Running state
  • Helm tests pass: helm test graylog

Functional (if applicable)

  • Web UI accessible and login works
  • DataNodes visible in System > Cluster Configuration
  • Inputs can be created and receive data

Upgrade (if applicable)

  • Upgrade from previous release succeeds
  • Scaling up/down works correctly
  • Configuration changes apply correctly

Specific to this PR

  • ct lint --config .github/ct.yaml --all passes locally against this branch.
  • ct install --config .github/ct.yaml --all validated end-to-end via the PR-attached CI run, covering the chart install + all 5 helm test pods against Kubernetes v1.32.11, v1.33.7, and v1.34.3 in parallel.
  • Password masking verified: NOTES.txt's EXTERNAL ACCESS section renders password: *** in workflow logs, and the "ADDITIONAL NOTES randomly generated password" warning block is correctly suppressed.
  • Final CI validation run on PR ci(gha): adopt chart-testing for chart CI (lint + install) #95: run 26732173551, conclusion success. All four jobs green: helm-ct-lint and helm-ct-install matrix variants for K8s v1.32.11, v1.33.7, v1.34.3.

Notes for reviewers

  • Verify all applicable tests above pass
  • Validate that the linked issues are no longer reproducible, if applicable
  • Sync up with the author before merging
  • The commit history should be preserved — recommend rebase-merge to keep the 13-commit split (3 yamllint fixes + 10 CI infrastructure commits).
  • Test pods at charts/graylog/templates/tests/ remain in place; they're now exercised by ct install on every PR, against three Kubernetes versions. No changes to those files in this PR.
  • MongoDB version pin in ci-values.yaml (mongodb.version: "8.0.23") is a workaround for MongoDB Version 7.0 silently fails #89; can be removed when the chart's default (7.0.25) is updated post-MongoDB Version 7.0 silently fails #89 resolution. The chart's actual default value is unchanged.
  • upgrade: true was considered and explicitly disabled. ct's --upgrade doesn't exercise helm upgrade against an existing deployed release in the way users/maintainers would expect — it installs the previous revision into an ephemeral namespace, upgrades to current within that same ephemeral namespace, and tears down. Combined with the ~2x CI runtime cost, not worth enabling now. The configuration line is kept visible in .github/ct.yaml so re-enablement is a single-character change if maintainers want it later.
  • Functional checklist left unchecked: ct install covers helm install + helm test, but not browser-driven UI verification. UI functional testing remains manual per docs/TESTING.md Phase 4 and could be addressed in a follow-up PR.
  • v1.35.0 is not in the K8s matrix. We chose three versions (1.32 minimum, 1.33 mid, 1.34 recent) to keep the matrix at three parallel runners. Easy to extend with v1.35.0 (and beyond) as kindest/node releases.
  • Known cosmetic annotation in CI runs: astral-sh/setup-uv@v7.3.0 (invoked internally by helm/chart-testing-action@v2.8.0) emits "No file matched to ... The cache will never get invalidated" because this repo has no Python dependency files for the action's default cache-dependency-glob to hash. Informational only, doesn't affect functionality. Not addressable from our workflow without forking chart-testing-action or replacing it with manual setup. Could file an upstream issue against helm/chart-testing-action to expose enable-cache (or similar) as a forwarded input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant