diff --git a/chartvalidator/README.md b/chartvalidator/README.md index 511076e..08c054e 100644 --- a/chartvalidator/README.md +++ b/chartvalidator/README.md @@ -1,3 +1,161 @@ -## Chart Validator +# Chart Validator -A image for assisting in validation of Kubernetes charts. Has tools for rendering charts and validating them using KubeConform \ No newline at end of file +A CI tool that validates Kubernetes Helm charts end-to-end: it renders each chart with its real values, validates the resulting manifests against Kubernetes API schemas, and confirms that every referenced Docker image actually exists in its registry. + +It is designed to run in CI against [ArgoCD ApplicationSet](https://argo-cd.readthedocs.io/en/stable/user-guide/application-set/) files, so problems are caught before a deployment is attempted. + +--- + +## How it works + +Chart Validator runs a five-stage pipeline. Each stage is backed by a pool of concurrent workers (10 by default) and passes results to the next stage via Go channels. + +``` +ApplicationSet YAML files + │ + ▼ + 1. Parse ApplicationSets — find every chart referenced in each environment + │ + ▼ + 2. Render charts — helm template with the chart's real values files + │ + ▼ + 3. Validate manifests — kubeconform against Kubernetes API schemas + │ + ▼ + 4. Extract images — parse container image references from the manifests + │ + ▼ + 5. Validate images — docker manifest inspect for every unique image +``` + +A failure at any stage is reported immediately; the tool exits non-zero if any check fails. + +### Stage 1 — Parse ApplicationSets + +Scans `{envdir}/{env}/appsets/*.appset.yaml` for entries of the form: + +```yaml +spec: + generators: + - list: + elements: + - chartName: my-chart + repoURL: https://charts.example.com + chartVersion: "1.2.3" + baseValuesFile: env/base/values.yaml + valuesOverride: env/production/values.yaml +``` + +Each element becomes one unit of work flowing through the rest of the pipeline. Paths in `baseValuesFile` and `valuesOverride` are relative to the parent directory of `envdir` (i.e. prefixed with `../`). + +### Stage 2 — Render charts + +Runs `helm template` for each chart, combining the base values file and the override values file. Repository URLs that look like OCI registries but lack a scheme (e.g. `europe-west4-docker.pkg.dev/my-project/charts`) are automatically prefixed with `oci://` because ArgoCD accepts scheme-less URLs but Helm CLI requires the explicit prefix. + +Flags passed to Helm: +- `-f baseValuesFile -f valuesOverride` — layered values +- `--version chartVersion` +- `--include-crds` +- `--kube-version` — the target Kubernetes version (defaults to `1.33.0`) +- `--api-versions` — any additional API groups passed via `-api-versions` + +Rendered manifests are written to the output directory and passed to the next stage by file path. + +### Stage 3 — Validate manifests + +Runs `kubeconform` in strict mode against the rendered YAML. CRD schemas are resolved from three sources in order: + +1. **Local schemas** — JSON schema files in the `schemas/` directory bundled with the image (covers Traefik, 1Password Connect, Prometheus Operator). +2. **CRDs catalog** — remote fallback at `https://raw.githubusercontent.com/datreeio/CRDs-catalog/main/…` for any CRD not in the local set. +3. **Built-in schemas** — kubeconform's own upstream Kubernetes schemas. + +`CustomResourceDefinition` resources themselves are skipped during validation (kubeconform cannot validate a CRD against itself). + +### Stage 4 — Extract images + +Parses the rendered YAML and collects the `image` field from every `containers` and `initContainers` entry in Pods, Deployments, DaemonSets, and StatefulSets. Duplicate image references are deduplicated before being sent to the next stage. + +### Stage 5 — Validate images + +Runs `docker manifest inspect {image}` for each unique image. Results are cached so the same image is only checked once even if it appears across multiple charts. + +If a manifest inspect fails, the check is retried up to **3 times** with a random backoff of **1–15 seconds** between attempts. Failures that are clearly permanent — image not found (`manifest unknown`), auth errors (`unauthorized`, `denied`), or a bad image reference — are not retried. + +--- + +## Usage + +``` +chart-checker [flags] + +Commands: + run-checks Render charts and run all validation checks. + render-only Render charts only; skip all validation. + help Show this help. +``` + +### Flags + +All flags apply to both commands. + +| Flag | Default | Description | +|---|---|---| +| `-env` | _(all environments)_ | Process only this environment (the folder name under `-envdir`). | +| `-envdir` | `../env` | Directory that contains per-environment subdirectories. | +| `-output` | `manifests` | Directory where rendered manifests are written. Cleared on each run. | +| `-api-versions` | _(none)_ | Comma-separated additional Kubernetes API versions passed to `helm template`. | +| `-k8s-version` | `1.33.0` | Kubernetes version used for Helm rendering and kubeconform validation. | +| `-v` | `false` | Enable verbose (debug-level) logging. | + +### Examples + +```bash +# Validate all environments +chart-checker run-checks + +# Validate a single environment +chart-checker run-checks -env sandbox + +# Validate with a specific Kubernetes version +chart-checker run-checks -env production -k8s-version 1.30.0 + +# Render charts only (useful for debugging values or template issues) +chart-checker render-only -env staging -v +``` + +--- + +## Registry authentication + +Image validation uses the Docker daemon's credential store, so any registry reachable by `docker pull` on the host is also reachable by the validator. + +The container image ships with `docker-credential-gcr` pre-configured for Google Artifact Registry endpoints (`gcr.io`, `*.pkg.dev`). For other registries, mount or inject a pre-authenticated `~/.docker/config.json`, or configure the appropriate credential helper in the Docker config before invoking the tool. + +--- + +## Updating CRD schemas + +Local CRD schemas must be regenerated whenever a new CRD-bearing chart version is adopted. The schemas live in `schemas/` and are committed to the repository. + +From `tmp/ci/`: + +```bash +make update-schemas +``` + +This renders the CRD charts (Traefik, 1Password Connect, Prometheus Operator), converts their OpenAPI specs to JSON Schema format, and writes the output to `schemas/`. To target a specific chart version: + +```bash +make update-schemas TRAEFIK_VERSION=35.2.0 PROMETHEUS_OPERATOR_VERSION=76.3.0 +``` + +--- + +## Building the image + +```bash +docker build -t chartvalidator . +``` + +Unit tests run as part of the build. The resulting image contains `chart-checker`, `helm`, `kubeconform`, `kustomize`, and `docker` CLI. diff --git a/chartvalidator/checker/engine_app_checker.go b/chartvalidator/checker/engine_app_checker.go index adf46ad..104b184 100644 --- a/chartvalidator/checker/engine_app_checker.go +++ b/chartvalidator/checker/engine_app_checker.go @@ -5,6 +5,7 @@ import ( "fmt" "path/filepath" "sync" + "time" ) type AppCheckInstruction struct { @@ -86,6 +87,7 @@ func NewAppCheckerEngine(context context.Context, outputDir string, apiVersions cache: map[string]DockerImageValidationResult{}, pending: map[string]*sync.WaitGroup{}, cacheLock: sync.RWMutex{}, + retrySleepFn: time.Sleep, workerWaitGroup: sync.WaitGroup{}, } diff --git a/chartvalidator/checker/engine_docker_validation.go b/chartvalidator/checker/engine_docker_validation.go index 47cc05c..0782801 100644 --- a/chartvalidator/checker/engine_docker_validation.go +++ b/chartvalidator/checker/engine_docker_validation.go @@ -5,6 +5,7 @@ import ( "encoding/json" "fmt" "io/fs" + "math/rand" "os" "os/exec" "path/filepath" @@ -25,12 +26,13 @@ type DockerImageValidationEngine struct { executor CommandExecutor context context.Context - cache map[string]DockerImageValidationResult + cache map[string]DockerImageValidationResult pending map[string]*sync.WaitGroup cacheLock sync.RWMutex name string + retrySleepFn func(time.Duration) workerWaitGroup sync.WaitGroup } @@ -127,32 +129,69 @@ func (engine *DockerImageValidationEngine) waitForPending(chart ChartRenderParam } func (engine *DockerImageValidationEngine) validateSingleDockerImage(chart ChartRenderParams, image string, workerId int) DockerImageValidationResult { - ctx, cancel := context.WithTimeout(engine.context, 2*time.Minute) - defer cancel() + const maxRetries = 3 args := []string{"manifest", "inspect", image} - cmd := engine.executor.CommandContext(ctx, "docker", args...) + cmdStr := fmt.Sprintf("docker %s", strings.Join(args, " ")) + + var err error + var output []byte + for attempt := 0; attempt <= maxRetries; attempt++ { + if attempt > 0 { + sleepSecs := 1 + rand.Intn(15) + logEngineWarning(engine.name, workerId, fmt.Sprintf("retrying %s (attempt %d/%d) after %ds", cmdStr, attempt+1, maxRetries+1, sleepSecs)) + engine.retrySleepFn(time.Duration(sleepSecs) * time.Second) + select { + case <-engine.context.Done(): + return DockerImageValidationResult{Image: image, Exists: false, Error: engine.context.Err(), Chart: chart} + default: + } + } - // Print the command being executed using interface methods - cmdStr := fmt.Sprintf("%s %s", filepath.Base(cmd.GetPath()), strings.Join(cmd.GetArgs()[1:], " ")) - logEngineDebug(engine.name, workerId, fmt.Sprintf("executing: %s", cmdStr)) + ctx, cancel := context.WithTimeout(engine.context, 2*time.Minute) + cmd := engine.executor.CommandContext(ctx, "docker", args...) + logEngineDebug(engine.name, workerId, fmt.Sprintf("executing: %s", cmdStr)) + output, err = cmd.CombinedOutput() + cancel() - err := cmd.Run() + if err == nil { + logEngineDebug(engine.name, workerId, fmt.Sprintf("completed: %s", cmdStr)) + break + } - exists := err == nil - if err != nil { - logEngineWarning(engine.name, workerId, fmt.Sprintf("failed: %s", cmdStr)) - } else { - logEngineDebug(engine.name, workerId, fmt.Sprintf("completed: %s", cmdStr)) + logEngineWarning(engine.name, workerId, fmt.Sprintf("failed: %s: %s", cmdStr, strings.TrimSpace(string(output)))) + + if isPermanentDockerError(output) { + logEngineWarning(engine.name, workerId, fmt.Sprintf("not retrying %s: permanent failure detected", cmdStr)) + break + } } return DockerImageValidationResult{ Image: image, - Exists: exists, + Exists: err == nil, Error: err, - Chart: chart, + Chart: chart, } +} +// isPermanentDockerError returns true if the command output indicates a failure +// that will not resolve on retry, such as an image not existing or an auth error. +func isPermanentDockerError(output []byte) bool { + s := strings.ToLower(string(output)) + for _, pattern := range []string{ + "manifest unknown", + "no such manifest", + "unauthorized", + "denied", + "invalid reference format", + "name unknown", + } { + if strings.Contains(s, pattern) { + return true + } + } + return false } // findJSONFiles recursively finds all JSON files in the given directory diff --git a/chartvalidator/checker/engine_docker_validation_test.go b/chartvalidator/checker/engine_docker_validation_test.go index d64db16..c9cb99a 100644 --- a/chartvalidator/checker/engine_docker_validation_test.go +++ b/chartvalidator/checker/engine_docker_validation_test.go @@ -15,13 +15,14 @@ import ( // Helper function to create a Docker validation engine func createDockerValidationEngine(mockExecutor *MockCommandExecutor) *DockerImageValidationEngine { return &DockerImageValidationEngine{ - inputChan: make(chan ImageExtractionResult), - outputChan: make(chan DockerImageValidationResult), - executor: mockExecutor, - context: createTestContext(), - cache: make(map[string]DockerImageValidationResult), - pending: make(map[string]*sync.WaitGroup), - name: "DockerImageValidationEngine", + inputChan: make(chan ImageExtractionResult), + outputChan: make(chan DockerImageValidationResult), + executor: mockExecutor, + context: createTestContext(), + cache: make(map[string]DockerImageValidationResult), + pending: make(map[string]*sync.WaitGroup), + name: "DockerImageValidationEngine", + retrySleepFn: func(time.Duration) {}, } } @@ -65,6 +66,7 @@ func sendImagesToEngine(engine *DockerImageValidationEngine, images []string) { Image: img, } } + close(engine.inputChan) }() } @@ -112,9 +114,8 @@ func TestDockerImageValidationEngine(t *testing.T) { img := "nginx:1.20" go func(s string) { - engine.inputChan <- ImageExtractionResult{ - Image: s, - } + engine.inputChan <- ImageExtractionResult{Image: s} + close(engine.inputChan) }(img) result := <-engine.outputChan @@ -126,7 +127,6 @@ func TestDockerImageValidationEngine(t *testing.T) { } assertCommandExecution(t, mockExecutor, "docker manifest inspect nginx:1.20") - engine.context.Done() } func TestDockerImageValidationCache(t *testing.T) { @@ -419,24 +419,100 @@ func TestValidateSingleDockerImage(t *testing.T) { } } -func TestDockerValidationError(t *testing.T) { +func TestDockerValidationRetry(t *testing.T) { + callCount := 0 + mockExecutor := createMockExecutorWithBehavior(func() error { + callCount++ + if callCount < 3 { + return fmt.Errorf("transient docker error") + } + return nil + }) + + engine := createDockerValidationEngine(mockExecutor) + engine.Start(1) + + img := "nginx:1.20" + go func() { + engine.inputChan <- ImageExtractionResult{Image: img} + close(engine.inputChan) + }() + + result := <-engine.outputChan + assert.Equal(t, img, result.Image) + assert.Nil(t, result.Error, "expected success after retries") + assert.True(t, result.Exists) + assert.Equal(t, 3, callCount, "expected 3 attempts (2 failures then 1 success)") +} + +func TestDockerValidationRetriesExhausted(t *testing.T) { + callCount := 0 mockExecutor := createMockExecutorWithBehavior(func() error { - return fmt.Errorf("mocked docker error") + callCount++ + return fmt.Errorf("transient docker error") }) engine := createDockerValidationEngine(mockExecutor) engine.Start(1) + img := "nginx:1.20" + go func() { + engine.inputChan <- ImageExtractionResult{Image: img} + close(engine.inputChan) + }() + + result := <-engine.outputChan + assert.Equal(t, img, result.Image) + assert.NotNil(t, result.Error) + assert.False(t, result.Exists) + assert.Equal(t, 4, callCount, "expected 4 attempts (1 initial + 3 retries)") +} + +func TestDockerValidationPermanentError(t *testing.T) { + callCount := 0 + mockExecutor := &MockCommandExecutor{ + Output: []byte("unauthorized: authentication required"), + BehaviorOnRun: func() error { + callCount++ + return fmt.Errorf("exit status 1") + }, + } + + engine := createDockerValidationEngine(mockExecutor) + engine.Start(1) + + img := "private.registry.io/app:v1.0" + go func() { + engine.inputChan <- ImageExtractionResult{Image: img} + close(engine.inputChan) + }() + + result := <-engine.outputChan + assert.Equal(t, img, result.Image) + assert.NotNil(t, result.Error) + assert.False(t, result.Exists) + assert.Equal(t, 1, callCount, "expected exactly 1 attempt: permanent errors should not be retried") +} + +func TestDockerValidationError(t *testing.T) { + mockExecutor := &MockCommandExecutor{ + Output: []byte("manifest unknown: manifest unknown"), + BehaviorOnRun: func() error { + return fmt.Errorf("exit status 1") + }, + } + + engine := createDockerValidationEngine(mockExecutor) + engine.Start(1) + img := "nonexistent:image" go func(s string) { - engine.inputChan <- ImageExtractionResult{ - Image: s, - } + engine.inputChan <- ImageExtractionResult{Image: s} + close(engine.inputChan) }(img) result := <-engine.outputChan assert.Equal(t, result.Image, img) assert.NotNil(t, result.Error) assertCommandExecution(t, mockExecutor, "docker manifest inspect nonexistent:image") - engine.context.Done() } \ No newline at end of file diff --git a/chartvalidator/checker/exec_mock.go b/chartvalidator/checker/exec_mock.go index e1bcc97..e5cbd95 100644 --- a/chartvalidator/checker/exec_mock.go +++ b/chartvalidator/checker/exec_mock.go @@ -45,6 +45,10 @@ func (m *MockCommand) SetDir(dir string) { } func (m *MockCommand) CombinedOutput() ([]byte, error) { + if m.executor.BehaviorOnRun != nil { + err := m.executor.BehaviorOnRun() + return m.output, err + } return m.output, m.err }