Add database upgrade and rollback compatibility tests#2084
Conversation
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
There was a problem hiding this comment.
Pull request overview
Adds end-to-end upgrade/rollback coverage for the database migration system by exercising real Helm upgrades from prior releases (including a rolling-upgrade compatibility window) and validating schema/data invariants across forward and reverse migrations.
Changes:
- Add Go e2e tests to validate upgrade round-trip (seed → upgrade → schema equivalence → rollback) and rolling upgrade compatibility (old pods against new schema).
- Add scripts/Makefile targets and CI jobs to resolve upgrade-from versions and run the new tests in a version matrix.
- Update database-migrations reference docs to reflect that the round-trip upgrade testing is now enforced.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| scripts/upgrade-from-version.sh | Resolves the adjacent prior release tag reachable from HEAD for upgrade testing. |
| scripts/prev-stable-version.sh | Resolves the latest patch tag from the previous stable release line for rollback-floor coverage. |
| Makefile | Adds targets to install the prior release and run upgrade/rolling-upgrade e2e suites. |
| go/core/test/e2e/upgrade/upgrade_test.go | Implements the upgrade round-trip test (seed, upgrade, schema equivalence, rollback assertions). |
| go/core/test/e2e/upgrade/roundtrip_test.go | Adds shared helpers for schema dumping, clean-install schema construction, port-forward, and migration driving. |
| go/core/test/e2e/upgrade/rolling_upgrade_test.go | Implements rolling-upgrade compatibility checks (old-code/new-schema window). |
| .github/workflows/ci.yaml | Adds CI jobs to run upgrade tests across adjacent + previous-stable version targets. |
| .github/actions/upgrade-test-setup/action.yaml | Composite action to resolve target versions, dedupe legs, and set up the toolchain/cluster. |
| .claude/skills/kagent-dev/references/database-migrations.md | Updates migration policy docs to reflect enforced upgrade/rollback testing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
EItanya
left a comment
There was a problem hiding this comment.
My main comment about these tests is that they sidestep the actual kagent DB usage. They interact directly with the DB rather than testing how the system behaves. For example they pre-seed the DB with data by directly inserting it rather than performing kagent operations. How much more work would it be to access the data that way?
| buildx-builder-name: | ||
| description: Name for the docker/setup-buildx-action builder. | ||
| required: true | ||
| buildx-version: | ||
| description: Buildx version for docker/setup-buildx-action. | ||
| required: true |
There was a problem hiding this comment.
Why are these required inputs? Can't they be static?
| $(MAKE) build | ||
| $(MAKE) install-previous-release |
There was a problem hiding this comment.
Why are these not dependency targets? calling make inside of make seems weird
| func TestRollingUpgradeCompatibility(t *testing.T) { | ||
| if os.Getenv("RUN_ROLLING_UPGRADE_TESTS") != "true" { | ||
| t.Skip("set RUN_ROLLING_UPGRADE_TESTS=true to run rolling upgrade tests") | ||
| } |
There was a problem hiding this comment.
This feels weird, can't we have this in a separate package or something?
| $(MAKE) build | ||
| $(MAKE) install-previous-release UPGRADE_PREV_EXTRA_ARGS="--set controller.replicas=2" |
There was a problem hiding this comment.
Same comment about calling make
|
|
||
| .PHONY: install-previous-release | ||
| install-previous-release: ## Install the previous released kagent + kagent-crds charts from the public OCI registry | ||
| test -n "$(UPGRADE_FROM_VERSION)" || { echo "UPGRADE_FROM_VERSION is empty; set it explicitly or ensure git tags are fetched." >&2; exit 1; } |
| UPGRADE_FROM_VERSION ?= $(shell ./scripts/upgrade-from-version.sh) | ||
|
|
||
| # The bundled-Postgres image is selected by the install target's --set flags, not | ||
| # by the chart defaults (the chart ships a non-vector image). So the previous | ||
| # install must use the exact pins the upgrade-from release shipped — otherwise the | ||
| # baseline DB would differ from how that release actually runs, and the upgrade | ||
| # would conflate a DB swap with the migration change under test. Read those flags | ||
| # straight from that release's own helm-install-provider target (via its tagged | ||
| # Makefile) instead of hardcoding values that drift as the bundled image changes. | ||
| # Assumes the flags are literal (no make/env variables); the guard in | ||
| # install-previous-release fails loudly if they can't be read. | ||
| PREV_DB_SET_FLAGS = $(shell git show v$(UPGRADE_FROM_VERSION):Makefile 2>/dev/null | \ |
There was a problem hiding this comment.
This means that these will be called every time make is used. Should we instead add this specifically to the targets that need it?
## Description Adds a static check that a migration on this branch doesn't break the queries a prior release still runs — the backward-compatibility property that makes a one-step rollback (and a rolling deploy) safe. It compiles a prior release's sqlc queries against the current schema (derived from the migration files) and fails if a migration dropped, renamed, or retyped a column or table an older query still references. Runs in normal CI on every PR — no database, no cluster. ## Targets Checks two prior versions (deduplicated when they coincide): - the latest release reachable from HEAD — the adjacent previous release, and - the previous stable line's latest patch (the `release/vX.Y.x` tip) — the rollback-window floor. ## Running ``` make -C go check-query-contraction ``` Override the targets with `TARGET_VERSIONS` (space-separated, no leading `v`). ## Notes - Verified it passes today and fails on a simulated column drop. - Catches column/table/type-shape contractions; it does not catch data rewrites (no DDL to inspect) or semantic changes a query still compiles against. - Complements the contraction-declaration check in #2067: that one requires destructive DDL to be acknowledged; this one verifies a prior release's queries actually survive the new schema. - Also documents the existing `sqlc generate` gate (schema derived from migrations) as the forward-direction equivalent for current-release queries. - This branch carries its own copies of the helper scripts, which are shared with #2084, so that will require a trivial merge conflict resolution. --------- Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Description
Closes: #1637
Adds automated coverage that a Helm upgrade from a prior release to the current build is safe: migrations apply cleanly, existing data survives, the resulting schema matches a fresh install, and the migrations reverse cleanly. Also covers the rolling-deploy window where prior-release pods serve against an already-migrated schema.
Tests
TestUpgrade, gated byRUN_UPGRADE_TESTS): installs the prior release, seeds representative data, upgrades to the current build, then asserts the controller rolls out without crashing, migrations reach the target version cleanly, the seeded data survives, the upgraded schema is identical to a clean install, and reversing the migrations restores the prior schema with data intact.TestRollingUpgradeCompatibility, gated byRUN_ROLLING_UPGRADE_TESTS): runs the upgrade with two controller replicas and verifies a prior-release pod can still read and write while the new schema is live. Skips when there is no migration delta to exercise.Targets
Both tests run against two prior versions (a CI matrix, deduplicated when they coincide):
release/vX.Y.xtip) — the supported rollback floor.Override locally with
UPGRADE_FROM_VERSION.Running locally
Requires a kind cluster, then:
or
A run leaves the database migrated, so recreate the cluster between runs.
Notes