Skip to content

Add database upgrade and rollback compatibility tests#2084

Open
iplay88keys wants to merge 9 commits into
mainfrom
iplay88keys/db-upgrade-compatibility-tests
Open

Add database upgrade and rollback compatibility tests#2084
iplay88keys wants to merge 9 commits into
mainfrom
iplay88keys/db-upgrade-compatibility-tests

Conversation

@iplay88keys

@iplay88keys iplay88keys commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Description

Closes: #1637

Adds automated coverage that a Helm upgrade from a prior release to the current build is safe: migrations apply cleanly, existing data survives, the resulting schema matches a fresh install, and the migrations reverse cleanly. Also covers the rolling-deploy window where prior-release pods serve against an already-migrated schema.

Tests

  • Upgrade round-trip (TestUpgrade, gated by RUN_UPGRADE_TESTS): installs the prior release, seeds representative data, upgrades to the current build, then asserts the controller rolls out without crashing, migrations reach the target version cleanly, the seeded data survives, the upgraded schema is identical to a clean install, and reversing the migrations restores the prior schema with data intact.
  • Rolling compatibility (TestRollingUpgradeCompatibility, gated by RUN_ROLLING_UPGRADE_TESTS): runs the upgrade with two controller replicas and verifies a prior-release pod can still read and write while the new schema is live. Skips when there is no migration delta to exercise.

Targets

Both tests run against two prior versions (a CI matrix, deduplicated when they coincide):

  • the latest release reachable from HEAD — the adjacent previous release, and
  • the previous stable line's latest patch (the release/vX.Y.x tip) — the supported rollback floor.

Override locally with UPGRADE_FROM_VERSION.

Running locally

Requires a kind cluster, then:

make create-kind-cluster
make run-upgrade-tests

or

make create-kind-cluster
make run-rolling-upgrade-tests

A run leaves the database migrated, so recreate the cluster between runs.

Notes

  • When the current and prior releases share the same migration set, the migration and reversal assertions are no-ops (the data-survival, schema-equivalence, and no-crash checks still run); the tests engage fully on the first new migration. The two targets also coincide today and diverge once a new minor releases or the stable line gets a backport patch.
  • Query-level backward compatibility (a prior release's queries against the new schema) is covered separately by a static check in Add query-contraction check against the previous release #2085.
  • This branch carries its own copies of the helper scripts, which are shared with Add query-contraction check against the previous release #2085, so that will require a trivial merge conflict resolution.

Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
@iplay88keys iplay88keys marked this pull request as ready for review June 25, 2026 20:28
Copilot AI review requested due to automatic review settings June 25, 2026 20:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end upgrade/rollback coverage for the database migration system by exercising real Helm upgrades from prior releases (including a rolling-upgrade compatibility window) and validating schema/data invariants across forward and reverse migrations.

Changes:

  • Add Go e2e tests to validate upgrade round-trip (seed → upgrade → schema equivalence → rollback) and rolling upgrade compatibility (old pods against new schema).
  • Add scripts/Makefile targets and CI jobs to resolve upgrade-from versions and run the new tests in a version matrix.
  • Update database-migrations reference docs to reflect that the round-trip upgrade testing is now enforced.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
scripts/upgrade-from-version.sh Resolves the adjacent prior release tag reachable from HEAD for upgrade testing.
scripts/prev-stable-version.sh Resolves the latest patch tag from the previous stable release line for rollback-floor coverage.
Makefile Adds targets to install the prior release and run upgrade/rolling-upgrade e2e suites.
go/core/test/e2e/upgrade/upgrade_test.go Implements the upgrade round-trip test (seed, upgrade, schema equivalence, rollback assertions).
go/core/test/e2e/upgrade/roundtrip_test.go Adds shared helpers for schema dumping, clean-install schema construction, port-forward, and migration driving.
go/core/test/e2e/upgrade/rolling_upgrade_test.go Implements rolling-upgrade compatibility checks (old-code/new-schema window).
.github/workflows/ci.yaml Adds CI jobs to run upgrade tests across adjacent + previous-stable version targets.
.github/actions/upgrade-test-setup/action.yaml Composite action to resolve target versions, dedupe legs, and set up the toolchain/cluster.
.claude/skills/kagent-dev/references/database-migrations.md Updates migration policy docs to reflect enforced upgrade/rollback testing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread go/core/test/e2e/upgrade/roundtrip_test.go
Comment thread go/core/test/e2e/upgrade/rolling_upgrade_test.go Outdated
Comment thread go/core/test/e2e/upgrade/upgrade_test.go Outdated
Comment thread go/core/test/e2e/upgrade/upgrade_test.go Outdated
Comment thread Makefile
EItanya and others added 2 commits June 26, 2026 11:40

@EItanya EItanya left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment about these tests is that they sidestep the actual kagent DB usage. They interact directly with the DB rather than testing how the system behaves. For example they pre-seed the DB with data by directly inserting it rather than performing kagent operations. How much more work would it be to access the data that way?

Comment on lines +15 to +20
buildx-builder-name:
description: Name for the docker/setup-buildx-action builder.
required: true
buildx-version:
description: Buildx version for docker/setup-buildx-action.
required: true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these required inputs? Can't they be static?

Comment thread Makefile
Comment on lines +563 to +564
$(MAKE) build
$(MAKE) install-previous-release

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these not dependency targets? calling make inside of make seems weird

func TestRollingUpgradeCompatibility(t *testing.T) {
if os.Getenv("RUN_ROLLING_UPGRADE_TESTS") != "true" {
t.Skip("set RUN_ROLLING_UPGRADE_TESTS=true to run rolling upgrade tests")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels weird, can't we have this in a separate package or something?

Comment thread Makefile
Comment on lines +578 to +579
$(MAKE) build
$(MAKE) install-previous-release UPGRADE_PREV_EXTRA_ARGS="--set controller.replicas=2"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about calling make

Comment thread Makefile

.PHONY: install-previous-release
install-previous-release: ## Install the previous released kagent + kagent-crds charts from the public OCI registry
test -n "$(UPGRADE_FROM_VERSION)" || { echo "UPGRADE_FROM_VERSION is empty; set it explicitly or ensure git tags are fetched." >&2; exit 1; }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is test?

Comment thread Makefile
Comment on lines +514 to +525
UPGRADE_FROM_VERSION ?= $(shell ./scripts/upgrade-from-version.sh)

# The bundled-Postgres image is selected by the install target's --set flags, not
# by the chart defaults (the chart ships a non-vector image). So the previous
# install must use the exact pins the upgrade-from release shipped — otherwise the
# baseline DB would differ from how that release actually runs, and the upgrade
# would conflate a DB swap with the migration change under test. Read those flags
# straight from that release's own helm-install-provider target (via its tagged
# Makefile) instead of hardcoding values that drift as the bundled image changes.
# Assumes the flags are literal (no make/env variables); the guard in
# install-previous-release fails loudly if they can't be read.
PREV_DB_SET_FLAGS = $(shell git show v$(UPGRADE_FROM_VERSION):Makefile 2>/dev/null | \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that these will be called every time make is used. Should we instead add this specifically to the targets that need it?

EItanya pushed a commit that referenced this pull request Jun 26, 2026
## Description

Adds a static check that a migration on this branch doesn't break the
queries a prior release still runs — the backward-compatibility property
that makes a one-step rollback (and a rolling deploy) safe.

It compiles a prior release's sqlc queries against the current schema
(derived from the migration files) and fails if a migration dropped,
renamed, or retyped a column or table an older query still references.
Runs in normal CI on every PR — no database, no cluster.

## Targets

Checks two prior versions (deduplicated when they coincide):

- the latest release reachable from HEAD — the adjacent previous
release, and
- the previous stable line's latest patch (the `release/vX.Y.x` tip) —
the rollback-window floor.

## Running

```
make -C go check-query-contraction
```

Override the targets with `TARGET_VERSIONS` (space-separated, no leading
`v`).

## Notes

- Verified it passes today and fails on a simulated column drop.
- Catches column/table/type-shape contractions; it does not catch data
rewrites (no DDL to inspect) or semantic changes a query still compiles
against.
- Complements the contraction-declaration check in #2067: that one
requires destructive DDL to be acknowledged; this one verifies a prior
release's queries actually survive the new schema.
- Also documents the existing `sqlc generate` gate (schema derived from
migrations) as the forward-direction equivalent for current-release
queries.
- This branch carries its own copies of the helper scripts, which are
shared with #2084, so that will
require a trivial merge conflict resolution.

---------

Signed-off-by: Jeremy Alvis <jeremy.alvis@solo.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add upgrade compatibility tests for database migrations

4 participants