Skip to content

feat: auto-backup and migrate DuckLake metadata on version upgrade#346

Open
fuziontech wants to merge 3 commits intomainfrom
jh/ducklake-auto-migration
Open

feat: auto-backup and migrate DuckLake metadata on version upgrade#346
fuziontech wants to merge 3 commits intomainfrom
jh/ducklake-auto-migration

Conversation

@fuziontech
Copy link
Member

Summary

  • Before attaching DuckLake, checks the metadata store spec version (connects directly via pgx)
  • If version is older than expected (e.g. 0.30.4), dumps all ducklake_* tables to a SQL backup file, then attaches with AUTOMATIC_MIGRATION TRUE
  • Centralizes the ATTACH statement builder (was duplicated in server.go, checkpoint.go, querylog.go)
  • Backup failure blocks migration — no upgrade without a safety net

Context

DuckLake 0.4 (shipping with DuckDB 1.5.x) adds SET SORTED BY support which we need for optimizing Parquet row-group pruning on large tables. The 0.3→0.4 migration is irreversible (drops columns, restructures schema_versions), so we need a backup before upgrading.

Since pg_dump is not installed on duckgres instances, the backup is implemented entirely in Go — connects to the metadata PostgreSQL via pgx (already a dependency) and writes CREATE TABLE + INSERT statements.

Test plan

  • go build compiles cleanly
  • go test ./server/... passes
  • go vet clean
  • Test against a real DuckLake 0.3 instance to verify version detection and backup file contents
  • Test that AUTOMATIC_MIGRATION TRUE successfully upgrades to 0.4 (requires DuckDB 1.5.x driver upgrade in a follow-up)

🤖 Generated with Claude Code

fuziontech and others added 3 commits March 23, 2026 17:51
When DuckLake spec version is older than expected (e.g. 0.3 → 0.4),
automatically backup all ducklake_* metadata tables to a SQL file
before attaching with AUTOMATIC_MIGRATION TRUE. This ensures safe
rollback if migration fails.

- Add server/ducklake_migration.go with version detection, backup,
  and shared ATTACH statement builder
- Centralize ATTACH statement construction (was duplicated in 3 places)
- Backup is written to <dataDir>/ducklake-backup-<timestamp>-v<version>.sql
- Migration check runs once per process via sync.Once
- Backup failure blocks migration (fail-safe)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Quote all SQL identifiers with double quotes to handle reserved words
  (e.g. "key", "value" in ducklake_metadata table)
- Replace string comparison with numeric version parsing to handle
  versions like "0.10" correctly
- Move migration check before the DuckLake semaphore so backup doesn't
  block other connections for up to 10 minutes
- Add fsync before closing backup file for crash safety
- Fix double-close on backup file using closed flag
- Add comment explaining sync.Once is correct for multitenant mode
  (each worker process serves one tenant)
- Add comment on []byte assumption in formatSQLValue
- Log backup path at INFO before starting (not just after)
- Add unit tests for buildDuckLakeAttachStmt, formatSQLValue,
  quoteIdent, versionLessThan, and duckLakeMigrationNeeded

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Handle return values from rows.Close(), fmt.Fprintf(), fmt.Fprintln(),
and dataRows.Close() to satisfy the errcheck linter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant