Skip to content

feat(skills): add databricks-lakebase-migration skill#513

Open
dgokeeffe wants to merge 1 commit intodatabricks-solutions:mainfrom
dgokeeffe:feature/databricks-lakebase-migration
Open

feat(skills): add databricks-lakebase-migration skill#513
dgokeeffe wants to merge 1 commit intodatabricks-solutions:mainfrom
dgokeeffe:feature/databricks-lakebase-migration

Conversation

@dgokeeffe
Copy link
Copy Markdown

Summary

Adds a new skill databricks-lakebase-migration that captures the
Provisioned → Autoscaling migration mechanics. Direct in-place migration
isn't supported as of 2026-05; the sanctioned path is pg_dump /
pg_restore, but the docs leave out several Lakebase-specific gotchas
that I hit while migrating a real workspace app this week. This skill
codifies them.

What's in the skill

  • End-to-end runbook for the snapshot migration path (the only
    one that works today): pg_dump -Fc -n <schema> --no-owner --no-acl
    → bootstrap destination → pg_restore --role=<sp-uuid> → re-point
    the app's database resource.
  • Five gotchas that are not in the public docs:
    1. Raw CREATE DATABASE skips databricks_auth and neon extension
      installation; app SPs then fail OAuth with password authentication failed. Recovery is CREATE EXTENSION IF NOT EXISTS ….
    2. App SPs need databricks_create_role(<sp-uuid>, 'SERVICE_PRINCIPAL') — vanilla CREATE ROLE produces an
      OAuth-unresolvable role.
    3. pg_restore --role=<sp-uuid> is the trick that puts the SP in
      ownership of restored tables, so the app's startup migrations can
      later run DDL.
    4. databricks bundle deploy can't change database.instance_name
      on an existing app (update mask rejected); workaround is direct
      apps update --json with the full resources array.
    5. UC sync pipelines do not auto-follow a re-pointed app — synced
      tables are a frozen snapshot until you re-wire the pipeline.
  • Common-issues table mapping every error message I saw during the
    migration to its root cause and fix.
  • Capacity mapping reference between Provisioned CU_1/2/4/8 and
    Autoscaling min/max CU ranges.

Why now

databricks-lakebase-autoscale/SKILL.md lists "Direct migration from
Lakebase Provisioned" as a current limitation but doesn't link to a
recipe. This PR fills that gap so the next field engineer doing this
doesn't burn an afternoon discovering the extension and role-registration
gotchas independently.

Other changes

  • databricks-skills/README.md: adds the new skill, plus
    databricks-lakebase-autoscale (the existing skill was already in the
    repo but missing from the README skills list).
  • databricks-skills/install_skills.sh: registers the new skill in
    DATABRICKS_SKILLS and get_skill_description. No extra files
    beyond SKILL.md, so no get_skill_extra_files entry needed.

Testing

The runbook in this skill was validated end-to-end against the
lakemeter workspace app on AWS:

  • Source: lakemeter-db (Provisioned, CU_1) / lakemeter_v2 database, ~99 MB
  • Destination: lakemeter-v2-autoscale (Autoscaling, CU_1 → 4-8 CU range) / lakemeter_v2 database
  • Outcome: App auth works, all 26 tables restored with correct
    ownership, 111k VM-pricing rows verified, downtime ~10 min, total
    effort ~30 min once the gotchas were known.

The five gotchas were not predicted from docs — each surfaced as a
specific failure during the run. The skill captures both the symptom
and the fix.

Test plan

  • Runbook executed end-to-end against a real Provisioned + Autoscaling instance pair
  • Every gotcha in the skill maps to a real error message I observed in app logs / psql output
  • install_skills.sh shellcheck-clean (no new syntax)
  • databricks-skills/README.md skill list ordering preserved (alphabetical within section)

This pull request and its description were written by Isaac.

Captures the Provisioned → Autoscaling migration mechanics that aren't in
the public docs as of 2026-05. Covers:

- pg_dump/pg_restore workflow with the --role flag for SP ownership
- Why raw CREATE DATABASE breaks app SP OAuth (missing databricks_auth
  + neon extensions) and how to recover
- databricks_create_role(<sp-uuid>, 'SERVICE_PRINCIPAL') as the proper
  way to register an app SP for OAuth-token resolution
- The Apps API update-mask limitation when re-pointing database
  resources via bundle deploy, and the direct apps update --json
  workaround
- Synced-table snapshot semantics (UC sync pipelines do not auto-follow)
- Step-by-step runbook plus a common-issues table covering every
  failure mode hit during a real lakemeter migration

Also updates databricks-skills/README.md to list the existing
databricks-lakebase-autoscale skill (was missing) and the new migration
skill, and adds them to install_skills.sh.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant