Skip to content

Fix registry backfill with per-provider versions and Docker extraction#65223

Open
kaxil wants to merge 2 commits intoapache:mainfrom
astronomer:fix-registry-backfill-per-provider-docker
Open

Fix registry backfill with per-provider versions and Docker extraction#65223
kaxil wants to merge 2 commits intoapache:mainfrom
astronomer:fix-registry-backfill-per-provider-docker

Conversation

@kaxil
Copy link
Copy Markdown
Member

@kaxil kaxil commented Apr 14, 2026

The registry backfill workflow had two problems preventing it from backfilling intermediate provider versions:

  1. Flat version list applied to all providers: Input like providers="amazon google" versions="9.24.0 21.0.0" tried all versions against all providers, failing because amazon doesn't have 21.0.0 and google doesn't have 9.24.0.

  2. Missing system dependencies: Extraction ran uv run --with on a bare GitHub runner, which lacks C libraries (krb5-dev, libxml2-dev) needed by providers like amazon and google.

Changes

Workflow (registry-backfill.yml):

  • Replace separate providers + versions inputs with provider-versions accepting provider/version pairs (e.g. amazon/9.24.0 google/21.0.0 amazon/9.23.0)
  • jq-based matrix builder groups pairs by provider, so each job gets only its relevant versions
  • Add build-ci-image job and prepare_breeze_and_image step (same pattern as registry-build.yml)

Breeze command (registry_commands.py):

  • Add _backfill_docker() that runs extraction inside the Breeze CI container via pip install + execute_command_in_shell
  • Refactor existing host-based extraction into _backfill_uv() as --no-docker fallback
  • Default is Docker; local dev can use --no-docker for faster iteration

Example usage

# Backfill multiple providers with different versions
gh workflow run "Registry Backfill" \
  -f destination=live \
  -f provider-versions="amazon/9.24.0 google/21.0.0 celery/3.17.2"

This creates parallel jobs: one for amazon (9.24.0), one for google (21.0.0), one for celery (3.17.2). Multiple versions per provider are grouped into a single job.


rebuild_or_pull_ci_image_if_needed(command_params=shell_params)

# Place isolated providers.json under dev/registry/ so it's visible inside the container
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "files" not dev most likely.

Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are not blocking - but I have a feeling things can be quite simplified with leveraging uv

Chain both extraction scripts in a single uv run invocation to avoid
creating two ephemeral environments per version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:registry backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

2 participants