Skip to content

feat: Git-backed model registry and automatic package installation #9

@profsergiocosta

Description

@profsergiocosta

feat: Git-backed model registry and automatic package installation

Overview

Currently the model registry reads TOML files from a local directory (./configs/models/), which works for development but does not support the full researcher workflow: publishing a model as a Python package and registering it via a pull request to dissmodel-configs.

This issue tracks the two pieces of infrastructure needed to close that gap.


Context

The intended researcher workflow is:

1. Develop model locally (coastal-flood/)
2. Test via CLI:
       python -m coastal_flood.flood_executor run --input data/grid.zip
3. Publish package to PyPI or GitHub
4. Open PR to dissmodel-configs adding flood_vector.toml with package field
5. PR merged → server syncs configs → worker installs package → job runs

Steps 1–2 already work. Steps 3–5 require the two features below.


Feature 1 — Git-backed registry sync

Current state

api_registry.py reads TOMLs from a bind-mounted local directory:

CONFIGS_PATH = Path("/configs")   # bind mount: ./configs:/configs

The configs-sync service and sync_configs() function exist but are commented out in docker-compose.yml and api_registry.py to allow local directory testing.

What needs to be done

1a. Re-enable configs-sync service in docker-compose.yml

configs-sync:
  image: alpine/git
  container_name: dissmodel-configs-sync
  restart: unless-stopped
  volumes:
    - configs-data:/configs
  environment:
    - CONFIGS_REPO=${CONFIGS_REPO}
    - SYNC_INTERVAL=${CONFIGS_SYNC_INTERVAL:-900}
  entrypoint: |
    sh -c '
      if [ ! -d /configs/.git ]; then
        git clone $$CONFIGS_REPO /configs
      fi
      while true; do
        git -C /configs pull --ff-only
        sleep $$SYNC_INTERVAL
      done
    '
  networks:
    - dissmodel-net

1b. Switch api and worker from bind mount to named volume

# api and worker services
volumes:
  - configs-data:/configs:ro   # was: ./configs:/configs:ro

1c. Uncomment sync_configs() scheduler in api_registry.py

@asynccontextmanager
async def lifespan(app: FastAPI):
    _scheduler = start_sync_scheduler(
        interval_seconds=int(os.getenv("CONFIGS_SYNC_INTERVAL", 900))
    )
    yield
    _scheduler.shutdown()

1d. Add CONFIGS_REPO to .env.example

CONFIGS_REPO=https://github.com/lambdageo/dissmodel-configs
CONFIGS_SYNC_INTERVAL=900

1e. Create dissmodel-configs repository

Public repository at github.com/lambdageo/dissmodel-configs with structure:

dissmodel-configs/
  models/
    flood_vector.toml
  catalog.yaml
  README.md

Acceptance criteria

  • docker compose up clones dissmodel-configs automatically on first run
  • POST /admin/sync triggers an immediate git pull
  • Editing a TOML and pushing to dissmodel-configs is reflected in the API within 15 minutes
  • GET /models lists models from the git-backed registry

Feature 2 — Automatic package installation in worker

Current state

The worker loads executors from services/worker/executors/ — classes that are bundled directly in the platform repository. There is no mechanism to install external packages declared in a TOML.

What needs to be done

2a. Add package field to TOML spec

# dissmodel-configs/models/flood_vector.toml

[model]
name    = "flood_vector"
class   = "flood_vector"
package = "coastal-flood==0.1.0"        # PyPI
# package = "git+https://github.com/lambdageo/coastal-flood@v0.1.0"  # GitHub
# package = "/opt/coastal-flood"         # local volume (development only)

[model.parameters]
taxa_elevacao = 0.5
end_time      = 10

2b. Implement _ensure_package() in runner.py

def _ensure_package(spec: dict) -> None:
    """
    Install the model package declared in the spec.

    Supports three URI schemes:
      PyPI:    "coastal-flood==0.1.0"
      GitHub:  "git+https://github.com/org/repo@tag"
      Local:   "/opt/coastal-flood"  (editable install — development only)
    """
    package = spec.get("model", {}).get("package")
    if not package:
        return

    if package.startswith("/"):
        # Local volume — editable install for development
        cmd = [sys.executable, "-m", "pip", "install",
               "-e", package, "--quiet"]
    else:
        # PyPI or GitHub
        cmd = [sys.executable, "-m", "pip", "install",
               package, "--quiet", "--no-cache-dir"]

    subprocess.check_call(cmd)


def run_experiment(record: ExperimentRecord) -> ExperimentRecord:
    _ensure_package(record.resolved_spec)   # ← add this line
    executor_cls = _resolve_executor(record)
    ...

2c. Ensure the researcher's package registers its executor on import

The package __init__.py must import the executor to trigger __init_subclass__ registration:

# coastal_flood/__init__.py
from coastal_flood.flood_executor import FloodVectorExecutor  # noqa: F401

2d. _resolve_executor() must import the package after installation

def _resolve_executor(record: ExperimentRecord):
    model_class = record.resolved_spec.get("model", {}).get("class")
    pkg_name    = record.resolved_spec.get("model", {}).get("package", "").split("==")[0]

    if pkg_name:
        # Import the package so __init_subclass__ fires and registers the executor
        importlib.import_module(pkg_name.replace("-", "_"))

    import worker.executors  # noqa: F401 — register built-in executors
    return ExecutorRegistry.get(model_class)

Acceptance criteria

  • TOML with package = "coastal-flood==0.1.0" causes worker to pip install before running
  • TOML with package = "git+https://..." installs from GitHub
  • TOML with package = "/opt/coastal-flood" installs in editable mode (dev workflow)
  • TOML without package field runs built-in executors unchanged (no regression)
  • Second job with same package skips installation (already installed)
  • Failed install sets record.status = "failed" with a clear error message

Files to change

File Change
docker-compose.yml Re-enable configs-sync; switch to named volume
services/worker/runner.py Add _ensure_package(), call before _resolve_executor()
services/api/api_registry.py Uncomment scheduler startup
.env.example Add CONFIGS_REPO, CONFIGS_SYNC_INTERVAL
configs/models/flood_vector.toml Add package field
dissmodel-configs repo Create with initial TOMLs

Out of scope

  • Package signature verification (post-MVP security)
  • Dependency conflict resolution between packages
  • Automatic uninstall of old package versions

Status

Feature 2 — Automatic package installation ✅ CONCLUÍDO

Implementado via abordagem subprocess (job_runner.py) em vez da
abordagem _resolve_executor proposta originalmente.

Motivo: pip install em runtime não atualiza sys.modules do processo
pai — subprocess isolado resolve o problema de forma limpa.

Pendente (pós-MVP):

  • Evitar reinstalação se versão já instalada for a mesma

Feature 1 — Git-backed registry ⏳ PENDENTE

Nenhuma alteração no escopo original.
Repositório: github.com/lambdageo/dissmodel-configs
Fluxo de PR para aceitar novos modelos permanece como descrito.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions