Skip to content

Foss4geu helmchart#27

Merged
lhoupert merged 25 commits into
foss4geufrom
foss4geu-helmchart
Jul 2, 2026
Merged

Foss4geu helmchart#27
lhoupert merged 25 commits into
foss4geufrom
foss4geu-helmchart

Conversation

@lhoupert

@lhoupert lhoupert commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

No description provided.

lhoupert and others added 24 commits June 30, 2026 17:20
Adds charts/eoapi-workshop/, a thin umbrella over the upstream eoapi chart
(0.13.1) that deploys only the workshop's compose-aligned components — pgstac
database, stac, raster, vector, stac-browser, and stac-auth-proxy + mock-oidc —
with no observability/monitoring stack (multidim, docServer, eoapi-notifier,
knative, prometheus/grafana/metrics-server, and autoscaling are all disabled).

Why an umbrella rather than a values profile: it is self-contained and
versioned in this repo, installable with one `helm install`, and pins the
upstream chart version via Chart.lock.

Auth mirrors docker-compose: stac-auth-proxy with DEFAULT_PUBLIC=true (public
reads, protected writes), fronted by a mock OIDC server. Discovery is split into
an external OIDC_DISCOVERY_URL (matching the token issuer) and an in-cluster
OIDC_DISCOVERY_INTERNAL_URL for JWKS fetching, so token validation works from a
pod. The internal URL is pinned to the `eoapi-mock-oidc-server.eoapi` Service
DNS, so the release name and namespace must both be `eoapi`.

README documents the deployment procedure (ingress-nginx + Crunchy PGO
prerequisites), a 5-step auth test flow, and the known /vector empty-data gap
(loadSamples seeds STAC items, not the compose features-loader ecoregions layer).

Verified via helm lint + helm template rendering (enabled components present,
disabled ones absent, auth env vars resolve correctly). Not yet run on a live
cluster.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fix three issues that blocked a working deploy of the umbrella chart on a
real (non-localhost) Kubernetes cluster, and add reproducible tooling.

Chart fixes (values.yaml + new templates/passthrough-ingress.yaml):
- Set stac-auth-proxy UPSTREAM_URL to the in-cluster STAC Service
  (http://eoapi-stac:8080). The upstream chart leaves it at the
  unreachable localhost:8080 default, so the proxy crash-loops.
- Route /stac and /browser through a dedicated, rewrite-free Ingress.
  The upstream chart's global rewrite-target: /$2 strips the prefix that
  both stac-auth-proxy (ROOT_PATH=/stac) and the prefixed stac-browser
  image require, returning 404. Disable their upstream ingress paths and
  serve them via passthrough-ingress.yaml instead.
- Treat the external host as a deploy-time input: localhost is now only a
  documented local-dev fallback; nothing cluster-specific is committed.

Tooling:
- deploy.sh: idempotent, reproducible deploy/verify/teardown. Installs
  prerequisites (ingress-nginx + PGO), auto-discovers the ingress
  LoadBalancer host (<IP>.nip.io) or honors $INGRESS_HOST, generates
  gitignored overrides, installs, waits for rollout, and verifies the
  endpoints plus the auth flow.
- .gitignore / .helmignore: keep generated overrides and tooling out of
  git and out of the packaged chart.

Also relocate the chart from charts/ to infrastructure/charts/.

Verified by three full teardown+rebuild cycles (each got a different LB
IP, handled automatically); all services return 200 and writes are
auth-enforced (401 without a token, 201 with one).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a workflow that builds Dockerfile.local (the workshop conda env +
baked-in docs notebooks) and pushes it to
ghcr.io/<owner>/eoapi-workshop:{latest,sha,branch}. The Helm chart's
per-participant Labs consume this image (values key `jupyter.image`);
until now the image was only built locally by docker-compose, so a
K8s deployment had nothing to pull.

Image name is derived from github.repository_owner so it stays correct
in forks. Runs on push to main (Dockerfile/env/docs changes) and via
manual workflow_dispatch (to build from the workshop branch).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds templates/jupyter.yaml: one isolated JupyterLab Deployment+Service+PVC
per entry in values `jupyter.participants` (default 5), served at /lab/<name>
through the existing no-rewrite passthrough ingress (extended here). Each pod:

- runs the published GHCR workshop image via `args` only, preserving the
  image ENTRYPOINT that activates the conda env;
- gets its own RWO PVC at /home/jovyan, seeded once from the image by an
  initContainer so the empty volume doesn't shadow the baked-in notebooks;
- uses strategy Recreate (RWO can't roll) and TCP probes (JupyterLab exposes
  no unauthenticated HTTP health endpoint), with a generous startupProbe for
  the large first image pull;
- injects the same eoAPI endpoints as the docker-compose jupyterhub service
  (in-cluster Service DNS) plus DB creds from the PGO secret
  eoapi-pguser-eoapi using the DIRECT primary keys (not pgbouncer-*, whose
  transaction pooling breaks 02-database's DDL/COPY).

Tokens are injected per participant by deploy.sh (next commit); a bare
helm install leaves them empty. Adds tests/render-checks.sh as the chart's
render-level regression suite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Declares the published stac-manager chart (1.0.3, https://stac-manager.ds.io/)
as a dependency and wires it to the workshop STAC API + mock OIDC:
- fullnameOverride pins the Service to `eoapi-stac-manager` so the passthrough
  ingress can reference it by a stable name;
- stacApi/publicUrl/stacBrowser/oidc default to the localhost paths (deploy.sh
  rewrites the host at deploy time);
- resources raised (1-4Gi) because the container runs a build at startup;
- /manager added to the passthrough ingress (prefix preserved, PUBLIC_URL=/manager).

Also fixes a false-negative in tests/render-checks.sh: with `set -o pipefail`,
`printf | grep -q` reports failure on large inputs because grep short-circuits
and printf takes SIGPIPE — switched the check helpers to here-strings.

NOTE: stac-manager OIDC login uses PKCE (needs a secure context), so editing
works on localhost but requires TLS on a remote http host.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Switches the workshop stack from path-based to host-based routing: every
service is served at the ROOT of its own subdomain under a wildcard domain
(requires *.<baseDomain> DNS → ingress LB).

- new templates/subdomain-ingress.yaml owns all routing; removes the
  path-based templates/passthrough-ingress.yaml and disables the upstream
  eoapi ingress (eoapi.ingress.enabled=false);
- new top-level `routing` values block (baseDomain, className, tls);
- stac/raster/vector served at root (ingress.path="" → --root-path=);
- stac-auth-proxy ROOT_PATH="" with /healthz probes;
- browser swapped to the root-serving radiantearth/stac-browser image
  (the upstream custom image bakes a /browser prefix that breaks at a
  subdomain root) and pointed at stac.<baseDomain>;
- mock-oidc ISSUER + all cross-service URLs (proxy discovery, browser,
  stac-manager stacApi/publicUrl/stacBrowser/oidc) moved to subdomains;
- JupyterLabs drop --ServerApp.base_url and serve at lab-NN.<baseDomain> root.

Hosts: stac. raster. vector. browser. manager. mock-oidc. lab-01..05.
Tests updated to assert the subdomain hosts/backends and absence of rewrites.

Known limit: browser OIDC redirect_uri still derives from the apex host; and
OIDC login (PKCE) needs TLS. Read/browse works over http.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
deploy.sh now targets the subdomain model:
- BASE_DOMAIN (default eoapi-workshop.ds.io) drives a generated
  .deploy/overrides.yaml holding every per-subdomain URL (browser catalog,
  proxy/browser OIDC discovery, mock ISSUER, stac-manager stacApi/publicUrl/
  stacBrowser/oidc) plus a per-participant JupyterLab token;
- tokens are reused from the existing overrides on re-deploy, so participant
  URLs stay stable (idempotent);
- participant names are read from the rendered chart (single source = values);
- rollout wait now covers the Labs + stac-manager; verify() curls each service
  subdomain (stac/raster/vector/browser/manager/mock-oidc), runs the 401-vs-token
  auth check on stac.<domain>, and prints the participant URLs;
- new `overrides` and `urls` subcommands; dropped the obsolete nip.io host
  discovery (a fixed wildcard domain replaces it).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the path-based/localhost/nip.io docs with the current architecture:
subdomain-per-service (*.eoapi-workshop.ds.io) with the wildcard-DNS and
release-name contracts up top, the deploy.sh workflow (BASE_DOMAIN, tokens,
urls/overrides subcommands), a routing-model section, per-subdomain verify +
auth-test commands, a Participant JupyterLabs section, and updated known
limitations (vector empty, TLS-for-editing, browser OIDC, capacity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The workshop Lab image needs publishing before the branch merges to main.
Add foss4geu-helmchart to the publish workflow's push trigger so CI builds
and pushes ghcr.io/<owner>/eoapi-workshop from this branch. Editing the
workflow file matches its own path filter, so this push triggers the build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… URL

Live testing showed authenticated requests failing with 401 /
PyJWKClientConnectionError. Root cause: stac-auth-proxy builds its JWKS client
from OIDC_DISCOVERY_URL and rewrites the jwks_uri to that same origin, then
fetches JWKS from it. With OIDC_DISCOVERY_URL set to the external LB subdomain,
the pod tried to fetch JWKS over a LoadBalancer hairpin and failed. The proxy
does NOT validate the token issuer, so the mock's external ISSUER need not match.

Fix: set OIDC_DISCOVERY_URL to the in-cluster Service DNS. deploy.sh no longer
overrides it per-domain. Verified live: POST without token -> 401, with a minted
token -> 400 (past the auth gate). Added a render-check for the in-cluster URL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Non-default branches previously only got branch/sha tags, but the chart
references :latest. Tag :latest unconditionally so the workshop image is
pullable at the tag the chart expects while it's built from the workshop branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cp -a tries to preserve timestamps on the PVC mount root /home/jovyan, which
is owned by root, so it fails with 'Operation not permitted' and the seed
initContainer exits 1 (Init:Error/CrashLoopBackOff). The files do copy; only
the attribute-preservation on the root-owned mount point fails. cp -R copies
the contents (owned by the runtime user) without touching the mount root.
Verified live: all 5 Labs reach 1/1 Running with notebooks seeded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The seed-once design (PVC over /home/jovyan, .seeded marker) meant new
notebooks added to the image never reached already-provisioned Labs, e.g.
06-stac_transactions_auth.ipynb was invisible after it landed. Switch to:
notebooks come from the image at /home/jovyan/docs on every start (fresh, so
updates always appear); only /home/jovyan/work is a persistent PVC. Removes
the seed initContainer and sets imagePullPolicy: Always so :latest rebuilds
are picked up. Verified live: all 5 Labs Running with notebooks 00-06.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… vendored tgz

- jupyter.yaml/values.yaml: fix stale comments that still described the old
  path-based /lab routing + "persistent home" (it's subdomain-root serving with
  only /home/jovyan/work persisted; notebooks come fresh from the image).
- deploy.sh: add optional imagePullSecret setup (GHCR_USER/GHCR_TOKEN) attached
  to the default ServiceAccount before the Labs start, so a fresh deploy of the
  PRIVATE image isn't stuck in ImagePullBackOff; and register the dependency
  repos before `helm dependency build` so it resolves Chart.lock on a fresh
  machine (previously failed with "no repository definition").
- .gitignore: ignore the vendored charts/*.tgz (rebuilt by deploy.sh) instead of
  accidentally committing them; Chart.lock stays tracked. Untracked the two tgz.
- README: document the GHCR_TOKEN pull-secret deploy path.

Verified: removing charts/*.tgz then `helm dependency build` rebuilds them;
lint + render-checks pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Best-practices pass (/ai-engineering: test at every boundary, own the change):
- ci.yml: add a `helm` job that registers the dependency repos, builds
  dependencies, runs `helm lint`, and executes tests/render-checks.sh — so the
  chart's render invariants (subdomain hosts/backends, in-cluster OIDC URL, PG
  direct-primary keys, no base_url) are enforced automatically on every PR.
- templates/NOTES.txt: after install/upgrade, print each service subdomain and
  the per-participant JupyterLab URLs (with tokens when deployed via deploy.sh),
  plus a pointer to `./deploy.sh verify`.

Verified: NOTES renders correctly; helm lint + render-checks pass; ci.yml parses.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove duplication (wildcard DNS, TLS-for-editing, and test-only-auth each
appeared in 2-3 places) and tighten prose: merge the read-first contracts,
condense the routing/verify/JupyterLabs sections, and trim the auth-test
walkthrough to the essential steps. No facts dropped.
…schema)

Running the notebooks in a Lab surfaced two cluster-only failures (compose was
fine). Both fixed with chart/data changes only — no notebook edits, so compose
is untouched:

- 03: `pgstacBootstrap.loadSamples` is now false. The upstream sample collection
  `noaa-emergency-response` is stored without a STAC `type` field, which breaks
  pystac_client's get_all_collections(). The notebooks create their own STAC
  data, and compose ships no STAC sample loader either.
- 05: add templates/features-loader-job.yaml — a post-install/upgrade hook Job
  (k8s equivalent of the compose features-loader) that loads the NA CEC Level III
  Ecoregions shapefile into features.ecoregions (idempotent, superuser secret,
  grants read to all). Also set tipg TIPG_DB_SCHEMAS=["features","public"] so it
  serves the layer.

04 needed no change: titiler-pgstac 3.0.0's /searches/register returns `id`
(what the notebook reads) and collections carry `extent`; its only failure was
the empty username widget under headless execution.

Verified live: applied to the cluster and re-ran 03/04/05 headless → all PASS
(0 errors); features.ecoregions is served by tipg. render-checks + lint pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In-cluster, the Lab pod and the participant's browser need different URLs:
the pod can only reach in-cluster Service DNS, the browser only the external
subdomains — so display cells built from the server-side *_ENDPOINT vars
handed the browser unreachable addresses.

Inject STAC_API_BROWSER_URL / TITILER_BROWSER_URL / TIPG_BROWSER_URL
(derived from routing.baseDomain) into the Lab pods and use them in the
IFrame/viewer cells only, falling back to the previous .replace() behaviour
when unset (docker-compose, 2i2c). Server-side httpx cells are untouched.
Notebook 02 now prefers the stack's own STAC Browser (native #/collections
route — both compose and the chart point SB_catalogUrl at this API), avoiding
the mixed-content block of the public https browser over an http API.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The chart (via eoapi-k8s 6.3.1) deploys titiler-pgstac 3.0.0 / tipg 1.4.0 /
stac-fastapi-pgstac 6.2.2, but compose pinned 1.9.0 / 1.1.2 / 6.0.2 — and the
notebooks bake in routes that changed between those releases, so they could
only be correct in one environment. Align compose to the chart's versions and
update the notebooks to the current routes:

- /collections|searches/…/{tms}/map  →  …/map.html   (titiler-pgstac 3.x)
- /collections/…/tiles/{tms}/viewer  →  …/map.html   (tipg 1.4)
- NDVI map cell: expression now requires explicit assets= (titiler 3.x)

Enable TITILER_PGSTAC_API_ENABLE_EXTERNAL_DATASET_ENDPOINTS in the chart
(compose already had it) for notebook 04 §4.4, and add a lockstep render
check that asserts the chart runs the exact image:tag pinned in
docker-compose.yml so the versions cannot drift apart again.
Also run stac-manager under amd64 emulation (no arm64 manifest published).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Notebook 04 §4.5 renders glad-global-forest-change-1.11, but only the 2i2c
deploy workflow loaded it — in compose and on Kubernetes the collection was
missing (404). Mirror that deploy step in both environments:

- chart: add a stac-loader container to the features-loader Job (idempotent,
  pypgstac upsert from stac.maap-project.org, pinned to the deployed pgstac
  version 0.9.10)
- compose: add an equivalent one-shot stac-loader service (script shared via
  a compose config), and bump the pgstac image v0.9.8 → v0.9.10 to match
- both: set AWS_NO_SIGN_REQUEST=YES on the raster service — the glad assets
  are s3:// URIs in a public bucket and GDAL refuses unsigned reads otherwise

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The username widget defaulted to None, so notebook 04 only worked if a human
retyped the exact username used in notebook 02 — and headless runs always
failed (collection "None-sentinel-2-c1-l2a" → KeyError on 'extent').
Prefill the widget with the username of the most recent *-sentinel-2-c1-l2a
collection in the STAC catalog; participants can still overwrite it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Same failure mode as notebook 04 (431d835): the username widget defaulted to
None and notebook 02 generates a random Haikunator username, so the collection
search always came back empty — and silently, since the cell displayed nothing
on no match. Re-running the widget cell also wiped anything the user had typed.

Prefill the widget with the username of the most recent *-sentinel-2-c1-l2a
collection in the STAC catalog, and print the searched value plus available
collection ids when the filter matches nothing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ub container

Running the notebooks in the compose jupyterhub service (rather than a
host-side Jupyter) exposed three issues:

- titiler-pgstac 3.x dropped the POSTGRES_* settings for PG* / DATABASE_URL;
  compose still passed the old names, so the service crashed at startup
  (quote_from_bytes(None)). Rename the env vars.
- notebook 04 rewrote its server-side endpoint to localhost at definition
  time, breaking every in-container httpx call. Keep the server endpoint
  as-is and apply the localhost rewrite only in the browser-URL fallback.
- notebook 04 cells 21/23 used httpx's 5s default timeout for remote-COG
  info/preview — too short under emulation or on slow networks. Use
  timeout=None like the sibling cells.

Verified: notebooks 02-05 execute with 0 errors inside the compose jupyterhub
container (03's catalog-prefill from the previous commit supplies a coherent
username headless); all rendered IFrame URLs use the host-reachable localhost
ports.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Run the repo-pinned ruff (0.11.5) over the notebooks so CI's
'ruff format --check' passes. No behavioural change — notebooks 02-05
re-verified green in the compose jupyterhub container after formatting.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@lhoupert lhoupert marked this pull request as ready for review July 2, 2026 07:52
@lhoupert lhoupert changed the base branch from main to foss4geu July 2, 2026 07:54
@lhoupert lhoupert merged commit 6b93fa8 into foss4geu Jul 2, 2026
3 checks passed
@lhoupert lhoupert deleted the foss4geu-helmchart branch July 2, 2026 07:54
pantierra pushed a commit that referenced this pull request Jul 2, 2026
* feat(helm): add minimal docker-compose-aligned eoapi umbrella chart

Adds charts/eoapi-workshop/, a thin umbrella over the upstream eoapi chart
(0.13.1) that deploys only the workshop's compose-aligned components — pgstac
database, stac, raster, vector, stac-browser, and stac-auth-proxy + mock-oidc —
with no observability/monitoring stack (multidim, docServer, eoapi-notifier,
knative, prometheus/grafana/metrics-server, and autoscaling are all disabled).

Why an umbrella rather than a values profile: it is self-contained and
versioned in this repo, installable with one `helm install`, and pins the
upstream chart version via Chart.lock.

Auth mirrors docker-compose: stac-auth-proxy with DEFAULT_PUBLIC=true (public
reads, protected writes), fronted by a mock OIDC server. Discovery is split into
an external OIDC_DISCOVERY_URL (matching the token issuer) and an in-cluster
OIDC_DISCOVERY_INTERNAL_URL for JWKS fetching, so token validation works from a
pod. The internal URL is pinned to the `eoapi-mock-oidc-server.eoapi` Service
DNS, so the release name and namespace must both be `eoapi`.

README documents the deployment procedure (ingress-nginx + Crunchy PGO
prerequisites), a 5-step auth test flow, and the known /vector empty-data gap
(loadSamples seeds STAC items, not the compose features-loader ecoregions layer).

Verified via helm lint + helm template rendering (enabled components present,
disabled ones absent, auth env vars resolve correctly). Not yet run on a live
cluster.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(helm): make eoapi-workshop chart deployable on remote clusters

Fix three issues that blocked a working deploy of the umbrella chart on a
real (non-localhost) Kubernetes cluster, and add reproducible tooling.

Chart fixes (values.yaml + new templates/passthrough-ingress.yaml):
- Set stac-auth-proxy UPSTREAM_URL to the in-cluster STAC Service
  (http://eoapi-stac:8080). The upstream chart leaves it at the
  unreachable localhost:8080 default, so the proxy crash-loops.
- Route /stac and /browser through a dedicated, rewrite-free Ingress.
  The upstream chart's global rewrite-target: /$2 strips the prefix that
  both stac-auth-proxy (ROOT_PATH=/stac) and the prefixed stac-browser
  image require, returning 404. Disable their upstream ingress paths and
  serve them via passthrough-ingress.yaml instead.
- Treat the external host as a deploy-time input: localhost is now only a
  documented local-dev fallback; nothing cluster-specific is committed.

Tooling:
- deploy.sh: idempotent, reproducible deploy/verify/teardown. Installs
  prerequisites (ingress-nginx + PGO), auto-discovers the ingress
  LoadBalancer host (<IP>.nip.io) or honors $INGRESS_HOST, generates
  gitignored overrides, installs, waits for rollout, and verifies the
  endpoints plus the auth flow.
- .gitignore / .helmignore: keep generated overrides and tooling out of
  git and out of the packaged chart.

Also relocate the chart from charts/ to infrastructure/charts/.

Verified by three full teardown+rebuild cycles (each got a different LB
IP, handled automatically); all services return 200 and writes are
auth-enforced (401 without a token, 201 with one).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ci(workshop): publish JupyterLab image to GHCR

Adds a workflow that builds Dockerfile.local (the workshop conda env +
baked-in docs notebooks) and pushes it to
ghcr.io/<owner>/eoapi-workshop:{latest,sha,branch}. The Helm chart's
per-participant Labs consume this image (values key `jupyter.image`);
until now the image was only built locally by docker-compose, so a
K8s deployment had nothing to pull.

Image name is derived from github.repository_owner so it stays correct
in forks. Runs on push to main (Dockerfile/env/docs changes) and via
manual workflow_dispatch (to build from the workshop branch).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(helm): per-participant JupyterLab environments

Adds templates/jupyter.yaml: one isolated JupyterLab Deployment+Service+PVC
per entry in values `jupyter.participants` (default 5), served at /lab/<name>
through the existing no-rewrite passthrough ingress (extended here). Each pod:

- runs the published GHCR workshop image via `args` only, preserving the
  image ENTRYPOINT that activates the conda env;
- gets its own RWO PVC at /home/jovyan, seeded once from the image by an
  initContainer so the empty volume doesn't shadow the baked-in notebooks;
- uses strategy Recreate (RWO can't roll) and TCP probes (JupyterLab exposes
  no unauthenticated HTTP health endpoint), with a generous startupProbe for
  the large first image pull;
- injects the same eoAPI endpoints as the docker-compose jupyterhub service
  (in-cluster Service DNS) plus DB creds from the PGO secret
  eoapi-pguser-eoapi using the DIRECT primary keys (not pgbouncer-*, whose
  transaction pooling breaks 02-database's DDL/COPY).

Tokens are injected per participant by deploy.sh (next commit); a bare
helm install leaves them empty. Adds tests/render-checks.sh as the chart's
render-level regression suite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(helm): add stac-manager UI (routed at /manager)

Declares the published stac-manager chart (1.0.3, https://stac-manager.ds.io/)
as a dependency and wires it to the workshop STAC API + mock OIDC:
- fullnameOverride pins the Service to `eoapi-stac-manager` so the passthrough
  ingress can reference it by a stable name;
- stacApi/publicUrl/stacBrowser/oidc default to the localhost paths (deploy.sh
  rewrites the host at deploy time);
- resources raised (1-4Gi) because the container runs a build at startup;
- /manager added to the passthrough ingress (prefix preserved, PUBLIC_URL=/manager).

Also fixes a false-negative in tests/render-checks.sh: with `set -o pipefail`,
`printf | grep -q` reports failure on large inputs because grep short-circuits
and printf takes SIGPIPE — switched the check helpers to here-strings.

NOTE: stac-manager OIDC login uses PKCE (needs a secure context), so editing
works on localhost but requires TLS on a remote http host.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(helm): subdomain-per-service routing (*.eoapi-workshop.ds.io)

Switches the workshop stack from path-based to host-based routing: every
service is served at the ROOT of its own subdomain under a wildcard domain
(requires *.<baseDomain> DNS → ingress LB).

- new templates/subdomain-ingress.yaml owns all routing; removes the
  path-based templates/passthrough-ingress.yaml and disables the upstream
  eoapi ingress (eoapi.ingress.enabled=false);
- new top-level `routing` values block (baseDomain, className, tls);
- stac/raster/vector served at root (ingress.path="" → --root-path=);
- stac-auth-proxy ROOT_PATH="" with /healthz probes;
- browser swapped to the root-serving radiantearth/stac-browser image
  (the upstream custom image bakes a /browser prefix that breaks at a
  subdomain root) and pointed at stac.<baseDomain>;
- mock-oidc ISSUER + all cross-service URLs (proxy discovery, browser,
  stac-manager stacApi/publicUrl/stacBrowser/oidc) moved to subdomains;
- JupyterLabs drop --ServerApp.base_url and serve at lab-NN.<baseDomain> root.

Hosts: stac. raster. vector. browser. manager. mock-oidc. lab-01..05.
Tests updated to assert the subdomain hosts/backends and absence of rewrites.

Known limit: browser OIDC redirect_uri still derives from the apex host; and
OIDC login (PKCE) needs TLS. Read/browse works over http.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(helm): rework deploy.sh for subdomain routing + participant tokens

deploy.sh now targets the subdomain model:
- BASE_DOMAIN (default eoapi-workshop.ds.io) drives a generated
  .deploy/overrides.yaml holding every per-subdomain URL (browser catalog,
  proxy/browser OIDC discovery, mock ISSUER, stac-manager stacApi/publicUrl/
  stacBrowser/oidc) plus a per-participant JupyterLab token;
- tokens are reused from the existing overrides on re-deploy, so participant
  URLs stay stable (idempotent);
- participant names are read from the rendered chart (single source = values);
- rollout wait now covers the Labs + stac-manager; verify() curls each service
  subdomain (stac/raster/vector/browser/manager/mock-oidc), runs the 401-vs-token
  auth check on stac.<domain>, and prints the participant URLs;
- new `overrides` and `urls` subcommands; dropped the obsolete nip.io host
  discovery (a fixed wildcard domain replaces it).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(helm): rewrite README for subdomain routing + Labs + stac-manager

Replaces the path-based/localhost/nip.io docs with the current architecture:
subdomain-per-service (*.eoapi-workshop.ds.io) with the wildcard-DNS and
release-name contracts up top, the deploy.sh workflow (BASE_DOMAIN, tokens,
urls/overrides subcommands), a routing-model section, per-subdomain verify +
auth-test commands, a Participant JupyterLabs section, and updated known
limitations (vector empty, TLS-for-editing, browser OIDC, capacity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(workshop): also build the image on the workshop branch

The workshop Lab image needs publishing before the branch merges to main.
Add foss4geu-helmchart to the publish workflow's push trigger so CI builds
and pushes ghcr.io/<owner>/eoapi-workshop from this branch. Editing the
workflow file matches its own path filter, so this push triggers the build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(helm): point stac-auth-proxy OIDC_DISCOVERY_URL at the in-cluster URL

Live testing showed authenticated requests failing with 401 /
PyJWKClientConnectionError. Root cause: stac-auth-proxy builds its JWKS client
from OIDC_DISCOVERY_URL and rewrites the jwks_uri to that same origin, then
fetches JWKS from it. With OIDC_DISCOVERY_URL set to the external LB subdomain,
the pod tried to fetch JWKS over a LoadBalancer hairpin and failed. The proxy
does NOT validate the token issuer, so the mock's external ISSUER need not match.

Fix: set OIDC_DISCOVERY_URL to the in-cluster Service DNS. deploy.sh no longer
overrides it per-domain. Verified live: POST without token -> 401, with a minted
token -> 400 (past the auth gate). Added a render-check for the in-cluster URL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(workshop): tag the image :latest on every build

Non-default branches previously only got branch/sha tags, but the chart
references :latest. Tag :latest unconditionally so the workshop image is
pullable at the tag the chart expects while it's built from the workshop branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(helm): seed Lab home with cp -R (not cp -a)

cp -a tries to preserve timestamps on the PVC mount root /home/jovyan, which
is owned by root, so it fails with 'Operation not permitted' and the seed
initContainer exits 1 (Init:Error/CrashLoopBackOff). The files do copy; only
the attribute-preservation on the root-owned mount point fails. cp -R copies
the contents (owned by the runtime user) without touching the mount root.
Verified live: all 5 Labs reach 1/1 Running with notebooks seeded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(helm): serve Lab notebooks fresh from the image (persist only work/)

The seed-once design (PVC over /home/jovyan, .seeded marker) meant new
notebooks added to the image never reached already-provisioned Labs, e.g.
06-stac_transactions_auth.ipynb was invisible after it landed. Switch to:
notebooks come from the image at /home/jovyan/docs on every start (fresh, so
updates always appear); only /home/jovyan/work is a persistent PVC. Removes
the seed initContainer and sets imagePullPolicy: Always so :latest rebuilds
are picked up. Verified live: all 5 Labs Running with notebooks 00-06.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(helm): address review — stale comments, pull-secret, dep repos, vendored tgz

- jupyter.yaml/values.yaml: fix stale comments that still described the old
  path-based /lab routing + "persistent home" (it's subdomain-root serving with
  only /home/jovyan/work persisted; notebooks come fresh from the image).
- deploy.sh: add optional imagePullSecret setup (GHCR_USER/GHCR_TOKEN) attached
  to the default ServiceAccount before the Labs start, so a fresh deploy of the
  PRIVATE image isn't stuck in ImagePullBackOff; and register the dependency
  repos before `helm dependency build` so it resolves Chart.lock on a fresh
  machine (previously failed with "no repository definition").
- .gitignore: ignore the vendored charts/*.tgz (rebuilt by deploy.sh) instead of
  accidentally committing them; Chart.lock stays tracked. Untracked the two tgz.
- README: document the GHCR_TOKEN pull-secret deploy path.

Verified: removing charts/*.tgz then `helm dependency build` rebuilds them;
lint + render-checks pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(helm)+chore: validate chart in CI and print URLs via NOTES.txt

Best-practices pass (/ai-engineering: test at every boundary, own the change):
- ci.yml: add a `helm` job that registers the dependency repos, builds
  dependencies, runs `helm lint`, and executes tests/render-checks.sh — so the
  chart's render invariants (subdomain hosts/backends, in-cluster OIDC URL, PG
  direct-primary keys, no base_url) are enforced automatically on every PR.
- templates/NOTES.txt: after install/upgrade, print each service subdomain and
  the per-participant JupyterLab URLs (with tokens when deployed via deploy.sh),
  plus a pointer to `./deploy.sh verify`.

Verified: NOTES renders correctly; helm lint + render-checks pass; ci.yml parses.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(helm): simplify README (~230→152 lines)

Remove duplication (wildcard DNS, TLS-for-editing, and test-only-auth each
appeared in 2-3 places) and tighten prose: merge the read-first contracts,
condense the routing/verify/JupyterLabs sections, and trim the auth-test
walkthrough to the essential steps. No facts dropped.

* fix(helm): make workshop notebooks 03/05 run in-cluster (data + tipg schema)

Running the notebooks in a Lab surfaced two cluster-only failures (compose was
fine). Both fixed with chart/data changes only — no notebook edits, so compose
is untouched:

- 03: `pgstacBootstrap.loadSamples` is now false. The upstream sample collection
  `noaa-emergency-response` is stored without a STAC `type` field, which breaks
  pystac_client's get_all_collections(). The notebooks create their own STAC
  data, and compose ships no STAC sample loader either.
- 05: add templates/features-loader-job.yaml — a post-install/upgrade hook Job
  (k8s equivalent of the compose features-loader) that loads the NA CEC Level III
  Ecoregions shapefile into features.ecoregions (idempotent, superuser secret,
  grants read to all). Also set tipg TIPG_DB_SCHEMAS=["features","public"] so it
  serves the layer.

04 needed no change: titiler-pgstac 3.0.0's /searches/register returns `id`
(what the notebook reads) and collections carry `extent`; its only failure was
the empty username widget under headless execution.

Verified live: applied to the cluster and re-ran 03/04/05 headless → all PASS
(0 errors); features.ecoregions is served by tipg. render-checks + lint pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(helm)+docs: hand browser-reachable URLs to the notebook IFrame cells

In-cluster, the Lab pod and the participant's browser need different URLs:
the pod can only reach in-cluster Service DNS, the browser only the external
subdomains — so display cells built from the server-side *_ENDPOINT vars
handed the browser unreachable addresses.

Inject STAC_API_BROWSER_URL / TITILER_BROWSER_URL / TIPG_BROWSER_URL
(derived from routing.baseDomain) into the Lab pods and use them in the
IFrame/viewer cells only, falling back to the previous .replace() behaviour
when unset (docker-compose, 2i2c). Server-side httpx cells are untouched.
Notebook 02 now prefers the stack's own STAC Browser (native #/collections
route — both compose and the chart point SB_catalogUrl at this API), avoiding
the mixed-content block of the public https browser over an http API.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(compose)+docs+helm: run the same service versions everywhere

The chart (via eoapi-k8s 6.3.1) deploys titiler-pgstac 3.0.0 / tipg 1.4.0 /
stac-fastapi-pgstac 6.2.2, but compose pinned 1.9.0 / 1.1.2 / 6.0.2 — and the
notebooks bake in routes that changed between those releases, so they could
only be correct in one environment. Align compose to the chart's versions and
update the notebooks to the current routes:

- /collections|searches/…/{tms}/map  →  …/map.html   (titiler-pgstac 3.x)
- /collections/…/tiles/{tms}/viewer  →  …/map.html   (tipg 1.4)
- NDVI map cell: expression now requires explicit assets= (titiler 3.x)

Enable TITILER_PGSTAC_API_ENABLE_EXTERNAL_DATASET_ENDPOINTS in the chart
(compose already had it) for notebook 04 §4.4, and add a lockstep render
check that asserts the chart runs the exact image:tag pinned in
docker-compose.yml so the versions cannot drift apart again.
Also run stac-manager under amd64 emulation (no arm64 manifest published).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(data): load the glad STAC collection in compose and the chart

Notebook 04 §4.5 renders glad-global-forest-change-1.11, but only the 2i2c
deploy workflow loaded it — in compose and on Kubernetes the collection was
missing (404). Mirror that deploy step in both environments:

- chart: add a stac-loader container to the features-loader Job (idempotent,
  pypgstac upsert from stac.maap-project.org, pinned to the deployed pgstac
  version 0.9.10)
- compose: add an equivalent one-shot stac-loader service (script shared via
  a compose config), and bump the pgstac image v0.9.8 → v0.9.10 to match
- both: set AWS_NO_SIGN_REQUEST=YES on the raster service — the glad assets
  are s3:// URIs in a public bucket and GDAL refuses unsigned reads otherwise

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(docs): prefill notebook 04's username from the catalog

The username widget defaulted to None, so notebook 04 only worked if a human
retyped the exact username used in notebook 02 — and headless runs always
failed (collection "None-sentinel-2-c1-l2a" → KeyError on 'extent').
Prefill the widget with the username of the most recent *-sentinel-2-c1-l2a
collection in the STAC catalog; participants can still overwrite it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(docs): prefill notebook 03's username from the catalog

Same failure mode as notebook 04 (431d835): the username widget defaulted to
None and notebook 02 generates a random Haikunator username, so the collection
search always came back empty — and silently, since the cell displayed nothing
on no match. Re-running the widget cell also wiped anything the user had typed.

Prefill the widget with the username of the most recent *-sentinel-2-c1-l2a
collection in the STAC catalog, and print the searched value plus available
collection ids when the filter matches nothing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(compose)+docs: make the notebooks run inside the compose jupyterhub container

Running the notebooks in the compose jupyterhub service (rather than a
host-side Jupyter) exposed three issues:

- titiler-pgstac 3.x dropped the POSTGRES_* settings for PG* / DATABASE_URL;
  compose still passed the old names, so the service crashed at startup
  (quote_from_bytes(None)). Rename the env vars.
- notebook 04 rewrote its server-side endpoint to localhost at definition
  time, breaking every in-container httpx call. Keep the server endpoint
  as-is and apply the localhost rewrite only in the browser-URL fallback.
- notebook 04 cells 21/23 used httpx's 5s default timeout for remote-COG
  info/preview — too short under emulation or on slow networks. Use
  timeout=None like the sibling cells.

Verified: notebooks 02-05 execute with 0 errors inside the compose jupyterhub
container (03's catalog-prefill from the previous commit supplies a coherent
username headless); all rendered IFrame URLs use the host-reachable localhost
ports.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* style: ruff-format the notebooks and stac_auth helper

Run the repo-pinned ruff (0.11.5) over the notebooks so CI's
'ruff format --check' passes. No behavioural change — notebooks 02-05
re-verified green in the compose jupyterhub container after formatting.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant