Skip to content

feat(k8s): make app-tier object storage configurable via storage.s3#5932

Open
aicam wants to merge 1 commit into
apache:mainfrom
aicam:aws-eks/02-pluggable-object-storage
Open

feat(k8s): make app-tier object storage configurable via storage.s3#5932
aicam wants to merge 1 commit into
apache:mainfrom
aicam:aws-eks/02-pluggable-object-storage

Conversation

@aicam

@aicam aicam commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

This makes the application tier's object-storage target configurable, as the first step toward supporting an external S3 store alongside the in-cluster MinIO. It is non-breaking: the default (on-prem / in-cluster MinIO) install renders an identical set of resources.

Today file-service and workflow-computing-unit-manager hardcode the S3 endpoint and credentials to the in-cluster MinIO Service ({{ .Release.Name }}-minio) and its auto-generated Secret. This PR routes both through a new storage.s3 values block:

  • values.yaml — new storage.s3 block: endpoint, region, existingSecret, accessKeyId, secretAccessKey. All default to empty.
  • templates/base/_helpers.tpl (new) — helpers that resolve the S3 endpoint and the credentials Secret name/keys. When storage.s3.endpoint is empty they fall back to the in-cluster MinIO Service and its {{ .Release.Name }}-minio Secret (keys root-user/root-password), so the default render is unchanged.
  • templates/aws/s3-credentials-secret.yaml (new) — a {{ .Release.Name }}-s3-credentials Secret, rendered only when an external endpoint is set and no existingSecret is supplied. Renders nothing on the default install.
  • file-service / workflow-computing-unit-manager deployments — the STORAGE_S3_* env now comes from the helpers; STORAGE_S3_REGION is added only in external mode.

How it behaves:

  • Default (no config): services point at the in-cluster MinIO exactly as before.
  • External S3: set storage.s3.endpoint + region + credentials (or existingSecret) → both services use that store; the chart materializes the credentials Secret unless you bring your own.

Out of scope (intentionally deferred to keep this PR small and atomic): LakeFS blockstore + Lakekeeper warehouse external-S3 wiring, the minio.enabled switch to drop the in-cluster MinIO entirely, and a values-aws.yaml example overlay. Those are the LakeFS/Lakekeeper half of "make object storage pluggable" and will land as a follow-up.

Any related issues, documentation, discussions?

Closes #5931 (app-tier storage.s3 task).
Part of #5891 — unify AWS (EKS) and on-premise Kubernetes deployment under bin/k8s (parent feature).
Follows #5757 (Helm template reorg) and the design discussion in #5641.

How was this PR tested?

Verified the default install is unchanged and the external path renders correctly:

  1. No-op render proof: helm template texera bin/k8s on this branch vs on main renders the same 102 resources — identical after ignoring comments. The only textual artifact is one empty --- document from the gated-off s3-credentials-secret.yaml (its # license header is emitted by Helm even though the {{- if }} body is empty); it produces no Kubernetes object. helm lint passes.
  2. External S3 render: helm template ... --set storage.s3.endpoint=https://s3.us-west-2.amazonaws.com --set storage.s3.accessKeyId=… --set storage.s3.secretAccessKey=… renders the *-s3-credentials Secret, repoints both deployments' STORAGE_S3_ENDPOINT/credentials at it (keys access-key-id/secret-access-key), and adds STORAGE_S3_REGION.
  3. Bring-your-own Secret: with --set storage.s3.existingSecret=my-creds, the deployments reference my-creds and the chart generates no Secret.
  4. helm lint bin/k8s passes for both the default and the external value sets.
  5. helm install locally with Minikube and test creating dataset and running workflows

No unit tests were added — the change is limited to Helm chart values/templates, validated by the render diff and helm lint above.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

Route the file-service and workflow-computing-unit-manager S3 access
(endpoint + credentials) through a new `storage.s3` values block instead of
hardcoding the in-cluster MinIO Service/Secret. When storage.s3.endpoint is
empty the helpers fall back to the in-cluster MinIO, so the default install is
unchanged (no-op render). Setting storage.s3.endpoint + credentials points
those services at an external S3-compatible store; a `<release>-s3-credentials`
Secret is generated from the inline keys unless storage.s3.existingSecret is
provided.

LakeFS/Lakekeeper storage and the minio.enabled off-switch are intentionally
left for the next step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @mengw15, @xuang7
    You can notify them by mentioning @mengw15, @xuang7 in a comment.

@aicam aicam requested a review from bobbai00 June 25, 2026 16:49

@bobbai00 bobbai00 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment for clarification

Comment thread bin/k8s/values.yaml
# S3-compatible endpoint URL -- together with region and credentials -- to
# point the services at an external store (e.g. AWS S3) instead.
storage:
s3:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have currently two places that use s3:

  • datasetS3: LakeFS uses S3 as its underlying storage
  • executionS3: Workflow execution results are using s3

Which one is this PR introducing ?

Seems both are touched in this PR. But should LakeFS's s3 connection also be updated because it should use datasetS3 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(k8s): make app-tier object storage configurable via storage.s3

2 participants