Pass KE workload via mounted secret to workers#62129

Open

amoghrajesh wants to merge 6 commits intoapache:mainfrom

astronomer:pass-jwt-to-ke-pods-via-secret

Contributor

amoghrajesh commented Feb 18, 2026 •

edited

Loading

We used to pass the workload to a K8s worker using command line args which is not a good practice.

Through this PR, I am creating a K8s Secret to pass in the task workload: https://kubernetes.io/docs/concepts/configuration/secret/. The secret will contain the ExecuteTask workload JSON and it will be mounted into the worker pod at a fixed path. The pod reads the workload using --json-path instead of --json-string. The secret's lifecycle is tied to the pod via k8s ownerReferences, so it is automatically garbage collected when the pod is deleted. The cleanup CronJob acts as a fallback for any orphaned secrets.

Sizing implications?

Each Secret will be under 1 KB or less in size considering the standard fields it will have and the structure we form, making the overhead negligible even at high concurrency.

Since the scheduler now requires creating a K8s Secret for the worker to mount, the helm chart pod-launcher RBAC role has been updated to grant the scheduler permission to create, get, and patch secrets.

This is needed to create the workload secret and to set the ownerReference on it after the pod is created. This doesn't seem too bad since the scheduler is a trusted component and already had the same verbs for the pods resource.

Ran a few examples by deploying the change on K8s and this is what we see now:

Fetched the args now for a running worker:

(airflow) ➜  airflow git:(pass-jwt-to-ke-pods-via-secret) ✗ kubectl get pod -n airflow --sort-by=.metadata.creationTimestamp -o jsonpath='{.items[-1].spec.containers[0].args}' 
["python","-m","airflow.sdk.execution_time.execute_workload","--json-path","/run/secrets/airflow-workload/workload.json"]%              
(airflow) ➜  airflow git:(pass-jwt-to-ke-pods-via-secret) ✗ kubectl get pod -n airflow --sort-by=.metadata.creationTimestamp -o jsonpath='{.items[-1].spec.containers[0].args}'
["python","-m","airflow.sdk.execution_time.execute_workload","--json-path","/run/secrets/airflow-workload/workload.json"]%

Images showing args and one of the secrets

image (70)

image (69)

Secrets have ownerRefs and get deleted once task is done

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.


          Pass KE workload via mounted secret to workers

e9c2758

amoghrajesh requested review from ashb, kaxil and potiuk

February 18, 2026 14:41

amoghrajesh self-assigned this

amoghrajesh requested review from bugraoz93, dstandish, hussein-awala, jedcunningham and jscheffl as code owners

February 18, 2026 14:41

boring-cyborg bot added area:helm-chart area:providers provider:cncf-kubernetes labels

amoghrajesh requested a review from pierrejeambrun

February 18, 2026 14:41

potiuk approved these changes

View reviewed changes

Member

potiuk left a comment

Nice!

jedcunningham reviewed

View reviewed changes

Member

jedcunningham left a comment

We should also add secrets cleanup into cleanup-pods (maybe rename it?). With ownerreferences the window is small, but it does still exist.

...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py Outdated

+                                      metadata=client.V1ObjectMeta(
+                                          name=secret_name,
+                                          namespace=self.namespace,
+                                          labels={"airflow-workload-secret": "true"},

Member

jedcunningham Feb 18, 2026

We should add more labels to here, like dag_id, run_id, task_id, map_index. And/or ti id.

Contributor

jscheffl Feb 18, 2026

Yeah for cleanup reasons a label with the task UUID at least would be great!

Contributor Author

amoghrajesh Feb 19, 2026

Good call, handled it and added dag_id, task_id, run_id, map_index and even ti_id: b4a137b

...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py Outdated

                           if isinstance(command[0], ExecuteTask):
                               workload = command[0]
-                              command = workload_to_command_args(workload)
+                              secret_name = f"{WORKLOAD_SECRET_VOLUME_NAME}-{pod_id}"

Member

jedcunningham Feb 18, 2026

Using a "volume name" here is a bit odd...

Contributor Author

amoghrajesh Feb 19, 2026

Yes semantically you are right, I extracted that as a constant: WORKLOAD_SECRET_NAME and used it instead

Contributor Author

amoghrajesh Feb 19, 2026

Handled in 3d773ea

...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py Outdated

+                      if secret_name:
+                          if pod.spec.volumes is None:
+                              pod.spec.volumes = []
+                          pod.spec.volumes.append(

Member

jedcunningham Feb 18, 2026

We should do this in construct_pod so that the final pod is sent to pod_mutation_hook.

Contributor Author

amoghrajesh Feb 19, 2026

Done, handled it in 0b3d3e9

...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py Outdated

                               raise
+                      self._delete_workload_secret(f"airflow-workload-{pod_name}", namespace)
+                  def _delete_workload_secret(self, secret_name: str, namespace: str) -> None:

Member

jedcunningham Feb 18, 2026

We should instead patch the secret and set ownerReferences to the pod that is using it. k8s will then automatically delete the secret when the pod is deleted.

Contributor Author

amoghrajesh Feb 19, 2026

Umm interesting take, let me try and do that.

Contributor Author

amoghrajesh Feb 19, 2026

That actually made sense, I had to look up ownerRefs and it totally was worth it, handled it in 0cb0b19 by making a patch API call for the secret

...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py Outdated Show resolved Hide resolved

jedcunningham requested changes

View reviewed changes

jscheffl reviewed

View reviewed changes

...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py

                               workload = command[0]
-                              command = workload_to_command_args(workload)
+                              secret_name = f"{WORKLOAD_SECRET_VOLUME_NAME}-{pod_id}"
+                              self.kube_client.create_namespaced_secret(

Contributor

jscheffl Feb 18, 2026

There need to be a clear error message especially in the transitional phase especially when upgraded and the credentials are lagging permissions to create/delete secrets.

Contributor

jscheffl commented Feb 18, 2026

Thanks for the PR. I really assume this is a good improvement.

Nevertheless thinking about and improving here this also adds a bit of additional complexity for cases where one or multiple remote clusters are being used to distribute workload. Means (1) when upgrading provider existing installs might run into pitfall and need to upgrade permissions allowing to add / delete secrets. So something that need to be considered when upgrading. Especially for distributed setups and then (2) also remote clusters would not grant additional permissions and we likely get a lot of trouble reports?

(3) If I consider there are people using a distributed K8s setup I'd be a bit worried if I deleted create/delete secret permission to a remote, if such "remote K8s admin" might be reluctant, would there be a way to force configure the legacy secret sharing via Pod manifest possible?

amoghrajesh added 5 commits

February 19, 2026 13:37


          adding secret cleanup to cleanup-pods job

e7b3b5f


          add other labels to secret

b4a137b


          moving mount logic to construct_pod

0b3d3e9


          swapping to ownerRefs

0cb0b19


          cleanup

3d773ea

Contributor Author

amoghrajesh commented Feb 19, 2026

We should also add secrets cleanup into cleanup-pods (maybe rename it?). With ownerreferences the window is small, but it does still exist.

Thanks, I had missed that flow (so many sometimes :)), added it in e7b3b5f

github-advanced-security bot found potential problems

View reviewed changes

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/cli/kubernetes_command.py Dismissed Show dismissed Hide dismissed

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/cli/kubernetes_command.py Dismissed Show dismissed Hide dismissed

Contributor Author

amoghrajesh commented Feb 19, 2026

@jscheffl all valid concerns, I wonder what's the best way to handle it here.

For upgrade paths, when people upgrade to the new version of this provider, the new code will try to create that secret and fail with a 403, will adding something like "you need to update your RBAC" in the scheduler logs be a good way to handle this?
About remote clusters, unsure how to handle it. This will involve a remote cluster admin to grant some more permissions (specially in secure environments), so I wonder what's the best way forward there.

In such cases, maybe the best fallback would be to fallback to the legacy way (using CLI args) maybe using a flag or a new configuration to keep migration smooth and not break usage?

Any thoughts @jedcunningham @jscheffl @potiuk ?

Contributor

jscheffl commented Feb 19, 2026

@jscheffl all valid concerns, I wonder what's the best way to handle it here.
1. For upgrade paths, when people upgrade to the new version of this provider, the new code will try to create that secret and fail with a 403, will adding something like "you need to update your RBAC" in the scheduler logs be a good way to handle this?

2. About remote clusters, unsure how to handle it. This will involve a remote cluster admin to grant some more permissions (specially in secure environments), so I wonder what's the best way forward there.
In such cases, maybe the best fallback would be to fallback to the legacy way (using CLI args) maybe using a flag or a new configuration to keep migration smooth and not break usage?

Any thoughts @jedcunningham @jscheffl @potiuk ?

Regarding 2) I have no strong opinion. Just by the arguments... an automated fallback with logged warning might be the "nicest" and a security researcher then might complaint that such error might start dropping secrets to CLI. It might be a configurable fallback?

Regarding 1) in theory could follow whatever we decide for (2)?

Contributor Author

amoghrajesh commented Feb 23, 2026

Yeah I think a configurable fallback to follow the "cli" way of doing things might be the safest path forward in terms of compat. We should make it clear in warnings about this though that RBAC needs to be updated. Hmm let me wait for others to chime in here too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

jscheffl jscheffl left review comments

jedcunningham jedcunningham requested changes

potiuk potiuk approved these changes

ashb Awaiting requested review from ashb

kaxil Awaiting requested review from kaxil

dstandish Awaiting requested review from dstandish dstandish is a code owner

hussein-awala Awaiting requested review from hussein-awala hussein-awala is a code owner

bugraoz93 Awaiting requested review from bugraoz93 bugraoz93 is a code owner

pierrejeambrun Awaiting requested review from pierrejeambrun

Requested changes must be addressed to merge this pull request.

Labels

area:helm-chart area:providers provider:cncf-kubernetes