Pass KE workload via mounted secret to workers#62129
Pass KE workload via mounted secret to workers#62129amoghrajesh wants to merge 6 commits intoapache:mainfrom
Conversation
jedcunningham
left a comment
There was a problem hiding this comment.
We should also add secrets cleanup into cleanup-pods (maybe rename it?). With ownerreferences the window is small, but it does still exist.
| metadata=client.V1ObjectMeta( | ||
| name=secret_name, | ||
| namespace=self.namespace, | ||
| labels={"airflow-workload-secret": "true"}, |
There was a problem hiding this comment.
We should add more labels to here, like dag_id, run_id, task_id, map_index. And/or ti id.
There was a problem hiding this comment.
Yeah for cleanup reasons a label with the task UUID at least would be great!
There was a problem hiding this comment.
Good call, handled it and added dag_id, task_id, run_id, map_index and even ti_id: b4a137b
| if isinstance(command[0], ExecuteTask): | ||
| workload = command[0] | ||
| command = workload_to_command_args(workload) | ||
| secret_name = f"{WORKLOAD_SECRET_VOLUME_NAME}-{pod_id}" |
There was a problem hiding this comment.
Using a "volume name" here is a bit odd...
There was a problem hiding this comment.
Yes semantically you are right, I extracted that as a constant: WORKLOAD_SECRET_NAME and used it instead
| if secret_name: | ||
| if pod.spec.volumes is None: | ||
| pod.spec.volumes = [] | ||
| pod.spec.volumes.append( |
There was a problem hiding this comment.
We should do this in construct_pod so that the final pod is sent to pod_mutation_hook.
| raise | ||
| self._delete_workload_secret(f"airflow-workload-{pod_name}", namespace) | ||
|
|
||
| def _delete_workload_secret(self, secret_name: str, namespace: str) -> None: |
There was a problem hiding this comment.
We should instead patch the secret and set ownerReferences to the pod that is using it. k8s will then automatically delete the secret when the pod is deleted.
There was a problem hiding this comment.
Umm interesting take, let me try and do that.
There was a problem hiding this comment.
That actually made sense, I had to look up ownerRefs and it totally was worth it, handled it in 0cb0b19 by making a patch API call for the secret
...cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py
Outdated
Show resolved
Hide resolved
| workload = command[0] | ||
| command = workload_to_command_args(workload) | ||
| secret_name = f"{WORKLOAD_SECRET_VOLUME_NAME}-{pod_id}" | ||
| self.kube_client.create_namespaced_secret( |
There was a problem hiding this comment.
There need to be a clear error message especially in the transitional phase especially when upgraded and the credentials are lagging permissions to create/delete secrets.
|
Thanks for the PR. I really assume this is a good improvement. Nevertheless thinking about and improving here this also adds a bit of additional complexity for cases where one or multiple remote clusters are being used to distribute workload. Means (1) when upgrading provider existing installs might run into pitfall and need to upgrade permissions allowing to add / delete secrets. So something that need to be considered when upgrading. Especially for distributed setups and then (2) also remote clusters would not grant additional permissions and we likely get a lot of trouble reports? (3) If I consider there are people using a distributed K8s setup I'd be a bit worried if I deleted create/delete secret permission to a remote, if such "remote K8s admin" might be reluctant, would there be a way to force configure the legacy secret sharing via Pod manifest possible? |
Thanks, I had missed that flow (so many sometimes :)), added it in e7b3b5f |
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/cli/kubernetes_command.py
Dismissed
Show dismissed
Hide dismissed
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/cli/kubernetes_command.py
Dismissed
Show dismissed
Hide dismissed
|
@jscheffl all valid concerns, I wonder what's the best way to handle it here.
In such cases, maybe the best fallback would be to fallback to the legacy way (using CLI args) maybe using a flag or a new configuration to keep migration smooth and not break usage? Any thoughts @jedcunningham @jscheffl @potiuk ? |
Regarding 2) I have no strong opinion. Just by the arguments... an automated fallback with logged warning might be the "nicest" and a security researcher then might complaint that such error might start dropping secrets to CLI. It might be a configurable fallback? Regarding 1) in theory could follow whatever we decide for (2)? |
|
Yeah I think a configurable fallback to follow the "cli" way of doing things might be the safest path forward in terms of compat. We should make it clear in warnings about this though that RBAC needs to be updated. Hmm let me wait for others to chime in here too |
We used to pass the workload to a K8s worker using command line args which is not a good practice.
Through this PR, I am creating a K8s Secret to pass in the task workload: https://kubernetes.io/docs/concepts/configuration/secret/. The secret will contain the ExecuteTask workload JSON and it will be mounted into the worker pod at a fixed path. The pod reads the workload using --json-path instead of --json-string. The secret's lifecycle is tied to the pod via k8s ownerReferences, so it is automatically garbage collected when the pod is deleted. The cleanup CronJob acts as a fallback for any orphaned secrets.
Sizing implications?
Each Secret will be under 1 KB or less in size considering the standard fields it will have and the structure we form, making the overhead negligible even at high concurrency.
Since the scheduler now requires creating a K8s Secret for the worker to mount, the helm chart
pod-launcherRBAC role has been updated to grant the scheduler permission to create, get, and patch secrets.This is needed to create the workload secret and to set the
ownerReferenceon it after the pod is created. This doesn't seem too bad since the scheduler is a trusted component and already had the same verbs for the pods resource.Ran a few examples by deploying the change on K8s and this is what we see now:
argsand one of the secrets{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.