Skip to content

feat(helm): add api-server rollout restart cronjob#67569

Draft
Subham-KRLX wants to merge 1 commit into
apache:mainfrom
Subham-KRLX:feat/api-server-rollout-restart-cronjob
Draft

feat(helm): add api-server rollout restart cronjob#67569
Subham-KRLX wants to merge 1 commit into
apache:mainfrom
Subham-KRLX:feat/api-server-rollout-restart-cronjob

Conversation

@Subham-KRLX
Copy link
Copy Markdown
Contributor

@Subham-KRLX Subham-KRLX commented May 26, 2026

This PR adds Helm chart support for periodic API server rollout restarts on Kubernetes.

Problem:
Long-running uvicorn processes in the API server can accumulate stale resources. Kubernetes provides kubectl rollout restart as a native mechanism, but the Airflow Helm chart has no built in support for scheduling this.

Fix:
Adds a new apiServer.rolloutRestart configuration section with a CronJob, ServiceAccount, Role, and RoleBinding following the exact same pattern as the existing databaseCleanup CronJob.

closes: #61432

Was generative AI tooling used to co-author this PR?
  • Yes — Claude(For pr description)

@Subham-KRLX Subham-KRLX force-pushed the feat/api-server-rollout-restart-cronjob branch from 4e4c5a6 to 7e10d46 Compare May 26, 2026 17:40
@Subham-KRLX Subham-KRLX changed the title Feat/api server rollout restart cronjob feat(helm): add api-server rollout restart cronjob May 26, 2026
Copy link
Copy Markdown
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had similar PR recently. I did not like it.

If a long running process has a problem then there should be other means. A rolling restart is for me only a workaround and should not be/get a permanent feature in Helm chart therefore.

See also https://airflow.apache.org/docs/apache-airflow/stable/faq.html#how-to-prevent-api-server-memory-growth

@Subham-KRLX
Copy link
Copy Markdown
Contributor Author

We had similar PR recently. I did not like it.

If a long running process has a problem then there should be other means. A rolling restart is for me only a workaround and should not be/get a permanent feature in Helm chart therefore.

See also https://airflow.apache.org/docs/apache-airflow/stable/faq.html#how-to-prevent-api-server-memory-growth

I have marked this PR as a draft for now while I look into how we can better support/document the Gunicorn rolling restarts within the Helm chart instead of forcing a full pod rollout I will update the PR if there's a better native Helm integration we can add.

@Subham-KRLX Subham-KRLX marked this pull request as draft May 26, 2026 17:49
@Miretpl
Copy link
Copy Markdown
Contributor

Miretpl commented May 26, 2026

We had similar PR recently. I did not like it.

If a long running process has a problem then there should be other means. A rolling restart is for me only a workaround and should not be/get a permanent feature in Helm chart therefore.

See also https://airflow.apache.org/docs/apache-airflow/stable/faq.html#how-to-prevent-api-server-memory-growth

+1. Handling of worker restarts should be implemented in the API server itself if it is needed.

@Subham-KRLX
Copy link
Copy Markdown
Contributor Author

We had similar PR recently. I did not like it.

If a long running process has a problem then there should be other means. A rolling restart is for me only a workaround and should not be/get a permanent feature in Helm chart therefore.

See also https://airflow.apache.org/docs/apache-airflow/stable/faq.html#how-to-prevent-api-server-memory-growth

+1. Handling of worker restarts should be implemented in the API server itself if it is needed.

Agree a rolling restart is a workaround and should not be a permanent chart feature. This PR adds an opt-in CronJob disabled by default so teams can use it as a short term mitigation.

@Miretpl
Copy link
Copy Markdown
Contributor

Miretpl commented May 27, 2026

Agree a rolling restart is a workaround and should not be a permanent chart feature. This PR adds an opt-in CronJob disabled by default so teams can use it as a short term mitigation.

I would be really against adding workaround features to the chart. We have, e.g., PostgreSQL within the chart currently, which was meant only for development purposes, and there are teams which are using it for production. I believe that we should not encourage users to use this particular workaround by implementing it and making it easy to use. If there is a team which will need to do it, they could just create the CronJob definition and apply it to the Kubernetes cluster.

@jscheffl
Copy link
Copy Markdown
Contributor

Agree a rolling restart is a workaround and should not be a permanent chart feature. This PR adds an opt-in CronJob disabled by default so teams can use it as a short term mitigation.

I would be really against adding workaround features to the chart. We have, e.g., PostgreSQL within the chart currently, which was meant only for development purposes, and there are teams which are using it for production. I believe that we should not encourage users to use this particular workaround by implementing it and making it easy to use. If there is a team which will need to do it, they could just create the CronJob definition and apply it to the Kubernetes cluster.

Then - if really somebody needs it - would propose to add this being a Kustomize Layer example we can add to the repo with some docs how to apply but not adding this to main chart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:helm-chart Airflow Helm Chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Helm chart support for periodic API server rollout restarts on Kubernetes

3 participants