OCPBUGS-91663: Clean up old temp directories in downloads pod#1176
OCPBUGS-91663: Clean up old temp directories in downloads pod#1176hongkailiu wants to merge 1 commit into
Conversation
Add cleanup logic to remove old download-* temp directories before creating a new one. This prevents accumulation of stale temp directories from previous pod instances. Changes: - Define TEMP_DIR_PREFIX constant for 'download-' prefix - Add cleanup logic to remove existing download-* directories on startup - Use prefix parameter in tempfile.mkdtemp() for easier identification Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
WalkthroughThe embedded Python startup script inside ChangesDownloads Deployment Startup Script — Temp Dir Cleanup
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 13 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (13 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/retest-required Maybe flaky tests? |
|
/retest-required |
|
The fix seems working. Cluster-bot (logs) Then $ downloads_pod="$(oc get pod -n openshift-console -l app=console,component=downloads -o wide -o jsonpath='{.items[0].metadata.name}')"
$ oc -n openshift-console exec $downloads_pod -- kill 1
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
3.0G /tmp/download-7o_q2ztx
3.0G /tmp
$ oc -n openshift-console exec $downloads_pod -- kill 1
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
error: Internal error occurred: unable to upgrade connection: container not found ("download-server")
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
1.9G /tmp/download-bqem1kf_
1.9G /tmp
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
3.1G /tmp/download-bqem1kf_
3.1G /tmp
$ oc -n openshift-console exec $downloads_pod -- kill 1
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
error: Internal error occurred: unable to upgrade connection: container not found ("download-server")
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
error: Internal error occurred: unable to upgrade connection: container not found ("download-server")
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
3.1G /tmp/download-mrmmn4gw
3.1G /tmp
$ oc -n openshift-console logs $downloads_pod
Starting downloads server...
Cleaning up old temp directories in /tmp...
Removing old directory: /tmp/download-bqem1kf_
Serving from: /tmp/download-mrmmn4gw
Creating arch directories...
Creating license symlink...
Creating oc binary symlinks (archives will be created asynchronously)...
... |
|
@hongkailiu: This pull request references Jira Issue OCPBUGS-91663, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@hongkailiu: This pull request references Jira Issue OCPBUGS-91663, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
/test e2e-aws-console |
1 similar comment
|
/test e2e-aws-console |
|
@hongkailiu: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload-job periodic-ci-openshift-release-master-nightly-5.0-e2e-aws-ovn-upgrade-fips |
|
@jhadvig: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hongkailiu, jhadvig The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips |
|
@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e6c311d0-707e-11f1-9bb8-7c12c4bd56bc-0 |
/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-no-nat-instance |
|
@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9f990840-707f-11f1-9314-d0d109c503cc-0 |
Follow up openshift/console-operator#1176 The GoLang implementation uses a fixed folder name `defaultArtifactsDir` to store the files and cleans the folder up [1]. So it should not suffer the issue of accumulating the files over startups. The current unit test ensures already the same `defaultArtifactsDir` is used [2], and we add the logic in the pull to ensure that the file from previous container is removed in the new container after startup. [1]. https://github.com/openshift/console/blob/71f7e03bcfe04f9842f24a8f628660f8f2d4b892/cmd/downloads/config/downloads_config.go#L119-L122 [2]. https://github.com/openshift/console/blob/28f9ca40e41a5b221bcd562194fcc1ff930cc6f4/cmd/downloads/config/downloads_config_test.go#L266-L268
Follow up openshift/console-operator#1176 The GoLang implementation uses a fixed folder name `defaultArtifactsDir` to store the files and cleans the folder up [1]. So it should not suffer the issue of accumulating the files over startups. The current unit test ensures already the same `defaultArtifactsDir` is used [2], and we add the logic in the pull to ensure that the file from previous container is removed in the new container after startup. [1]. https://github.com/openshift/console/blob/71f7e03bcfe04f9842f24a8f628660f8f2d4b892/cmd/downloads/config/downloads_config.go#L119-L122 [2]. https://github.com/openshift/console/blob/28f9ca40e41a5b221bcd562194fcc1ff930cc6f4/cmd/downloads/config/downloads_config_test.go#L266-L268
Add cleanup logic to remove old download-* temp directories before creating a new one. This prevents accumulation of stale temp directories from previous pod instances.
Changes:
Analysis / Root cause:
When the pod is killed with
oc -n openshift-console exec <downloads-pod> -- kill 1, it is like simulating A container crashing does not remove a Pod from a node. The data in an emptyDir volume is safe across container crashes. as in https://kubernetes.io/docs/concepts/storage/volumes/#emptydir.The data from the previous container are reserved and thus could cause out-of-disk issue after a loop of such actions.
Solution description:
See the PR description above.
Test setup:
Test cases:
I will verify manually with the steps in the associated bug.
Browser conformance:
Additional info:
Reviewers and assignees:
Summary by CodeRabbit