Skip to content

OCPBUGS-91663: Clean up old temp directories in downloads pod#1176

Open
hongkailiu wants to merge 1 commit into
openshift:mainfrom
hongkailiu:OCPBUGS-91663
Open

OCPBUGS-91663: Clean up old temp directories in downloads pod#1176
hongkailiu wants to merge 1 commit into
openshift:mainfrom
hongkailiu:OCPBUGS-91663

Conversation

@hongkailiu

@hongkailiu hongkailiu commented Jun 23, 2026

Copy link
Copy Markdown
Member

Add cleanup logic to remove old download-* temp directories before creating a new one. This prevents accumulation of stale temp directories from previous pod instances.

Changes:

  • Define TEMP_DIR_PREFIX constant for 'download-' prefix
  • Add cleanup logic to remove existing download-* directories on startup
  • Use prefix parameter in tempfile.mkdtemp() for easier identification

Analysis / Root cause:

When the pod is killed with oc -n openshift-console exec <downloads-pod> -- kill 1, it is like simulating A container crashing does not remove a Pod from a node. The data in an emptyDir volume is safe across container crashes. as in https://kubernetes.io/docs/concepts/storage/volumes/#emptydir.

The data from the previous container are reserved and thus could cause out-of-disk issue after a loop of such actions.

Solution description:

See the PR description above.

Test setup:

Test cases:

I will verify manually with the steps in the associated bug.

Browser conformance:

  • Chrome
  • Firefox
  • Safari (or Epiphany on Linux)

Additional info:

Reviewers and assignees:

Summary by CodeRabbit

  • Bug Fixes
    • Downloads service now includes automatic cleanup of temporary directories from previous operations, improving resource management and preventing disk space accumulation.

Add cleanup logic to remove old download-* temp directories before
creating a new one. This prevents accumulation of stale temp directories
from previous pod instances.

Changes:
- Define TEMP_DIR_PREFIX constant for 'download-' prefix
- Add cleanup logic to remove existing download-* directories on startup
- Use prefix parameter in tempfile.mkdtemp() for easier identification

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Walkthrough

The embedded Python startup script inside downloads-deployment.yaml is updated to define a TEMP_DIR_PREFIX constant (download-), scan the system temp directory for stale directories matching that prefix, remove them via shutil.rmtree (logging warnings on failure), and create the new working directory using that prefix.

Changes

Downloads Deployment Startup Script — Temp Dir Cleanup

Layer / File(s) Summary
Prefixed temp dir cleanup on startup
bindata/assets/deployments/downloads-deployment.yaml
Import line is updated to include shutil. TEMP_DIR_PREFIX = 'download-' is introduced. On startup, any existing download-* directories in the system temp dir are removed via shutil.rmtree with per-item warning logs on exceptions. The working directory creation is switched from unprefixed tempfile.mkdtemp() to tempfile.mkdtemp(prefix=TEMP_DIR_PREFIX).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description includes Analysis/Root cause, Solution description, and Test cases sections. However, it lacks the required Jira issue prefix in the title and leaves Test setup and Browser conformance sections incomplete. Add Jira issue prefix (OCPBUGS-91663) to the PR title and complete the Test setup and Browser conformance sections of the description template.
Microshift Test Compatibility ⚠️ Warning PR adds 26 new Ginkgo e2e test files. Tests use config.openshift.io/v1 APIs (proxy, infrastructure, ingress, featuregate, etc.) which are NOT available on MicroShift. No [Skipped:MicroShift], [apig... Add [apigroup:config.openshift.io] tags to tests using config APIs, or guard with IsMicroShiftCluster() checks with g.Skip(). Tests: custom_url_test.go, proxy_test.go, brand_customization_test.go, and affected framework components.
✅ Passed checks (13 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR only modifies downloads-deployment.yaml (Python script in YAML), not any Ginkgo test files. No test code changes present in this PR.
Test Structure And Quality ✅ Passed The PR modifies only a deployment YAML file with embedded Python script, not Ginkgo test code. The custom check for test structure and quality is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests; it only modifies the downloads deployment manifest to add temp directory cleanup logic. SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only the embedded Python startup script for temp directory cleanup; no Kubernetes scheduling constraints (affinity, replicas, nodeSelector, tolerations, PDB, topology spread) are introd...
Ote Binary Stdout Contract ✅ Passed The PR modifies a Kubernetes deployment YAML for a downloads pod (not an OTE test binary). The Python script's print statements are pod startup logs, not test output that communicates with openshif...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. All e2e tests use standard Go testing.T package, not Ginkgo. The custom check is not applicable.
No-Weak-Crypto ✅ Passed PR modifies only temp directory cleanup logic with no weak crypto (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or insecure comparisons.
Container-Privileges ✅ Passed The manifest enforces restrictive security: runAsNonRoot, allowPrivilegeEscalation=false, all capabilities dropped, read-only root filesystem. No privileged, hostPID/Network/IPC, or SYS_ADMIN setti...
No-Sensitive-Data-In-Logs ✅ Passed The logging statements in the cleanup logic only log filesystem paths and generic OS exceptions (e.g., permission denied), not passwords, tokens, keys, PII, session IDs, or sensitive data.
Title check ✅ Passed The title clearly and specifically describes the main change: adding cleanup logic for old temp directories in the downloads pod, matching the PR's primary objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from TheRealJon and jhadvig June 23, 2026 19:14
@hongkailiu

Copy link
Copy Markdown
Member Author

/retest-required

Maybe flaky tests?

@hongkailiu

Copy link
Copy Markdown
Member Author

/retest-required

@hongkailiu

Copy link
Copy Markdown
Member Author

The fix seems working.

Cluster-bot (logs)

launch 5.0.0-0.nightly,console-operator#1176 aws

Then

$ downloads_pod="$(oc get pod -n openshift-console -l app=console,component=downloads -o wide -o jsonpath='{.items[0].metadata.name}')"

$ oc -n openshift-console exec $downloads_pod -- kill 1
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
3.0G	/tmp/download-7o_q2ztx
3.0G	/tmp
$ oc -n openshift-console exec $downloads_pod -- kill 1
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
error: Internal error occurred: unable to upgrade connection: container not found ("download-server")
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
1.9G	/tmp/download-bqem1kf_
1.9G	/tmp
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
3.1G	/tmp/download-bqem1kf_
3.1G	/tmp
$ oc -n openshift-console exec $downloads_pod -- kill 1
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
error: Internal error occurred: unable to upgrade connection: container not found ("download-server")
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
error: Internal error occurred: unable to upgrade connection: container not found ("download-server")
$ oc -n openshift-console exec $downloads_pod -- du -h -d 1 /tmp
3.1G	/tmp/download-mrmmn4gw
3.1G	/tmp
$ oc  -n openshift-console logs $downloads_pod
Starting downloads server...
Cleaning up old temp directories in /tmp...
  Removing old directory: /tmp/download-bqem1kf_
Serving from: /tmp/download-mrmmn4gw
Creating arch directories...
Creating license symlink...
Creating oc binary symlinks (archives will be created asynchronously)...
...

@hongkailiu hongkailiu changed the title Clean up old temp directories in downloads pod OCPBUGS-91663: Clean up old temp directories in downloads pod Jun 23, 2026
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 23, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@hongkailiu: This pull request references Jira Issue OCPBUGS-91663, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Add cleanup logic to remove old download-* temp directories before creating a new one. This prevents accumulation of stale temp directories from previous pod instances.

Changes:

  • Define TEMP_DIR_PREFIX constant for 'download-' prefix
  • Add cleanup logic to remove existing download-* directories on startup
  • Use prefix parameter in tempfile.mkdtemp() for easier identification

Analysis / Root cause:

When the pod is killed with oc -n openshift-console exec <downloads-pod> -- kill 1, it is like simulating A container crashing does not remove a Pod from a node. The data in an emptyDir volume is safe across container crashes. as in https://kubernetes.io/docs/concepts/storage/volumes/#emptydir.

The data from the previous container are reserved and thus could cause out-of-disk issue after a loop of such actions.

Solution description:

See the PR description above.

Test setup:

Test cases:

I will verify manually with the steps in the associated bug.

Browser conformance:

  • Chrome
  • Firefox
  • Safari (or Epiphany on Linux)

Additional info:

Reviewers and assignees:

Summary by CodeRabbit

  • Bug Fixes
  • Downloads service now includes automatic cleanup of temporary directories from previous operations, improving resource management and preventing disk space accumulation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu

Copy link
Copy Markdown
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Jun 23, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@hongkailiu: This pull request references Jira Issue OCPBUGS-91663, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jun 23, 2026
@hongkailiu

Copy link
Copy Markdown
Member Author

/retest-required

@hongkailiu

Copy link
Copy Markdown
Member Author

/test e2e-aws-console

1 similar comment
@hongkailiu

Copy link
Copy Markdown
Member Author

/test e2e-aws-console

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@hongkailiu: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jhadvig

jhadvig commented Jun 25, 2026

Copy link
Copy Markdown
Member

/payload-job periodic-ci-openshift-release-master-nightly-5.0-e2e-aws-ovn-upgrade-fips
/payload-job periodic-ci-openshift-release-master-nightly-5.0-e2e-aws-ovn-upgrade-fips-nat-instance

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@jhadvig: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@jhadvig

jhadvig commented Jun 25, 2026

Copy link
Copy Markdown
Member

/lgtm
/approve

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 25, 2026
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongkailiu, jhadvig

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2026
@hongkailiu

Copy link
Copy Markdown
Member Author

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e6c311d0-707e-11f1-9bb8-7c12c4bd56bc-0

@hongkailiu

Copy link
Copy Markdown
Member Author
$ rg "5.0.*upgrade-fips-.+-instance"
ci-operator/jobs/openshift/release/openshift-release-main-periodics.yaml
273293:  name: periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-no-nat-instance

core-services/release-controller/_releases/release-ocp-5.0.json
187:        "name": "periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-no-nat-instance"

core-services/release-controller/_releases/priv/release-ocp-5.0.json
223:                "name": "periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-no-nat-instance-priv"

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-no-nat-instance

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@hongkailiu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-upgrade-fips-no-nat-instance

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9f990840-707f-11f1-9314-d0d109c503cc-0

hongkailiu added a commit to hongkailiu/console that referenced this pull request Jun 25, 2026
Follow up openshift/console-operator#1176

The GoLang implementation uses a fixed folder name `defaultArtifactsDir` to store the files and cleans the folder up [1]. So it should not suffer the issue of accumulating the files over startups.

The current unit test ensures already the same `defaultArtifactsDir` is used [2], and we add the logic in the pull to ensure that the file from previous container is removed in the new container after startup.

[1]. https://github.com/openshift/console/blob/71f7e03bcfe04f9842f24a8f628660f8f2d4b892/cmd/downloads/config/downloads_config.go#L119-L122

[2]. https://github.com/openshift/console/blob/28f9ca40e41a5b221bcd562194fcc1ff930cc6f4/cmd/downloads/config/downloads_config_test.go#L266-L268
hongkailiu added a commit to hongkailiu/console that referenced this pull request Jun 25, 2026
Follow up openshift/console-operator#1176

The GoLang implementation uses a fixed folder name `defaultArtifactsDir` to store the files and cleans the folder up [1]. So it should not suffer the issue of accumulating the files over startups.

The current unit test ensures already the same `defaultArtifactsDir` is used [2], and we add the logic in the pull to ensure that the file from previous container is removed in the new container after startup.

[1]. https://github.com/openshift/console/blob/71f7e03bcfe04f9842f24a8f628660f8f2d4b892/cmd/downloads/config/downloads_config.go#L119-L122

[2]. https://github.com/openshift/console/blob/28f9ca40e41a5b221bcd562194fcc1ff930cc6f4/cmd/downloads/config/downloads_config_test.go#L266-L268
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants