Skip to content

HDDS-13108. Refactor StorageVolume to use SlidingWindow#8843

Open
ptlrs wants to merge 16 commits intoapache:masterfrom
ptlrs:HDDS-13108-Migrate-failed-volume-checks-to-one-sliding-window
Open

HDDS-13108. Refactor StorageVolume to use SlidingWindow#8843
ptlrs wants to merge 16 commits intoapache:masterfrom
ptlrs:HDDS-13108-Migrate-failed-volume-checks-to-one-sliding-window

Conversation

@ptlrs
Copy link
Copy Markdown
Contributor

@ptlrs ptlrs commented Jul 22, 2025

Please describe your PR in detail:

This PR uses the new sliding window implementation.
It migrates all existing checks to detect a failed volume to use the new time-based sliding window utility.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13108

How was this patch tested?

CI:https://github.com/ptlrs/ozone/actions/runs/16436635030

@ptlrs ptlrs marked this pull request as draft July 22, 2025 06:38
@ptlrs
Copy link
Copy Markdown
Contributor Author

ptlrs commented Jul 22, 2025

Hi @errose28 @Tejaskriya @adoroszlai can you please review this PR?

@errose28 errose28 added the scanners Changes related to datanode container and volume scanners label Jul 22, 2025
@Tejaskriya Tejaskriya self-requested a review July 23, 2025 08:31
@errose28 errose28 self-requested a review July 23, 2025 18:46
Copy link
Copy Markdown
Contributor

@Tejaskriya Tejaskriya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @ptlrs , please find a suggestion below

@ptlrs
Copy link
Copy Markdown
Contributor Author

ptlrs commented Jul 31, 2025

Thanks for the review @Tejaskriya. I have added the configuration.

@ptlrs ptlrs requested a review from Tejaskriya July 31, 2025 06:38
Copy link
Copy Markdown
Contributor

@Tejaskriya Tejaskriya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, @errose28 could you please take a look?

@github-actions
Copy link
Copy Markdown

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Nov 11, 2025
@github-actions
Copy link
Copy Markdown

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

@github-actions github-actions bot closed this Nov 25, 2025
@ptlrs
Copy link
Copy Markdown
Contributor Author

ptlrs commented Feb 9, 2026

Hi @errose28, could you please reopen this PR?

@errose28 errose28 reopened this Feb 9, 2026
@github-actions github-actions bot removed the stale label Feb 10, 2026
…e-failed-volume-checks-to-one-sliding-window

# Conflicts:
#	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
#	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java
@ptlrs
Copy link
Copy Markdown
Contributor Author

ptlrs commented Feb 12, 2026

Hi @errose28, the conflicts have been resolved for this PR. Could you please take a look.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 6, 2026

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Mar 6, 2026
Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ptlrs for the patch.

@adoroszlai adoroszlai removed the stale label Mar 9, 2026
@adoroszlai adoroszlai changed the title HDDS-13108. Migrate failed volume checks to one sliding window HDDS-13108. Refactor StorageVolume to use SlidingWindow Mar 9, 2026
ptlrs and others added 2 commits March 25, 2026 08:30
Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
@ptlrs ptlrs requested review from ChenSammi and adoroszlai March 27, 2026 18:24
@ptlrs
Copy link
Copy Markdown
Contributor Author

ptlrs commented Mar 27, 2026

Hi @errose28 @ChenSammi @adoroszlai I have updated this PR, could you please take another look?

Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ptlrs for updating the patch, LGTM.

)
private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;

@Config(key = "hdds.datanode.disk.check.sliding.window.timeout",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this sliding window introduced, "hdds.datanode.disk.check.io.test.count" property function is half removed. We should consider deprecate "hdds.datanode.disk.check.io.test.count" and introduce a new boolean property with name, like "hdds.datanode.disk.check.io.test.enabled".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not break existing users, it's recommend to add new property "hdds.datanode.disk.check.io.test.enabled", instead of change the current property "hdds.datanode.disk.check.io.test.count" to "hdds.datanode.disk.check.io.test.enabled".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I reverted the change which removed the config and updated the deprecated config list.

@ptlrs ptlrs requested a review from ChenSammi March 31, 2026 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scanners Changes related to datanode container and volume scanners

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants