HDDS-13108. Refactor StorageVolume to use SlidingWindow#8843
HDDS-13108. Refactor StorageVolume to use SlidingWindow#8843ptlrs wants to merge 16 commits intoapache:masterfrom
Conversation
… scanner failures
…e-failed-volume-checks-to-one-sliding-window
…g window mechanism
|
Hi @errose28 @Tejaskriya @adoroszlai can you please review this PR? |
Tejaskriya
left a comment
There was a problem hiding this comment.
Thanks for working on this @ptlrs , please find a suggestion below
...ner-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/StorageVolume.java
Outdated
Show resolved
Hide resolved
|
Thanks for the review @Tejaskriya. I have added the configuration. |
Tejaskriya
left a comment
There was a problem hiding this comment.
Looks good to me, @errose28 could you please take a look?
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it. |
|
Hi @errose28, could you please reopen this PR? |
…e-failed-volume-checks-to-one-sliding-window # Conflicts: # hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java # hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java
|
Hi @errose28, the conflicts have been resolved for this PR. Could you please take a look. |
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
…failed-volume-checks-to-one-sliding-window
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @ptlrs for the patch.
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java
Outdated
Show resolved
Hide resolved
...test/java/org/apache/hadoop/ozone/container/common/volume/TestStorageVolumeHealthChecks.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
Show resolved
Hide resolved
...ner-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/StorageVolume.java
Show resolved
Hide resolved
...ner-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/StorageVolume.java
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/SlidingWindow.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
|
Hi @errose28 @ChenSammi @adoroszlai I have updated this PR, could you please take another look? |
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @ptlrs for updating the patch, LGTM.
...test/java/org/apache/hadoop/ozone/container/common/volume/TestStorageVolumeHealthChecks.java
Show resolved
Hide resolved
...c/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
Outdated
Show resolved
Hide resolved
| ) | ||
| private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT; | ||
|
|
||
| @Config(key = "hdds.datanode.disk.check.sliding.window.timeout", |
There was a problem hiding this comment.
With this sliding window introduced, "hdds.datanode.disk.check.io.test.count" property function is half removed. We should consider deprecate "hdds.datanode.disk.check.io.test.count" and introduce a new boolean property with name, like "hdds.datanode.disk.check.io.test.enabled".
There was a problem hiding this comment.
To not break existing users, it's recommend to add new property "hdds.datanode.disk.check.io.test.enabled", instead of change the current property "hdds.datanode.disk.check.io.test.count" to "hdds.datanode.disk.check.io.test.enabled".
There was a problem hiding this comment.
Ok, I reverted the change which removed the config and updated the deprecated config list.
…ust configurations for volume health checks
Please describe your PR in detail:
This PR uses the new sliding window implementation.
It migrates all existing checks to detect a failed volume to use the new time-based sliding window utility.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13108
How was this patch tested?
CI:https://github.com/ptlrs/ozone/actions/runs/16436635030