HDDS-15208. OM should learn to finalize from SCM#10236
Conversation
errose28
left a comment
There was a problem hiding this comment.
Thanks for adding this. Overall looks good, just left some minor comments inline.
errose28
left a comment
There was a problem hiding this comment.
Thanks for the updates. My understanding is that this is being done in two PRs, where this one adds the OM server side handling of finalization and a follow-up PR will add the client command that starts the background service and writes the finalizing key. Can you update the PR description?
| appender.console.name = STDOUT | ||
| appender.console.layout.type = PatternLayout | ||
| appender.console.layout.pattern = %d{DEFAULT} | %-5level | %c{1} | %msg | %throwable{3} %n | ||
| appender.console.layout.pattern = %-5level | %c{1} | %msg%n |
There was a problem hiding this comment.
Why was the pattern for both audit loggers changed? Can we use the original pattern for both?
There was a problem hiding this comment.
This is removing the exception and timestamp from being logged to the console. IMO we should restore this to what it was before since it is easer to trace log messages in a test env from the console than opening dedicated files. additivity was also set to false for the audit logs which means they won't go to the console anymore. I think we should remove that so all log messages are searchable in one place.
Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/DatanodeStorage.java
errose28
left a comment
There was a problem hiding this comment.
Just two comments left. Looks like there's some new test failures to address as well.
| appender.console.name = STDOUT | ||
| appender.console.layout.type = PatternLayout | ||
| appender.console.layout.pattern = %d{DEFAULT} | %-5level | %c{1} | %msg | %throwable{3} %n | ||
| appender.console.layout.pattern = %-5level | %c{1} | %msg%n |
There was a problem hiding this comment.
This is removing the exception and timestamp from being logged to the console. IMO we should restore this to what it was before since it is easer to trace log messages in a test env from the console than opening dedicated files. additivity was also set to false for the audit logs which means they won't go to the console anymore. I think we should remove that so all log messages are searchable in one place.
There was a problem hiding this comment.
Since request processing is mocked in this test, we should have a unit test for OMFinalizeUpgradeRequest/Response classes as well, similar to other OM requests. Looks like we didn't have those previously but they would be good to add.
What changes were proposed in this pull request?
When OM is started, it checks if finalization is needed. If so, it spawns a background service which polls SCM to see if it is time to finalize or not. If SCM indicates it should finalize, then it will finalize the OMs via Ratis and then shutdown the background service to avoid any further polling.
The poll interval defaults to 60 seconds, but its configurable via ozone.om.upgrade.finalization.check.interval.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15208
How was this patch tested?
Existing integration tests were modified to remove the calls to the old OM finalize command, and instead trigger finalize via SCM. New unit tests have been added for the new polling service.