HDDS-14949 — Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack#10030
Draft
yandrey321 wants to merge 13 commits intoapache:masterfrom
Draft
HDDS-14949 — Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack#10030yandrey321 wants to merge 13 commits intoapache:masterfrom
yandrey321 wants to merge 13 commits intoapache:masterfrom
Conversation
adoroszlai
reviewed
Apr 2, 2026
Contributor
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @yandrey321 for working on this.
- The patch is way too large. In addition to migration to Ratis-shaded Netty/etc., it includes other build changes (introduction of new modules, Mac-specific workarounds, etc.), which we may do separately.
- Build is failing in your fork: https://github.com/yandrey321/ozone/actions/runs/23906863482
Contributor
There was a problem hiding this comment.
I don't think .class file should be committed.
Author
There was a problem hiding this comment.
my bad, removed it from version control
… grpc-client modules The pre-verify-refresh-classes-from-jar antrun execution (bound to pre-integration-test) did an unconditional rm -rf target/classes/ followed by unzip from the module's local JAR. On CI with mvn clean verify (not install), the JAR is present in target/ but on pom-packaging modules or in any edge-case where the JAR is absent the directory was left empty, causing downstream reactor modules to fail with "class file for ContainerProtos$ContainerCommandRequestProto not found" when they tried to compile against the now-empty target/classes/. Two-part fix: 1. Root pom.xml: replace the unconditional rm+unzip pair with a single /bin/sh one-liner that only runs if the JAR file actually exists, so target/classes/ is left intact when no JAR was produced. 2. hdds-datanode-grpc-client and hdds-scm-grpc-client: override pre-verify-refresh-classes-from-jar with phase=none, because both modules set mdep.analyze.skip=true (no dependency analysis runs), so wiping target/classes/ at pre-integration-test serves no purpose and only risks leaving the directory empty for downstream compilation. Made-with: Cursor
… of .m2/
The pre-compile-delete-classes and pre-test-compile-refresh-classes antrun
executions were restoring hdds-datanode-grpc-client/target/classes/ by
unzipping from the .m2/ repository JAR. On a CI clean-build (mvn clean
verify) no Ozone artifacts are pre-installed to .m2/, so those unzip
operations silently failed, leaving no safety net if anything clears the
directory after compilation.
Switch the source paths from ${settings.localRepository}/org/apache/ozone/...
to ${maven.multiModuleProjectDirectory}/hadoop-hdds/.../target/X.jar.
These local JARs are created during each foundation module's own package
phase (which runs before any downstream module's process-resources), so
they are always present on CI. The unzip still silently no-ops when the
JAR is not yet built (e.g. the foundation module itself hasn't been packaged
yet in the current reactor pass). This makes the class-file restoration
reliable on both CI and local machines without requiring a prior mvn install.
Made-with: Cursor
…repo root ContainerProtos.class was extracted into the project root (org/apache/...) by an unzip operation running in the wrong directory during a debugging session and was accidentally included in commit c075c90. Generated bytecode has no place in version control. Also add *.class to .gitignore so compiled class files at the project root can never be staged again. Made-with: Cursor
Three dependencies in hdds-server-scm were incorrectly changed to test scope during a prior dependency:analyze-only cleanup: - com.fasterxml.jackson.core:jackson-databind - org.apache.commons:commons-compress - org.apache.ozone:hdds-client Although none of them are directly imported in SCM main sources, they are loaded indirectly at runtime (jackson-databind is used via reflection in StorageContainerManager; commons-compress and hdds-client are loaded transitively). Declaring them at test scope caused the generated hdds-server-scm.classpath to omit them, resulting in a NoClassDefFoundError for jackson-databind when the SCM process started, crashing all Docker-based acceptance tests. Restore the three dependencies to compile (default) scope so build-classpath includes them in the runtime classpath descriptor. Add matching ignoredUnusedDeclaredDependency entries to the root POM so dependency:analyze-only no longer flags them as unused. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack
Summary
Migrates Ozone's gRPC transport layer from the standalone
io.grpc/io.netty/com.google.protobuflibraries to the Ratis-shaded equivalents (
org.apache.ratis.thirdparty.*), eliminating duplicate copiesof these libraries on the classpath and resolving version conflicts with Ratis at runtime.
Zero-copy marshalling is preserved: all generated stubs use the shaded
MessageLite-basedProtoUtils.marshaller()fromratis-thirdparty.New modules
hadoop-hdds/datanode-grpc-clienthdds-datanode-grpc-clientDatanodeClientProtocol.proto; generates shaded gRPC stubs for Datanode/Container RPChadoop-hdds/scm-grpc-clienthdds-scm-grpc-clientInterSCMProtocol.proto,SCMRatisProtocol.proto,SCMUpdateProtocol.proto; generates shaded gRPC stubs for inter-SCM and SCM-Ratis RPCBoth modules use
protobuf-maven-pluginto generate Java from the proto files and thenmaven-antrun-pluginto rewrite the generated sources in-place before compilation:com.google.protobuf → org.apache.ratis.thirdparty.com.google.protobuf com.google.common → org.apache.ratis.thirdparty.com.google.common io.grpc → org.apache.ratis.thirdparty.io.grpc
Generating directly into the shaded source root (
target/generated-sources/proto-java-ratis/)ensures both Maven and the IDE compile from the same tree, preventing stale-class interference.
Proto file migrations
DatanodeClientProtocol.protohdds-interface-clienthdds-datanode-grpc-clientInterSCMProtocol.protohdds-interface-serverhdds-scm-grpc-clientSCMRatisProtocol.protohdds-interface-serverhdds-scm-grpc-clientSCMUpdateProtocol.protohdds-interface-serverhdds-scm-grpc-clientproto.lockfiles updated in all four affected modules.Source changes —
io.grpc→ shaded importshadoop-ozone/common—GrpcOmTransport,GrpcOMFailoverProxyProvider,ClientAddressClientInterceptor,ClientAddressServerInterceptor,GrpcClientConstantsio.grpc.*replaced withorg.apache.ratis.thirdparty.io.grpc.*hadoop-ozone/ozone-manager—GrpcOzoneManagerServer,OzoneManagerServiceGrpchadoop-hdds/framework—GrpcMetricsServerRequestInterceptor,GrpcMetricsServerResponseInterceptor,GrpcMetricsServerTransportFilterhadoop-ozone/csi—CsiServer,ControllerService,IdentityService,NodeServicehadoop-hdds/common—HddsUtilsByteStringusage aligned with shaded protobufBuild changes
Root
pom.xmlhdds-datanode-grpc-clientandhdds-scm-grpc-clientadded to<dependencyManagement>anddependency-analysis ignore lists.
maven-antrun-pluginexecutions added globally (inherited by every module):pre-clean-force-delete-targetpre-clean/bin/rm -rfto deletetarget/beforemaven-clean-pluginruns, working around macOScom.apple.provenanceextended attributes that preventjava.nio.file.Files.delete()from removing IDE-written class filespre-compile-delete-classesprocess-resources.classfiles from the current module, then restoreshdds-common,hdds-datanode-grpc-client, andhdds-scm-grpc-clientfrom their installed.m2/JARs to prevent Java Language Server corruption of dependency class files during reactor buildspre-test-compile-refresh-classesprocess-test-resourcestest-compile, closing the LSP interference window betweencompileandtest-compilewithin a single module lifecyclepre-verify-refresh-classes-from-jarpre-integration-testtarget/classes/from the module's own freshly-built JAR beforedependency:analyze-onlyruns atverifypre-verify-delete-test-classesexecution phase corrected fromprepare-packagetopre-integration-test, fixing a bug where test-jars were being packaged empty (compiledtest classes were deleted before
maven-jar-plugin:test-jarcould include them).What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14949
How was this patch tested?
Build, unit and integration tests