Skip to content

HDDS-14949 — Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack#10030

Draft
yandrey321 wants to merge 13 commits intoapache:masterfrom
yandrey321:HDDS-14949
Draft

HDDS-14949 — Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack#10030
yandrey321 wants to merge 13 commits intoapache:masterfrom
yandrey321:HDDS-14949

Conversation

@yandrey321
Copy link
Copy Markdown

@yandrey321 yandrey321 commented Apr 2, 2026

What changes were proposed in this pull request?

Migrate gRPC transport to the Ratis-shaded gRPC/Netty/Protobuf stack

Summary

Migrates Ozone's gRPC transport layer from the standalone io.grpc / io.netty / com.google.protobuf
libraries to the Ratis-shaded equivalents (org.apache.ratis.thirdparty.*), eliminating duplicate copies
of these libraries on the classpath and resolving version conflicts with Ratis at runtime.
Zero-copy marshalling is preserved: all generated stubs use the shaded MessageLite-based
ProtoUtils.marshaller() from ratis-thirdparty.

New modules

Module Artifact Purpose
hadoop-hdds/datanode-grpc-client hdds-datanode-grpc-client Owns DatanodeClientProtocol.proto; generates shaded gRPC stubs for Datanode/Container RPC
hadoop-hdds/scm-grpc-client hdds-scm-grpc-client Owns InterSCMProtocol.proto, SCMRatisProtocol.proto, SCMUpdateProtocol.proto; generates shaded gRPC stubs for inter-SCM and SCM-Ratis RPC

Both modules use protobuf-maven-plugin to generate Java from the proto files and then
maven-antrun-plugin to rewrite the generated sources in-place before compilation:
com.google.protobuf → org.apache.ratis.thirdparty.com.google.protobuf com.google.common → org.apache.ratis.thirdparty.com.google.common io.grpc → org.apache.ratis.thirdparty.io.grpc

Generating directly into the shaded source root (target/generated-sources/proto-java-ratis/)
ensures both Maven and the IDE compile from the same tree, preventing stale-class interference.


Proto file migrations

File Moved from Moved to
DatanodeClientProtocol.proto hdds-interface-client hdds-datanode-grpc-client
InterSCMProtocol.proto hdds-interface-server hdds-scm-grpc-client
SCMRatisProtocol.proto hdds-interface-server hdds-scm-grpc-client
SCMUpdateProtocol.proto hdds-interface-server hdds-scm-grpc-client
proto.lock files updated in all four affected modules.

Source changes — io.grpc → shaded imports

File Change
hadoop-ozone/commonGrpcOmTransport, GrpcOMFailoverProxyProvider, ClientAddressClientInterceptor, ClientAddressServerInterceptor, GrpcClientConstants All io.grpc.* replaced with org.apache.ratis.thirdparty.io.grpc.*
hadoop-ozone/ozone-managerGrpcOzoneManagerServer, OzoneManagerServiceGrpc Same migration
hadoop-hdds/frameworkGrpcMetricsServerRequestInterceptor, GrpcMetricsServerResponseInterceptor, GrpcMetricsServerTransportFilter Same migration
hadoop-ozone/csiCsiServer, ControllerService, IdentityService, NodeService Same migration
hadoop-hdds/commonHddsUtils ByteString usage aligned with shaded protobuf
All corresponding test files updated to match.

Build changes

Root pom.xml

  • hdds-datanode-grpc-client and hdds-scm-grpc-client added to <dependencyManagement> and
    dependency-analysis ignore lists.
  • Four new maven-antrun-plugin executions added globally (inherited by every module):
    Execution Phase Purpose
    pre-clean-force-delete-target pre-clean Uses /bin/rm -rf to delete target/ before maven-clean-plugin runs, working around macOS com.apple.provenance extended attributes that prevent java.nio.file.Files.delete() from removing IDE-written class files
    pre-compile-delete-classes process-resources Deletes stale .class files from the current module, then restores hdds-common, hdds-datanode-grpc-client, and hdds-scm-grpc-client from their installed .m2/ JARs to prevent Java Language Server corruption of dependency class files during reactor builds
    pre-test-compile-refresh-classes process-test-resources Same foundation-module refresh before test-compile, closing the LSP interference window between compile and test-compile within a single module lifecycle
    pre-verify-refresh-classes-from-jar pre-integration-test Repopulates target/classes/ from the module's own freshly-built JAR before dependency:analyze-only runs at verify
  • pre-verify-delete-test-classes execution phase corrected from prepare-package to
    pre-integration-test, fixing a bug where test-jars were being packaged empty (compiled
    test classes were deleted before maven-jar-plugin:test-jar could include them).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14949

How was this patch tested?

Build, unit and integration tests

@adoroszlai adoroszlai marked this pull request as draft April 2, 2026 15:17
Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yandrey321 for working on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think .class file should be committed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, removed it from version control

yandrey321 and others added 12 commits April 2, 2026 12:18
… grpc-client modules

The pre-verify-refresh-classes-from-jar antrun execution (bound to
pre-integration-test) did an unconditional rm -rf target/classes/ followed
by unzip from the module's local JAR.  On CI with mvn clean verify (not
install), the JAR is present in target/ but on pom-packaging modules or in
any edge-case where the JAR is absent the directory was left empty, causing
downstream reactor modules to fail with
  "class file for ContainerProtos$ContainerCommandRequestProto not found"
when they tried to compile against the now-empty target/classes/.

Two-part fix:
1. Root pom.xml: replace the unconditional rm+unzip pair with a single
   /bin/sh one-liner that only runs if the JAR file actually exists, so
   target/classes/ is left intact when no JAR was produced.
2. hdds-datanode-grpc-client and hdds-scm-grpc-client: override
   pre-verify-refresh-classes-from-jar with phase=none, because both
   modules set mdep.analyze.skip=true (no dependency analysis runs), so
   wiping target/classes/ at pre-integration-test serves no purpose and
   only risks leaving the directory empty for downstream compilation.

Made-with: Cursor
… of .m2/

The pre-compile-delete-classes and pre-test-compile-refresh-classes antrun
executions were restoring hdds-datanode-grpc-client/target/classes/ by
unzipping from the .m2/ repository JAR.  On a CI clean-build (mvn clean
verify) no Ozone artifacts are pre-installed to .m2/, so those unzip
operations silently failed, leaving no safety net if anything clears the
directory after compilation.

Switch the source paths from ${settings.localRepository}/org/apache/ozone/...
to ${maven.multiModuleProjectDirectory}/hadoop-hdds/.../target/X.jar.
These local JARs are created during each foundation module's own package
phase (which runs before any downstream module's process-resources), so
they are always present on CI.  The unzip still silently no-ops when the
JAR is not yet built (e.g. the foundation module itself hasn't been packaged
yet in the current reactor pass).  This makes the class-file restoration
reliable on both CI and local machines without requiring a prior mvn install.

Made-with: Cursor
…repo root

ContainerProtos.class was extracted into the project root (org/apache/...)
by an unzip operation running in the wrong directory during a debugging
session and was accidentally included in commit c075c90.  Generated
bytecode has no place in version control.

Also add *.class to .gitignore so compiled class files at the project root
can never be staged again.

Made-with: Cursor
Three dependencies in hdds-server-scm were incorrectly changed to
test scope during a prior dependency:analyze-only cleanup:
- com.fasterxml.jackson.core:jackson-databind
- org.apache.commons:commons-compress
- org.apache.ozone:hdds-client

Although none of them are directly imported in SCM main sources,
they are loaded indirectly at runtime (jackson-databind is used via
reflection in StorageContainerManager; commons-compress and hdds-client
are loaded transitively).  Declaring them at test scope caused the
generated hdds-server-scm.classpath to omit them, resulting in a
NoClassDefFoundError for jackson-databind when the SCM process started,
crashing all Docker-based acceptance tests.

Restore the three dependencies to compile (default) scope so
build-classpath includes them in the runtime classpath descriptor.
Add matching ignoredUnusedDeclaredDependency entries to the root POM
so dependency:analyze-only no longer flags them as unused.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants