Skip to content

xds: Trust Manager fix for when SAN validation against SNI sent doesn't apply#12775

Merged
kannanjgithub merged 7 commits intogrpc:masterfrom
kannanjgithub:trustmanager-wrong-merge-fix
Apr 29, 2026
Merged

xds: Trust Manager fix for when SAN validation against SNI sent doesn't apply#12775
kannanjgithub merged 7 commits intogrpc:masterfrom
kannanjgithub:trustmanager-wrong-merge-fix

Conversation

@kannanjgithub
Copy link
Copy Markdown
Contributor

@kannanjgithub kannanjgithub commented Apr 27, 2026

Fixes a bug in propagation of autoSniSanValidationDoesNotApply (from PR #12422). It added an argument autoSniSanValidationDoesNotApply to SslContextProviderSupplier.updateSslContext that sets it on the DynamicSslContextProvider but because UpstreamTlsContext equals wasn't implemented, it was getting replaced by a new instance and the flag getting lost. This issue was identified when fixing an incorrect merge caused error in CertProviderClientSslContextProvider that recreated the trust manager without consideration to autoSniSanValidationDoesNotApply. It ought to have caused failure in the test XdsSecurityClientServerTest.tlsClientServer_autoSniValidation_noSniApplicable_usesMatcherFromCmnVdnCtx but it wasn't, because even though autoSniSanValidationDoesNotApply was false due to not getting the propagated true value, SAN matcher fallback was still happening because there was no server SNI sent.

With the new changes, in addition to fixing the equals method, by moving the decision about autoSniSanValidationDoesNotApply to TlsContextManagerImpl.findOrCreateClientSslContextProvider I have eliminated the need to have a deferred setting of this decision via DynamicSslContextProvider.setAutoSniSanValidationDoesNotApply called from SslContextProviderSupplier.updateSslContext.

Summary of Changes:

  1. Enhanced UpstreamTlsContext (EnvoyServerProtoData.java):

    • Modified Caching Behavior: Implemented full equals() and hashCode() overrides for UpstreamTlsContext. Previously, it relied on the base class which only compared the commonTlsContext, causing different SNI or auto-validation settings to incorrectly share the same cache entry.
    • Normalization: Updated constructors to normalize the sni field to an empty string ("") if null. This prevents equality mismatches between context objects created from different sources (e.g., test helpers vs. Envoy protos).
  2. Centralized Validation Logic in TlsContextManagerImpl

    • API Update: Modified findOrCreateClientSslContextProvider to accept the autoSniSanValidationDoesNotApply flag.
    • Effective Key Generation: If the flag is true, the manager now creates a specialized UpstreamTlsContext for the cache lookup where autoSniSanValidation is forced to false.
      This ensures the cache key reflects the effective validation behavior, allowing subchannels with different hostname/IP types to correctly share or separate their SslContextProvider instances.
  3. Propagated Flag through SslContextProviderSupplier:

    • Updated the supplier to pass the validation flag from the ClientSecurityHandler down to the manager and provider.
    • Simplified the lifecycle by removing the need for a separate setter on the provider.
  4. Refined DynamicSslContextProvider and CertProviderClientSslContextProvider:

    • Removed the redundant setAutoSniSanValidationDoesNotApply method and state.
    • The provider now retrieves the final validation state directly from its immutable UpstreamTlsContext during context computation.
  5. Strengthened Integration Tests (XdsSecurityClientServerTest.java)

    • Modified the test tlsClientServer_autoSniValidation_noSniApplicable_usesMatcherFromCmnVdnCtx to enable CertificateUtils.useChannelAuthorityIfNoSniApplicable.
    • This forces the client to use the channel's authority as a non-empty SNI that mismatches the server certificate. This ensures the test rigorously asserts that SNI validation is truly disabled when it "does not apply" (e.g., when no specific SNI was requested by the user).
  6. Updated Test Suite & API Alignment

    • SecurityProtocolNegotiatorsTest: Updated test cases to pass true for the validation flag. This was required because the new strict cache key equality exposed that the test was previously using a state that didn't match the ClientSecurityHandler it was testing.

grpc#12742)"

This reverts commit ef35313 because it breaks Google internal build. Error Prone adds a lot of restrictions, which we may or may not want. We'd need to look at each case and decide what to do.
grpc#12422). It added an argument autoSniSanValidationDoesNotApply to SslContextProviderSupplier.updateSslContext that sets it on the DynamicSslContextProvider but it was not calling DynamicSslContextProvider.updateSslcontext, so the trustmanager didn't see the changed value for autoSniSanValidationDoesNotApply. This issue was identified when fixing an incorrect merge caused error in CertProviderClientSslContextProvider.java that recreated the trust manager without consideration to autoSniSanValidationDoesNotApply. It ought to have caused failure in the test XdsSecurityClientServerTest.tlsClientServer_autoSniValidation_noSniApplicable_usesMatcherFromCmnVdnCtx but it wasn't, because even though autoSniSanValidationDoesNotApply was false due to not getting the propagated true value, SAN matcher fallback was still happening because there was no server SNI sent.

With the existing plumbing, calling DynamicSslContextProvider.updateSslcontext when autoSniSanValidationDoesNotApply changes will also require not calling clearKeysAndCerts in CertProviderSslContextProvider.updateSslContextWhenReady, which would cause redundant updates. This made us revisit the [reasoning](grpc#12422 (comment)) for plumbing autoSniSanValidationDoesNotApply separately instead of making it a part of the SslContextProvider map TlsContextManagerImpl.mapForClient.
This change makesthe SNI related fields in fact part of the cache key so it holds the SslContextProvider that is already built with the SNI related information set into the TrustManager.

Summary of Changes:

   1. Enhanced UpstreamTlsContext (EnvoyServerProtoData.java):
       * Modified Caching Behavior: Implemented full equals() and hashCode() overrides for UpstreamTlsContext. Previously, it relied on the base class which only compared the commonTlsContext, causing different SNI or auto-validation settings to incorrectly share the same cache entry.
       * Normalization: Updated constructors to normalize the sni field to an empty string ("") if null. This prevents equality mismatches between context objects created from different sources (e.g., test helpers vs. Envoy protos).

   2. Centralized Validation Logic in TlsContextManagerImpl
       * API Update: Modified findOrCreateClientSslContextProvider to accept the autoSniSanValidationDoesNotApply flag.
       * Effective Key Generation: If the flag is true, the manager now creates a specialized UpstreamTlsContext for the cache lookup where autoSniSanValidation is forced to false.
       This ensures the cache key reflects the effective validation behavior, allowing subchannels with different hostname/IP types to correctly share or separate their SslContextProvider instances.

   3. Propagated Flag through SslContextProviderSupplier:
       * Updated the supplier to pass the validation flag from the ClientSecurityHandler down to the manager and provider.
       * Simplified the lifecycle by removing the need for a separate setter on the provider.

   4. Refined DynamicSslContextProvider and CertProviderClientSslContextProvider:
       * Removed the redundant setAutoSniSanValidationDoesNotApply method and state.
       * The provider now retrieves the final validation state directly from its immutable UpstreamTlsContext during context computation.

   5. Strengthened Integration Tests (XdsSecurityClientServerTest.java)
       * Modified the test tlsClientServer_autoSniValidation_noSniApplicable_usesMatcherFromCmnVdnCtx to enable CertificateUtils.useChannelAuthorityIfNoSniApplicable.
       * This forces the client to use the channel's authority as a non-empty SNI that mismatches the server certificate. This ensures the test rigorously asserts that SNI validation is truly disabled when it "does not apply" (e.g., when no specific SNI was requested by the user).

   6. Updated Test Suite & API Alignment
       * SecurityProtocolNegotiatorsTest: Updated test cases to pass true for the validation flag. This was required because the new strict cache key equality exposed that the test was previously using a state that didn't match the ClientSecurityHandler it was testing.
       * API Compatibility: Updated SslContextProviderSupplierTest, TlsContextManagerTest, and ClusterImplLoadBalancerTest to support the new TlsContextManager method signature.
@kannanjgithub kannanjgithub requested a review from ejona86 April 27, 2026 11:27
@ejona86
Copy link
Copy Markdown
Member

ejona86 commented Apr 27, 2026

calling DynamicSslContextProvider.updateSslcontext when autoSniSanValidationDoesNotApply changes

FWIW, I don't think we need to support the value changing. The only time it could change is during tests. How were you thinking it would change?

Copy link
Copy Markdown
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not wild about this, but the existing code is definitely broken and this looks not broken.

}

@Override
public boolean equals(Object o) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is good to add no matter what else happens.

/** Creates a SslContextProvider. Used for retrieving a client-side SslContext. */
SslContextProvider findOrCreateClientSslContextProvider(
UpstreamTlsContext upstreamTlsContext);
UpstreamTlsContext upstreamTlsContext, boolean autoSniSanValidationDoesNotApply);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not wild about exposing this flag in the API. Part of the point of the flag was for it to be easy to maintain and obvious.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this comment by oversight, I mistakenly thought this was the other "I'm not wild about" comment I had read that didn't require action. Will look into this again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear to me why we need plumbing at all for the value. Why not check the flag directly in CertProviderClientSslContextProvider?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoSniSanValidationDoesNotApply is not just the flag value, it is a computed field in ClientSecurityHandler constructor and floated to here. I have now moved the changes in TlsContextManager. findOrCreateClientSslContextProvider to recreate UpstreamTlsContext when autoSniSanValidationDoesNotApply is true to its caller in SslContextProviderSupplier. This removed the need for modifying the signature of TlsContextManager. findOrCreateClientSslContextProvider.

@kannanjgithub
Copy link
Copy Markdown
Contributor Author

kannanjgithub commented Apr 28, 2026

calling DynamicSslContextProvider.updateSslcontext when autoSniSanValidationDoesNotApply changes

FWIW, I don't think we need to support the value changing. The only time it could change is during tests. How were you thinking it would change?

That was a misunderstanding on my part. Two different instances of ClientCertificateSslcontextProvider were getting created for the same UpstreamTlsContext because of the missing equals (which was the real culprit, as you pointed out). autoSniSanValidationDoesNotApply set earlier to true from the ProtocolNegotiatorPath was lost in the ClientCertificateSslcontextProvider created via the file system update for roots and certs.

With the new changes, in addition to fixing the equals method, by moving the decision about autoSniSanValidationDoesNotApply to TlsContextManagerImpl.findOrCreateClientSslContextProvider I have eliminated the need to have a deferred setting of this decision via DynamicSslContextProvider.setAutoSniSanValidationDoesNotApply called from SslContextProviderSupplier.updateSslContext.

Corrected the description.

@ejona86
Copy link
Copy Markdown
Member

ejona86 commented Apr 28, 2026

@kannanjgithub, did you forget to push the changes?

…rovider` to recreate `UpstreamTlsContext` when `autoSniSanValidationDoesNotApply` is true to its caller in `SslContextProviderSupplier`. This removed the need for modifying the signature of `TlsContextManager. findOrCreateClientSslContextProvider`.
…rovider` to recreate `UpstreamTlsContext` when `autoSniSanValidationDoesNotApply` is true to its caller in `SslContextProviderSupplier`. This removed the need for modifying the signature of `TlsContextManager. findOrCreateClientSslContextProvider`.
@kannanjgithub kannanjgithub merged commit bb153a8 into grpc:master Apr 29, 2026
16 of 17 checks passed
@kannanjgithub kannanjgithub deleted the trustmanager-wrong-merge-fix branch April 29, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants