xds: Trust Manager fix for when SAN validation against SNI sent doesn't apply#12775
Conversation
grpc#12742)" This reverts commit ef35313 because it breaks Google internal build. Error Prone adds a lot of restrictions, which we may or may not want. We'd need to look at each case and decide what to do.
grpc#12422). It added an argument autoSniSanValidationDoesNotApply to SslContextProviderSupplier.updateSslContext that sets it on the DynamicSslContextProvider but it was not calling DynamicSslContextProvider.updateSslcontext, so the trustmanager didn't see the changed value for autoSniSanValidationDoesNotApply. This issue was identified when fixing an incorrect merge caused error in CertProviderClientSslContextProvider.java that recreated the trust manager without consideration to autoSniSanValidationDoesNotApply. It ought to have caused failure in the test XdsSecurityClientServerTest.tlsClientServer_autoSniValidation_noSniApplicable_usesMatcherFromCmnVdnCtx but it wasn't, because even though autoSniSanValidationDoesNotApply was false due to not getting the propagated true value, SAN matcher fallback was still happening because there was no server SNI sent. With the existing plumbing, calling DynamicSslContextProvider.updateSslcontext when autoSniSanValidationDoesNotApply changes will also require not calling clearKeysAndCerts in CertProviderSslContextProvider.updateSslContextWhenReady, which would cause redundant updates. This made us revisit the [reasoning](grpc#12422 (comment)) for plumbing autoSniSanValidationDoesNotApply separately instead of making it a part of the SslContextProvider map TlsContextManagerImpl.mapForClient. This change makesthe SNI related fields in fact part of the cache key so it holds the SslContextProvider that is already built with the SNI related information set into the TrustManager. Summary of Changes: 1. Enhanced UpstreamTlsContext (EnvoyServerProtoData.java): * Modified Caching Behavior: Implemented full equals() and hashCode() overrides for UpstreamTlsContext. Previously, it relied on the base class which only compared the commonTlsContext, causing different SNI or auto-validation settings to incorrectly share the same cache entry. * Normalization: Updated constructors to normalize the sni field to an empty string ("") if null. This prevents equality mismatches between context objects created from different sources (e.g., test helpers vs. Envoy protos). 2. Centralized Validation Logic in TlsContextManagerImpl * API Update: Modified findOrCreateClientSslContextProvider to accept the autoSniSanValidationDoesNotApply flag. * Effective Key Generation: If the flag is true, the manager now creates a specialized UpstreamTlsContext for the cache lookup where autoSniSanValidation is forced to false. This ensures the cache key reflects the effective validation behavior, allowing subchannels with different hostname/IP types to correctly share or separate their SslContextProvider instances. 3. Propagated Flag through SslContextProviderSupplier: * Updated the supplier to pass the validation flag from the ClientSecurityHandler down to the manager and provider. * Simplified the lifecycle by removing the need for a separate setter on the provider. 4. Refined DynamicSslContextProvider and CertProviderClientSslContextProvider: * Removed the redundant setAutoSniSanValidationDoesNotApply method and state. * The provider now retrieves the final validation state directly from its immutable UpstreamTlsContext during context computation. 5. Strengthened Integration Tests (XdsSecurityClientServerTest.java) * Modified the test tlsClientServer_autoSniValidation_noSniApplicable_usesMatcherFromCmnVdnCtx to enable CertificateUtils.useChannelAuthorityIfNoSniApplicable. * This forces the client to use the channel's authority as a non-empty SNI that mismatches the server certificate. This ensures the test rigorously asserts that SNI validation is truly disabled when it "does not apply" (e.g., when no specific SNI was requested by the user). 6. Updated Test Suite & API Alignment * SecurityProtocolNegotiatorsTest: Updated test cases to pass true for the validation flag. This was required because the new strict cache key equality exposed that the test was previously using a state that didn't match the ClientSecurityHandler it was testing. * API Compatibility: Updated SslContextProviderSupplierTest, TlsContextManagerTest, and ClusterImplLoadBalancerTest to support the new TlsContextManager method signature.
FWIW, I don't think we need to support the value changing. The only time it could change is during tests. How were you thinking it would change? |
ejona86
left a comment
There was a problem hiding this comment.
I'm not wild about this, but the existing code is definitely broken and this looks not broken.
| } | ||
|
|
||
| @Override | ||
| public boolean equals(Object o) { |
There was a problem hiding this comment.
Looks like this is good to add no matter what else happens.
| /** Creates a SslContextProvider. Used for retrieving a client-side SslContext. */ | ||
| SslContextProvider findOrCreateClientSslContextProvider( | ||
| UpstreamTlsContext upstreamTlsContext); | ||
| UpstreamTlsContext upstreamTlsContext, boolean autoSniSanValidationDoesNotApply); |
There was a problem hiding this comment.
I'm not wild about exposing this flag in the API. Part of the point of the flag was for it to be easy to maintain and obvious.
There was a problem hiding this comment.
I missed this comment by oversight, I mistakenly thought this was the other "I'm not wild about" comment I had read that didn't require action. Will look into this again.
There was a problem hiding this comment.
It isn't clear to me why we need plumbing at all for the value. Why not check the flag directly in CertProviderClientSslContextProvider?
There was a problem hiding this comment.
autoSniSanValidationDoesNotApply is not just the flag value, it is a computed field in ClientSecurityHandler constructor and floated to here. I have now moved the changes in TlsContextManager. findOrCreateClientSslContextProvider to recreate UpstreamTlsContext when autoSniSanValidationDoesNotApply is true to its caller in SslContextProviderSupplier. This removed the need for modifying the signature of TlsContextManager. findOrCreateClientSslContextProvider.
That was a misunderstanding on my part. Two different instances of ClientCertificateSslcontextProvider were getting created for the same UpstreamTlsContext because of the missing equals (which was the real culprit, as you pointed out). autoSniSanValidationDoesNotApply set earlier to true from the ProtocolNegotiatorPath was lost in the ClientCertificateSslcontextProvider created via the file system update for roots and certs. With the new changes, in addition to fixing the equals method, by moving the decision about Corrected the description. |
|
@kannanjgithub, did you forget to push the changes? |
…rovider` to recreate `UpstreamTlsContext` when `autoSniSanValidationDoesNotApply` is true to its caller in `SslContextProviderSupplier`. This removed the need for modifying the signature of `TlsContextManager. findOrCreateClientSslContextProvider`.
…rovider` to recreate `UpstreamTlsContext` when `autoSniSanValidationDoesNotApply` is true to its caller in `SslContextProviderSupplier`. This removed the need for modifying the signature of `TlsContextManager. findOrCreateClientSslContextProvider`.
…to trustmanager-wrong-merge-fix
Fixes a bug in propagation of
autoSniSanValidationDoesNotApply(from PR #12422). It added an argumentautoSniSanValidationDoesNotApplytoSslContextProviderSupplier.updateSslContextthat sets it on theDynamicSslContextProviderbut becauseUpstreamTlsContextequals wasn't implemented, it was getting replaced by a new instance and the flag getting lost. This issue was identified when fixing an incorrect merge caused error inCertProviderClientSslContextProviderthat recreated the trust manager without consideration toautoSniSanValidationDoesNotApply. It ought to have caused failure in the testXdsSecurityClientServerTest.tlsClientServer_autoSniValidation_noSniApplicable_usesMatcherFromCmnVdnCtxbut it wasn't, because even thoughautoSniSanValidationDoesNotApplywas false due to not getting the propagated true value, SAN matcher fallback was still happening because there was no server SNI sent.With the new changes, in addition to fixing the equals method, by moving the decision about autoSniSanValidationDoesNotApply to TlsContextManagerImpl.findOrCreateClientSslContextProvider I have eliminated the need to have a deferred setting of this decision via DynamicSslContextProvider.setAutoSniSanValidationDoesNotApply called from SslContextProviderSupplier.updateSslContext.
Summary of Changes:
Enhanced
UpstreamTlsContext(EnvoyServerProtoData.java):UpstreamTlsContext. Previously, it relied on the base class which only compared the commonTlsContext, causing different SNI or auto-validation settings to incorrectly share the same cache entry.Centralized Validation Logic in
TlsContextManagerImplfindOrCreateClientSslContextProviderto accept theautoSniSanValidationDoesNotApplyflag.autoSniSanValidationis forced to false.This ensures the cache key reflects the effective validation behavior, allowing subchannels with different hostname/IP types to correctly share or separate their
SslContextProviderinstances.Propagated Flag through
SslContextProviderSupplier:ClientSecurityHandlerdown to the manager and provider.Refined
DynamicSslContextProviderandCertProviderClientSslContextProvider:setAutoSniSanValidationDoesNotApplymethod and state.UpstreamTlsContextduring context computation.Strengthened Integration Tests (XdsSecurityClientServerTest.java)
tlsClientServer_autoSniValidation_noSniApplicable_usesMatcherFromCmnVdnCtxto enableCertificateUtils.useChannelAuthorityIfNoSniApplicable.Updated Test Suite & API Alignment