OAK-12093 Add maven-enforcer dependency convergence check#2959
OAK-12093 Add maven-enforcer dependency convergence check#2959thomasmueller wants to merge 8 commits into
Conversation
Commit-Check ✔️ |
reschke
left a comment
There was a problem hiding this comment.
There are a few good things here that should get their own tickets.
Some other suggestions are based on a misunderstanding why we embed different versions.
Some "fix" things that will auto-resolve anyway once we upgrade dependencies.
Most of the remaining changes just add noise to the parent pom.
| <dependency> | ||
| <groupId>com.google.guava</groupId> | ||
| <artifactId>guava</artifactId> | ||
| <version>33.1.0-jre</version> |
There was a problem hiding this comment.
This depencency is embedded intentionally. It matches the version used by the Azure SDK. Updating it could be possible, but Azure support doesn't have adequate test coverage over here, so I would leave that to someone more knowledgeable of the SDK.
For that matter, this dependency will go away when the SDK 8 use is gone.
There was a problem hiding this comment.
It looks like embedded deps don't cause OSGi conflicts at runtime, yes, but dependencyConvergence runs at Maven dependency resolution time. Maven still needs to pick a single version for compilation and test.
It looks like we still need to pin the version, because the tests don't use OSGi. Oak-run also doesn't use OSGi, and some of the jobs we need (eg. for indexing) do not use OSGi, so I think it's important that we test this as well. If tests with OSGi (using a different version) are needed, then we could do that e.g. in oak-it-osgi.
And for OSGi, not every bundle embeds Guava; some import it via Import-Package.
Pinning is also a way to document things; otherwise Maven would chose silently (and somewhat arbitrarily) some version and we don't know which one.
There was a problem hiding this comment.
It seems there is a misunderstanding here.
Embedded dependencies which are not exported are not visible from the POV of OSGi, and thus will not conflict. That's one of the reasons why this is used here.
We need that Guava version right now. There's zero reason to pin the version. It's set to the version the Azure SDK wants (and the SDK will go away anyway).
| <dependency> | ||
| <groupId>com.googlecode.json-simple</groupId> | ||
| <artifactId>json-simple</artifactId> | ||
| <version>1.1</version> |
There was a problem hiding this comment.
Yes, but should be a separate ticket; I can do that.
There was a problem hiding this comment.
That said: that library has been unmaintained for 12 years now; so it would be better to look for a replacement.
There was a problem hiding this comment.
| <artifactId>caffeine</artifactId> | ||
| <version>3.1.8</version> | ||
| </dependency> | ||
| <!-- guava:33.5.0-jre pulls 2.41.0; caffeine:3.1.8 pulls 2.21.1 --> |
There was a problem hiding this comment.
-
Guava will go anyway.
-
It would be good to understand why that shows up as a dependency; given it's nature as a code checking tool. It might be better to suppress it in the place where we use Caffeine.
In any case, if it ever disappears as a Caffeine dependeny, this will leave noise over here.
There was a problem hiding this comment.
if it ever disappears as a Caffeine dependeny, this will leave noise over here.
We could document this.
There was a problem hiding this comment.
Where?
ChatGPT points out that these are annotations and that they are used in the Caffeine API; so they are only relevant at Oak compile time. It might be possbile to just exclude them, and that's what we should try first.
| <version>3.0.2</version> | ||
| </dependency> | ||
| <!-- azure-keyvault-core:1.2.6 pulls guava:30.1.1-jre, conflicting with the direct guava | ||
| deps in oak-blob-cloud-azure and oak-segment-azure. Additionally, in reactor builds |
There was a problem hiding this comment.
It doesn't conflict, because these are embeds. One reason for being embeds is to avoid conflicts.
There was a problem hiding this comment.
See my first comments about embeds
| deps in oak-blob-cloud-azure and oak-segment-azure. Additionally, in reactor builds | ||
| Maven uses oak-shaded-guava's original pom.xml rather than dependency-reduced-pom.xml, | ||
| so guava:33.5.0-jre also appears as a transitive dep (making guava optional in | ||
| oak-shaded-guava would eliminate that false positive, but not the azure-keyvault conflict). --> |
There was a problem hiding this comment.
we are removing Guava.
Yes, but it is not yet removed. The following might be interesting:
"making guava optional in oak-shaded-guava would eliminate that false positive"
There was a problem hiding this comment.
Was that an AI? In any case it's completely ignorant of what this module is.
| <version>5.13.0</version> | ||
| </dependency> | ||
| <!-- jackson-dataformat-xml:2.19.4 (via oak-segment-azure) pulls 7.1.1; tika-parsers:1.28.5 pulls 6.3.1 --> | ||
| <dependency> |
There was a problem hiding this comment.
But we didn't do it yet.
| <groupId>org.apache.lucene</groupId> | ||
| <artifactId>lucene-core</artifactId> | ||
| </exclusion> | ||
| <!-- jackrabbit-core:2.22.3 (via jackrabbit-parent) declares tika.version=2.4.1, |
There was a problem hiding this comment.
Actually, the problem was that there was embedding, and that was caused by a plugin version change (probably fixing a bug).
There was a problem hiding this comment.
Not sure what you mean.
There was a problem hiding this comment.
The issue was caused by the change of a maven bundle. Before the change it did not include Tika. That's what changed (and that's why the aforementioned issue did not come up ages ago).
| + 38 MB. Initial value. Current 35MB plus a 10% | ||
| --> | ||
| <max.jar.size>92274688</max.jar.size> | ||
| <max.jar.size>93323264</max.jar.size> |
There was a problem hiding this comment.
why did it increase? (just trying to understand)
There was a problem hiding this comment.
Because with the pinned version (newer versions) it was too large.
| <groupId>com.google.guava</groupId> | ||
| <artifactId>guava</artifactId> | ||
| <version>33.1.0-jre</version> | ||
| </dependency> |
There was a problem hiding this comment.
No. Embedded for good reasons.
There was a problem hiding this comment.
See the first comment about embeds.
Remember the idea of this PR is to avoid issues like the problem that occured with Tika, by adding enforce-dependency-convergence. It turns out, dependencies are very complicated, and I'm trying to make the smallest possible PR that can add dependency-convergence. I do not think anything in this PR is not needed to make that happen. |
| <dependency> | ||
| <groupId>org.apache.commons</groupId> | ||
| <artifactId>commons-pool2</artifactId> | ||
| <version>2.12.0</version> | ||
| </dependency> | ||
| <!-- oak-auth-ldap has a direct dep on 2.1.12; api-all:2.0.1 pulls 2.1.3 --> | ||
| <dependency> | ||
| <groupId>org.apache.mina</groupId> | ||
| <artifactId>mina-core</artifactId> | ||
| <version>2.1.12</version> | ||
| </dependency> |
There was a problem hiding this comment.
Both of them would go away once we update the org.apache.directory.api/api-all in oak-auth-ldap.
There was a problem hiding this comment.
FWIW, this is another case of intended embedding.
@rishabhdaim - can you open a ticket for the directory update?
There was a problem hiding this comment.
rishabhdaim
left a comment
There was a problem hiding this comment.
AFAIK, embedded dependencies shouldn't cause conflicts. Shouldn't we avoid them here.
| <!-- aws-java-sdk-s3:1.12.791 pulls 2.12.7; edu.ucar:udunits (via tika-parsers) pulls 2.2 --> | ||
| <dependency> | ||
| <groupId>joda-time</groupId> | ||
| <artifactId>joda-time</artifactId> | ||
| <version>2.12.7</version> | ||
| </dependency> |
There was a problem hiding this comment.
IIUC, since it is embedded in oak-segment-aws, it shouldn't cause any conflict.
There was a problem hiding this comment.
I'll first try to make the build work, and then let me try removing it, and then we will see what happens :-)
There was a problem hiding this comment.
See my very first comment about embeds.
| <dependency> | ||
| <groupId>com.amazonaws</groupId> | ||
| <artifactId>aws-java-sdk-core</artifactId> | ||
| <version>1.12.791</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>com.amazonaws</groupId> | ||
| <artifactId>aws-java-sdk-dynamodb</artifactId> | ||
| <version>1.12.791</version> | ||
| </dependency> |
There was a problem hiding this comment.
dynamodb-lock-client is also embedded in oak-segment-aws.
There was a problem hiding this comment.
See my very first comment about embeds.
| <artifactId>aws-java-sdk-dynamodb</artifactId> | ||
| <version>1.12.791</version> | ||
| </dependency> | ||
| <!-- awssdk:netty-nio-client:2.34.9 pulls 4.1.126.Final; azure-core-http-netty:1.14.1 pulls 4.1.101.Final --> |
There was a problem hiding this comment.
both of them are embedded, can't understand how they can cause conflicts.
There was a problem hiding this comment.
See my very first comment about embeds.
|
Observation: there is a lot back and forth about embedding. It doesn't seem to be constructive in this context. Embedding can be problematic, in particular as it makes components harder to update, and sometimes APIs of embedded code leaks into the exports. Thus they should be reviewed; optimally by those who added them. We there's a reason to think an embed is problematic, we should open a ticket. |
|
|
@thomasmueller one key observation, now we have version fixed in oak-parent/pom.xml, but they are still added with a version field. Would we be opening new tickets to remove versions from individual poms OR we need to do it here? For e.g : Now Netty's are included in oak-parent/pom.xml, but are still referenced in 4 separate poms with the version field. |
There was a problem hiding this comment.
Some of the findings here are useful and we should address them separately.
Overall this would be a very bad change that is hard to maintain and really does not fix any problem.
The ticket says "add dependency check"; what's completely missing is a discussion what the benefits and drackbacks would be. In particular we apparently have different views about (1) what embeds are for and (2) how they actually work. I have tried to explain it here, but obviously failed to. We need to have a common understandig first before it even makes sense to dicuss the details of the ticket.
EDIT: in particular: what's completely missing is an explanation why diverging versions in test dependencies are actually a technical problem, and why adding more information to the POMs is useful here.
Consider: test dependencies A and B with transitive dependency on C. If we "pin" C's version to something that is "good" for the current versions of A and B, who's going to review/update it when we update to newer versions of A or B?
| <dependency> | ||
| <groupId>com.googlecode.json-simple</groupId> | ||
| <artifactId>json-simple</artifactId> | ||
| <version>1.1</version> |
There was a problem hiding this comment.
That said: that library has been unmaintained for 12 years now; so it would be better to look for a replacement.
| <dependency> | ||
| <groupId>com.googlecode.json-simple</groupId> | ||
| <artifactId>json-simple</artifactId> | ||
| <version>1.1</version> |
There was a problem hiding this comment.
| <dependency> | ||
| <groupId>com.google.guava</groupId> | ||
| <artifactId>guava</artifactId> | ||
| <version>33.1.0-jre</version> |
There was a problem hiding this comment.
It seems there is a misunderstanding here.
Embedded dependencies which are not exported are not visible from the POV of OSGi, and thus will not conflict. That's one of the reasons why this is used here.
We need that Guava version right now. There's zero reason to pin the version. It's set to the version the Azure SDK wants (and the SDK will go away anyway).
| <artifactId>caffeine</artifactId> | ||
| <version>3.1.8</version> | ||
| </dependency> | ||
| <!-- guava:33.5.0-jre pulls 2.41.0; caffeine:3.1.8 pulls 2.21.1 --> |
There was a problem hiding this comment.
Where?
ChatGPT points out that these are annotations and that they are used in the Caffeine API; so they are only relevant at Oak compile time. It might be possbile to just exclude them, and that's what we should try first.
| deps in oak-blob-cloud-azure and oak-segment-azure. Additionally, in reactor builds | ||
| Maven uses oak-shaded-guava's original pom.xml rather than dependency-reduced-pom.xml, | ||
| so guava:33.5.0-jre also appears as a transitive dep (making guava optional in | ||
| oak-shaded-guava would eliminate that false positive, but not the azure-keyvault conflict). --> |
There was a problem hiding this comment.
Was that an AI? In any case it's completely ignorant of what this module is.
| <groupId>org.apache.lucene</groupId> | ||
| <artifactId>lucene-core</artifactId> | ||
| </exclusion> | ||
| <!-- jackrabbit-core:2.22.3 (via jackrabbit-parent) declares tika.version=2.4.1, |
There was a problem hiding this comment.
The issue was caused by the change of a maven bundle. Before the change it did not include Tika. That's what changed (and that's why the aforementioned issue did not come up ages ago).
| <!-- oak-auth-ldap has a direct dep on 2.1.12; api-all:2.0.1 pulls 2.1.3 --> | ||
| <dependency> | ||
| <groupId>org.apache.mina</groupId> | ||
| <artifactId>mina-core</artifactId> |
There was a problem hiding this comment.
We need to discuss this somewhere else, apparently.
| <dependency> | ||
| <groupId>org.apache.commons</groupId> | ||
| <artifactId>commons-pool2</artifactId> | ||
| <version>2.12.0</version> | ||
| </dependency> | ||
| <!-- oak-auth-ldap has a direct dep on 2.1.12; api-all:2.0.1 pulls 2.1.3 --> | ||
| <dependency> | ||
| <groupId>org.apache.mina</groupId> | ||
| <artifactId>mina-core</artifactId> | ||
| <version>2.1.12</version> | ||
| </dependency> |
There was a problem hiding this comment.
FWIW, this is another case of intended embedding.
@rishabhdaim - can you open a ticket for the directory update?
|
Embeds: see, for instance: https://gemini.google.com/share/577ed3522d94 |
|
FWIW: It would be useful to exclude test dependencies for now, and see how this affects the outcome. |
mbaedke
left a comment
There was a problem hiding this comment.
I agree that it is unfortunate that we can't have OSGi's classloader separation in oak-run and other standalone tooling.
But the main concern should be to test each module with the dependency version it actually uses in an OSGi deployment.
If that version is embedded, we can't just blindly change it.
That would be the task of the maintainer of the module who actually knows what they are doing.
That specific version may have been embedded for a reason, changing it is not low risk.
If we have a different version in e.g. oak-run and that is considered an issue, we should have tests for that issue in oak-run.
If a dependency is not embedded and used in multiple modules, then it makes sense to centralize it.
If it's just used in a single module, we should not centralize it, because the responsibility for it should be kept with the maintainer of that module.
| <plugin> | ||
| <groupId>org.apache.maven.plugins</groupId> | ||
| <artifactId>maven-enforcer-plugin</artifactId> | ||
| <version>3.5.0</version> |
There was a problem hiding this comment.
| <version>3.5.0</version> |
| </dependency> | ||
| <!-- guava:33.5.0-jre pulls 2.41.0; caffeine:3.1.8 pulls 2.21.1 --> | ||
| <dependency> | ||
| <groupId>com.google.errorprone</groupId> |
There was a problem hiding this comment.
Not needed.
If we really care, the simpler way would be to exclude the dependency oak-core-spi's caffeine dependency.
And yes, I could create a ticket and a PR if desired.



Update: right now the build fails. I will try to resolve the failures first, and then we can discuss about the changes. It might not make sense to discuss changes right now.
Adds the maven-enforcer-plugin to oak-parent with the dependencyConvergence rule, which fails the build if any dependency appears at more than one version in a module's resolved dependency tree. This catches transitive version conflicts at build time rather than at runtime.
Running the enforcer across all modules revealed 20 existing version conflicts. These are fixed by adding explicit version pins to dependencyManagement in oak-parent, with comments on each pin identifying the two conflicting sources so they can be removed when the underlying dependencies are upgraded.
A few modules had direct dependencies with stale explicit versions that were the root cause of their conflict; those versions are updated in place:
A comment is also added to the existing tika-core exclusion in oak-run-commons explaining why the exclusion is necessary alongside the pin.
The Guava pin points to a bigger issue with different modules requiring different version of Guava. It should be resolved. The root cause is that azure-keyvault-core:1.2.6 is quite old and brings in a significantly outdated Guava (30.1.1-jre vs 33.5.0-jre). The pin papers over that mismatch. The real fix would be to upgrade azure-keyvault-core or replace it, since that library has been deprecated in favour of the modern com.azure:azure-keyvault-keys / azure-keyvault-secrets SDK. Once that dependency is gone, the Guava pin would likely become unnecessary.