You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TL;DR: Testcontainers only needs TAR streams for building images / copying files. Depending on commons-compress pulls in a large multi-archiver library (plus commons-lang3) and widens the CVE blast radius. I propose we vendor a minimal TAR-only implementation (effectively TarInputStream/TarOutputStream + small helpers) into Testcontainers until a suitable micro-library exists.
Motivation
We only use a tiny subset of functionality
In practice Testcontainers needs tar only (e.g., TarInputStream, TarOutputStream), not the whole zoo of formats supported by commons-compress.
CVE blast radius from “monolith” jar commons-compress bundles many archivers in one jar. A vulnerability in any of them taints the entire artifact, which then taints Testcontainers and all of our users transitively—even if we never touch the vulnerable format.
Unwanted extra dependency (commons-lang3) commons-compress depends on commons-lang3, which further increases footprint and attack surface for effectively no benefit in our use case.
Given this, Testcontainers can’t rely on upstream modularization to reduce our dependency and security footprint.
Proposal
Inline a minimal TAR implementation into Testcontainers, under an internal package (e.g., org.testcontainers.tar.*) with the following properties:
Scope: just what we need for Docker interactions: creating/reading TARs with support for directories, regular files, symlinks, file modes/exec bit, long names (PAX), and stable timestamps/UID/GID as required by Docker build contexts and copyToContainer.
Source: start from a clean, minimal implementation (or re-implement the few necessary pieces). If we adapt ALv2 code (e.g., a small subset of Commons Tar classes), we will:
preserve license headers,
include NOTICE updates,
rename packages to avoid conflicts,
keep the fork strictly minimal.
Tests: extend our integration tests to assert:
long file names and PAX headers work,
file permissions are preserved (incl. executable bit),
symlinks round-trip correctly,
Windows/Linux/macOS scenarios,
build contexts produce identical images compared to current implementation.
An alternative option could be using docker cli directly instead of manually working with tar files, however, it sounds like a long-term solution.
If there’s rough consensus, I can prepare a PoC PR with the minimal internal TAR code and an extended test suite to make this change safe.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
TL;DR: Testcontainers only needs TAR streams for building images / copying files. Depending on
commons-compresspulls in a large multi-archiver library (pluscommons-lang3) and widens the CVE blast radius. I propose we vendor a minimal TAR-only implementation (effectivelyTarInputStream/TarOutputStream+ small helpers) into Testcontainers until a suitable micro-library exists.Motivation
We only use a tiny subset of functionality
In practice Testcontainers needs tar only (e.g.,
TarInputStream,TarOutputStream), not the whole zoo of formats supported by commons-compress.CVE blast radius from “monolith” jar
commons-compressbundles many archivers in one jar. A vulnerability in any of them taints the entire artifact, which then taints Testcontainers and all of our users transitively—even if we never touch the vulnerable format.Unwanted extra dependency (
commons-lang3)commons-compressdepends oncommons-lang3, which further increases footprint and attack surface for effectively no benefit in our use case.Why not fix this upstream?
The Commons team has explicitly declined micro-modularization of
commons-lang3:https://lists.apache.org/thread/9g1opd6l44dmck00b8gwg5qf1srngybl
They also can’t practically split
commons-compressinto per-format modules due to Maven-specific issue:https://lists.apache.org/thread/bwhsonqdq1f57hrfz3l6wy1gxrtssbsc
https://lists.apache.org/thread/q0bn38qtxkv4orx6o9lhtonjcxkbtw5f
Even removing the
commons-compress=>commons-lang3dependency was not accepted:Remove Commons Lang dependency apache/commons-compress#607
Given this, Testcontainers can’t rely on upstream modularization to reduce our dependency and security footprint.
Proposal
Inline a minimal TAR implementation into Testcontainers, under an internal package (e.g., org.testcontainers.tar.*) with the following properties:
Scope: just what we need for Docker interactions: creating/reading TARs with support for directories, regular files, symlinks, file modes/exec bit, long names (PAX), and stable timestamps/UID/GID as required by Docker build contexts and copyToContainer.
Source: start from a clean, minimal implementation (or re-implement the few necessary pieces). If we adapt ALv2 code (e.g., a small subset of Commons Tar classes), we will:
Tests: extend our integration tests to assert:
An alternative option could be using
dockercli directly instead of manually working withtarfiles, however, it sounds like a long-term solution.If there’s rough consensus, I can prepare a PoC PR with the minimal internal TAR code and an extended test suite to make this change safe.
Beta Was this translation helpful? Give feedback.
All reactions