Skip to content

validate_usr_symlinks: match type=link as a whole mtree field#2099

Open
MindflareX wants to merge 1 commit into
GoogleContainerTools:mainfrom
MindflareX:fix/validate-usr-symlinks-substring-match
Open

validate_usr_symlinks: match type=link as a whole mtree field#2099
MindflareX wants to merge 1 commit into
GoogleContainerTools:mainfrom
MindflareX:fix/validate-usr-symlinks-substring-match

Conversation

@MindflareX
Copy link
Copy Markdown

Summary

The aspect's awk script in private/util/validate_usr_symlinks.awk used $0 !~ /type=link/ to decide whether an entry at /bin, /sbin, /lib, /lib32, /lib64, or /libx32 was a symlink. Because the pattern was a plain substring match, any mtree field value elsewhere on the line that happened to contain the literal string type=link (e.g. uname=type=link, gname=type=link, a flags value spelled the same way) would satisfy the check, and the aspect would silently accept a non-symlink at that path — breaking the merged-usr invariant that the aspect exists to enforce.

A symmetric issue existed in the link= match used to verify the symlink target, which could in principle be anchored against the trailing characters of a prior field rather than a true field boundary.

This PR anchors both regexes to whitespace boundaries so they operate on whole mtree fields and adds regression tests.

Reproduction (against the unfixed script)

$ printf './bin type=dir uname=type=link gname=root mode=0755 uid=0 gid=0\n' \
    | gawk -v validation_output_file=/dev/null -f private/util/validate_usr_symlinks.awk
$ echo $?
0

The script reports no violation even though ./bin is a directory, not a symlink to usr/bin. With this PR:

$ ./bazel-bin/private/util/validate_usr_symlinks_test
VIOLATION: ./bin is not a symlink (must link to usr/bin) violates usr-merge convention.
./bin
$ echo $?
1

Risk assessment

For images built from the snapshots in private/repos/deb/*.lock.json this is defense-in-depth: Debian-supplied .debs do not carry uname/gname/flags values shaped like type=link, so the bypass cannot be reached from current sources. The fix matters mainly because the aspect's stated job is to catch merged-usr-violating layers regardless of how they end up in the input tars (locally-built tars, custom packages, future non-Debian sources).

Test plan

  • private/util/validate_usr_symlinks_test.sh passes locally (existing cases + 5 new regression cases).
  • Each new regression case fails with the pre-PR awk and passes with the post-PR awk.
  • CI green.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the robustness of the symlink validation logic in the AWK script by replacing naive substring matches for type=link and link= with field-bounded regex patterns. This prevents false negatives when metadata fields like uname or gname contain the string 'type=link'. Corresponding regression tests were added to verify the fix. A review comment suggests using the [[:space:]] character class consistently when extracting the link target to ensure all POSIX-defined whitespace characters are handled correctly.

Comment thread private/util/validate_usr_symlinks.awk Outdated
if ($0 !~ /[[:space:]]type=link([[:space:]]|$)/) {
VIOLATIONS[original_path] = original_path " is not a symlink (must link to " expected[path] ")"
} else if (match($0, / link=([^ \t]+)/, dest) && dest[1] != expected[path]) {
} else if (match($0, /[[:space:]]link=([^ \t]+)/, dest) && dest[1] != expected[path]) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the use of [[:space:]] in the type=link check on line 27, consider using [^[:space:]]+ to match the link target. This ensures that any POSIX-defined whitespace character is correctly treated as a field boundary when extracting the link destination, making the regex more robust and consistent with the preceding check.

        } else if (match($0, /[[:space:]]link=([^[:space:]]+)/, dest) && dest[1] != expected[path]) {

@MindflareX MindflareX force-pushed the fix/validate-usr-symlinks-substring-match branch from 44fbbf2 to 143878f Compare May 24, 2026 10:19
The aspect's awk script used `$0 !~ /type=link/` to decide whether an
entry at /bin, /sbin, /lib, /lib32, /lib64, or /libx32 was a symlink.
Because the pattern was a plain substring match, any mtree field value
elsewhere on the line that happened to contain the literal string
"type=link" (for example `uname=type=link`, `gname=type=link`, or a
`flags` value spelled the same way) would satisfy the check and the
aspect would silently accept a non-symlink at that path. A symmetric
issue existed in the ` link=` match, which could be anchored on the
prior field's trailing characters rather than a field boundary.

Anchor both regexes to whitespace boundaries so the check operates on
whole mtree fields, and add regression test cases that fail under the
old script and pass under the new one.

Signed-off-by: MindflareX <94793217+MindflareX@users.noreply.github.com>
@MindflareX MindflareX force-pushed the fix/validate-usr-symlinks-substring-match branch from 143878f to 8fac8bd Compare May 24, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant