overlay: tolerate non-must-copy xattr failures during copy-up#13262
overlay: tolerate non-must-copy xattr failures during copy-up#13262drewmacneil wants to merge 1 commit into
Conversation
ayushr2
left a comment
There was a problem hiding this comment.
Thanks for the bug report. I can write the fix. I will add you as a co-author.
| const char kSelinuxLabel[] = "system_u:object_r:container_ro_file_t:s0"; | ||
| if (setxattr(lower_subdir.c_str(), "security.selinux", kSelinuxLabel, | ||
| sizeof(kSelinuxLabel) - 1, 0) < 0) { | ||
| SKIP_IF(errno == EOPNOTSUPP || errno == ENOTSUP); |
There was a problem hiding this comment.
I think this test will effectively not run in gVisor. I think tmpfs will also return EOPNOTSUPP for this:
gvisor/pkg/sentry/vfs/permissions.go
Line 324 in 85b606a
| // abort the copy-up. Mirrors Linux's fs/overlayfs/util.c:ovl_must_copy_xattr. | ||
| func mustCopyXattr(name string) bool { | ||
| return name == "system.posix_acl_access" || | ||
| name == "system.posix_acl_default" || | ||
| strings.HasPrefix(name, linux.XATTR_USER_PREFIX) | ||
| } |
There was a problem hiding this comment.
Linux's fs/overlayfs/util.c:ovl_must_copy_xattr() function actually does not include XATTR_USER_PREFIX. It includes XATTR_SECURITY_PREFIX.
| // without CAP_SYS_ADMIN; ENODATA covers an xattr that | ||
| // disappears between ListXattrAt and GetXattrAt. | ||
| if !mustCopyXattr(name) && (linuxerr.Equals(linuxerr.EOPNOTSUPP, err) || | ||
| linuxerr.Equals(linuxerr.EPERM, err) || |
There was a problem hiding this comment.
Why do we ignore the EPERM here? Linux doesn't ignore it.
| GTEST_SKIP() << "upper filesystem accepts POSIX ACLs; cannot test " | ||
| "must-copy failure"; |
There was a problem hiding this comment.
This test is also moot in gVisor since we don't support system.posix_acl_default xattr.
ayushr2
left a comment
There was a problem hiding this comment.
#13289 supersedes this
@drewmacneil does it fix your reproducer?
Summary
Fixes
mkdir,open(O_CREAT), and similar operations failing withEOPNOTSUPPinside the sandbox when the lower-layer filesystem exposes a non-must-copysecurity.*xattr (e.g.security.selinuxon a container rootfs labeled by Docker/containerd on an SELinux-enabled host).copyXattrsLockedinpkg/sentry/fsimpl/overlay/copy_up.goiterates lower-layer xattrs and copies each to upper. Until 4e559c2af ("gofer: Add full xattr support"), the gofer'sListXattrAtwas a stub returningEOPNOTSUPP, so the loop's function-level early-return short-circuited it and copy-up never saw real host xattrs. After 4e559c2, the gofer callsunix.Flistxattron the host file, so copy-up now sees names likesecurity.selinux. The VFS permission check (pkg/sentry/vfs/permissions.go:317-324) rejects writes tosecurity.*other thansecurity.capabilitywithEOPNOTSUPP, and the previous unconditionalreturn errin the loop propagated that error all the way out to userspace. The "should not regress" claim in 4e559c2's commit message consideredGetXattrAtforuser.overlay.opaquelookups, not the newListXattrAtexposing arbitrary host xattrs to copy-up.This change matches Linux's
fs/overlayfs/copy_up.c:ovl_copy_xattr, which tolerates per-xattr failures unless the xattr is inovl_must_copy_xattr(POSIX ACLs anduser.*). For SELinux specifically, Linux relies on the LSM to re-label the upper file independently rather than copying the xattr verbatim — gVisor's tmpfs upper has no SELinux LSM, so the xattr is simply absent on upper, which matches Linux behavior with SELinux disabled.Repro
Triggered by any host filesystem that exposes a
security.*xattr (other thansecurity.capability) on the lower layer. SELinux labeling of container rootfs files is the common in-the-wild trigger — every file under a containerd-managed bundle on Amazon Linux 2023 carriessecurity.selinux="system_u:object_r:container_ro_file_t:s0". Observed sentry log and strace excerpt:The user-visible symptom is
mkdir(2)(or any operation that triggers copy-up of the parent) returningEOPNOTSUPP, after which the in-sandbox process typically exits because it can't create its workspace.Test
Two regression tests in
test/syscalls/linux/mount.cc:OverlayfsCopyUpSkipsUnsupportedSecurityXattr— setssecurity.selinuxanduser.test_markeron a lower directory, mounts overlay with a tmpfs upper, triggers copy-up viamkdiron a child path, and asserts thatmkdirsucceeds, the must-copyuser.test_markerwas copied to upper, and the skippedsecurity.selinuxwas not. Without the fix,mkdirfails withEOPNOTSUPP.OverlayfsCopyUpFailsForMustCopyXattr— guards the must-copy boundary. Setssystem.posix_acl_defaulton a lower directory and triggers copy-up against an upper that rejects POSIX ACLs (probed at setup time). Asserts copy-up still aborts as before, so the fix isn't blanket-permissive.Verified with:
bazel test //test/syscalls:mount_test_runsc_systrap --test_filter="*OverlayfsCopyUp*"bazel test //test/syscalls:mount_test_runsc_ptrace --test_filter="*OverlayfsCopyUp*"bazel test //pkg/sentry/fsimpl/overlay:overlay_testAssisted-by: Claude Opus 4.7