runsc: bind-mount host /proc/driver/nvidia for CDI createContainer hooks#13284
Open
a7i wants to merge 1 commit into
Open
runsc: bind-mount host /proc/driver/nvidia for CDI createContainer hooks#13284a7i wants to merge 1 commit into
a7i wants to merge 1 commit into
Conversation
After google#13034 enabled createContainer hooks in CDI specs, three of four NVIDIA CDI hooks emitted by k8s-device-plugin succeed inside the gofer (create-symlinks, enable-cuda-compat, update-ldcache). The fourth -- disable-device-node-modification -- still fails because it opens /proc/driver/nvidia/params inside the container's rootfs and bind-mounts a modified copy over it; that path does not exist in containerRootFs at hook time because procfs is not mounted (the sentry serves /proc itself later): nvidia-ctk hook disable-device-node-modification stderr: failed to mount modified params file: open o_path procfd: open /run/containerd/.../rootfs/proc/driver/nvidia/params: no such file or directory FATAL ERROR: error executing CreateContainer hooks Under runc, /proc is mounted into the container's mount namespace before createContainer hooks run, so the hook just works. Mirror that here for the gofer: when nvproxy is enabled, bind-mount the host's /proc/driver/nvidia directory onto containerRootFs/proc/driver/nvidia (read-only) before invoking hooks. The hook's semantic effect (set ModifyDeviceFiles=0 to prevent libnvidia-ml from auto-creating extra /dev/nvidiaN nodes) does not apply under gVisor -- nvproxy mediates all device access and the sentry owns /dev -- but the hook needs to be able to *complete* for sandbox creation to proceed. Fixes google#13283
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #13283.
What this does
After #13034 enabled CDI
createContainerhooks in the gofer, three of the four NVIDIA hooks emitted byk8s-device-plugin(create-symlinks,enable-cuda-compat,update-ldcache) now succeed inside the gofer. The fourth —disable-device-node-modification— still fails because it bind-mounts a modifiedparamsfile over<containerRootFs>/proc/driver/nvidia/params, and that path doesn't exist incontainerRootFsat hook time (procfs is not mounted there; the sentry serves/procitself later).This change bind-mounts the host's
/proc/driver/nvidiadirectory (read-only) ontocontainerRootFs/proc/driver/nvidiawhen nvproxy is enabled, right afterSetupDevand before hooks execute. That mirrors what runc gets for free from mounting procfs into the container beforecreateContainerhooks run.The hook's semantic effect (set
ModifyDeviceFiles=0so in-containerlibnvidia-mlwon't auto-create extra/dev/nvidiaNnodes) doesn't apply under gVisor because nvproxy mediates all device access and the sentry owns/dev. The hook just needs to be able to complete so sandbox creation proceeds — which this fix achieves.Sequence
Tested
Verified on a Tesla T4 host (Ubuntu 22.04, kernel 6.8, containerd 2.2.2, kubelet 1.35) with
k8s-device-plugininDEVICE_LIST_STRATEGY=cdi-annotationsmode andrunsc release-20260520.0(pre-patch baseline). Pre-patch the gofer log shows:With this patch applied, all four hooks log
Execute hook success!, the sandbox starts, and a PyTorch CUDA pod reportscuda available: Trueend-to-end.Tests
TestSetupNvidiaProcDriverNoHostDrivercovers the graceful-skip path (host with no NVIDIA driver loaded) using a tmpdir as the rootfs.Risks
/proc/driver/nvidia/(kernel-provided NVIDIA driver metadata). Same surface the host driver already exposes to userspace; nothing privileged is added.specutils.NVProxyEnabled(spec, conf).os.Statof/proc/driver/nvidiareturnsENOENT→ function returns nil cleanly.if rootfsConf.ShouldUseLisafs()only for the hook execution itself, butSetupNvidiaProcDriveris called outside that branch (alongsideSetupDev). EROFS rootfs is still covered by theos.MkdirAllfailing → fatal, with a clear error; that matches the existingSetupDevbehavior and the EROFS hook caveat already documented by feat: Support running createContainer hooks in CDI spec #13034.