Skip to content

runsc: bind-mount host /proc/driver/nvidia for CDI createContainer hooks#13284

Open
a7i wants to merge 1 commit into
google:masterfrom
a7i:a7i/nvidia-procfs-for-cdi-hooks
Open

runsc: bind-mount host /proc/driver/nvidia for CDI createContainer hooks#13284
a7i wants to merge 1 commit into
google:masterfrom
a7i:a7i/nvidia-procfs-for-cdi-hooks

Conversation

@a7i
Copy link
Copy Markdown
Contributor

@a7i a7i commented May 26, 2026

Fixes #13283.

What this does

After #13034 enabled CDI createContainer hooks in the gofer, three of the four NVIDIA hooks emitted by k8s-device-plugin (create-symlinks, enable-cuda-compat, update-ldcache) now succeed inside the gofer. The fourth — disable-device-node-modification — still fails because it bind-mounts a modified params file over <containerRootFs>/proc/driver/nvidia/params, and that path doesn't exist in containerRootFs at hook time (procfs is not mounted there; the sentry serves /proc itself later).

This change bind-mounts the host's /proc/driver/nvidia directory (read-only) onto containerRootFs/proc/driver/nvidia when nvproxy is enabled, right after SetupDev and before hooks execute. That mirrors what runc gets for free from mounting procfs into the container before createContainer hooks run.

The hook's semantic effect (set ModifyDeviceFiles=0 so in-container libnvidia-ml won't auto-create extra /dev/nvidiaN nodes) doesn't apply under gVisor because nvproxy mediates all device access and the sentry owns /dev. The hook just needs to be able to complete so sandbox creation proceeds — which this fix achieves.

Sequence

gofer setup (containerRootFs)
  SetupMounts          → libcuda / library bind-mounts
  SetupDev             → /dev/nvidia* cdevs
  SetupNvidiaProcDriver  ← NEW: bind-mount /proc/driver/nvidia (ro)
  ExecuteHooks         → all four CDI createContainer hooks now succeed
bind-mount containerRootFs → goferRootFs (existing behavior)
pivot_root into goferRootFs
sentry boots; serves its own /proc

Tested

Verified on a Tesla T4 host (Ubuntu 22.04, kernel 6.8, containerd 2.2.2, kubelet 1.35) with k8s-device-plugin in DEVICE_LIST_STRATEGY=cdi-annotations mode and runsc release-20260520.0 (pre-patch baseline). Pre-patch the gofer log shows:

hooks.go:63] Executing hook nvidia-ctk hook disable-device-node-modification
util.go:107] FATAL ERROR: error executing CreateContainer hooks
stderr: failed to mount modified params file: open o_path procfd:
  open /run/containerd/.../rootfs/proc/driver/nvidia/params: no such file or directory

With this patch applied, all four hooks log Execute hook success!, the sandbox starts, and a PyTorch CUDA pod reports cuda available: True end-to-end.

Tests

  • TestSetupNvidiaProcDriverNoHostDriver covers the graceful-skip path (host with no NVIDIA driver loaded) using a tmpdir as the rootfs.
  • Real bind-mount behavior is covered by the existing GPU-host integration tests; this change drops into the same code path they exercise after feat: Support running createContainer hooks in CDI spec #13034.

Risks

  • Scope of the bind-mount: read-only, only the contents of /proc/driver/nvidia/ (kernel-provided NVIDIA driver metadata). Same surface the host driver already exposes to userspace; nothing privileged is added.
  • No-op when nvproxy is disabled: gated on specutils.NVProxyEnabled(spec, conf).
  • No-op when the host driver is not loaded: os.Stat of /proc/driver/nvidia returns ENOENT → function returns nil cleanly.
  • Compat with EROFS rootfs: Setup runs inside if rootfsConf.ShouldUseLisafs() only for the hook execution itself, but SetupNvidiaProcDriver is called outside that branch (alongside SetupDev). EROFS rootfs is still covered by the os.MkdirAll failing → fatal, with a clear error; that matches the existing SetupDev behavior and the EROFS hook caveat already documented by feat: Support running createContainer hooks in CDI spec #13034.

After google#13034 enabled createContainer hooks in CDI specs, three of four
NVIDIA CDI hooks emitted by k8s-device-plugin succeed inside the gofer
(create-symlinks, enable-cuda-compat, update-ldcache). The fourth --
disable-device-node-modification -- still fails because it opens
/proc/driver/nvidia/params inside the container's rootfs and bind-mounts a
modified copy over it; that path does not exist in containerRootFs at hook
time because procfs is not mounted (the sentry serves /proc itself later):

  nvidia-ctk hook disable-device-node-modification
  stderr: failed to mount modified params file: open o_path procfd:
    open /run/containerd/.../rootfs/proc/driver/nvidia/params:
    no such file or directory
  FATAL ERROR: error executing CreateContainer hooks

Under runc, /proc is mounted into the container's mount namespace before
createContainer hooks run, so the hook just works. Mirror that here for
the gofer: when nvproxy is enabled, bind-mount the host's
/proc/driver/nvidia directory onto containerRootFs/proc/driver/nvidia
(read-only) before invoking hooks.

The hook's semantic effect (set ModifyDeviceFiles=0 to prevent libnvidia-ml
from auto-creating extra /dev/nvidiaN nodes) does not apply under gVisor --
nvproxy mediates all device access and the sentry owns /dev -- but the
hook needs to be able to *complete* for sandbox creation to proceed.

Fixes google#13283
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

createContainer hook disable-device-node-modification fails: /proc/driver/nvidia/params not available in containerRootFs (gvisor#13034 follow-up)

1 participant