Skip to content

feat: optionally prebuild NVIDIA GPU kernel module into VHD to cut provisioning time#8612

Draft
ganeshkumarashok wants to merge 2 commits into
Azure:mainfrom
ganeshkumarashok:gpu-prebuild-kernel-module
Draft

feat: optionally prebuild NVIDIA GPU kernel module into VHD to cut provisioning time#8612
ganeshkumarashok wants to merge 2 commits into
Azure:mainfrom
ganeshkumarashok:gpu-prebuild-kernel-module

Conversation

@ganeshkumarashok
Copy link
Copy Markdown
Contributor

Problem

On mainstream Ubuntu GPU SKUs the aks-gpu-cuda image is only pre-pulled into the VHD; the expensive NVIDIA DKMS kernel-module compile + update-initramfs runs on the host at first boot during CSE. That compile is the dominant cost in GPU node provisioning. (GB200 is the only SKU that installs at VHD build time today.)

What this does

Adds an opt-in path to compile the NVIDIA kernel module into the VHD at build time and skip the boot-time build, scoped to the most common GPU SKU (Ubuntu 22.04 amd64, CUDA driver).

Boot-time (parts/linux/cloud-init/artifacts/cse_config.sh)

  • New gpuPrebuiltModuleMatches() guard and a fast path in configGPUDrivers() that selects the install-skip-build action only when a baked module exactly matches the running kernel, driver_version, driver_kind, and driver image_tag.
  • Any mismatch — kernel drift, a newer driver handed down by CRP, a GRID SKU, or an older VHD without the marker — falls back to today's full build. Correctness never depends on the fast path; it is purely an optimization.

Build-time (vhdbuilder/packer/install-dependencies.sh)

  • New prebuildGPUKernelModule() runs the aks-gpu container's build-only action during VHD build (no device access) and records the driver image tag in the marker.
  • Gated behind PREBUILD_GPU_KERNEL_MODULE (default off) and UBUNTU_RELEASE=22.04, so existing builds are unchanged and it is only attempted on VHDs whose aks-gpu image supports build-only.

Tests (spec/.../cse_config_spec.sh)

  • shellspec coverage for gpuPrebuiltModuleMatches: exact match, absent marker, kernel drift, image-tag mismatch, GRID kind, and unloadable module. (6/6 passing.)

Why it's safe to merge independently

The boot-time fast path is always compiled in but inert until a marker exists (no marker ⇒ identical to today). The build-time prebuild is off by default. Nothing changes for non-GPU nodes — the marker/module is never produced or consulted there.

Companion change (required first)

Depends on Azure/aks-gpu#159 (the build-only / install-skip-build entrypoint actions). That image must be published to MCR and referenced in components.json before PREBUILD_GPU_KERNEL_MODULE is enabled — the currently-cached image has the old entrypoint.

Validation

  • go test ./pkg/agent/... — pass
  • make generate — no snapshot diffs (GPU CSE path isn't exercised by baker snapshots)
  • new shellspec block — 6/6 pass

Still required before production

  • Publish & wire the companion aks-gpu image (Fix an issue for generator #159) into components.json.
  • Confirm --dkms compiles cleanly on a GPU-less Packer builder.
  • Secure Boot signing of the prebuilt .ko.
  • GPU e2e.

Draft until the companion image lands and the above are validated.

…ovisioning time

On mainstream Ubuntu GPU SKUs the aks-gpu-cuda image is only pre-pulled into
the VHD; the expensive NVIDIA DKMS kernel-module compile + update-initramfs
runs on the host at first boot during CSE. This adds an opt-in path to compile
the module into the VHD at build time and skip the boot-time build.

- cse_config.sh: add gpuPrebuiltModuleMatches() guard and a "fast path" in
  configGPUDrivers() that selects the install-skip-build action only when a
  baked module exactly matches the running kernel, driver version/kind, and
  driver image tag. Any mismatch (kernel drift, newer driver from CRP, GRID
  SKU, older VHD without the marker) falls back to today's full build, so
  correctness never depends on the fast path.
- install-dependencies.sh: add prebuildGPUKernelModule() which runs the aks-gpu
  container's build-only action during VHD build and records the driver image
  tag in the marker. Gated behind PREBUILD_GPU_KERNEL_MODULE and scoped to
  Ubuntu 22.04 amd64 (CUDA driver). Default off so existing builds are
  unchanged and it is only attempted on VHDs whose aks-gpu image supports the
  build-only action.
- cse_config_spec.sh: add shellspec coverage for gpuPrebuiltModuleMatches.

Requires a companion aks-gpu image change (build-only / install-skip-build
entrypoint actions) to be published before PREBUILD_GPU_KERNEL_MODULE is
enabled. Secure Boot module signing and GPU e2e validation are still required.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 31, 2026 18:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in path to prebuild NVIDIA GPU kernel modules during Ubuntu 22.04 amd64 VHD creation, then lets CSE skip the boot-time DKMS build when the baked module exactly matches the running node configuration.

Changes:

  • Adds gpuPrebuiltModuleMatches() and uses install-skip-build for matching prebuilt GPU modules.
  • Adds gated VHD build logic to run the aks-gpu build-only action and record image metadata.
  • Adds ShellSpec coverage for marker matching and mismatch scenarios.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
parts/linux/cloud-init/artifacts/cse_config.sh Adds the GPU prebuilt-module guard and CSE fast path.
vhdbuilder/packer/install-dependencies.sh Adds optional VHD-time NVIDIA module prebuild logic for Ubuntu 22.04 amd64.
spec/parts/linux/cloud-init/artifacts/cse_config_spec.sh Adds ShellSpec tests for the prebuilt module marker guard.
Comments suppressed due to low confidence (1)

parts/linux/cloud-init/artifacts/cse_config.sh:1043

  • The fast path currently treats an install-skip-build failure as fatal instead of falling back to the normal install path. A matching marker only proves the cached module metadata lines up; it does not prove the skip-build action can actually complete on every boot configuration (for example module signature/load failures or an image-side regression). Since this path is intended to be an optimization only, retrying with the full build before exiting preserves the existing provisioning behavior when the optimization fails.
        retrycmd_if_failure 5 10 600 bash -c "$CTR_GPU_INSTALL_CMD $NVIDIA_DRIVER_IMAGE:$NVIDIA_DRIVER_IMAGE_TAG gpuinstall /entrypoint.sh ${gpu_install_action}"
        ret=$?
        if [ "$ret" -ne 0 ]; then
            echo "Failed to install GPU driver, exiting..."
            exit $ERR_GPU_DRIVERS_START_FAIL

The GPU kernel module is DKMS-compiled against the builder's running kernel,
but nodes boot the newest kernel baked into the image. If those differ, every
node sees a marker kernel != uname -r, silently falls back to the boot-time
build, and ships a useless prebuilt module. Fail the VHD build loudly when the
running kernel isn't the newest installed, or when matching headers are absent,
so the misorder is fixed instead of silently regressing the optimization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants