forked from NVIDIA/gpu-driver-container
-
Notifications
You must be signed in to change notification settings - Fork 0
Enable GPU operator to install GRID driver on Azure NV instances #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vinser52
wants to merge
10
commits into
datadog
Choose a base branch
from
sergei.vinogradov/nvidia-grid-driver
base: datadog
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+195
−1
Open
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
5084376
Install GRID driver on Azure NV and NVv3 instances
vinser52 5bdc758
Use sysfs instead of Azure IMDS
vinser52 c05e53f
Modify download script accept driver version as argument
vinser52 86cfb3f
Add executable permission to the download script
vinser52 d29ff40
Install kernel headers before running the GRID installer
vinser52 a888111
Fix issue with __acpi_video_get_backlight_type symbol not found
vinser52 3e1efa4
Minor refactoring of function names
vinser52 dbea80f
Fix installer for precompiled driver
vinser52 92c9692
Address review comments
vinser52 1bb9b0b
Use dkms from the cuda package repo
vinser52 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| set -eu | ||
|
|
||
| # GRID_INSTALLER_DIR is provided by Dockerfile ENV | ||
| GRID_INSTALLER_DIR=${GRID_INSTALLER_DIR:-/opt/nvidia-grid-install} | ||
|
|
||
| # Available Azure GRID driver versions | ||
| AVAILABLE_VERSIONS="550.144.06, 535.161.08, 525.105.17" | ||
|
|
||
| print_usage() { | ||
| echo "Usage: $0 <driver_version>" | ||
| echo "Available versions: $AVAILABLE_VERSIONS" | ||
| } | ||
|
|
||
| get_grid_azure_url() { | ||
| local version="$1" | ||
|
|
||
| # Azure GRID driver version mapping | ||
| case "$version" in | ||
| 550.144.06*) | ||
| echo "https://download.microsoft.com/download/c5319e92-672e-4067-8d85-ab66a7a64db3/NVIDIA-Linux-x86_64-550.144.06-grid-azure.run" | ||
| ;; | ||
| 535.161.08*) | ||
| echo "https://download.microsoft.com/download/8/d/a/8da4fb8e-3a9b-4e6a-bc9a-72ff64d7a13c/NVIDIA-Linux-x86_64-535.161.08-grid-azure.run" | ||
| ;; | ||
| 525.105.17*) | ||
| echo "https://download.microsoft.com/download/6/b/d/6bd2850f-5883-4e2a-9a35-edbd3dd6808c/NVIDIA-Linux-x86_64-525.105.17-grid-azure.run" | ||
| ;; | ||
| *) | ||
| echo "" | ||
| return 1 | ||
| ;; | ||
| esac | ||
| return 0 | ||
| } | ||
|
|
||
| fetch_grid_azure_installer() { | ||
| local driver_version="$1" | ||
|
|
||
| if [ -z "$driver_version" ]; then | ||
| echo "ERROR: Driver version must be provided as an argument" | ||
| print_usage | ||
| exit 1 | ||
| fi | ||
|
|
||
| mkdir -p "$GRID_INSTALLER_DIR" | ||
| cd "$GRID_INSTALLER_DIR" | ||
|
|
||
| local download_url=$(get_grid_azure_url "$driver_version") | ||
|
|
||
| if [ -z "$download_url" ]; then | ||
| echo "ERROR: No Azure GRID driver URL found for version $driver_version" | ||
| print_usage | ||
| exit 1 | ||
| fi | ||
|
|
||
| local filename=$(basename "$download_url") | ||
| echo "Downloading GRID driver from: $download_url" | ||
|
|
||
| curl -fSsl -o "$filename" "$download_url" | ||
| chmod +x "$filename" | ||
|
|
||
| echo "GRID installer downloaded successfully to $GRID_INSTALLER_DIR/$filename" | ||
| } | ||
|
|
||
| fetch_grid_azure_installer "$@" | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| #! /bin/bash | ||
|
|
||
| GRID_INSTALLER_DIR=${GRID_INSTALLER_DIR:-/opt/nvidia-grid-install} | ||
|
|
||
| _install_grid_driver() { | ||
| echo "Installing NVIDIA GRID driver from Azure package..." | ||
|
|
||
| if [ ! -d "$GRID_INSTALLER_DIR" ]; then | ||
| echo "ERROR: GRID installer directory not found: $GRID_INSTALLER_DIR" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Find the .run installer file | ||
| local installer_file=$(find "$GRID_INSTALLER_DIR" -maxdepth 1 -type f -name "NVIDIA-Linux-*.run" | head -n 1) | ||
|
|
||
| if [ -z "$installer_file" ]; then | ||
| echo "ERROR: GRID installer .run file not found in $GRID_INSTALLER_DIR" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "Using GRID installer: $installer_file" | ||
|
|
||
| # Install kernel headers and modules required for DKMS | ||
| # linux-modules provides video.ko which nvidia-modeset depends on for __acpi_video_get_backlight_type symbol | ||
| echo "Installing kernel headers and modules for ${KERNEL_VERSION}..." | ||
| apt-get install --no-install-recommends --no-download -y \ | ||
| linux-headers-${KERNEL_VERSION} \ | ||
| linux-modules-${KERNEL_VERSION} \ | ||
| dkms | ||
|
|
||
| # Create temporary directory for installer | ||
| local tmpdir="$GRID_INSTALLER_DIR/nvidia-grid-tmp" | ||
| mkdir -p "$tmpdir" | ||
|
|
||
| # Install GRID driver using the .run installer | ||
| # -s (--silent): non-interactive silent mode | ||
| # --dkms: use DKMS to build and load kernel modules automatically | ||
| # --tmpdir: specify temporary directory for installation | ||
| # Note: GRID drivers do not support --skip-module-load option | ||
| bash -c "$installer_file -s --dkms --tmpdir $tmpdir" | ||
|
|
||
| local exit_code=$? | ||
|
|
||
| # Clean up temporary directory | ||
| rm -rf "$tmpdir" | ||
|
|
||
| if [ $exit_code -ne 0 ]; then | ||
| echo "ERROR: GRID driver installation failed with exit code $exit_code" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Updating gridd.conf as required for Azure NV/NVv3 VMs. | ||
| # See: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup#install-grid-drivers-on-nv-or-nvv3-series-vms | ||
| echo "Creating GRID config" | ||
| cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf | ||
|
|
||
| # Replace EnableUI in place (handles both commented and uncommented) | ||
| sed -i 's/^#\?[[:space:]]*EnableUI=.*/EnableUI=FALSE/' /etc/nvidia/gridd.conf | ||
|
|
||
| # Add EnableUI if not present anywhere in the file | ||
| grep -q '^EnableUI=' /etc/nvidia/gridd.conf || echo "EnableUI=FALSE" >> /etc/nvidia/gridd.conf | ||
|
|
||
| # Replace IgnoreSP in place (handles both commented and uncommented) | ||
| sed -i 's/^#\?[[:space:]]*IgnoreSP=.*/IgnoreSP=FALSE/' /etc/nvidia/gridd.conf | ||
|
|
||
| # Add IgnoreSP if not present anywhere in the file | ||
| grep -q '^IgnoreSP=' /etc/nvidia/gridd.conf || echo "IgnoreSP=FALSE" >> /etc/nvidia/gridd.conf | ||
|
|
||
| # Comment out FeatureType if uncommented | ||
| sed -i 's/^FeatureType=/#FeatureType=/' /etc/nvidia/gridd.conf | ||
|
|
||
| echo "GRID driver installed successfully" | ||
| } | ||
|
|
||
| _has_nvidia_a10_gpu() { | ||
| # Check for NVIDIA A10 GPU (vendor: 0x10de, device: 0x2236) | ||
| # NVIDIA A10 requires GRID driver on Azure | ||
| for dev in /sys/bus/pci/devices/*; do | ||
| if [ -f "$dev/vendor" ] && [ -f "$dev/device" ]; then | ||
| vendor=$(cat "$dev/vendor") | ||
| device=$(cat "$dev/device") | ||
|
|
||
| if [ "$vendor" = "0x10de" ] && [ "$device" = "0x2236" ]; then | ||
| echo "Detected NVIDIA A10 GPU at $(basename $dev), GRID driver required" | ||
| return 0 # A10 GPU present | ||
| fi | ||
| fi | ||
| done | ||
|
|
||
| return 1 # A10 GPU not present | ||
| } | ||
|
|
||
| _is_grid_driver_required() { | ||
| # Extract CSP name from kernel version (e.g. "azure" from "5.15.0-1040-azure") | ||
| local csp_name="${KERNEL_VERSION##*-}" | ||
|
|
||
| # Check if this is an Azure instance with NVidia A10 GPU | ||
| if [ "$csp_name" = "azure" ] && _has_nvidia_a10_gpu; then | ||
| return 0 # GRID driver required | ||
| fi | ||
|
|
||
| return 1 # GRID driver not required | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to support all those versions, especially since they are hardcoded anyway. Only keeping 1 (the latest) per driver branch would shorten the script a little bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done