oracle-quickstart
diff --git a/‎GETTING_STARTED_HELM_DEPLOY.md‎
Lines changed: 1 addition & 1 deletion b/‎GETTING_STARTED_HELM_DEPLOY.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎oci-scanner-plugin-helm/README.md‎
Lines changed: 37 additions & 1 deletion b/‎oci-scanner-plugin-helm/README.md‎
Lines changed: 37 additions & 1 deletion
@@ -232,7 +232,7 @@ helm install lens ./helm -n lens \
 
 ## Step 2: OCI GPU Data Plane Plugin installation on GPU Nodes
 
-**NOTE** : Running data control plane plugin as a Kubernetes native plugin running daemon sets for [AMD MI300X nodes can be found here](./oci-scanner-plugin-amd-helm/README.md). Nvidia offering as a daemon is coming soon. Issue#22
+**NOTE** : Running data control plane plugin as a Kubernetes native plugin running daemon sets for [AMD and Nvidia  nodes can be found here](./oci-scanner-plugin-helm/README.md). Supported GPUs are: MI300x, MI355x, A10, H100 and B200.
 
 1. **Navigate to Dashboards**: Go to the dashboard section of the OCI GPU Scanner Portal
 2. **Go to Tab - OCI GPU Scanner Install Script**:
 
@@ -12,6 +12,7 @@ Multi-vendor GPU monitoring and health check solution for OCI compute instances
 - **Pod Node Mapper**: Pod-to-node relationship tracking
 - **Health Check**: GPU performance testing (optional)
 - **DRHPC**: Distributed diagnostic monitoring for both AMD and NVIDIA
+- **Node Problem Detector**: GPU health monitoring via DRHPC integration (requires labeling)
 
 ## Configuration
 
@@ -27,6 +28,11 @@ helm install oci-gpu-scanner-plugin . -f values.yaml -n oci-gpu-scanner-plugin \
 helm install oci-gpu-scanner-plugin ./oci-scanner-plugin-amd-helm \
   --set healthCheck.enabled=true
 
+# Enable Node Problem Detector (requires node labeling and drhpc to be enabled- see below)
+helm upgrade oci-gpu-scanner-plugin . \
+  --set nodeProblemDetector.enabled=true \
+  --set drhpc.enabled=true
+
 # Uninstall
 helm uninstall oci-gpu-scanner-plugin -n oci-gpu-scanner-plugin
 ```
@@ -36,4 +42,34 @@ helm uninstall oci-gpu-scanner-plugin -n oci-gpu-scanner-plugin
 - Kubernetes cluster with AMD / Nvidia GPU nodes
 - Prometheus Push Gateway accessible from cluster
 - AMD GPU drivers installed on nodes
-- Nvidia GPU Drivers installed on the nodes
+- Nvidia GPU Drivers installed on the nodes
+
+## Node Problem Detector Setup
+
+**IMPORTANT**: The Node Problem Detector will only work on GPU nodes that are labeled with `oci.oraclecloud.com/oke-node-problem-detector-enabled="true"`. And it reads these metrics from drhpc, so ensure that is enabled while deploying.
+
+Before enabling NPD, label your GPU nodes:
+
+```bash
+# Label individual nodes
+kubectl label nodes <node-name> oci.oraclecloud.com/oke-node-problem-detector-enabled=true
+
+# Label all AMD GPU nodes
+kubectl label nodes --selector=amd.com/gpu=true oci.oraclecloud.com/oke-node-problem-detector-enabled=true
+
+# Label all NVIDIA GPU nodes
+kubectl label nodes --selector=nvidia.com/gpu=true oci.oraclecloud.com/oke-node-problem-detector-enabled=true
+
+# Verify labels
+kubectl get nodes --show-labels | grep oke-node-problem-detector-enabled
+```
+
+Then enable NPD:
+
+```bash
+helm upgrade oci-gpu-scanner-plugin . \
+  --set nodeProblemDetector.enabled=true \
+  --set drhpc.enabled=true
+```
+
+**Note**: NPD requires DRHPC to be enabled and running to provide GPU health check data.