@@ -12,16 +12,18 @@ To begin using Datadog's GPU Monitoring, your environment must meet the followin
1212
1313#### Minimum version requirements
1414
15- - ** Datadog Agent** : version 7.70.1
15+ - ** Datadog Agent** : version 7.72.2
1616- [ ** Datadog Operator** ] [ 5 ] : version 1.18, _ or_ [ ** Datadog Helm chart** ] [ 6 ] : version 3.137.3
1717- ** Operating system** : Linux
1818 - (Optional) For advanced eBPF metrics, Linux kernel version 5.8
1919- ** NVIDIA driver** : version 450.51
2020- ** Kubernetes** : 1.22 with PodResources API active
2121
22- ## Set up GPU Monitoring on a uniform cluster
22+ ## Set up GPU Monitoring on a uniform cluster or non-Kubernetes environment
2323
24- In a uniform cluster, all nodes have GPU devices.
24+ The following instructions are the basic steps to set up GPU Monitoring in the following environments:
25+ - In a Kubernetes cluster where ** all** the nodes have GPU devices
26+ - In a non-Kubernetes environment, such as Docker or non-containerized Linux.
2527
2628{{< tabs >}}
2729{{% tab "Datadog Operator" %}}
@@ -97,6 +99,156 @@ In a uniform cluster, all nodes have GPU devices.
9799[2] : https://github.com/DataDog/datadog-agent/releases
98100
99101{{% /tab %}}
102+
103+ {{% tab "Docker" %}}
104+
105+ To enable GPU Monitoring in Docker without advanced eBPF metrics, use the following configuration when starting the container Agent :
106+
107+ ` ` ` shell
108+ docker run \
109+ --pid host \
110+ --gpus all \
111+ -e DD_GPU_ENABLED=true \
112+ -v /var/run/docker.sock:/var/run/docker.sock:ro \
113+ -v /proc/:/host/proc/:ro \
114+ -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
115+ gcr.io/datadoghq/agent:latest
116+ ` ` `
117+
118+ To enable advanced eBPF metrics, use the following configuration for the required permissions to run eBPF programs :
119+
120+ ` ` ` shell
121+ docker run \
122+ --cgroupns host \
123+ --pid host \
124+ --gpus all \
125+ -e DD_API_KEY="<DATADOG_API_KEY>" \
126+ -e DD_GPU_MONITORING_ENABLED=true \
127+ -e DD_GPU_ENABLED=true \
128+ -v /:/host/root:ro \
129+ -v /var/run/docker.sock:/var/run/docker.sock:ro \
130+ -v /proc/:/host/proc/:ro \
131+ -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
132+ -v /sys/kernel/debug:/sys/kernel/debug \
133+ -v /lib/modules:/lib/modules:ro \
134+ -v /usr/src:/usr/src:ro \
135+ -v /var/tmp/datadog-agent/system-probe/build:/var/tmp/datadog-agent/system-probe/build \
136+ -v /var/tmp/datadog-agent/system-probe/kernel-headers:/var/tmp/datadog-agent/system-probe/kernel-headers \
137+ -v /etc/apt:/host/etc/apt:ro \
138+ -v /etc/yum.repos.d:/host/etc/yum.repos.d:ro \
139+ -v /etc/zypp:/host/etc/zypp:ro \
140+ -v /etc/pki:/host/etc/pki:ro \
141+ -v /etc/yum/vars:/host/etc/yum/vars:ro \
142+ -v /etc/dnf/vars:/host/etc/dnf/vars:ro \
143+ -v /etc/rhsm:/host/etc/rhsm:ro \
144+ -e HOST_ROOT=/host/root \
145+ --security-opt apparmor:unconfined \
146+ --cap-add=SYS_ADMIN \
147+ --cap-add=SYS_RESOURCE \
148+ --cap-add=SYS_PTRACE \
149+ --cap-add=IPC_LOCK \
150+ --cap-add=CHOWN \
151+ gcr.io/datadoghq/agent:latest
152+ ` ` `
153+
154+ Replace `<DATADOG_API_KEY>` with your [Datadog API key][1].
155+
156+ [1] : https://app.datadoghq.com/organization-settings/api-keys
157+
158+ {{% /tab %}}
159+ {{% tab "Docker Compose" %}}
160+
161+ If using `docker-compose`, make the following additions to the Datadog Agent service.
162+
163+ ` ` ` yaml
164+ version: '3'
165+ services:
166+ datadog:
167+ image: "gcr.io/datadoghq/agent:latest"
168+ environment:
169+ - DD_GPU_ENABLED=true
170+ - DD_API_KEY=<DATADOG_API_KEY>
171+ volumes:
172+ - /var/run/docker.sock:/var/run/docker.sock:ro
173+ - /proc/:/host/proc/:ro
174+ - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
175+ deploy:
176+ resources:
177+ reservations:
178+ devices:
179+ - driver: nvidia
180+ count: all
181+ capabilities: [gpu]
182+ ` ` `
183+
184+ To enable advanced eBPF metrics, use the following configuration for the required permissions to run eBPF programs :
185+
186+ ` ` ` yaml
187+ version: '3'
188+ services:
189+ datadog:
190+ image: "gcr.io/datadoghq/agent:latest"
191+ environment:
192+ - DD_GPU_MONITORING_ENABLED=true # only for advanced eBPF metrics
193+ - DD_GPU_ENABLED=true
194+ - DD_API_KEY=<DATADOG_API_KEY>
195+ - HOST_ROOT=/host/root
196+ volumes:
197+ - /var/run/docker.sock:/var/run/docker.sock:ro
198+ - /proc/:/host/proc/:ro
199+ - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
200+ - /sys/kernel/debug:/sys/kernel/debug
201+ - /:/host/root
202+ cap_add:
203+ - SYS_ADMIN
204+ - SYS_RESOURCE
205+ - SYS_PTRACE
206+ - IPC_LOCK
207+ - CHOWN
208+ security_opt:
209+ - apparmor:unconfined
210+ deploy:
211+ resources:
212+ reservations:
213+ devices:
214+ - driver: nvidia
215+ count: all
216+ capabilities: [gpu]
217+ ` ` `
218+
219+ {{% /tab %}}
220+ {{% tab "Linux (non-containerized)" %}}
221+
222+ Modify your `/etc/datadog-agent/datadog.yaml` file to enable GPU monitoring
223+
224+ ` ` ` yaml
225+ gpu:
226+ enabled: true
227+ ` ` `
228+
229+ To enable advanced eBPF metrics, follow these steps :
230+
231+ 1. If `/etc/datadog-agent/system-probe.yaml` does not exist, create it from `system-probe.yaml.example` :
232+
233+ ` ` ` shell
234+ sudo -u dd-agent install -m 0640 /etc/datadog-agent/system-probe.yaml.example /etc/datadog-agent/system-probe.yaml
235+ ` ` `
236+
237+ 2. Edit `/etc/datadog-agent/system-probe.yaml` and enable GPU monitoring in system-probe :
238+
239+ ` ` ` yaml
240+ gpu_monitoring:
241+ enabled: true
242+ ` ` `
243+
244+ 3. Restart the Datadog Agent
245+
246+ ` ` ` shell
247+ sudo systemctl restart datadog-agent
248+ ` ` `
249+
250+ {{% /tab %}}
251+
100252{{< /tabs >}}
101253
102254# # Set up GPU Monitoring on a mixed cluster
0 commit comments