Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions micro-benchmarks/nccl-tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ The NCCL tests are packaged in a container.
> |`CUDA_VERSION` | `12.8.1` | |
> |`GDRCOPY_VERSION` | `v2.5.1` | [link](https://github.com/NVIDIA/gdrcopy) |
> |`EFA_INSTALLER_VERSION`| `1.43.2` | [link](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-enable) |
> |`AWS_OFI_NCCL_VERSION` | `v1.16.3` | [link](https://github.com/aws/aws-ofi-nccl) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you still add a line to the readme that shows folks how they can install OFI NCCL version (and why this was removed — because it’s now bundled in with efa installation)?

> |`NCCL_VERSION` | `v2.27.7-1` | [link](https://github.com/NVIDIA/nccl) |
> |`NCCL_TESTS_VERSION` | `v2.16.9` | [link](https://github.com/NVIDIA/nccl-tests) |

Expand All @@ -47,10 +46,9 @@ You must pick each version of the library and set them as variables before proce
```bash
GDRCOPY_VERSION=v2.5.1
EFA_INSTALLER_VERSION=1.43.2
AWS_OFI_NCCL_VERSION=v1.16.3
NCCL_VERSION=v2.27.7-1
NCCL_TESTS_VERSION=v2.16.9
TAG="efa${EFA_INSTALLER_VERSION}-ofi${AWS_OFI_NCCL_VERSION}-nccl${NCCL_VERSION}-tests${NCCL_TESTS_VERSION}"
TAG="efa${EFA_INSTALLER_VERSION}-nccl${NCCL_VERSION}-tests${NCCL_TESTS_VERSION}"
CONTAINER_IMAGE_NAME_TAG="nccl-tests:${TAG}"
```

Expand All @@ -62,7 +60,6 @@ If you wish to build the containar image by yourself, follow this section. Alter
```bash
docker build -f nccl-tests.Dockerfile \
--build-arg="EFA_INSTALLER_VERSION=${EFA_INSTALLER_VERSION}" \
--build-arg="AWS_OFI_NCCL_VERSION=${AWS_OFI_NCCL_VERSION}" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either comment out, or add a comment below that shows how folks with older efa versions can build with ofi Nccl installation.

just for the short term, until we get a couple more efa installer versions.

--build-arg="NCCL_VERSION=${NCCL_VERSION}" \
--build-arg="NCCL_TESTS_VERSION=${NCCL_TESTS_VERSION}" \
-t ${CONTAINER_IMAGE_NAME_TAG} \
Expand Down Expand Up @@ -262,7 +259,7 @@ To change the type of collective to test, modify the line with `srun` in the fil
kubectl logs -f $(kubectl get pods | grep launcher | cut -d ' ' -f 1)
```

The following is an example exerpt from the logs of a NCCL all_reduce_perf test, executed on a cluster with two p5.48xlarge instances (using EFA_INSTALLER_VERSION=1.28.0, AWS_OFI_NCCL_VERSION=v1.7.3-aws, NCCL_TESTS_VERSION=master, ARG NCCL_VERSION=2.18.5):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

The following is an example exerpt from the logs of a NCCL all_reduce_perf test, executed on a cluster with two p5.48xlarge instances (using EFA_INSTALLER_VERSION=1.28.0, NCCL_TESTS_VERSION=master, ARG NCCL_VERSION=2.18.5):

```log
[1,0]<stdout>:# out-of-place in-place
Expand Down
1 change: 0 additions & 1 deletion micro-benchmarks/nccl-tests/nccl-tests.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ FROM nvcr.io/nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04

ARG GDRCOPY_VERSION=v2.5.1
ARG EFA_INSTALLER_VERSION=1.43.2
ARG AWS_OFI_NCCL_VERSION=v1.16.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

ARG NCCL_VERSION=v2.27.7-1
ARG NCCL_TESTS_VERSION=v2.16.9

Expand Down