Skip to content

Commit 07358cb

Browse files
authored
Merge pull request #130 from VectorInstitute/feature/killarney-migration
* Adapted Slurm configuration changes to Killarney cluster while keeping support for Bon Echo * Removed redundant code in generated script * Added RDMA support * Added more default value support in environment config * Sorted list command * Update config README to track cached model weights
2 parents a9554e4 + 69ea120 commit 07358cb

28 files changed

+4053
-3486
lines changed

.github/workflows/unit_tests.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,10 @@ jobs:
7171
run: |
7272
uv run pytest tests/test_imports.py
7373
74+
- name: Import Codecov GPG public key
75+
run: |
76+
gpg --keyserver keyserver.ubuntu.com --recv-keys 806BB28AED779869
77+
7478
- name: Upload coverage to Codecov
7579
uses: codecov/codecov-action@v5.5.0
7680
with:

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,7 @@ collect_env.py
152152

153153
# build files
154154
dist/
155+
156+
# type stubs
157+
stubs/
158+
mypy.ini

Dockerfile

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM nvidia/cuda:12.6.3-cudnn-devel-ubuntu22.04
1+
FROM nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04
22

33
# Non-interactive apt-get commands
44
ARG DEBIAN_FRONTEND=noninteractive
@@ -35,16 +35,29 @@ RUN wget https://bootstrap.pypa.io/get-pip.py && \
3535
rm get-pip.py && \
3636
python3.10 -m pip install --upgrade pip setuptools wheel uv
3737

38+
# Install Infiniband/RDMA support
39+
RUN apt-get update && apt-get install -y \
40+
libibverbs1 libibverbs-dev ibverbs-utils \
41+
librdmacm1 librdmacm-dev rdmacm-utils \
42+
&& rm -rf /var/lib/apt/lists/*
43+
44+
# Set up RDMA environment (these will persist in the final container)
45+
ENV LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH"
46+
ENV UCX_NET_DEVICES=all
47+
ENV NCCL_IB_DISABLE=0
48+
3849
# Set up project
3950
WORKDIR /vec-inf
4051
COPY . /vec-inf
4152

4253
# Install project dependencies with build requirements
43-
RUN PIP_INDEX_URL="https://download.pytorch.org/whl/cu121" uv pip install --system -e .[dev]
54+
RUN PIP_INDEX_URL="https://download.pytorch.org/whl/cu128" uv pip install --system -e .[dev]
4455

4556
# Final configuration
4657
RUN mkdir -p /vec-inf/nccl && \
4758
mv /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 /vec-inf/nccl/libnccl.so.2.18.1
59+
ENV VLLM_NCCL_SO_PATH=/vec-inf/nccl/libnccl.so.2.18.1
60+
ENV NCCL_DEBUG=INFO
4861

4962
# Set the default command to start an interactive shell
5063
CMD ["bash"]

README.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,18 +7,20 @@
77
[![code checks](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml)
88
[![docs](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml)
99
[![codecov](https://codecov.io/github/VectorInstitute/vector-inference/branch/main/graph/badge.svg?token=NI88QSIGAC)](https://app.codecov.io/github/VectorInstitute/vector-inference/tree/main)
10-
[![vLLM](https://img.shields.io/badge/vllm-0.9.2)](https://docs.vllm.ai/en/v0.9.2/index.html)
10+
[![vLLM](https://img.shields.io/badge/vLLM-0.10.1.1-blue)](https://docs.vllm.ai/en/v0.10.1.1/)
1111
![GitHub License](https://img.shields.io/github/license/VectorInstitute/vector-inference)
1212

13-
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
13+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
14+
15+
**NOTE**: Supported models on Killarney are tracked [here](vec_inf/config/README.md)
1416

1517
## Installation
1618
If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
1719

1820
```bash
1921
pip install vec-inf
2022
```
21-
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.9.2`.
23+
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.10.1.1`.
2224

2325
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
2426
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.
@@ -42,23 +44,26 @@ You should see an output like the following:
4244

4345
<img width="600" alt="launch_image" src="https://github.com/user-attachments/assets/a72a99fd-4bf2-408e-8850-359761d96c4f">
4446

47+
**NOTE**: On Vector Killarney Cluster environment, the following fields are required:
48+
* `--account`, `-A`: The Slurm account, this argument can be set to default by setting environment variable `VEC_INF_ACCOUNT`.
49+
* `--work-dir`, `-D`: A working directory other than your home directory, this argument can be set to default by seeting environment variable `VEC_INF_WORK_DIR`.
4550

4651
#### Overrides
4752

4853
Models that are already supported by `vec-inf` would be launched using the cached configuration (set in [slurm_vars.py](vec_inf/client/slurm_vars.py)) or [default configuration](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be
49-
overriden. For example, if `qos` is to be overriden:
54+
overriden. For example, if `resource-type` is to be overriden:
5055

5156
```bash
52-
vec-inf launch Meta-Llama-3.1-8B-Instruct --qos <new_qos>
57+
vec-inf launch Meta-Llama-3.1-8B-Instruct --resource-type <new_resource_type>
5358
```
5459

55-
To overwrite default vLLM engine arguments, you can specify the engine arguments in a comma separated string:
60+
To overwrite default `vllm serve` arguments, you can specify the arguments in a comma separated string:
5661

5762
```bash
5863
vec-inf launch Meta-Llama-3.1-8B-Instruct --vllm-args '--max-model-len=65536,--compilation-config=3'
5964
```
6065

61-
For the full list of vLLM engine arguments, you can find them [here](https://docs.vllm.ai/en/stable/serving/engine_args.html), make sure you select the correct vLLM version.
66+
For the full list of `vllm serve` arguments, you can find them [here](https://docs.vllm.ai/en/stable/cli/serve.html), make sure you select the correct vLLM version.
6267

6368
#### Custom models
6469

@@ -85,14 +90,12 @@ models:
8590
gpus_per_node: 1
8691
num_nodes: 1
8792
vocab_size: 152064
88-
qos: m2
8993
time: 08:00:00
90-
partition: a40
94+
resource_type: l40s # You can also leave this field empty if your environment has a default type of resource to use
9195
model_weights_parent_dir: /h/<username>/model-weights
9296
vllm_args:
9397
--max-model-len: 1010000
9498
--max-num-seqs: 256
95-
--compilation-config: 3
9699
```
97100
98101
You would then set the `VEC_INF_MODEL_CONFIG` path using:
@@ -103,9 +106,8 @@ export VEC_INF_MODEL_CONFIG=/h/<username>/my-model-config.yaml
103106

104107
**NOTE**
105108
* There are other parameters that can also be added to the config but not shown in this example, check the [`ModelConfig`](vec_inf/client/config.py) for details.
106-
* Check [vLLM Engine Arguments](https://docs.vllm.ai/en/stable/serving/engine_args.html) for the full list of available vLLM engine arguments, the default parallel size for any parallelization is default to 1, so none of the sizes were set specifically in this example
109+
* Check [`vllm serve`](https://docs.vllm.ai/en/stable/cli/serve.html) for the full list of available vLLM engine arguments, the default parallel size for any parallelization is default to 1, so none of the sizes were set specifically in this example
107110
* For GPU partitions with non-Ampere architectures, e.g. `rtx6000`, `t4v2`, BF16 isn't supported. For models that have BF16 as the default type, when using a non-Ampere GPU, use FP16 instead, i.e. `--dtype: float16`
108-
* Setting `--compilation-config` to `3` currently breaks multi-node model launches, so we don't set them for models that require multiple nodes of GPUs.
109111

110112
#### Other commands
111113

@@ -114,7 +116,7 @@ export VEC_INF_MODEL_CONFIG=/h/<username>/my-model-config.yaml
114116
* `metrics`: Streams performance metrics to the console.
115117
* `shutdown`: Shutdown a model by providing its Slurm job ID.
116118
* `list`: List all available model names, or view the default/cached configuration of a specific model.
117-
* `cleanup`: Remove old log directories. You can filter by `--model-family`, `--model-name`, `--job-id`, and/or `--before-job-id`. Use `--dry-run` to preview what would be deleted.
119+
* `cleanup`: Remove old log directories, use `--help` to see the supported filters. Use `--dry-run` to preview what would be deleted.
118120

119121
For more details on the usage of these commands, refer to the [User Guide](https://vectorinstitute.github.io/vector-inference/user_guide/)
120122

@@ -125,6 +127,7 @@ Example:
125127
```python
126128
>>> from vec_inf.api import VecInfClient
127129
>>> client = VecInfClient()
130+
>>> # Assume VEC_INF_ACCOUNT and VEC_INF_WORK_DIR is set
128131
>>> response = client.launch_model("Meta-Llama-3.1-8B-Instruct")
129132
>>> job_id = response.slurm_job_id
130133
>>> status = client.get_status(job_id)

docs/index.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Vector Inference: Easy inference on Slurm clusters
22

3-
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
3+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/stable/). **This package runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
4+
5+
**NOTE**: Supported models on Killarney are tracked [here](vec_inf/config/README.md)
46

57
## Installation
68

@@ -10,7 +12,7 @@ If you are using the Vector cluster environment, and you don't need any customiz
1012
pip install vec-inf
1113
```
1214

13-
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.9.2`.
15+
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.10.1.1`.
1416

1517
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
1618
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.

0 commit comments

Comments
 (0)