You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge pull request #130 from VectorInstitute/feature/killarney-migration
* Adapted Slurm configuration changes to Killarney cluster while keeping support for Bon Echo
* Removed redundant code in generated script
* Added RDMA support
* Added more default value support in environment config
* Sorted list command
* Update config README to track cached model weights
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
13
+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
14
+
15
+
**NOTE**: Supported models on Killarney are tracked [here](vec_inf/config/README.md)
14
16
15
17
## Installation
16
18
If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
17
19
18
20
```bash
19
21
pip install vec-inf
20
22
```
21
-
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.9.2`.
23
+
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.10.1.1`.
22
24
23
25
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
24
26
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.
@@ -42,23 +44,26 @@ You should see an output like the following:
**NOTE**: On Vector Killarney Cluster environment, the following fields are required:
48
+
*`--account`, `-A`: The Slurm account, this argument can be set to default by setting environment variable `VEC_INF_ACCOUNT`.
49
+
*`--work-dir`, `-D`: A working directory other than your home directory, this argument can be set to default by seeting environment variable `VEC_INF_WORK_DIR`.
45
50
46
51
#### Overrides
47
52
48
53
Models that are already supported by `vec-inf` would be launched using the cached configuration (set in [slurm_vars.py](vec_inf/client/slurm_vars.py)) or [default configuration](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be
49
-
overriden. For example, if `qos` is to be overriden:
54
+
overriden. For example, if `resource-type` is to be overriden:
For the full list of vLLM engine arguments, you can find them [here](https://docs.vllm.ai/en/stable/serving/engine_args.html), make sure you select the correct vLLM version.
66
+
For the full list of `vllm serve` arguments, you can find them [here](https://docs.vllm.ai/en/stable/cli/serve.html), make sure you select the correct vLLM version.
62
67
63
68
#### Custom models
64
69
@@ -85,14 +90,12 @@ models:
85
90
gpus_per_node: 1
86
91
num_nodes: 1
87
92
vocab_size: 152064
88
-
qos: m2
89
93
time: 08:00:00
90
-
partition: a40
94
+
resource_type: l40s # You can also leave this field empty if your environment has a default type of resource to use
* There are other parameters that can also be added to the config but not shown in this example, check the [`ModelConfig`](vec_inf/client/config.py) for details.
106
-
* Check [vLLM Engine Arguments](https://docs.vllm.ai/en/stable/serving/engine_args.html) for the full list of available vLLM engine arguments, the default parallel size for any parallelization is default to 1, so none of the sizes were set specifically in this example
109
+
* Check [`vllm serve`](https://docs.vllm.ai/en/stable/cli/serve.html) for the full list of available vLLM engine arguments, the default parallel size for any parallelization is default to 1, so none of the sizes were set specifically in this example
107
110
* For GPU partitions with non-Ampere architectures, e.g. `rtx6000`, `t4v2`, BF16 isn't supported. For models that have BF16 as the default type, when using a non-Ampere GPU, use FP16 instead, i.e. `--dtype: float16`
108
-
* Setting `--compilation-config` to `3` currently breaks multi-node model launches, so we don't set them for models that require multiple nodes of GPUs.
* `metrics`: Streams performance metrics to the console.
115
117
* `shutdown`: Shutdown a model by providing its Slurm job ID.
116
118
* `list`: List all available model names, or view the default/cached configuration of a specific model.
117
-
* `cleanup`: Remove old log directories. You can filter by `--model-family`, `--model-name`, `--job-id`, and/or `--before-job-id`. Use `--dry-run` to preview what would be deleted.
119
+
* `cleanup`: Remove old log directories, use `--help` to see the supported filters. Use `--dry-run` to preview what would be deleted.
118
120
119
121
For more details on the usage of these commands, refer to the [User Guide](https://vectorinstitute.github.io/vector-inference/user_guide/)
120
122
@@ -125,6 +127,7 @@ Example:
125
127
```python
126
128
>>> from vec_inf.api import VecInfClient
127
129
>>> client = VecInfClient()
130
+
>>> # Assume VEC_INF_ACCOUNT and VEC_INF_WORK_DIR is set
Copy file name to clipboardExpand all lines: docs/index.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
# Vector Inference: Easy inference on Slurm clusters
2
2
3
-
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
3
+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/stable/). **This package runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
4
+
5
+
**NOTE**: Supported models on Killarney are tracked [here](vec_inf/config/README.md)
4
6
5
7
## Installation
6
8
@@ -10,7 +12,7 @@ If you are using the Vector cluster environment, and you don't need any customiz
10
12
pip install vec-inf
11
13
```
12
14
13
-
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.9.2`.
15
+
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.10.1.1`.
14
16
15
17
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
16
18
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.
0 commit comments