Skip to content

Commit e1af7d1

Browse files
authored
Merge branch 'main' into pre-commit-ci-update-config
2 parents 731d3f4 + 1fb83d2 commit e1af7d1

File tree

7 files changed

+1961
-1740
lines changed

7 files changed

+1961
-1740
lines changed

.github/workflows/docker.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ jobs:
4545
images: vectorinstitute/vector-inference
4646

4747
- name: Build and push Docker image
48-
uses: docker/build-push-action@1dc73863535b631f98b2378be8619f83b136f4a0
48+
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83
4949
with:
5050
context: .
5151
file: ./Dockerfile

Dockerfile

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
FROM nvidia/cuda:12.4.1-devel-ubuntu20.04
1+
FROM nvidia/cuda:12.6.3-cudnn-devel-ubuntu22.04
22

33
# Non-interactive apt-get commands
44
ARG DEBIAN_FRONTEND=noninteractive
55

66
# No GPUs visible during build
77
ARG CUDA_VISIBLE_DEVICES=none
88

9-
# Specify CUDA architectures -> 7.5: RTX 6000 & T4, 8.0: A100, 8.6+PTX
10-
ARG TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6+PTX"
9+
# Specify CUDA architectures -> 7.5: Quadro RTX 6000 & T4, 8.0: A100, 8.6: A40, 8.9: L40S, 9.0: H100
10+
ARG TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6;8.9;9.0+PTX"
1111

1212
# Set the Python version
1313
ARG PYTHON_VERSION=3.10.12
@@ -41,10 +41,6 @@ COPY . /vec-inf
4141

4242
# Install project dependencies with build requirements
4343
RUN PIP_INDEX_URL="https://download.pytorch.org/whl/cu121" uv pip install --system -e .[dev]
44-
# Install FlashAttention
45-
RUN python3.10 -m pip install flash-attn --no-build-isolation
46-
# Install FlashInfer
47-
RUN python3.10 -m pip install flashinfer-python -i https://flashinfer.ai/whl/cu124/torch2.6/
4844

4945
# Final configuration
5046
RUN mkdir -p /vec-inf/nccl && \

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
[![code checks](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml)
88
[![docs](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml)
99
[![codecov](https://codecov.io/github/VectorInstitute/vector-inference/branch/main/graph/badge.svg?token=NI88QSIGAC)](https://app.codecov.io/github/VectorInstitute/vector-inference/tree/main)
10-
[![vLLM](https://img.shields.io/badge/vllm-0.8.5.post1-blue)](https://docs.vllm.ai/en/v0.8.5.post1/index.html)
10+
[![vLLM](https://img.shields.io/badge/vllm-0.9.2)](https://docs.vllm.ai/en/v0.9.2/index.html)
1111
![GitHub License](https://img.shields.io/github/license/VectorInstitute/vector-inference)
1212

1313
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **All scripts in this repository runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
@@ -18,7 +18,7 @@ If you are using the Vector cluster environment, and you don't need any customiz
1818
```bash
1919
pip install vec-inf
2020
```
21-
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.8.5.post1`.
21+
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.9.2`.
2222

2323
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
2424
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.

docs/api.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,15 @@ This section documents the Python API for vector-inference.
1010
show_root_full_path: true
1111
members: true
1212

13+
## Model Config
14+
15+
::: vec_inf.client.config.ModelConfig
16+
options:
17+
show_root_heading: true
18+
show_root_full_path: true
19+
members: true
20+
21+
1322
## Data Models
1423

1524
::: vec_inf.client.models

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ If you are using the Vector cluster environment, and you don't need any customiz
1010
pip install vec-inf
1111
```
1212

13-
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.8.5.post1`.
13+
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.9.2`.
1414

1515
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
1616
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "vec-inf"
3-
version = "0.6.1"
3+
version = "0.7.0"
44
description = "Efficient LLM inference on Slurm clusters using vLLM."
55
readme = "README.md"
66
authors = [{name = "Marshall Wang", email = "marshall.wang@vectorinstitute.ai"}]
@@ -40,8 +40,8 @@ docs = [
4040
[project.optional-dependencies]
4141
dev = [
4242
"xgrammar>=0.1.11",
43-
"torch>=2.5.1",
44-
"vllm>=0.7.3",
43+
"torch>=2.7.0",
44+
"vllm>=0.9.2",
4545
"vllm-nccl-cu12>=2.18,<2.19",
4646
"ray>=2.40.0",
4747
"cupy-cuda12x==12.1.0"

0 commit comments

Comments
 (0)