Skip to content

Commit c0b2dbb

Browse files
committed
Merge branch 'main' of https://github.com/vllm-project/tpu-inference into jiries/llama-guard-4-text
2 parents b6d2545 + c0c8192 commit c0b2dbb

File tree

74 files changed

+1585
-434
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+1585
-434
lines changed

.buildkite/README.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,9 @@ To support this requirement, each model and feature will go through a series of
2222
# Adding a new model to CI
2323
## Adding a TPU-optimized model
2424
TPU-optimized models are models we rewrite the model definition as opposed to using the model definition from the vLLM upstream. These models will go through benchmark on top of unit and integration (accuracy) tests. To add a TPU-optimized model to CI, model owners can use the prepared [add_model_to_ci.py](pipeline_generation/add_model_to_ci.py) script. The script will populate a buildkite yaml config file in the `.buildkite/models` directory; config files under this directory will be integrated to our pipeline automatically. The python script takes 2 arguments:
25-
- **model_name**: this is the **full name** of your model on Hugging Face. Please ensure to use the **full name** (ex: `meta-llama/Llama-3.1-8B` instead of `Llama-3.1-8B`) or else we won't be able to find your model.
26-
- **queue**: this is the queue you want to run on (ex: `tpu_v6e_queue`)
25+
- **--model-name**: this is the **full name** of your model on Hugging Face. Please ensure to use the **full name** (ex: `meta-llama/Llama-3.1-8B` instead of `Llama-3.1-8B`) or else we won't be able to find your model.
26+
- **--queue**: this is the queue you want to run on (ex: `tpu_v6e_queue`)
27+
- **--category**: this parameter allows you to set the model category, with the following options available: "text-only" or "multimodel". (default: "text-only")
2728

2829
```bash
2930
python add_model_to_ci.py --model-name <MODEL_NAME> --queue <QUEUE_NAME>
@@ -36,8 +37,9 @@ In the generated yml file, there are three TODOs that will need your input:
3637

3738
## Adding a vLLM-native model
3839
vLLM-native models are models using the model definition from the vLLM upstream. These models will not go through benchmark on our pipeline. To add a vLLM-native model to CI, model owners can use the prepared [add_model_to_ci.py](pipeline_generation/add_model_to_ci.py) script. The script will populate a buildkite yaml config file in the `.buildkite/models` directory; config files under this directory will be integrated to our pipeline automatically. The python script takes 3 arguments:
39-
- **model_name**: this is the **full name** of your model on Hugging Face. Please ensure to use the **full name** (ex: `meta-llama/Llama-3.1-8B` instead of `Llama-3.1-8B`) or else we won't be able to find your model.
40-
- **queue**: this is the queue you want to run on (ex: `tpu_v6e_queue`)
40+
- **--model-name**: this is the **full name** of your model on Hugging Face. Please ensure to use the **full name** (ex: `meta-llama/Llama-3.1-8B` instead of `Llama-3.1-8B`) or else we won't be able to find your model.
41+
- **--queue**: this is the queue you want to run on (ex: `tpu_v6e_queue`)
42+
- **--category**: this parameter allows you to set the model category, with the following options available: "text-only" or "multimodel". (default: "text-only")
4143

4244
```bash
4345
python add_model_to_ci.py --model-name <MODEL_NAME> --queue <QUEUE_NAME> --type vllm-native
@@ -49,8 +51,9 @@ In the generated yml file, there are two TODOs that will need your input:
4951

5052
# Adding a new feature to CI
5153
To add a new feature to CI, feature owners can use the prepared [add_feature_to_ci.py](pipeline_generation/add_feature_to_ci.py) script. The script will populate a buildkite yaml config file in the `.buildkite/features` directory; config files under this directory will be integrated to our pipeline automatically. The python script takes 2 arguments:
52-
- **feature_name**: this is the name of your feature
53-
- **queue**: this is the queue you want to run on (ex: `tpu_v6e_queue`)
54+
- **--feature-name**: this is the name of your feature
55+
- **--queue**: this is the queue you want to run on (ex: `tpu_v6e_queue`)
56+
- **--category**: this parameter allows you to set the feature category, with the following options available: "feature support matrix" or "kernel support matrix". (default: "feature support matrix")
5457

5558
```bash
5659
python add_feature_to_ci.py --feature-name <FEATURE_NAME> --queue <QUEUE_NAME>

.buildkite/features/Collective_Communication_Matmul.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# Collective Communication Matmul
2+
# kernel support matrix
23
steps:
34
- label: "Correctness tests for Collective Communication Matmul"
45
key: "Collective_Communication_Matmul_CorrectnessTest"
@@ -13,8 +14,31 @@ steps:
1314
env:
1415
CI_TARGET: "Collective Communication Matmul"
1516
CI_STAGE: "CorrectnessTest"
17+
CI_CATEGORY: "kernel support matrix"
1618
agents:
1719
queue: cpu
1820
commands:
1921
- |
2022
.buildkite/scripts/record_step_result.sh Collective_Communication_Matmul_CorrectnessTest
23+
24+
- label: "Performance tests for Collective Communication Matmul"
25+
key: "Collective_Communication_Matmul_PerformanceTest"
26+
depends_on: "record_Collective_Communication_Matmul_CorrectnessTest"
27+
soft_fail: true
28+
agents:
29+
queue: tpu_v6e_queue
30+
commands:
31+
- |
32+
buildkite-agent meta-data set "Collective_Communication_Matmul_PerformanceTest" "to be added"
33+
- label: "Record performance test result for Collective Communication Matmul"
34+
key: "record_Collective_Communication_Matmul_PerformanceTest"
35+
depends_on: "Collective_Communication_Matmul_PerformanceTest"
36+
env:
37+
CI_TARGET: "Collective Communication Matmul"
38+
CI_STAGE: "PerformanceTest"
39+
CI_CATEGORY: "kernel support matrix"
40+
agents:
41+
queue: cpu
42+
commands:
43+
- |
44+
.buildkite/scripts/record_step_result.sh Collective_Communication_Matmul_PerformanceTest

.buildkite/features/JAX-Path_Qxix_Quantization.yml

Lines changed: 0 additions & 42 deletions
This file was deleted.

.buildkite/features/MLA.yml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# MLA
2+
# feature support matrix
3+
steps:
4+
- label: "Correctness tests for MLA"
5+
key: "MLA_CorrectnessTest"
6+
soft_fail: true
7+
agents:
8+
queue: tpu_v6e_queue
9+
commands:
10+
- |
11+
buildkite-agent meta-data set "MLA_CorrectnessTest" "to be added"
12+
- label: "Record correctness test result for MLA"
13+
key: "record_MLA_CorrectnessTest"
14+
depends_on: "MLA_CorrectnessTest"
15+
env:
16+
CI_TARGET: "MLA"
17+
CI_STAGE: "CorrectnessTest"
18+
CI_CATEGORY: "feature support matrix"
19+
agents:
20+
queue: cpu
21+
commands:
22+
- |
23+
.buildkite/scripts/record_step_result.sh MLA_CorrectnessTest
24+
25+
- label: "Performance tests for MLA"
26+
key: "MLA_PerformanceTest"
27+
depends_on: "record_MLA_CorrectnessTest"
28+
soft_fail: true
29+
agents:
30+
queue: tpu_v6e_queue
31+
commands:
32+
- |
33+
buildkite-agent meta-data set "MLA_PerformanceTest" "to be added"
34+
- label: "Record performance test result for MLA"
35+
key: "record_MLA_PerformanceTest"
36+
depends_on: "MLA_PerformanceTest"
37+
env:
38+
CI_TARGET: "MLA"
39+
CI_STAGE: "PerformanceTest"
40+
CI_CATEGORY: "feature support matrix"
41+
agents:
42+
queue: cpu
43+
commands:
44+
- |
45+
.buildkite/scripts/record_step_result.sh MLA_PerformanceTest

.buildkite/features/MoE.yml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# MoE
2+
# feature support matrix
3+
steps:
4+
- label: "Correctness tests for MoE"
5+
key: "MoE_CorrectnessTest"
6+
soft_fail: true
7+
agents:
8+
queue: tpu_v6e_queue
9+
commands:
10+
- |
11+
buildkite-agent meta-data set "MoE_CorrectnessTest" "to be added"
12+
- label: "Record correctness test result for MoE"
13+
key: "record_MoE_CorrectnessTest"
14+
depends_on: "MoE_CorrectnessTest"
15+
env:
16+
CI_TARGET: "MoE"
17+
CI_STAGE: "CorrectnessTest"
18+
CI_CATEGORY: "feature support matrix"
19+
agents:
20+
queue: cpu
21+
commands:
22+
- |
23+
.buildkite/scripts/record_step_result.sh MoE_CorrectnessTest
24+
25+
- label: "Performance tests for MoE"
26+
key: "MoE_PerformanceTest"
27+
depends_on: "record_MoE_CorrectnessTest"
28+
soft_fail: true
29+
agents:
30+
queue: tpu_v6e_queue
31+
commands:
32+
- |
33+
buildkite-agent meta-data set "MoE_PerformanceTest" "to be added"
34+
- label: "Record performance test result for MoE"
35+
key: "record_MoE_PerformanceTest"
36+
depends_on: "MoE_PerformanceTest"
37+
env:
38+
CI_TARGET: "MoE"
39+
CI_STAGE: "PerformanceTest"
40+
CI_CATEGORY: "feature support matrix"
41+
agents:
42+
queue: cpu
43+
commands:
44+
- |
45+
.buildkite/scripts/record_step_result.sh MoE_PerformanceTest

.buildkite/features/Multimodal_Inputs.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# Multimodal Inputs
2+
# feature support matrix
23
steps:
34
- label: "Correctness tests for Multimodal Inputs"
45
key: "Multimodal_Inputs_CorrectnessTest"
@@ -13,6 +14,7 @@ steps:
1314
env:
1415
CI_TARGET: Multimodal Inputs
1516
CI_STAGE: "CorrectnessTest"
17+
CI_CATEGORY: "feature support matrix"
1618
agents:
1719
queue: cpu
1820
commands:
@@ -33,6 +35,7 @@ steps:
3335
env:
3436
CI_TARGET: Multimodal Inputs
3537
CI_STAGE: "PerformanceTest"
38+
CI_CATEGORY: "feature support matrix"
3639
agents:
3740
queue: cpu
3841
commands:
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Quantized Attention
2+
# kernel support matrix
3+
steps:
4+
- label: "Correctness tests for Quantized Attention"
5+
key: "Quantized_Attention_CorrectnessTest"
6+
soft_fail: true
7+
agents:
8+
queue: tpu_v6e_queue
9+
commands:
10+
- |
11+
buildkite-agent meta-data set "Quantized_Attention_CorrectnessTest" "to be added"
12+
- label: "Record correctness test result for Quantized Attention"
13+
key: "record_Quantized_Attention_CorrectnessTest"
14+
depends_on: "Quantized_Attention_CorrectnessTest"
15+
env:
16+
CI_TARGET: "Quantized Attention"
17+
CI_STAGE: "CorrectnessTest"
18+
CI_CATEGORY: "kernel support matrix"
19+
agents:
20+
queue: cpu
21+
commands:
22+
- |
23+
.buildkite/scripts/record_step_result.sh Quantized_Attention_CorrectnessTest
24+
25+
- label: "Performance tests for Quantized Attention"
26+
key: "Quantized_Attention_PerformanceTest"
27+
depends_on: "record_Quantized_Attention_CorrectnessTest"
28+
soft_fail: true
29+
agents:
30+
queue: tpu_v6e_queue
31+
commands:
32+
- |
33+
buildkite-agent meta-data set "Quantized_Attention_PerformanceTest" "to be added"
34+
- label: "Record performance test result for Quantized Attention"
35+
key: "record_Quantized_Attention_PerformanceTest"
36+
depends_on: "Quantized_Attention_PerformanceTest"
37+
env:
38+
CI_TARGET: "Quantized Attention"
39+
CI_STAGE: "PerformanceTest"
40+
CI_CATEGORY: "kernel support matrix"
41+
agents:
42+
queue: cpu
43+
commands:
44+
- |
45+
.buildkite/scripts/record_step_result.sh Quantized_Attention_PerformanceTest
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Quantized KV Cache
2+
# kernel support matrix
3+
steps:
4+
- label: "Correctness tests for Quantized KV Cache"
5+
key: "Quantized_KV_Cache_CorrectnessTest"
6+
soft_fail: true
7+
agents:
8+
queue: tpu_v6e_queue
9+
commands:
10+
- |
11+
buildkite-agent meta-data set "Quantized_KV_Cache_CorrectnessTest" "to be added"
12+
- label: "Record correctness test result for Quantized KV Cache"
13+
key: "record_Quantized_KV_Cache_CorrectnessTest"
14+
depends_on: "Quantized_KV_Cache_CorrectnessTest"
15+
env:
16+
CI_TARGET: "Quantized KV Cache"
17+
CI_STAGE: "CorrectnessTest"
18+
CI_CATEGORY: "kernel support matrix"
19+
agents:
20+
queue: cpu
21+
commands:
22+
- |
23+
.buildkite/scripts/record_step_result.sh Quantized_KV_Cache_CorrectnessTest
24+
25+
- label: "Performance tests for Quantized KV Cache"
26+
key: "Quantized_KV_Cache_PerformanceTest"
27+
depends_on: "record_Quantized_KV_Cache_CorrectnessTest"
28+
soft_fail: true
29+
agents:
30+
queue: tpu_v6e_queue
31+
commands:
32+
- |
33+
buildkite-agent meta-data set "Quantized_KV_Cache_PerformanceTest" "to be added"
34+
- label: "Record performance test result for Quantized KV Cache"
35+
key: "record_Quantized_KV_Cache_PerformanceTest"
36+
depends_on: "Quantized_KV_Cache_PerformanceTest"
37+
env:
38+
CI_TARGET: "Quantized KV Cache"
39+
CI_STAGE: "PerformanceTest"
40+
CI_CATEGORY: "kernel support matrix"
41+
agents:
42+
queue: cpu
43+
commands:
44+
- |
45+
.buildkite/scripts/record_step_result.sh Quantized_KV_Cache_PerformanceTest
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Quantized Matmul
2+
# kernel support matrix
3+
steps:
4+
- label: "Correctness tests for Quantized Matmul"
5+
key: "Quantized_Matmul_CorrectnessTest"
6+
soft_fail: true
7+
agents:
8+
queue: tpu_v6e_queue
9+
commands:
10+
- |
11+
buildkite-agent meta-data set "Quantized_Matmul_CorrectnessTest" "to be added"
12+
- label: "Record correctness test result for Quantized Matmul"
13+
key: "record_Quantized_Matmul_CorrectnessTest"
14+
depends_on: "Quantized_Matmul_CorrectnessTest"
15+
env:
16+
CI_TARGET: "Quantized Matmul"
17+
CI_STAGE: "CorrectnessTest"
18+
CI_CATEGORY: "kernel support matrix"
19+
agents:
20+
queue: cpu
21+
commands:
22+
- |
23+
.buildkite/scripts/record_step_result.sh Quantized_Matmul_CorrectnessTest
24+
25+
- label: "Performance tests for Quantized Matmul"
26+
key: "Quantized_Matmul_PerformanceTest"
27+
depends_on: "record_Quantized_Matmul_CorrectnessTest"
28+
soft_fail: true
29+
agents:
30+
queue: tpu_v6e_queue
31+
commands:
32+
- |
33+
buildkite-agent meta-data set "Quantized_Matmul_PerformanceTest" "to be added"
34+
- label: "Record performance test result for Quantized Matmul"
35+
key: "record_Quantized_Matmul_PerformanceTest"
36+
depends_on: "Quantized_Matmul_PerformanceTest"
37+
env:
38+
CI_TARGET: "Quantized Matmul"
39+
CI_STAGE: "PerformanceTest"
40+
CI_CATEGORY: "kernel support matrix"
41+
agents:
42+
queue: cpu
43+
commands:
44+
- |
45+
.buildkite/scripts/record_step_result.sh Quantized_Matmul_PerformanceTest

0 commit comments

Comments
 (0)