Skip to content

Commit 4097f05

Browse files
authored
Merge branch 'develop' into feature/llguidance
2 parents d68e3e2 + 6f42c37 commit 4097f05

File tree

171 files changed

+10460
-2318
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

171 files changed

+10460
-2318
lines changed

.github/pull_request_template.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@
66

77
<!-- Describe the purpose and goals of this pull request. -->
88

9+
> :bulb: If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)
10+
11+
> :bulb: 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)
12+
913
## Modifications
1014

1115
<!-- Detail the changes made in this pull request. -->

.github/workflows/_base_test.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,13 @@ jobs:
206206
check_service 90
207207
python -m pytest -sv test_max_waiting_time.py || TEST_EXIT_CODE=1
208208
209+
export TEMPLATE=TOKEN_NORMAL
210+
curl -X POST http://0.0.0.0:${FLASK_PORT}/switch \
211+
-H "Content-Type: application/json" \
212+
-d "{\"--model\": \"/MODELDATA/ERNIE-4.5-VL-28B-A3B-Thinking\", \"--reasoning-parser\": \"ernie-45-vl-thinking\", \"--tool-call-parser\": \"ernie-45-vl-thinking\", \"--tensor-parallel-size\": 1, \"--quantization\": \"wint4\", \"--max-model-len\": 131072, \"--max-num-seqs\": 32}"
213+
check_service 90
214+
python -m pytest -sv test_prompt_ids.py || TEST_EXIT_CODE=1
215+
209216
popd
210217
echo "TEST_EXIT_CODE=${TEST_EXIT_CODE}" >> /workspace/FastDeploy/exit_code.env
211218
'

.github/workflows/_unit_test_coverage.yml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ jobs:
105105
FD_CACHE_QUEUE_PORT=$((8098 + DEVICE_PORT * 100))
106106
FD_ROUTER_PORT=$((8048 + DEVICE_PORT * 100))
107107
FD_CONNECTOR_PORT=$((8038 + DEVICE_PORT * 100))
108+
FD_RDMA_PORT=$((8028 + DEVICE_PORT * 100))
108109
echo "Test ENV Parameter:"
109110
echo "========================================================="
110111
echo "FLASK_PORT=${FLASK_PORT}"
@@ -114,6 +115,7 @@ jobs:
114115
echo "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}"
115116
echo "FD_ROUTER_PORT=${FD_ROUTER_PORT}"
116117
echo "FD_CONNECTOR_PORT=${FD_CONNECTOR_PORT}"
118+
echo "FD_RDMA_PORT=${FD_RDMA_PORT}"
117119
echo "DEVICES=${DEVICES}"
118120
echo "========================================================="
119121
@@ -149,9 +151,15 @@ jobs:
149151
docker rm -f ${runner_name} || true
150152
fi
151153
154+
export RDMA_DEVICES=$(find /dev/infiniband/uverbs* -maxdepth 1 -not -type d | xargs -I{} echo '--device {}:{}')
155+
152156
docker run --rm --net=host \
153157
--name ${runner_name} \
154-
--cap-add=SYS_PTRACE --shm-size=64G \
158+
--cap-add=SYS_PTRACE --cap-add=IPC_LOCK \
159+
--shm-size=64G \
160+
${RDMA_DEVICES} \
161+
--device=/dev/infiniband/rdma_cm \
162+
--ulimit memlock=-1:-1 \
155163
-v $(pwd):/workspace -w /workspace \
156164
-v "${CACHE_DIR}/gitconfig:/etc/gitconfig:ro" \
157165
-v "${CACHE_DIR}/.cache:/root/.cache" \
@@ -165,6 +173,8 @@ jobs:
165173
-e "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}" \
166174
-e "FD_ROUTER_PORT=${FD_ROUTER_PORT}" \
167175
-e "FD_CONNECTOR_PORT=${FD_CONNECTOR_PORT}" \
176+
-e "FD_RDMA_PORT=${FD_RDMA_PORT}" \
177+
-e "CLEAN_CUDA=1" \
168178
-e TZ="Asia/Shanghai" \
169179
-e "fd_wheel_url=${fd_wheel_url}" \
170180
-e "BASE_REF=${BASE_REF}" \

.github/workflows/ci_xpu.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828

2929
- name: Code Checkout
3030
env:
31-
docker_image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.2.0
31+
docker_image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:ci
3232
run: |
3333
REPO="https://github.com/${{ github.repository }}.git"
3434
FULL_REPO="${{ github.repository }}"
@@ -59,7 +59,7 @@ jobs:
5959
6060
- name: Run CI unittest
6161
env:
62-
docker_image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.2.0
62+
docker_image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:ci
6363
run: |
6464
runner_name="${{ runner.name }}"
6565
last_char="${runner_name: -1}"

benchmarks/README.md

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ python benchmark_serving.py \
5858
--port 9812 \
5959
--dataset-name EBChat \
6060
--dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
61-
--hyperparameter-path yaml/request_yaml/eb45t-32k.yaml \
61+
--hyperparameter-path yaml/request_yaml/eb45-32k.yaml \
6262
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
6363
--metric-percentiles 80,95,99,99.9,99.95,99.99 \
6464
--num-prompts 1 \
@@ -78,7 +78,7 @@ python benchmark_serving.py \
7878
--port 9812 \
7979
--dataset-name EBChat \
8080
--dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
81-
--hyperparameter-path yaml/request_yaml/eb45t-32k.yaml \
81+
--hyperparameter-path yaml/request_yaml/eb45-32k.yaml \
8282
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
8383
--metric-percentiles 80,95,99,99.9,99.95,99.99 \
8484
--num-prompts 2000 \
@@ -100,7 +100,7 @@ python benchmark_serving.py \
100100
--port 9812 \
101101
--dataset-name EBChat \
102102
--dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
103-
--hyperparameter-path yaml/request_yaml/eb45t-32k.yaml \
103+
--hyperparameter-path yaml/request_yaml/eb45-32k.yaml \
104104
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
105105
--metric-percentiles 80,95,99,99.9,99.95,99.99 \
106106
--num-prompts 2000 \
@@ -135,3 +135,30 @@ python benchmarks/benchmark_mtp.py \
135135
--dataset-name:指定数据集类,指定为"EBChat"可读取转存的FD格式数据集
136136
--dataset-path:测试数据集路径
137137
```
138+
139+
### 指定输入输出长度,构造随机纯文输入测试
140+
141+
相关参数:
142+
- --dataset-name:指定数据集类,指定为"random"可构造随机纯文输入
143+
- --random-input-len:随机输入长度,对应英文单词数,默认200
144+
- --random-output-len:随机输出长度,默认1024
145+
- --random-range-ratio:输入输出长度变化范围比,[length *(1 - range_ratio), length* (1 + range_ratio)],默认0.1
146+
147+
#### 使用方式:
148+
```bash
149+
python benchmark_serving.py \
150+
--backend openai-chat \
151+
--model EB45T \
152+
--endpoint /v1/chat/completions \
153+
--host 0.0.0.0 \
154+
--port 9812 \
155+
--dataset-name random \
156+
--random-input-len 200 \
157+
--random-output-len 1024 \
158+
--random-range-ratio 0.1 \
159+
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
160+
--metric-percentiles 80,95,99,99.9,99.95,99.99 \
161+
--num-prompts 2000 \
162+
--max-concurrency 100 \
163+
--save-result > infer_log.txt 2>&1 &
164+
```

benchmarks/backend_request_func.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ class RequestFuncInput:
5252
language: Optional[str] = None
5353
debug: bool = False
5454
response_format: Optional[dict] = None
55+
random_flag: bool = False
5556

5657

5758
@dataclass
@@ -103,6 +104,13 @@ async def async_request_eb_openai_chat_completions(
103104
# 超参由yaml传入
104105
payload.update(request_func_input.hyper_parameters)
105106

107+
# 随机输入开关
108+
if request_func_input.random_flag:
109+
payload["max_tokens"] = request_func_input.output_len
110+
metadata = payload.get("metadata", {})
111+
metadata["min_tokens"] = request_func_input.output_len
112+
payload["metadata"] = metadata
113+
106114
if request_func_input.ignore_eos:
107115
payload["ignore_eos"] = request_func_input.ignore_eos
108116

0 commit comments

Comments
 (0)