Skip to content

Commit cf22698

Browse files
authored
add new feature for EC-RAG (#2336)
Signed-off-by: Yongbozzz <yongbo.zhu@intel.com>
1 parent f2ecfb6 commit cf22698

File tree

138 files changed

+7657
-2188
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

138 files changed

+7657
-2188
lines changed

.github/code_spell_ignore.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
ModelIn
22
modelin
33
pressEnter
4-
PromptIn
4+
PromptIn
5+
OT

EdgeCraftRAG/Dockerfile.server

100755100644
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,6 @@ WORKDIR /home/user/
3535
RUN git clone https://github.com/openvinotoolkit/openvino.genai.git genai
3636
ENV PYTHONPATH="$PYTHONPATH:/home/user/genai/tools/llm_bench"
3737

38+
RUN python3 -m nltk.downloader -d /home/user/nltk_data punkt_tab averaged_perceptron_tagger_eng
39+
3840
ENTRYPOINT ["python3", "-m", "edgecraftrag.server"]

EdgeCraftRAG/README.md

100755100644
Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,10 @@ quality and performance.
77

88
## What's New
99

10-
1. Support Intel Arc B60 for model inference
11-
2. support KBadmin for knowledge base management
12-
3. support Experience Injection module in UI
10+
1. Support Agent component and enable deep_search agent
11+
2. Optimize pipeline execution performance with asynchronous api
12+
3. Support session list display in UI
13+
4. Support vllm-based embedding service
1314

1415
## Table of contents
1516

EdgeCraftRAG/chatqna.py

100755100644
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ async def handle_request(self, request: Request):
4444
input = await request.json()
4545
stream_opt = input.get("stream", False)
4646
input["user"] = request.headers.get("sessionid", None)
47-
chat_request = ChatCompletionRequest.parse_obj(input)
47+
chat_request = ChatCompletionRequest.construct(**input)
4848
parameters = LLMParams(
4949
max_tokens=chat_request.max_tokens if chat_request.max_tokens else 1024,
5050
top_k=chat_request.top_k if chat_request.top_k else 10,

EdgeCraftRAG/docker_compose/intel/gpu/arc/README.md

Lines changed: 8 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ This section describes how to quickly deploy and test the EdgeCraftRAG service m
1414
2. [Access the Code](#2-access-the-code)
1515
3. [Prepare models](#3-prepare-models)
1616
4. [Prepare env variables and configurations](#4-prepare-env-variables-and-configurations)
17-
5. [Deploy the Service on Arc A770 Using Docker Compose](#5-deploy-the-service-on-intel-gpu-using-docker-compose)
17+
5. [Deploy the Service on Arc GPU Using Docker Compose](#5-deploy-the-service-on-intel-gpu-using-docker-compose)
1818
6. [Access UI](#6-access-ui)
1919
7. [Cleanup the Deployment](#7-cleanup-the-deployment)
2020

@@ -66,8 +66,6 @@ modelscope download --model $LLM_MODEL --local_dir "${MODEL_PATH}/${LLM_MODEL}"
6666

6767
### 4. Prepare env variables and configurations
6868

69-
Below steps are for single Intel Arc GPU inference, if you want to setup multi Intel Arc GPUs inference, please refer to [Multi-ARC Setup](../../../../docs/Advanced_Setup.md#multi-arc-setup)
70-
7169
#### Prepare env variables for vLLM deployment
7270

7371
```bash
@@ -86,7 +84,9 @@ export NO_PROXY=${NO_PROXY},${HOST_IP},edgecraftrag,edgecraftrag-server
8684
# export HF_ENDPOINT=https://hf-mirror.com # your HF mirror endpoint"
8785

8886
# Make sure all 3 folders have 1000:1000 permission, otherwise
89-
chown 1000:1000 ${MODEL_PATH} ${PWD} # the default value of DOC_PATH and TMPFILE_PATH is PWD ,so here we give permission to ${PWD}
87+
export DOC_PATH=${PWD}/tests
88+
export TMPFILE_PATH=${PWD}/tests
89+
chown 1000:1000 ${MODEL_PATH} ${DOC_PATH} ${TMPFILE_PATH}
9090
# In addition, also make sure the .cache folder has 1000:1000 permission, otherwise
9191
chown 1000:1000 -R $HOME/.cache
9292
```
@@ -110,15 +110,10 @@ export MILVUS_ENABLED=0
110110
#### option a. Deploy the Service on Arc A770 Using Docker Compose
111111

112112
```bash
113-
export VLLM_SERVICE_PORT_0=8100 # You can set your own port for vllm service
114-
# Generate your nginx config file
115-
# nginx-conf-generator.sh requires 2 parameters: DP_NUM and output filepath
116-
bash nginx/nginx-conf-generator.sh 1 nginx/nginx.conf
117-
# set NGINX_CONFIG_PATH
118-
export NGINX_CONFIG_PATH="${PWD}/nginx/nginx.conf"
113+
export VLLM_SERVICE_PORT_A770=8086 # You can set your own port for vllm service
119114

120115
# Launch EC-RAG service with compose
121-
docker compose -f docker_compose/intel/gpu/arc/compose_vllm.yaml up -d
116+
docker compose --profile a770 -f docker_compose/intel/gpu/arc/compose.yaml up -d
122117
```
123118

124119
#### option b. Deploy the Service on Arc B60 Using Docker Compose
@@ -140,7 +135,7 @@ docker compose -f docker_compose/intel/gpu/arc/compose_vllm.yaml up -d
140135
# export MAX_MODEL_LEN=49152
141136
# export BLOCK_SIZE=64
142137
# export QUANTIZATION=fp8
143-
docker compose -f docker_compose/intel/gpu/arc/compose_vllm_b60.yaml up -d
138+
docker compose --profile b60 -f docker_compose/intel/gpu/arc/compose.yaml up -d
144139
```
145140

146141
### 6. Access UI
@@ -157,8 +152,7 @@ Below is the UI front page, for detailed operations on UI and EC-RAG settings, p
157152
To stop the containers associated with the deployment, execute the following command:
158153

159154
```
160-
docker compose -f docker_compose/intel/gpu/arc/compose_vllm.yaml down
161-
# or docker compose -f docker_compose/intel/gpu/arc/compose_vllm_b60.yaml down
155+
docker compose -f docker_compose/intel/gpu/arc/compose.yaml down
162156
```
163157

164158
All the EdgeCraftRAG containers will be stopped and then removed on completion of the "down" command.

EdgeCraftRAG/docker_compose/intel/gpu/arc/compose.yaml

100755100644
Lines changed: 99 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: Apache-2.0
3+
34
services:
45
etcd:
56
container_name: milvus-etcd
67
image: quay.io/coreos/etcd:v3.5.5
8+
restart: always
79
environment:
810
- ETCD_AUTO_COMPACTION_MODE=revision
911
- ETCD_AUTO_COMPACTION_RETENTION=1000
@@ -22,6 +24,7 @@ services:
2224
minio:
2325
container_name: milvus-minio
2426
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
27+
restart: always
2528
environment:
2629
MINIO_ACCESS_KEY: minioadmin
2730
MINIO_SECRET_KEY: minioadmin
@@ -41,14 +44,15 @@ services:
4144
milvus-standalone:
4245
container_name: milvus-standalone
4346
image: milvusdb/milvus:v2.4.6
47+
restart: always
4448
command: ["milvus", "run", "standalone"]
4549
security_opt:
4650
- seccomp:unconfined
4751
environment:
4852
ETCD_ENDPOINTS: etcd:2379
4953
MINIO_ADDRESS: minio:9000
5054
volumes:
51-
- ./milvus.yaml:/milvus/configs/milvus.yaml
55+
- ./milvus-config.yaml:/milvus/configs/milvus.yaml
5256
- ${DOCKER_VOLUME_DIRECTORY:-${PWD}}/volumes/milvus:/var/lib/milvus
5357
healthcheck:
5458
test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
@@ -71,10 +75,12 @@ services:
7175
no_proxy: ${no_proxy}
7276
http_proxy: ${http_proxy}
7377
https_proxy: ${https_proxy}
74-
vLLM_ENDPOINT: ${vLLM_ENDPOINT:-http://${HOST_IP}:${NGINX_PORT:-8086}}
78+
vLLM_ENDPOINT: ${vLLM_ENDPOINT:-http://${HOST_IP}:${VLLM_SERVICE_PORT_B60:-8086}}
79+
LLM_MODEL: ${LLM_MODEL}
7580
ENABLE_BENCHMARK: ${ENABLE_BENCHMARK:-false}
76-
MAX_MODEL_LEN: ${MAX_MODEL_LEN:-5000}
81+
MAX_MODEL_LEN: ${MAX_MODEL_LEN:-49152}
7782
CHAT_HISTORY_ROUND: ${CHAT_HISTORY_ROUND:-0}
83+
METADATA_DATABASE_URL: ${METADATA_DATABASE_URL:-""}
7884
volumes:
7985
- ${MODEL_PATH:-${PWD}}:/home/user/models
8086
- ${DOC_PATH:-${PWD}}:/home/user/docs
@@ -125,6 +131,96 @@ services:
125131
depends_on:
126132
- edgecraftrag-server
127133
- ecrag
134+
llm-serving-xpu-b60:
135+
container_name: ipex-serving-xpu-container
136+
image: intel/llm-scaler-vllm:1.1-preview
137+
privileged: true
138+
restart: always
139+
ports:
140+
- ${VLLM_SERVICE_PORT_B60:-8086}:${VLLM_SERVICE_PORT_B60:-8086}
141+
volumes:
142+
- ${MODEL_PATH}:/workspace/vllm/models
143+
devices:
144+
- /dev/dri:/dev/dri
145+
environment:
146+
DTYPE: ${DTYPE:-float16}
147+
VLLM_SERVICE_PORT_B60: ${VLLM_SERVICE_PORT_B60:-8086}
148+
ZE_AFFINITY_MASK: ${ZE_AFFINITY_MASK:-0}
149+
ENFORCE_EAGER: ${ENFORCE_EAGER:-1}
150+
TRUST_REMOTE_CODE: ${TRUST_REMOTE_CODE:-1}
151+
DISABLE_SLIDING_WINDOW: ${DISABLE_SLIDING_WINDOW:-1}
152+
GPU_MEMORY_UTIL: ${GPU_MEMORY_UTIL:-0.8}
153+
NO_ENABLE_PREFIX_CACHING: ${NO_ENABLE_PREFIX_CACHING:-1}
154+
MAX_NUM_BATCHED_TOKENS: ${MAX_NUM_BATCHED_TOKENS:-8192}
155+
DISABLE_LOG_REQUESTS: ${DISABLE_LOG_REQUESTS:-1}
156+
MAX_MODEL_LEN: ${MAX_MODEL_LEN:-49152}
157+
BLOCK_SIZE: ${BLOCK_SIZE:-64}
158+
QUANTIZATION: ${QUANTIZATION:-fp8}
159+
LLM_MODEL: ${LLM_MODEL}
160+
TP: ${TP:-1}
161+
DP: ${DP:-1}
162+
entrypoint:
163+
/bin/bash -c "
164+
cd /workspace/vllm/models && source /opt/intel/oneapi/setvars.sh --force &&
165+
VLLM_OFFLOAD_WEIGHTS_BEFORE_QUANT=1 \
166+
TORCH_LLM_ALLREDUCE=1 \
167+
VLLM_USE_V1=1 \
168+
CCL_ZE_IPC_EXCHANGE=pidfd \
169+
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \
170+
VLLM_WORKER_MULTIPROC_METHOD=spawn \
171+
python3 -m vllm.entrypoints.openai.api_server \
172+
--model $${LLM_MODEL} \
173+
--dtype $${DTYPE} \
174+
--enforce-eager \
175+
--port $${VLLM_SERVICE_PORT_B60} \
176+
--trust-remote-code \
177+
--disable-sliding-window \
178+
--gpu-memory-util $${GPU_MEMORY_UTIL} \
179+
--no-enable-prefix-caching \
180+
--max-num-batched-tokens $${MAX_NUM_BATCHED_TOKENS} \
181+
--disable-log-requests \
182+
--max-model-len $${MAX_MODEL_LEN} \
183+
--block-size $${BLOCK_SIZE} \
184+
--quantization $${QUANTIZATION} \
185+
-tp=$${TP} \
186+
-dp=$${DP}"
187+
profiles:
188+
- b60
189+
llm-serving-xpu-770:
190+
container_name: ipex-llm-serving-xpu-770
191+
image: intelanalytics/ipex-llm-serving-xpu:0.8.3-b20
192+
privileged: true
193+
restart: always
194+
ports:
195+
- ${VLLM_SERVICE_PORT_A770:-8086}:${VLLM_SERVICE_PORT_A770:-8086}
196+
group_add:
197+
- video
198+
- ${VIDEOGROUPID:-44}
199+
- ${RENDERGROUPID:-109}
200+
volumes:
201+
- ${LLM_MODEL_PATH:-${MODEL_PATH}/${LLM_MODEL}}:/llm/models
202+
devices:
203+
- /dev/dri
204+
environment:
205+
no_proxy: ${no_proxy}
206+
http_proxy: ${http_proxy}
207+
https_proxy: ${https_proxy}
208+
MODEL_PATH: "/llm/models"
209+
SERVED_MODEL_NAME: ${LLM_MODEL}
210+
TENSOR_PARALLEL_SIZE: ${TENSOR_PARALLEL_SIZE:-1}
211+
MAX_NUM_SEQS: ${MAX_NUM_SEQS:-64}
212+
MAX_NUM_BATCHED_TOKENS: ${MAX_NUM_BATCHED_TOKENS:-10240}
213+
MAX_MODEL_LEN: ${MAX_MODEL_LEN:-10240}
214+
LOAD_IN_LOW_BIT: ${LOAD_IN_LOW_BIT:-fp8}
215+
CCL_DG2_USM: ${CCL_DG2_USM:-""}
216+
PORT: ${VLLM_SERVICE_PORT_A770:-8086}
217+
ZE_AFFINITY_MASK: ${SELECTED_XPU_0:-0}
218+
shm_size: '32g'
219+
entrypoint: /bin/bash -c "\
220+
cd /llm && \
221+
bash start-vllm-service.sh"
222+
profiles:
223+
- a770
128224
networks:
129225
default:
130226
driver: bridge

0 commit comments

Comments
 (0)