[Feat] corrtec the docs

t00939662 · t00939662 · commit 3fecdae465eb · 2025-11-10T16:11:37.000+08:00
diff --git a/docs/source/getting-started/quick_start.md b/docs/source/getting-started/quick_start.md
@@ -21,14 +21,16 @@ Before you start with UCM, please make sure that you have installed UCM correctl
 
 ## Features Overview
 
-UCM supports two key features: **Prefix Cache** and **Sparse-attention**. 
+UCM supports two key features: **Prefix Cache** and **Sparse attention**. 
 
 Each feature supports both **Offline Inference** and **Online API** modes. 
 
 For quick start, just follow the [usage](./quick_start.md) guide below to launch your own inference experience;
 
-For further research, click on the links blow to see more details of each feature:
+For further research on Prefix Cache, more details are available via the link below:
 - [Prefix Cache](../user-guide/prefix-cache/index.md)
+
+Various Sparse Attention features are now available, try GSA Sparsity via the link below:
 - [GSA Sparsity](../user-guide/sparse-attention/gsa.md)
 
 ## Usage
@@ -47,7 +49,7 @@ python offline_inference.py
 
 </details>
 
-<details>
+<details open>
 <summary><b>OpenAI-Compatible Online API</b></summary>
 
 For online inference , vLLM with our connector can also be deployed as a server that implements the OpenAI API protocol.
diff --git a/docs/source/user-guide/pd-disaggregation/1p1d.md b/docs/source/user-guide/pd-disaggregation/1p1d.md
@@ -8,16 +8,13 @@ This example demonstrates how to run unified-cache-management with disaggregated
 - Hardware: At least 2 GPUs or 2 NPUs
 
 ## Start disaggregated service
-For illustration purposes, let us assume that the model used is Qwen2.5-7B-Instruct.
+For illustration purposes, let us take GPU as an example and assume the model used is Qwen2.5-7B-Instruct.Using ASCEND_RT_VISIBLE_DEVICES instead of CUDA_VISIBLE_DEVICES to specify visible devices when starting service on Ascend platform.
 
 ### Run prefill server
 Prefiller Launch Command:
 ```bash
 export PYTHONHASHSEED=123456
-# For GPU devices, use the following command:
 export CUDA_VISIBLE_DEVICES=0 
-# For NPU devices, use the following command:
-export ASCEND_RT_VISIVLE_DEVEICES=0
 vllm serve /home/models/Qwen2.5-7B-Instruct \
 --max-model-len 20000 \
 --tensor-parallel-size 1 \
@@ -45,11 +42,8 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
 ### Run decode server
 Decoder Launch Command:
 ```bash
-export PYTHONHASHSEED=123456
-# For GPU devices, use the following command:
+export PYTHONHASHSEED=123456 
 export CUDA_VISIBLE_DEVICES=0 
-# For NPU devices, use the following command:
-export ASCEND_RT_VISIVLE_DEVEICES=0
 vllm serve /home/models/Qwen2.5-7B-Instruct \
 --max-model-len 20000 \
 --tensor-parallel-size 1 \
diff --git a/docs/source/user-guide/pd-disaggregation/npgd.md b/docs/source/user-guide/pd-disaggregation/npgd.md
@@ -78,7 +78,7 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
 ### Run proxy server
 Make sure prefill nodes and decode nodes can connect to each other.
 ```bash
-cd vllm-workspace/unified-cache-management/ucm/pd
+cd /vllm-workspace/unified-cache-management/ucm/pd
 python3 toy_proxy_server.py --host localhost --port 7802 --prefiller-host <prefill-node-ip> --prefiller-port 7800 --decoder-host <decode-node-ip> --decoder-port 7801
 ```
 
diff --git a/docs/source/user-guide/pd-disaggregation/xpyd.md b/docs/source/user-guide/pd-disaggregation/xpyd.md
@@ -8,7 +8,8 @@ This example demonstrates how to run unified-cache-management with disaggregated
 - Hardware: At least 4 GPUs (At least 2 GPUs for prefiller + 2 for decoder in 2d2p setup or 2 NPUs for prefiller + 2 for decoder in 2d2p setup)
 
 ## Start disaggregated service
-For illustration purposes, let us take a GPU as an example and assume the model used is Qwen2.5-7B-Instruct.
+For illustration purposes, let us take GPU as an example and assume the model used is Qwen2.5-7B-Instruct.Using ASCEND_RT_VISIBLE_DEVICES instead of CUDA_VISIBLE_DEVICES to specify visible devices when starting service on Ascend platform.
+
 ### Run prefill servers
 Prefiller1 Launch Command:
 ```bash