|
| 1 | +# Disaggregated Encoder |
| 2 | + |
| 3 | +These example scripts that demonstrate the disaggregated encoder (EPD) features of vLLM. |
| 4 | + |
| 5 | +For a detailed explanation of the EPD features, please refer to the [Disaggregated Encoder Feature Documentation](../../../docs/features/disagg_encoder.md). |
| 6 | + |
| 7 | +## Files |
| 8 | + |
| 9 | +- `disagg_epd_proxy.py` - Proxy script that demonstrates the XeYpZd setup (X encode instances, Y prefill instances, Z decode instances). Currently stable for the 1e1p1d configuration. |
| 10 | + |
| 11 | +- `disagg_1e1p1d_example.sh` - Sets up the 1e1p1d configuration, runs the VisionArena benchmark, and processes a single request with a local image. |
| 12 | + |
| 13 | +- `disagg_1e1pd_example.sh` - Sets up the 1e1pd configuration, runs the VisionArena benchmark, and processes a single request with a local image. |
| 14 | + |
| 15 | +### Custom Configuration |
| 16 | + |
| 17 | +```bash |
| 18 | +# Use specific GPUs |
| 19 | +GPU_E=0 GPU_PD=1 GPU_P=1 GPU_D=2 bash disagg_1e1p1d_example.sh |
| 20 | + |
| 21 | +# Use specific ports |
| 22 | +ENDPOINT_PORT=10001 bash disagg_1e1p1d_example.sh |
| 23 | + |
| 24 | +# Use specific model |
| 25 | +MODEL="Qwen/Qwen2.5-VL-3B-Instruct" bash disagg_1e1p1d_example.sh |
| 26 | + |
| 27 | +# Use specific storage path |
| 28 | +EC_SHARED_STORAGE_PATH="/tmp/my_ec_cache" bash disagg_1e1p1d_example.sh |
| 29 | +``` |
| 30 | + |
| 31 | +## Encoder Instances |
| 32 | + |
| 33 | +Encoder engines should be launched with the following flags: |
| 34 | + |
| 35 | +- `--enforce-eager` **(required)** – The current EPD implementation is only compatible with encoder instances running in this mode. |
| 36 | + |
| 37 | +- `--no-enable-prefix-caching` **(required)** – Encoder instances do not consume KV cache; prefix caching is disabled to avoid conflicts with other features. |
| 38 | + |
| 39 | +- `--max-num-batched-tokens=<large value>` **(default: 2048)** – This flag controls the token scheduling budget per decoding step and is irrelevant to encoder-only instances. **Set it to a very high value (effectively unlimited) to bypass scheduler limitations.** The actual token budget is managed by the encoder cache manager. |
| 40 | + |
| 41 | +## Local media inputs |
| 42 | + |
| 43 | +To support local image inputs (from your ```MEDIA_PATH``` directory), add the following flag to the encoder instance: |
| 44 | + |
| 45 | +```bash |
| 46 | +--allowed-local-media-path $MEDIA_PATH |
| 47 | +``` |
| 48 | + |
| 49 | +The vllm instances and `disagg_encoder_proxy` supports local URIs with ```{"url": "file://'"$MEDIA_PATH_FILENAME"'}``` as multimodal inputs. Each URI is passed unchanged from the `disagg_encoder_proxy` to the encoder instance so that the encoder can load the media locally. |
| 50 | + |
| 51 | +## EC connector and KV transfer |
| 52 | + |
| 53 | +The `ECSharedStorageConnector` is used to store the encoder cache on local disk and facilitate transfer. To enable the encoder disaggregation feature, add the following configuration: |
| 54 | + |
| 55 | +```bash |
| 56 | +# Add to encoder instance: |
| 57 | +--ec-transfer-config '{ |
| 58 | + "ec_connector": "ECSharedStorageConnector", |
| 59 | + "ec_role": "ec_producer", |
| 60 | + "ec_connector_extra_config": { |
| 61 | + "shared_storage_path": "'"$EC_SHARED_STORAGE_PATH"'" |
| 62 | + } |
| 63 | +}' |
| 64 | + |
| 65 | +# Add to prefill/prefill+decode instance: |
| 66 | +--ec-transfer-config '{ |
| 67 | + "ec_connector": "ECSharedStorageConnector", |
| 68 | + "ec_role": "ec_consumer", |
| 69 | + "ec_connector_extra_config": { |
| 70 | + "shared_storage_path": "'"$EC_SHARED_STORAGE_PATH"'" |
| 71 | + } |
| 72 | +}' |
| 73 | +``` |
| 74 | + |
| 75 | +`$EC_SHARED_STORAGE_PATH` is the path where the EC connector temporarily stores the cache. |
| 76 | + |
| 77 | +If you enable prefill instance (`--prefill-servers-urls` not disabled), you will need --kv-transfer-config to facilitate the PD disaggregation. Currently, we use the `NixlConnector` for this purpose. Refer to `tests/v1/kv_connector/nixl_integration` for more example codes on PD disaggregation with Nixl. |
| 78 | + |
| 79 | +```bash |
| 80 | +# Add to prefill instance: |
| 81 | +--kv-transfer-config '{ |
| 82 | + "kv_connector": "NixlConnector", |
| 83 | + "kv_role": "kv_producer" |
| 84 | +}' |
| 85 | + |
| 86 | +# Add to decode instance: |
| 87 | +--kv-transfer-config '{ |
| 88 | + "kv_connector": "NixlConnector", |
| 89 | + "kv_role": "kv_consumer" |
| 90 | +}' |
| 91 | +``` |
| 92 | + |
| 93 | +## Proxy Instance Flags (`disagg_epd_proxy.py`) |
| 94 | + |
| 95 | +| Flag | Description | |
| 96 | +|------|-------------| |
| 97 | +| `--encode-servers-urls` | Comma-separated list of encoder endpoints. Every multimodal item extracted from the request is fanned out to one of these URLs in a round-robin fashion. | |
| 98 | +| `--prefill-servers-urls` | Comma-separated list of prefill endpoints. Set to `disable`, `none`, or `""` to skip the dedicated prefill phase and run E+PD (encoder + combined prefill/decode). | |
| 99 | +| `--decode-servers-urls` | Comma-separated list of decode endpoints. Non-stream and stream paths both round-robin over this list. | |
| 100 | +| `--host`, `--port` | Bind address for the proxy itself (defaults: `0.0.0.0:8000`). | |
| 101 | + |
| 102 | +Example usage: |
| 103 | +For E + PD setup: |
| 104 | + |
| 105 | +```bash |
| 106 | +$ python disagg_encoder_proxy.py \ |
| 107 | + --encode-servers-urls "http://e1:8001,http://e2:8002" \ |
| 108 | + --prefill-servers-urls "disable" \ |
| 109 | + --decode-servers-urls "http://pd1:8003,http://pd2:8004" |
| 110 | +``` |
| 111 | + |
| 112 | +For E + P + D setup: |
| 113 | + |
| 114 | +```bash |
| 115 | +$ python disagg_encoder_proxy.py \ |
| 116 | + --encode-servers-urls "http://e1:8001,http://e2:8001" \ |
| 117 | + --prefill-servers-urls "http://p1:8003,http://p2:8004" \ |
| 118 | + --decode-servers-urls "http://d1:8005,http://d2:8006" |
| 119 | +``` |
0 commit comments