Skip to content

Commit 40d3326

Browse files
SageMooregemini-code-assist[bot]yewentao256
authored
[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#28377)
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Sage Moore <sagemoore@utexas.edu> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
1 parent 9c84ca8 commit 40d3326

File tree

1 file changed

+10
-5
lines changed

1 file changed

+10
-5
lines changed

vllm/model_executor/layers/fused_moe/shared_fused_moe.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,18 @@ def __init__(
2828
super().__init__(**kwargs)
2929
self._shared_experts = shared_experts
3030

31-
# Disable shared expert overlap if we are not using
32-
# flashinfer + DP since there is nothing to be gained in this case.
33-
# Disabling the overlap optimization also prevents the shared experts
34-
# from being hidden from torch.compile.
31+
# Disable shared expert overlap if we are using eplb, because of
32+
# correctness issues, or if using flashinfer with DP, since there
33+
# is nothing to be gained in this case. Disabling the overlap
34+
# optimization also prevents the shared experts from being hidden
35+
# from torch.compile.
3536
self.use_overlapped = (
3637
use_overlapped
37-
and not (self.use_flashinfer_cutlass_kernels and self.dp_size > 1)
38+
and not (
39+
# TODO(wentao): find the root cause and remove this condition
40+
self.enable_eplb
41+
or (self.use_flashinfer_cutlass_kernels and self.dp_size > 1)
42+
)
3843
and self._shared_experts is not None
3944
)
4045

0 commit comments

Comments
 (0)