-
Notifications
You must be signed in to change notification settings - Fork 729
[Speculative Decoding] fix pd-split metrics and support other model runner #6995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -141,6 +141,7 @@ def __init__( | |
|
|
||
| self.attn_backends: list[AttentionBackend] = [] | ||
| self._initialize_attn_backend() | ||
| self.eb5_runner = bool(int(os.getenv("EB5_ENABLE_FD_RUNNER", "0"))) | ||
|
|
||
| # Forward meta store the global meta information of the forward | ||
| self.forward_meta = None | ||
|
|
@@ -503,7 +504,7 @@ def insert_tasks_v1( | |
| self.model_inputs["step_idx"][idx : idx + 1] = ( | ||
| len(request.output_token_ids) if prefill_end_index >= len(input_ids) else 0 | ||
| ) | ||
| if self.enable_mm: | ||
| if self.enable_mm and not self.eb5_runner: | ||
|
||
| inputs = request.multimodal_inputs | ||
| self.model_inputs["attn_mask_offsets_full"][idx][0 : prefill_end_index - prefill_start_index] = ( | ||
| paddle.to_tensor( | ||
|
|
@@ -885,7 +886,7 @@ def _propose_cuda(self, step_use_cudagraph: bool = False, is_dummy_run: bool = F | |
| self.model_inputs["seq_lens_decoder"], | ||
| ) | ||
|
|
||
| if self.enable_mm: | ||
| if self.enable_mm and not self.eb5_runner: | ||
| attn_mask_offsets = update_attn_mask_offsets( | ||
| ids_remove_padding, | ||
| getattr( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR 描述目前基本保留了模板内容,未补充本次修改的动机、具体改动点、使用方式/回归命令、以及(若影响输出)精度验证结果。为了便于评审与后续维护,建议按模板补全至少 Motivation/Modifications/Usage(or Command)/Accuracy Tests(如无测试也请说明原因)。