-
Notifications
You must be signed in to change notification settings - Fork 753
Fix moe acc #7988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
BingooYang
wants to merge
143
commits into
PaddlePaddle:develop
Choose a base branch
from
BingooYang:fix_moe_acc
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Fix moe acc #7988
Changes from all commits
Commits
Show all changes
143 commits
Select commit
Hold shift + click to select a range
b24765a
Update setup.py
Jiang-Jia-Jun 55dbc83
[Cherry-Pick][BugFix] prevent requests from entering running state wi…
liyonghua0910 7ab48c4
[Cherry-Pick][CI] Use GPU-Build-RL runner for _build_linux_rl.yml (#7…
EmmonsCurse 36909bf
[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(#7172) (#7192)
huicongyao 403ce13
remove arctic_inference deps (#7236)
Deleter-D 6b78981
Split enable_mm (#7183) (#7233)
EmmonsCurse 84d6271
[Feature]distinguish whl version (#7204) (#7224)
EmmonsCurse 0181884
support moe for sm103 (#7240)
BingooYang 9c65655
[Cherry-Pick][RL] support moe-topk use topk_reduce_func #7218 (#7256)
zoooo0820 5fd8020
[Cherry-Pick][BugFix] Fix batch_size derivation and relax shape check…
xiaoxiaohehe001 098dd2c
[XPU][CI] lock xvllm version for fix bug (#7264) (#7266)
EmmonsCurse 849eb3d
[Cherry-Pick][Optimization] merge matmul and add (#6986) (#7191)
BingooYang 6fcc25f
Update ci_metax.yml (#7286)
plusNew001 921a0ae
[Docs] Update docs for release/2.5 (#7267) (#7277)
EmmonsCurse dea9d35
[OP]Unify MoE op with moe_permute path for bf16 GLM (#7164) (#7279)
fxyfxy777 dd0863b
[BugFix] Fix Async D2H copy bug & flash mash atten cache V out of bou…
EmmonsCurse 4f36346
[Cherry-Pick] change rms norm for glm #7269 (#7276)
zhangbo9674 c756038
[Cherry-Pick][FDConfig] Auto-scale CUDA Graph Capture & CLI Quantizat…
Deleter-D 2ac9b89
[XPU][CI]Update xtdk version in download_dependencies.sh (#7320) (#7322)
EmmonsCurse 65c6e72
[Cherry-Pick][Docs] Update Release Note(#7302) (#7341)
EmmonsCurse 42b0f59
[Cherry-Pick][RL] change glm rope_emb calculation #7316 (#7318)
zoooo0820 7446665
[Cherry-Pick][RL]moe bf16 ep support paddle batch_gemm(#7337) (#7339)
ckl117 9e8ea7d
[Cherry-Pick][CI] Sync dev optimizations to 2.6(#7335) (#7343)
EmmonsCurse 9cb82d7
[Cherry-Pick][TI-consistent] support quant use pow2scale(#7308) (#7310)
liuruyan b2997f3
fix overlap mtp empty run (#7314)
Sunny-bot1 d9a008f
[Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 (#7159…
rainyfly 9823d63
remove fa4 requirements (#7354)
zoooo0820 144dc17
update attn_mask_q 2 (#7373)
ckl117 e7c8dc2
[Speculate Decoding] Fix step_idx semantics in limit_thinking and set…
lonelygsh 8a8beca
[BugFix][PD Disaggregation][KVCache] Fix low cache hit rate in PD spl…
EmmonsCurse f6c066f
Revert "[Optimization] Optimize ttft for prefill pd (#6680)" (#7386)
freeliuzc 5f7524e
fix rl moe gate type (#7394)
Sunny-bot1 2ee1cc3
check init_flash_attn_version log (#7401)
ckl117 61bfe6e
modify flashmask version (#7414)
BingooYang 26674bb
[Cherry-Pick][RL] Add clear_graph_opt_backend for glm4_mtp (#7378) (#…
Deleter-D b8e8a62
PD deployment support without router (#7412) (#7424)
juncaipeng 72ce56b
[BugFix] fix tool call parser (#7369) (#7419)
EmmonsCurse 185708b
[Cherry-Pick][BugFix] Fix real token exceeding max_batched_tokens lim…
freeliuzc 650d1e4
[Cherry-Pick][Speculative Decoding] Add MTP logprob support for PD di…
Deleter-D 56b761d
[Cherry-Pick][Speculative Decoding][BugFix] Fix apply repeat times pe…
freeliuzc fc801f8
[Bugfix][RL] fix control request timeout in async update weights pipe…
jackyYang6 f4f7760
[CI] Temporarily pin paddlepaddle-gpu to 3.5.0.dev20260417 (#7486) (#…
EmmonsCurse 95261f0
Unify num_experts_per_tok to moe_k in ModelConfig for MoE model compa…
xyxinyang 74ddb20
[RL][Cherry-Pick] Fix the out-of-bounds issue caused by int32 in the…
gongshaotian be2fd17
add m_grouped_bf16_gemm_nn_contiguous(#7536)
ckl117 13034ef
[BugFix] Fix skip_x_record_stream incompatibility across deep_ep vers…
EmmonsCurse d551846
Mooncake storage register local buffer by chunk (#7416) (#7540)
juncaipeng 86df2a9
Update args_utils.py (#7549)
Jiang-Jia-Jun b0fde16
Enable output caching by default
Jiang-Jia-Jun 2961400
[Cherry-Pick][BugFix] Fix clear_parameters hang issue in MTP during w…
Deleter-D 9c91ecb
[Cherry-Pick][BugFix] Fix bugs in /v1/abort_requests interface from P…
qwes5s5 3d6d3a2
[DataProcessor] add completions (#7543) (#7558)
EmmonsCurse 2c04dfd
Update args_utils.py
Jiang-Jia-Jun 9ef8467
[Scheduler][BugFix] Fix token_budget calculation to use actual decode…
EmmonsCurse 258b22a
support deepgemm without bias input (#7559) (#7565)
EmmonsCurse b3aa469
[KSM] support keep sampling mask (#7460)
zeroRains eb92613
[Cherry-Pick][BugFix] Fix save_output_specualate parameter bugs in su…
Deleter-D 10f5a20
Cache queue support ipc (#7589)
juncaipeng af68b26
[RL] Remove redundant barrier and optimize model weights signal broad…
EmmonsCurse 4cbae62
Use triton qk_norm both in Prefill and Decode (#7213) (#7306)
EmmonsCurse 8d7063e
[Cherry-Pick][Optimization]Change default workers and max-concurrency…
EmmonsCurse 0de0be4
[Others] print evictable blocks in console log (#7384) (#7580)
EmmonsCurse d88982b
[Optimization] Support async D2H copy for MTP logprobs & Clean up ove…
EmmonsCurse 5508979
Fix PD interaction and error response (#7606)
juncaipeng c8a59a3
[Cherry-Pick][CI] Sync dev optimizations to 2.6(#7602) (#7610)
EmmonsCurse 6ad8fce
[RL][Feature] R3 Support GPUPrefixCache, CPUPrefixCache, PD Disaggreg…
gongshaotian e0cad0f
[Cherry-Pick][Speculative Decoding][BugFix] overlap compute logprobs …
huicongyao eee8289
[Bugfix]compile support SM100 (#7581) (#7629)
ChowMingSing 99444f6
fix fp8 infer error (#7627) (#7631)
EmmonsCurse 23e0a84
[Cherry-Pick][CI] Pin Paddle to release/3.3 last_commit build in 2.6(…
EmmonsCurse 5582b5a
[BugFix][Speculative Decoding] Fix tokens_per_seq min value calculati…
EmmonsCurse ecb31fb
[KVCache] Support flush FD GPU/CPU Cache index by AttentionStore (#7644)
jackyYang6 37672f9
support different AS interface for GPU and XPU (#7380) (#7647)
ApplEOFDiscord bfff3d9
[Cherry-Pick][KVCache] Support environment variable overrides for Att…
jackyYang6 188db35
[RL] Correct the semantics of max_num_batched_tokens with multimodal …
gongshaotian 0aa3e25
[Cherry-Pick][RL] rl support mix_quant (#7645) (#7650)
ckl117 32d5f5b
Refine metrics and trace for pd (#7613) (#7661)
juncaipeng 3ac8ff2
Remove recode info for request when finish sending cache (#7664)
juncaipeng f8e38f6
abort requests fix2 (#7652)
qwes5s5 97cda57
[Cherry-Pick][Optimize]Compute slot_mapping and position_ids(#7313 #7…
ShaneGZhu 75f328c
[Cherry-Pick][Optimization] Support logprob overlap in speculative de…
Deleter-D d3a2c71
[Cherry-Pick][BugFix][KVCache] Fix inference slowdown when enabling C…
kevincheng2 df1d64c
Fix key error for updating mtp model weights (#7676)
juncaipeng c1f9714
[Cherry-Pick] [BugFix] Fix get_tasks returns empty list and incorrect…
liyonghua0910 0ec9625
[Cherry-Pick] [BugFix] fix preempted token id not returned when a ful…
liyonghua0910 66dea60
[BugFix] Fix get_tasks returns empty list and incorrect nnode computa…
liyonghua0910 d0a0b3e
fix rl overlap (#7745)
Sunny-bot1 d5af459
[Cherry-Pick] [BugFix] Fix stop token sequence pointer offset and act…
chang-wenbin a5fa727
[BugFix][KVCache][Speculative Decoding] Fix get_max_chunk_tokens for …
EmmonsCurse d92163f
[BugFix] Fix ZMQ multipart frame interleaving in Splitwise connector …
yuanlehome 228987a
[Cherry-Pick] [BugFix] [RL] Fix cpu cache for rl (#7764) (#7765)
liyonghua0910 ad431c7
[RL] R3 Support Overlap Schedule (#7674)
gongshaotian 53af5cc
[Cherry-Pick][CI] Remove checklist validation from CheckPRTemplate.py…
EmmonsCurse f8a0cf2
[BugFix][KSM] Fix sampling_mask reordering in recover_batch_index_for…
DesmonDay 7901aeb
[XPU][CI] fix XPU CI bug (#7778)
plusNew001 a5191f2
[Cherry-Pick][Cleanup] Replace torch proxy alias with public compat A…
SigureMo fae4a8b
[BugFix] Fix KSM bug in MTP and Overlap (#7788)
zeroRains ae5dac1
[Cherry-Pick][Optimization] enable trtllm_all_reduce fusion kernel in…
BingooYang 0077822
[FDConfig] 默认开启 FD_ENABLE_E2W_TENSOR_CONVERT 和 FD_ENGINE_TASK_QUEUE_W…
sunlei1024 976cb7b
[BugFix] fix: cast image_mask.any() to bool for task queue serializat…
EmmonsCurse 90c010d
[Cherry-Pick][Speculative Decoding] Support mtp super ultra overlap i…
freeliuzc 4e7a46e
prepare request in prefill instance by multi threads (#7724)
juncaipeng d38eeb8
[Scheduler] [Optimization] Only preempt decode requests and better ma…
liyonghua0910 5e76c8b
fix(PrefixCache): fix garbled text in PD disaggregation by early retu…
EmmonsCurse 33b22b3
[Cherry-pick] [Optimization] Elemenwise fusion (#6880) (#7683)
BingooYang dc1fea1
[Cherry-Pick] [BugFix] Fix abort when enabling overlap schedule (#780…
liyonghua0910 478c9fa
[RL] pause: use abort pipeline with scheduling loop alive for gracefu…
jackyYang6 d02f3ba
[Feature] Add TritonMoEMethod for BF16 MoE inference (#7815)
xuanyuanminzheng df637af
refact abort requests (#7808)
qwes5s5 18cab83
fix paddle optional get assert in sm103 (#7820)
zoooo0820 72beb9e
opt moe_align_kernel (#7786)
yongqiangma 04e4ae8
[Cherry-Pick][BugFix] Fix pause drain hang caused by stale abort mark…
jackyYang6 d71bdda
[Cherry-Pick][CI] Optimize clean_ports logic by removing redundant co…
EmmonsCurse 514ed5c
[Cherry-Pick][Op][Optimization]Kernel fusion: cast+sigmoid+bias+noaux…
ShaneGZhu 9894b32
[Cherry-Pick][RL] Support cpu tensor broadcast(#7833) (#7840)
Sunny-bot1 ab3c5f4
[Cherry-Pick][CI] Set --workers=1 to avoid intermittent timeout failu…
EmmonsCurse 41d44d6
fix refact abort (#7838)
qwes5s5 8c4f5a6
[Cherry-Pick] update fleet_ops(#7859) (#7858)
liuruyan 31b12ee
[Cherry-Pick][Optimization] Reduce logprob processing overhead by usi…
Sunny-bot1 b5c8290
[RL] Reset buffer size of `slot_mapping` (#7868)
gongshaotian b562b8d
fix ce bug (#7874)
liuruyan 485f6c2
[Cherry-Pick][Feature][Log]console metrics log for pd disaggregation …
CSWYF3634076 e7815be
[Cherry-Pick][Benchmark] Add inner benchmark metrics component (#7881…
Deleter-D 5d18984
fix(kvcache): buffer early layer0 signals (#7896)
kevincheng2 3ffeb44
[Cherry-Pick][CI] Restore self-hosted runners for GitHub workflows(#7…
EmmonsCurse 85399db
[Cherry-pick][XPU][CI] fix logs update bug (#7915)
plusNew001 e7a02e2
supoort glm yarn rope (#7894)
Sunny-bot1 0a5d4b6
[bugfix] AS block leaks (#7895)
zccjjj bf0dace
[Scheduler] Increase sleep interval in fetch loops and cancel schedul…
liyonghua0910 a095d6f
[Cherry-Pick][Feature] support decode unified attention for mix(#7688…
lizhenyun01 c52b063
[Cherry-Pick][Optimization][Speculative Decoding]opt mtp logprob (#78…
Sunny-bot1 261041b
[Cherry-Pick][Bugfix] Fix clear bug in RL causing CUDA error 700 duri…
freeliuzc 8a1e71d
[PD] PD send cache via storage & Refine swap_cache_layout op (#7839)
juncaipeng 2b0fd53
[Cherry-Pick][Optimization]support fused noauxtc kernel on ep mode(#7…
ShaneGZhu 1e7ee22
[Cherry-Pick] [Optimization] TopP=1.0 using _random_sample (#7892) an…
ckl117 fefbcff
[Cherry-Pick] [BugFix] fix all reduce fusion accurate issue (#7923) (…
BingooYang ac24fcc
[Cherry-Pick][BugFix] fix mtp reset bugs in rl (#7957) (#7958)
Deleter-D 7198b58
[RL] Fix the incorrect routing of EOS tokens, which leads to changes …
gongshaotian eeed8a3
[RL] Fix Ernie mm bug (#7966)
gongshaotian 780c000
[Cherry-Pick][RL][Feature] Add GDR streaming weight update path (#795…
jackyYang6 99c7df1
fix moe accurate issue
BingooYang f232ed9
fix bug
BingooYang b5ec4fa
add test
BingooYang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This comment was marked as outdated.
Sorry, something went wrong.
Uh oh!
There was an error while loading. Please reload this page.