Skip to content

[Feature] support rmsnorm_gated for qwen3.5#6976

Open
wangna11BD wants to merge 3 commits intoPaddlePaddle:developfrom
wangna11BD:add_rmsnorm_gated
Open

[Feature] support rmsnorm_gated for qwen3.5#6976
wangna11BD wants to merge 3 commits intoPaddlePaddle:developfrom
wangna11BD:add_rmsnorm_gated

Conversation

@wangna11BD
Copy link

@wangna11BD wangna11BD commented Mar 23, 2026

Motivation

为支持 Qwen3.5 模型中带门控的RMSNorm(RMSNormGated)操作,新增了基于 Triton 的 fused RMSNormGated kernel 及对应的 RMSNormGated 层实现。

Modifications

  1. 新增了基于 Triton 的 fused kernel rms_norm_gated_fwd_kernel
  • 支持 swish/silu/sigmoid 三种 gate 激活函数
  • 支持可选 bias、可选 z(gate 张量)
  1. 新增了RMSNormGated 层实现
  • CUDA + Triton 可用时,使用融合 kernel;其他平台(如 GCU)则降级到 PaddlePaddle 原生实现
  1. 新增相关测试

Usage or Command

from fastdeploy.model_executor.layers.normalization import RMSNormGated

layer = RMSNormGated(hidden_size=2048, prefix="model.norm", activation="swish")

output = layer.forward(x, z=z)

Accuracy Tests

RMSNormGated 与 naive Python 实现的数值误差阈值设为1e-3

python tests/layers/test_rmsnorm_gated.py

image

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[Feature]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.

@paddle-bot
Copy link

paddle-bot bot commented Mar 23, 2026

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant