Add Quantizers for Qwen3VLMoeTextDecoderLayer #666

soodoshll · 2025-12-08T19:32:05Z

What does this PR do?

Type of change: ? new feature

Overview: ? huggingface transformers library implements Qwen3VL Moe layer as a monolithic module, instead of assembling it using Linear layers, which cannot be recognized by modelopt's quantizer now. This PR introduces a conversion from hf's qwen3vl_moe MoE layers to qewn3_moe MoE layers which consist of a set of Linear layers.

Testing

Tested with

python hf_ptq.py --pyt_ckpt_path=Qwen/Qwen3-VL-30B-A3B-Instruct --qformat=nvfp4 --dataset wikipedia

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

copy-pr-bot · 2025-12-08T19:32:09Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

codecov · 2025-12-10T18:01:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.73%. Comparing base (1562dd6) to head (37c24f4).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #666      +/-   ##
==========================================
+ Coverage   74.72%   74.73%   +0.01%     
==========================================
  Files         192      192              
  Lines       18833    18870      +37     
==========================================
+ Hits        14073    14103      +30     
- Misses       4760     4767       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

modelopt/torch/quantization/plugins/huggingface.py

shengliangxu

LGTM

Signed-off-by: Qidong Su <qidongs@nvidia.com> Signed-off-by: Qidong Su <soodoshll@gmail.com>

soodoshll requested a review from a team as a code owner December 8, 2025 19:32

soodoshll requested a review from ajrasane December 8, 2025 19:32

shengliangxu self-requested a review December 8, 2025 19:33

soodoshll force-pushed the qwen3-vl-moe branch from 2c18fd0 to 7a16307 Compare December 8, 2025 19:58

soodoshll requested review from a team as code owners December 8, 2025 19:58

soodoshll requested a review from ynankani December 8, 2025 19:58

shengliangxu removed request for a team, ChenhanYu, Edwardf0t1, cjluo-nv and ynankani December 10, 2025 02:00

shengliangxu reviewed Dec 10, 2025

View reviewed changes

modelopt/torch/quantization/plugins/huggingface.py Outdated Show resolved Hide resolved

shengliangxu approved these changes Dec 17, 2025

View reviewed changes

soodoshll force-pushed the qwen3-vl-moe branch 3 times, most recently from f5c78c9 to ff666d5 Compare December 18, 2025 20:22

soodoshll added 7 commits December 18, 2025 20:24

upd

d3013f6

Signed-off-by: Qidong Su <qidongs@nvidia.com> Signed-off-by: Qidong Su <soodoshll@gmail.com>

fix

70ddcc4

Signed-off-by: Qidong Su <qidongs@nvidia.com> Signed-off-by: Qidong Su <soodoshll@gmail.com>

fix

4ac0c87

Signed-off-by: Qidong Su <qidongs@nvidia.com> Signed-off-by: Qidong Su <soodoshll@gmail.com>

fix

904be6a

Signed-off-by: Qidong Su <qidongs@nvidia.com> Signed-off-by: Qidong Su <soodoshll@gmail.com>

upd

ef200ea

Signed-off-by: Qidong Su <qidongs@nvidia.com> Signed-off-by: Qidong Su <soodoshll@gmail.com>

refactor to directly impl qwen3_vl_moe

60b9e75

Signed-off-by: Qidong Su <qidongs@nvidia.com> Signed-off-by: Qidong Su <soodoshll@gmail.com>

format

37c24f4

Signed-off-by: Qidong Su <qidongs@nvidia.com> Signed-off-by: Qidong Su <soodoshll@gmail.com>

soodoshll force-pushed the qwen3-vl-moe branch from ff666d5 to 37c24f4 Compare December 18, 2025 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Quantizers for Qwen3VLMoeTextDecoderLayer #666

Add Quantizers for Qwen3VLMoeTextDecoderLayer #666

Uh oh!

soodoshll commented Dec 8, 2025

Uh oh!

copy-pr-bot bot commented Dec 8, 2025

Uh oh!

codecov bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

shengliangxu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Quantizers for Qwen3VLMoeTextDecoderLayer #666

Are you sure you want to change the base?

Add Quantizers for Qwen3VLMoeTextDecoderLayer #666

Uh oh!

Conversation

soodoshll commented Dec 8, 2025

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Dec 8, 2025

Uh oh!

codecov bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Dec 10, 2025 •

edited

Loading