Skip to content

Glm5 support#985

Draft
Aphoh wants to merge 3 commits intoNVIDIA:mainfrom
Aphoh:glm5-support
Draft

Glm5 support#985
Aphoh wants to merge 3 commits intoNVIDIA:mainfrom
Aphoh:glm5-support

Conversation

@Aphoh
Copy link

@Aphoh Aphoh commented Mar 5, 2026

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, using torch.load(..., weights_only=True), avoiding pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other source, did you follow IP policy in CONTRIBUTING.md?: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Aphoh and others added 3 commits February 11, 2026 15:22
- Make DeepSeek-specific imports lazy (only loaded for --model_type deepseek)
- Add load_hf_model() using device_map="auto" for single-node multi-GPU
- Add dynamic MoE class discovery and registration for calibration
- Add HF layer name patterns for quant config (q_a_proj, kv_a_proj, etc.)
- Disable GLM-5 indexer/MTP layers from quantization
- Make dist calls conditional for non-distributed HF path
- Add --model_type flag to ptq.py, quantize_to_nvfp4.py, and shell script
- Skip key remapping in quantize_to_nvfp4.py for HF models
- Guard quantization_config removal for bf16 checkpoints
- Add run_glm5_ptq.sh launch script

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix _remap_key to use component-level matching, add kernel.py stubs,
MTP head extraction script, and GLM-5 documentation.
- Fix TokenizersBackend compatibility in dataset_utils.py: transformers 5.x
  TokenizersBackend lacks batch_encode_plus, added fallback using _encode_plus
  per-sample with manual padding support (left/right)
- Fix quantize_to_nvfp4.py: skip quantization for layers not listed in
  per_layer_quant_config when using MIXED_PRECISION mode
- Add GLM MLA and DSA Indexer exclusions to hf_ptq build_quant_cfg
- Improve run_glm5_ptq.sh CLI: add --amax-path and --mla-quant flags
- Add glm5 dequant_nvfp4.py utility
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 00656bee-d24b-462b-a5cb-59366bfb35ad

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant