Skip to content

Conversation

@jiaqiw09
Copy link

@jiaqiw09 jiaqiw09 commented Jan 5, 2026

Summary

Add NPU support for the swiglu.

Details

  • Implements a flattened, grid-stride Triton kernel for SwiGLU forward/backward to improve scalability and reduce launch overhead on Ascend NPUs.
  • Uses UB-aware tiling (compute_default_tiling_strategy) and NPU vector core count to dynamically select block size and grid size for better performance stability.

Testing Done

I tested swiglu by following method and all cases passed:

  • python benchmark/scripts/benchmark_swiglu.py

  • pytest -v test/transformers/test_swiglu.py

  • run make test to ensure correctness

  • run make checkstyle to ensure code style

  • run make test-convergence to ensure convergence

@jiaqiw09
Copy link
Author

jiaqiw09 commented Jan 5, 2026

@Tcc0403 would you mind having a preview?

@jiaqiw09
Copy link
Author

jiaqiw09 commented Jan 6, 2026

@Tcc0403 thx for you review, I have just updated the code.

Copy link
Collaborator

@Tcc0403 Tcc0403 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants