Skip to content

[KDA] Refactor csrc kernel selection, add tests and sanity check#42

Merged
icavan merged 7 commits intomainfrom
feat/refactor-csrc
Apr 9, 2026
Merged

[KDA] Refactor csrc kernel selection, add tests and sanity check#42
icavan merged 7 commits intomainfrom
feat/refactor-csrc

Conversation

@KevinZeng08
Copy link
Copy Markdown
Collaborator

@KevinZeng08 KevinZeng08 commented Apr 8, 2026

📌 Description

  • Refactor sm100 csrc kernel selection code, removing hardcoded kernel objects
  • Add tests for comparing with FLA with bf16 beta
  • Add sanity check of 128 head dim and bf16 dtype

🔍 Related Issues

#41

🚀 Pull Request Checklist

Thank you for contributing to cuLA! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing.

⚡ Performance

Reviewer Notes

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the SM100 kernel dispatch logic by introducing BOOL_SWITCH and BETA_TYPE_SWITCH macros, which significantly reduces boilerplate code and removes redundant kernel aliases. It also enforces a head dimension of 128 in the Python wrappers and adds bfloat16 support for beta tensors in tests. Review feedback identifies critical bugs in the benchmark utilities where gating tensors were incorrectly cast to bfloat16, which would cause kernel failures. Additionally, there are suggestions to resolve redundant dimension checks and to consider runtime dispatch for the RoundingTF32 parameter.

KevinZeng08 and others added 4 commits April 8, 2026 16:20
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@icavan icavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@cherhh cherhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@icavan icavan merged commit 0a9abac into main Apr 9, 2026
@KevinZeng08 KevinZeng08 deleted the feat/refactor-csrc branch April 10, 2026 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants