Skip to content

Add xpu-kernels skill - Intel XPU Triton kernel development#547

Merged
sayakpaul merged 12 commits into
huggingface:mainfrom
danielfleischer:xpu-skill
May 20, 2026
Merged

Add xpu-kernels skill - Intel XPU Triton kernel development#547
sayakpaul merged 12 commits into
huggingface:mainfrom
danielfleischer:xpu-skill

Conversation

@danielfleischer
Copy link
Copy Markdown
Contributor

Adds a new skill under kernel-builder/skills/xpu-kernels/, alongside the existing cuda-kernels and rocm-kernels skills, bringing Intel XPU support to kernel-builder. Target hardware is Intel Battlemage / Arc Pro B70 (Xe2) via the Intel XPU Backend for Triton (https://github.com/intel/intel-xpu-backend-for-triton).

The skill packages the Xe-Forge (https://github.com/IntelLabs/Xe-Forge) workflow — an LLM-driven loop that transforms PyTorch code into optimized Triton kernels for Intel XPU — into the hf-kernels skill format. Xe-Forge has been used to produce measured speedups on KernelBench Level 2 fused kernels (bf16) and Flash Attention (fp16); full results live in that repo.

What's included

  • SKILL.md - skill definition and the analyze → validate → benchmark → profile → finalize trial-loop workflow, XPU-specific patterns (tensor descriptors, GRF 256, tile swizzling, bf16 + fp32 accumulation), and XPU correctness constraints.
  • scripts/ - CLI tools: analyze_kernel.py, validate_triton.py, benchmark.py (uses AI-Bench (https://github.com/libxsmm/AI-bench) as the harness), trial_manager.py (tree-structured trial tracking), xpu_profiler.py (VTune integration),
    plus HF kernels / transformers integration examples.
  • references/ - knowledge base: correctness, XPU optimizations, fusion patterns, memory patterns, dtype choices, persistent kernel patterns, optimization levels/strategies, KernelBench classification.

Next steps / guidance welcome

The skill content has been developed and validated against Xe-Forge directly. Integration into this repo's Nix-based build is the remaining piece, and I'd appreciate pointers from maintainers on:

  • The expected Nix flow for adding a new skill under kernel-builder/skills/ (build target, validation command)
  • Whether anything beyond the skill's own manifest.txt needs updating (indexes, CI config, registries)
  • Conventions for skills shipping Python CLI tools + external deps — currently using requirements.txt; open to a Nix-native packaging if preferred
  • Review/testing expectations for a new hardware backend skill

Happy to iterate on any of the above.

Adds a new skill under kernel-builder/skills/xpu-kernels/, alongside the
existing cuda-kernels and rocm-kernels skills, bringing Intel XPU
support to kernel-builder. Target hardware is Intel Battlemage / Arc Pro
B70 (Xe2) via the Intel XPU Backend for
Triton (https://github.com/intel/intel-xpu-backend-for-triton).

The skill packages the Xe-Forge (https://github.com/IntelLabs/Xe-Forge)
workflow — an LLM-driven loop that transforms PyTorch code into
optimized Triton kernels for Intel XPU — into the hf-kernels skill
format. Xe-Forge has been used to produce measured speedups on
KernelBench Level 2 fused kernels (bf16) and Flash Attention
forward (fp16); full results live in that repo.
@github-actions
Copy link
Copy Markdown

Hi @danielfleischer, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/huggingface/kernels/blob/main/CONTRIBUTING.md for more details.

@github-actions github-actions Bot closed this May 13, 2026
@danieldk danieldk reopened this May 13, 2026
@github-actions
Copy link
Copy Markdown

Hi @danielfleischer, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/huggingface/kernels/blob/main/CONTRIBUTING.md for more details.

@github-actions github-actions Bot closed this May 13, 2026
@sayakpaul sayakpaul reopened this May 14, 2026
@github-actions
Copy link
Copy Markdown

Hi @danielfleischer, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/huggingface/kernels/blob/main/CONTRIBUTING.md for more details.

@github-actions github-actions Bot closed this May 14, 2026
@sayakpaul sayakpaul reopened this May 15, 2026
sayakpaul and others added 5 commits May 15, 2026 10:46
* fix: remove existing test repo before upload (huggingface#519)

* fix: remove existing test repo before upload

* fix: add missing content type

* fix: prefer removing repos via hub library

* fix: use lib from nix shell on runner

* fix: disallow more than one instance of E2E running at once to avoid race conditions

* fix: prefer using ci token

* fix: update e2e to use trust_remote_code for the dummy user

* fix: prefer using latest kernels-data in test

* fix: update nix warns to throws (huggingface#540)

* feat: bump cute dsl/cutlass (huggingface#545)

* feat: add to vouched (huggingface#551)

* hook up skill in the cli and add docs.

---------

Co-authored-by: drbh <david.richard.holtz@gmail.com>
Co-authored-by: Copilot <copilot@github.com>
…#550)

* Update version bumping scripts with the `--major` option

With this change the script supports both major and minor version
bumping. For example:

Codebase at `0.10.1.dev0`

```
  (none)          -> 0.10.1
  --major         -> 0.11.0
  --dev           -> 0.10.1.dev1
  --dev --major   -> 0.11.0.dev0
```

Codebase at `0.10.1`:

```
  (none)          -> 0.10.2
  --major         -> 0.11.0
  --dev           -> 0.10.2.dev0
```

These are the typical version bumping workflows within the project.

* Sync .PHONY targets
Comment thread scripts/bump_version.py Outdated
#!/usr/bin/env python3
"""Bump all version strings in the repo.

Without ``--dev``: strip the development suffix ahead of a release.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an unrelated change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from #550.

Comment thread kernels/src/kernels/cli/benchmark.py Outdated
from kernels import get_kernel, get_local_kernel

if is_local:
kernel = get_local_kernel(Path(repo_id), "activation")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from #555.

.split('/')
.next_back()
.is_some_and(|n| n.starts_with("benchmark"))
.is_some_and(|n| n.starts_with("benchmark") && n.ends_with(".py"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from #543.

Comment thread Makefile
@@ -1,4 +1,4 @@
.PHONY: style kernel-builder-cli-docs quality bump-dev bump-dev-dry-run bump-release bump-release-dry-run pin-actions
.PHONY: style kernel-builder-cli-docs quality bump-dev bump-dev-dry-run bump-dev-major bump-dev-major-dry-run bump-release bump-release-dry-run bump-major bump-major-dry-run pin-actions
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Copy Markdown
Contributor Author

@danielfleischer danielfleischer May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from #550.

Fixing some paths due to the skill living in the agent-specific
location, outside of `kernel-builder/skills/xpu-kernels/`.
@danielfleischer
Copy link
Copy Markdown
Contributor Author

Should I not have cherry picked main?

@sayakpaul
Copy link
Copy Markdown
Member

If we merge the upstram main then those changes should disappear.

sayakpaul
sayakpaul previously approved these changes May 19, 2026
Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul
Copy link
Copy Markdown
Member

@danieldk could you review the changes in skill.rs?

@sayakpaul sayakpaul requested a review from danieldk May 19, 2026 14:50
@danieldk
Copy link
Copy Markdown
Member

@danieldk could you review the changes in skill.rs?

Looks good! 👍

@sayakpaul sayakpaul merged commit d9d3a5d into huggingface:main May 20, 2026
58 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants