feat: Sage Attention Algorithm by Marius-Graml · Pull Request #455 · PrunaAI/pruna

Marius-Graml · 2025-12-08T15:52:29Z

Description

Integration of the Sage Attention algorithm into the Pruna framework. The current version applies the attention backend from Diffusers, choosing the Sage Attention kernel from the Kernel Hub. This is because the original sageattn function appears to be broken (its outputs were pure noise). Additionally, tests for the Sage Attention algorithm were implemented.

Related Issue

No issues were fixed.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Reuse of the tests for flashattn3 adapted to sage attention.

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Additional Notes

/

johannaSommer

First PR and already almost flawless, big 👏🏻👏🏻👏🏻 coming your way soon!

src/pruna/algorithms/sage_attn.py

johannaSommer · 2025-12-15T08:56:38Z

src/pruna/algorithms/sage_attn.py

+    runs_on: list[str] = ["cuda", "accelerate"]
+    dataset_required: bool = False
+    compatible_before: Iterable[str] = []
+    compatible_after: Iterable[str] = ["torch_compile"]


compatible after would also be tags.CACHERS and compatible before probably also tags.QUANTIZERS

then add this compatibility also in other algorithms

johannaSommer · 2025-12-15T08:57:26Z

src/pruna/algorithms/sage_attn.py

+            return False
+
+        return any(
+            hasattr(component, "set_attention_backend") and component.dtype in [torch.bfloat16, torch.float16]


i recall this dtype check for the components from flash attention (because attention needs to be computed in this precision for FA3 to work), did we double check that that is the case also here?

johannaSommer · 2025-12-15T08:58:07Z

src/pruna/algorithms/sage_attn.py

+        # We simply apply the sage attention backend from diffusers
+        # Furthermore, we use the sage attention kernel from the hub as the default sageattn function
+        # is broken (at least at the moment)
+        for component in model.components.values():


as discussed, let's add target modules also here :)

johannaSommer · 2025-12-18T14:02:15Z

src/pruna/algorithms/sage_attn.py

+            configuration system.
+        """
+        return [
+            Boolean(


this is actually not needed and we can remove it, as the user can specify this exactly through the target modules anyway (there is a smash config interface for this)

johannaSommer · 2025-12-18T14:03:04Z

src/pruna/algorithms/sage_attn.py

+            The wrapped model.
+        """
+        target_modules = smash_config["target_modules"]
+        exclude_first_and_last_transformer_blocks = smash_config["exclude_first_and_last_transformer_blocks"]


for the target modules, let's please use the functionality we already have, otherwise we have a lot of duplicate code here

github-actions · 2026-01-04T00:09:19Z

This PR has been inactive for 10 days and is now marked as stale.

johannaSommer

Just two more comments regarding target modules, then we are gtg! :)

src/pruna/algorithms/sage_attn.py

…tention backend

…antizers as compatible after and before, add sage_attn in corresponding cachers and quantizers algorithms as compatible, add dtype check as sage_attn only works for float/bfloat16 (double checked), add target modules (but not fully finished yet)

…ast attention block per attention component. Remove dtype gaurd as dtypes of q, k, and v per attn module is implicitly checked by sage attention kernel.

…s default target module, remove warning print

github-actions · 2026-01-30T00:10:40Z

This PR has been inactive for 10 days and is now marked as stale.

Marius-Graml changed the title ~~feat/sage attn~~ feat: Sage Attention Algorithm Dec 8, 2025

Marius-Graml force-pushed the feat/sage-attn branch from e2e6e9f to 69c9679 Compare December 10, 2025 15:25

johannaSommer requested review from johannaSommer and nifleisch December 15, 2025 08:53

johannaSommer requested changes Dec 15, 2025

View reviewed changes

johannaSommer requested changes Dec 18, 2025

View reviewed changes

Marius-Graml requested a review from johannaSommer December 24, 2025 15:21

github-actions bot added the stale label Jan 4, 2026

github-actions bot closed this Jan 12, 2026

Marius-Graml reopened this Jan 13, 2026

github-actions bot removed the stale label Jan 14, 2026

johannaSommer requested changes Jan 15, 2026

View reviewed changes

src/pruna/algorithms/sage_attn.py Outdated Show resolved Hide resolved

src/pruna/algorithms/sage_attn.py Outdated Show resolved Hide resolved

Marius Graml and others added 12 commits January 19, 2026 16:29

Add sage attention algorithm to pruna framework by using diffusers at…

7eaec9e

…tention backend

Add compatibility for sage-attn with torch-compile

55636de

Add tests für sage attn algorithm

2cf488f

Change formatting using ruff

e58733f

Add target modules including hyperparameter for excluding first and l…

b3e053b

…ast attention block per attention component. Remove dtype gaurd as dtypes of q, k, and v per attn module is implicitly checked by sage attention kernel.

Add doc strings to functions and methods

6d61bef

Refactor sage_attn to use target_modules utilities

518589b

Remove incorrect comment, add all transformers (i.e., transformer*) a…

cea9d90

…s default target module, remove warning print

Remove unused code

06a4220

Update comments

6d18a6c

ruff check fix

7b196e6

Marius-Graml force-pushed the feat/sage-attn branch from 3ea1b21 to 7b196e6 Compare January 19, 2026 16:30

Marius-Graml requested a review from johannaSommer January 19, 2026 16:52

github-actions bot added the stale label Jan 30, 2026

johannaSommer approved these changes Feb 4, 2026

View reviewed changes

johannaSommer merged commit 20d4829 into main Feb 4, 2026
6 checks passed

johannaSommer deleted the feat/sage-attn branch February 4, 2026 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Sage Attention Algorithm#455

feat: Sage Attention Algorithm#455
johannaSommer merged 12 commits intomainfrom
feat/sage-attn

Marius-Graml commented Dec 8, 2025

Uh oh!

johannaSommer left a comment

Uh oh!

Uh oh!

johannaSommer Dec 15, 2025

Uh oh!

johannaSommer Dec 15, 2025

Uh oh!

johannaSommer Dec 15, 2025

Uh oh!

johannaSommer Dec 15, 2025

Uh oh!

johannaSommer Dec 18, 2025

Uh oh!

johannaSommer Dec 18, 2025

Uh oh!

github-actions bot commented Jan 4, 2026

Uh oh!

johannaSommer left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Marius-Graml commented Dec 8, 2025

Description

Related Issue

Type of Change

How Has This Been Tested?

Checklist

Additional Notes

Uh oh!

johannaSommer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johannaSommer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

johannaSommer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

johannaSommer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

johannaSommer Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

johannaSommer Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

johannaSommer Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 4, 2026

Uh oh!

johannaSommer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants