-
Notifications
You must be signed in to change notification settings - Fork 686
Adaptive Compression #2807
Copy link
Copy link
Open
Description
Hello team,
I recently came across the SemiAnalysis article “Vera Rubin: Extreme Co-Design as an Evolution” (https://newsletter.semianalysis.com/p/vera-rubin-extreme-co-design-an-evolution) where Adaptive Compression for transformer workloads was discussed. The article mentions significant speedups (50 PFLOPS vs 35 FLOPS), but I could not find detailed information on how this is implemented in the Transformer Engine.
Now that GTC 2026 has concluded, I wanted to ask for clarification on the following:
- Could you provide more details on the implementation of Adaptive Compression in Transformer Engine?
- Specifically, how is sparsity identified and exploited dynamically?
- Are there any public code examples, demos, or documentation illustrating this feature?
Any guidance or pointers would be greatly appreciated, as I am interested in evaluating and experimenting with this feature for transformer model acceleration.
Thank you for your time and support.
Best regards,
Guanchen
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels