Skip to content

Adaptive Compression #2807

@guanchenl

Description

@guanchenl

Hello team,

I recently came across the SemiAnalysis article “Vera Rubin: Extreme Co-Design as an Evolution” (https://newsletter.semianalysis.com/p/vera-rubin-extreme-co-design-an-evolution) where Adaptive Compression for transformer workloads was discussed. The article mentions significant speedups (50 PFLOPS vs 35 FLOPS), but I could not find detailed information on how this is implemented in the Transformer Engine.

Now that GTC 2026 has concluded, I wanted to ask for clarification on the following:

  1. Could you provide more details on the implementation of Adaptive Compression in Transformer Engine?
  2. Specifically, how is sparsity identified and exploited dynamically?
  3. Are there any public code examples, demos, or documentation illustrating this feature?

Any guidance or pointers would be greatly appreciated, as I am interested in evaluating and experimenting with this feature for transformer model acceleration.

Thank you for your time and support.

Best regards,
Guanchen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions