feat: Add TensorRT Edge-LLM AttentionPlugin backend support #4013

chohk88 · 2026-01-14T17:06:30Z

Description

This PR adds TensorRT Edge-LLM AttentionPlugin backend support as an alternative to the default SDPA lowering, providing 1.7x ~ 3.3x performance improvement for LLM inference.

Supported Models: Llama 3.x (3.1 and 3.2), Qwen 2.5, Qwen 3, Qwen3.1

⚠️ Current Implementation: The plugin backend requires building the AttentionPlugin library from a forked repository branch: https://github.com/chohk88/TensorRT-Edge-LLM/tree/feature/torch-tensorrt-python-runtime

This is a temporary solution for the initial implementation. The fork contains Torch-TRT compatibility Python runtime support that is not yet available in the official NVIDIA TensorRT-Edge-LLM repository.

Type of change

Please delete options that are not relevant and/or add your own.

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

narendasan · 2026-01-14T20:42:03Z

@zewenli98 please review

narendasan · 2026-01-14T20:43:24Z

examples/dynamo/attention_plugin_example.py

+
+This example uses a custom TensorRT plugin shared library (``libNvInfer_edgellm_plugin.so``)
+that replaces standard transformer attention operations and RoPE computations with optimized
+CUDA kernels. The plugin source code is available at (internal access only):


@chohk88 can you change this to external links?

narendasan · 2026-01-14T20:45:03Z

examples/dynamo/attention_plugin_example.py

+    - kv_cache_start_idx: [B] starting index in KV cache (required for release version)
+    """
+
+    @torch.library.custom_op("xqa::attn", mutates_args=())


lets call the op tensorrt_edge_llm::xqa_attn

narendasan · 2026-01-14T20:47:46Z

tools/llm/plugin_utils.py

+    - kv_cache_start_idx: [B] starting index in KV cache (required for release version)
+    """
+
+    @torch.library.custom_op("xqa::attn", mutates_args=())


Same thing here: tensorrt_edge_llm::xqa_attn

narendasan · 2026-01-14T20:48:31Z

tools/llm/plugin_utils.py

+        nkv: int,
+        d: int,
+    ) -> Tuple[torch.Tensor, torch.Tensor]:
+        batch_size = qkv.shape[0]


Is it possible to provide a valid implementation here easily? could we lift the kernel from the .so?

This would be a P1/P2 sort of thing, but I think it would be good for the sake of completeness

feat: Add TensorRT Edge-LLM AttentionPlugin backend support

d4480af

chohk88 requested review from lanluo-nvidia, narendasan and zewenli98 January 14, 2026 17:06

chohk88 self-assigned this Jan 14, 2026

chohk88 added the component: plugins label Jan 14, 2026

meta-cla bot added the cla signed label Jan 14, 2026

narendasan reviewed Jan 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add TensorRT Edge-LLM AttentionPlugin backend support #4013

feat: Add TensorRT Edge-LLM AttentionPlugin backend support #4013

Uh oh!

chohk88 commented Jan 14, 2026

Uh oh!

narendasan commented Jan 14, 2026

Uh oh!

narendasan Jan 14, 2026

Uh oh!

narendasan Jan 14, 2026

Uh oh!

narendasan Jan 14, 2026

Uh oh!

narendasan Jan 14, 2026

Uh oh!

narendasan Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add TensorRT Edge-LLM AttentionPlugin backend support #4013

Are you sure you want to change the base?

feat: Add TensorRT Edge-LLM AttentionPlugin backend support #4013

Uh oh!

Conversation

chohk88 commented Jan 14, 2026

Description

Type of change

Checklist:

Uh oh!

narendasan commented Jan 14, 2026

Uh oh!

narendasan Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

narendasan Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

narendasan Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

narendasan Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

narendasan Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants