feat: multiple optimization profiles for disjoint input shape regimes by cehongwang · Pull Request #4325 · pytorch/TensorRT

cehongwang · 2026-06-08T21:35:20Z

Add support for defining N optimization profiles at compile time via the list-based Input.profiles API and selecting the active profile at runtime (manual pin by index, or opt-in shape-based auto-selection).

AOT (torch.export) compile path builds one TRT optimization profile per declared profile index; submodules inherit the profile count via propagation across graph breaks.
Python and C++ runtimes expose a matching primitive engine API (set_active_profile / num_optimization_profiles / _active_profile_index / _auto_select_profiles) so the two runtimes remain interchangeable.
Profile selection is exposed through the optimization_profile context manager; auto-selection uses lazy (first-fitting) profile selection.
Backward compatible: engines without declared profiles keep the historical single-profile (dynamic) / no-profile (static) behavior.

Includes an example and runtime tests covering dynamic submodule inputs.

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

Add support for defining N optimization profiles at compile time via the list-based ``Input.profiles`` API and selecting the active profile at runtime (manual pin by index, or opt-in shape-based auto-selection). - AOT (torch.export) compile path builds one TRT optimization profile per declared profile index; submodules inherit the profile count via propagation across graph breaks. - Python and C++ runtimes expose a matching primitive engine API (set_active_profile / num_optimization_profiles / _active_profile_index / _auto_select_profiles) so the two runtimes remain interchangeable. - Profile selection is exposed through the optimization_profile context manager; auto-selection uses lazy (first-fitting) profile selection. - Backward compatible: engines without declared profiles keep the historical single-profile (dynamic) / no-profile (static) behavior. Includes an example and runtime tests covering dynamic submodule inputs.

narendasan · 2026-06-09T00:29:23Z

+  if (profile_index == active_profile_index) {
+    return;
+  }
+  auto stream = c10::cuda::getCurrentCUDAStream(device_info.id);


Does this work with the green context pr?

narendasan · 2026-06-09T00:30:44Z

+      }
+      const auto& dims = ranges_it->second;
+      auto sizes = inputs[i].sizes();
+      for (size_t d = 0; d < sizes.size(); ++d) {


Can we cache only what the dynamic dimension is for each profile and its ranges? Then we dont need to search mostly static dims

meta-cla Bot added the cla signed label Jun 8, 2026

cehongwang force-pushed the cehongw/multi-optimization-profile branch from f32fed3 to 427643d Compare June 8, 2026 23:32

github-actions Bot added the documentation Improvements or additions to documentation label Jun 8, 2026

cehongwang force-pushed the cehongw/multi-optimization-profile branch from 427643d to 2cd4797 Compare June 9, 2026 00:24

cehongwang requested review from apbose and narendasan June 9, 2026 00:28

apbose force-pushed the abose/dynamic-shapes-passthrough branch from f907b64 to 9f9055a Compare June 12, 2026 19:43

apbose and others added 5 commits June 16, 2026 20:25

dynamic shape arg

89443d8

shared dynamic dims across inputs via Inputs

b781ae4

adding testcase

e1cff6e

replacing named_dims with shared_dims

896857b

cehongwang force-pushed the cehongw/multi-optimization-profile branch from 2cd4797 to a0eeae7 Compare June 16, 2026 20:25

narendasan reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: multiple optimization profiles for disjoint input shape regimes#4325

feat: multiple optimization profiles for disjoint input shape regimes#4325
cehongwang wants to merge 5 commits into
abose/dynamic-shapes-passthroughfrom
cehongw/multi-optimization-profile

cehongwang commented Jun 8, 2026

Uh oh!

narendasan Jun 9, 2026

Uh oh!

narendasan Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cehongwang commented Jun 8, 2026

Description

Type of change

Checklist:

Uh oh!

narendasan Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

narendasan Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants