Merged two runtimes by cehongwang · Pull Request #4164 · pytorch/TensorRT

cehongwang · 2026-04-04T01:22:03Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

Signed-off-by: Torch-TensorRT Github Bot <torch-tensorrt.github.bot@nvidia.com>

narendasan

Just reviewed the core stuff for now. I think this mostly is not really solving the issue. The core idea is that we want to have a Python implementation of Torchbind endpoints (execute_engine / TRTEngine) that lets us run the same programs with either standard torch-trt or python only rather than just having two implementations that are kind of mixed together

narendasan · 2026-04-06T15:41:56Z

        )


+@torch.library.register_fake("tensorrt::execute_engine_python")  # type: ignore


Why do we need a seperate operator for this arent we just changing the implementation of TRTEngine to either be python or C++?

The problem is that if its a separate op then you can't interchange between C++ and Python only builds

narendasan · 2026-04-06T15:44:38Z

+# ---------------------------------------------------------------------------
+
+
+class PythonTRTEngine:


I think this class should be "TRTEngine" and only "registered" if the C++ runtime is unavailable. It should also be a valid script object so that the same operator works with the Python and C++ versions of the objects and it should uses the exact same APIs as the ones we expose in the JIT_hooks file

narendasan · 2026-04-06T15:45:40Z

+register_opaque_type(PythonTRTEngine, typ="reference")
+
+
+@torch.library.custom_op(  # type: ignore[misc]


Same thing here. this operator should only get registered if the C++ library is not available and it should take the name of the C++ op

narendasan · 2026-04-06T15:46:25Z

+def execute_engine_python(
+    input_tensors: List[torch.Tensor], engine: PythonTRTEngine
+) -> List[torch.Tensor]:
+    outputs = engine.execute(input_tensors)


Would rather use a struct + function design rather than some masked call to a method similar to the c++ structure

narendasan · 2026-04-06T15:47:44Z

Its cool that we have this but we should look into if there is a way to drop / mask registrations to change the runtime implementation rather than relying on distinct graph constructions

narendasan · 2026-04-06T15:54:28Z

            return
+
+        if self._is_python_runtime:
+            self.engine = PythonTRTEngine(


Yeah we should be trying to monitor torch bind registration and register a class if there is no C++ api rather than two code paths

narendasan · 2026-04-06T15:55:04Z

-        metadata = pickle.loads(dumped_metadata)
-        return metadata
+    def decode_metadata(encoded_metadata: bytes | str) -> Any:
+        if isinstance(encoded_metadata, str):


Why was this rewritten?

narendasan · 2026-04-06T15:55:46Z

+        )

-    def set_extra_state(self, state: SerializedTorchTensorRTModuleFmt) -> None:
+    def set_extra_state(self, state: TorchTensorRTModuleExtraState) -> None:


Why are we changing any of this it should be the same

narendasan · 2026-04-06T15:56:13Z

+            metadata["output_tensors_are_unowned"]
+        )
+
+    def __del__(self) -> None:


Is this necessary, cant we just use del in the actual TRTEngine class?

narendasan · 2026-04-09T01:24:08Z

-SerializedTensorRTEngineFmt = List[
-    Union[str, bytes]
-]  # Aligned with  //core/runtime/register_jit_hooks.cpp
 SerializedTorchTensorRTModuleFmt = Tuple[


This should be with the other serialization info

narendasan · 2026-04-09T01:26:14Z


        engine_info: List[str | bytes] = [""] * SERIALIZATION_LEN
-        engine_info[ABI_TARGET_IDX] = torch.ops.tensorrt.ABI_VERSION()
+        engine_info[ABI_TARGET_IDX] = (


We should centralize this so its like generate_engine_info and will choose the right source internally

narendasan · 2026-04-09T01:27:57Z

            return (
                self.name,
-                self.engine.__getstate__(),
+                engine_info,


Does python self.engine not have getstate()?

narendasan · 2026-04-09T01:28:57Z

-            list(input_tensors), self.engine
-        )
-
+        outputs = list(torch.ops.tensorrt.execute_engine(input_tensors, self.engine))


Why do we need this casting?

narendasan · 2026-04-09T01:29:18Z

        input_tensors: List[torch.Tensor] = [
-            (i if isinstance(i, torch.Tensor) else torch.tensor(i).cuda())
-            for i in inputs
+            (value if isinstance(value, torch.Tensor) else torch.tensor(value).cuda())


Is this just renaming a variable?

Yes I feel like this is better than "i" but if you think it is unnecessary I can change it back

narendasan · 2026-04-09T01:30:33Z

+            engine_bytes = engine_info[ENGINE_IDX]
+            engine_info[ENGINE_IDX] = base64.b64encode(engine_bytes).decode("utf-8")
+            trt_node = gm.graph.call_function(
+                torch.ops.tensorrt.no_op_placeholder_for_execute_engine.default,


Why do we need to use this? This means that we break compatibility with torch.export.load

Also does this even work if you try to load and run with a python only build?

I use this to decouple the dependencies on the C++ build. When serializing a C++ engine and loading it in the Python runtime.

This works with python only build, python built engine and load in C++ runtime, C++ build engine and load in Python runtime.

Let me look into the opaque object

narendasan · 2026-04-09T01:40:31Z

Like why can we use the c++ version of this https://github.com/pytorch/pytorch/blob/e2584b2554d11fda4998a8d2be6145b0eded5049/torch/_library/opaque_object.py#L73 like you do in python and then align the serialization so we dont need the no op

narendasan · 2026-04-24T00:31:29Z

  int64_t get_streamable_device_memory_budget();
  int64_t get_automatic_device_memory_budget();
  std::vector<at::Tensor> infer_outputs(std::vector<std::vector<int64_t>> input_shapes);
-  void set_pre_allocated_outputs(bool enable);


Why was this removed,

narendasan · 2026-04-24T00:34:09Z

-        )
-    else:
-        metadata = TorchTensorRTModule.decode_metadata(
-            fake_trt_engine.get_serialized_metadata()


Do we not have this method anymore?

narendasan · 2026-04-24T00:35:15Z

Can we just call this file _TRTEngine.py

narendasan · 2026-04-24T00:37:19Z

+            for i, shape in enumerate(output_shapes)
+        ]
+
+    @torch.library.custom_op(  # type: ignore[misc]


Is this still needed here?

narendasan · 2026-04-24T00:38:56Z

    version_compatible: bool = _defaults.VERSION_COMPATIBLE,
    optimization_level: Optional[int] = _defaults.OPTIMIZATION_LEVEL,
-    use_python_runtime: bool = _defaults.USE_PYTHON_RUNTIME,
+    use_python_runtime: bool = False,  # Does nothing. Kept for backward compatibility.


When we detect this is set, throw a deprecation warning

cehongwang requested a review from narendasan April 4, 2026 01:22

meta-cla Bot added the cla signed label Apr 4, 2026

cehongwang requested a review from zewenli98 April 4, 2026 01:22

cehongwang force-pushed the cehongw/python-runtime-rework branch 3 times, most recently from 878f4b4 to 27703d5 Compare April 4, 2026 01:56

docs: [Automated] Regenerating documenation for d97cb7a

ef0662c

Signed-off-by: Torch-TensorRT Github Bot <torch-tensorrt.github.bot@nvidia.com>

cehongwang force-pushed the cehongw/python-runtime-rework branch from 27703d5 to ef0662c Compare April 4, 2026 02:05

narendasan reviewed Apr 6, 2026

View reviewed changes

Merged two operator and fixed some comments

5b1bde7

cehongwang force-pushed the cehongw/python-runtime-rework branch from cc7d4b6 to 5b1bde7 Compare April 6, 2026 20:33

Enabled op agnostic serialization for both runtime

1755001

cehongwang force-pushed the cehongw/python-runtime-rework branch from 694755f to 1755001 Compare April 8, 2026 19:47

narendasan reviewed Apr 9, 2026

View reviewed changes

Enabled cross serialization

01f5d13

cehongwang force-pushed the cehongw/python-runtime-rework branch from 65d604a to 01f5d13 Compare April 13, 2026 23:49

cehongwang added 3 commits April 15, 2026 22:34

Added Engine serializer

5a2f3d2

Added checks for serialization format

09f2b88

Added tests

030f44b

narendasan mentioned this pull request Apr 20, 2026

feat: add TRT-RTX native CUDA graph support #4187

Draft

7 tasks

cehongwang added 2 commits April 21, 2026 23:22

Changed tests and docs

231434e

run all tests and fixed bugs

8677198

narendasan reviewed Apr 24, 2026

View reviewed changes

Comment thread py/torch_tensorrt/dynamo/runtime/_PythonTRTEngine.py

Copy link
Copy Markdown

Collaborator

narendasan Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just call this file _TRTEngine.py

narendasan reviewed Apr 24, 2026

View reviewed changes

tp5uiuc mentioned this pull request Apr 24, 2026

feat(runtime): add TensorRT-RTX runtime cache, dynamic shapes strategy, and native CUDA graph support to C++ runtime #4202

Draft

10 tasks

cehongwang force-pushed the cehongw/python-runtime-rework branch from 05954a6 to 392fab7 Compare April 24, 2026 19:28

change the shape info

bb550df

cehongwang force-pushed the cehongw/python-runtime-rework branch from 392fab7 to bb550df Compare April 24, 2026 19:30

		)


		@torch.library.register_fake("tensorrt::execute_engine_python") # type: ignore

		# ---------------------------------------------------------------------------


		class PythonTRTEngine:

		register_opaque_type(PythonTRTEngine, typ="reference")


		@torch.library.custom_op( # type: ignore[misc]

Conversation

cehongwang commented Apr 4, 2026

Description

Type of change

Checklist:

Uh oh!

narendasan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

narendasan commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

narendasan commented Apr 9, 2026 •

edited

Loading