diff --git a/samples/python/quickly_deployable_plugins/README.md b/samples/python/quickly_deployable_plugins/README.md index 3dd0d03d..875c4ebf 100644 --- a/samples/python/quickly_deployable_plugins/README.md +++ b/samples/python/quickly_deployable_plugins/README.md @@ -66,7 +66,7 @@ def add_plugin_desc(inp0: trtp.TensorDesc, block_size: int) -> trtp.TensorDesc: return inp0.like() ``` -The argument "sample::elemwise_add_plugin" defines the namespace ("sample") and name ("elemwise_add_plugin") of the plugin. Input arguments to the decorated function (`plugin_desc`) annotated with `trt.plugin.TensorDesc` denote the input tensors; all others are interpreted as plugin attributes (see the [TRT API Reference](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/tensorrt.plugin/trt_plugin_register.html) for a full list of allowed attribute types). The output signature is a `trt.plugin.TensorDesc` describing the output. `inp0.like()` returns a tensor descriptor with identical shape and type characteristics to `inp0`. +The argument "sample::elemwise_add_plugin" defines the namespace ("sample") and name ("elemwise_add_plugin") of the plugin. Input arguments to the decorated function (`plugin_desc`) annotated with `trt.plugin.TensorDesc` denote the input tensors; all others are interpreted as plugin attributes (see the [TRT API Reference](https://docs.nvidia.com/deeplearning/tensorrt/latest/_static/python-api/infer/tensorrt.plugin/trt_plugin_register.html) for a full list of allowed attribute types). The output signature is a `trt.plugin.TensorDesc` describing the output. `inp0.like()` returns a tensor descriptor with identical shape and type characteristics to `inp0`. The computation function, decorated with `trt.plugin.impl`, receives `trt.plugin.Tensor`s for each input and output. In contrast to `TensorDesc`s, a `Tensor` references an underlying data buffer, directly accessible through `Tensor.data_ptr`. When working with Torch and OpenAI Triton kernels, it is easier to use `torch.as_tensor()` to zero-copy construct a `torch.Tensor` corresponding to the `trt.plugin.Tensor`. @@ -124,7 +124,7 @@ Non-zero is an operation where the indices of the non-zero elements of the input To handle DDS, the extent of each data-dependent output dimension must be expressed in terms of a *_size tensor_*, which is a scalar that communicates to TRT an upper-bound and an autotune value for that dimension, in terms of the input shapes. The TRT engine build may be optimized for the autotune value, but the extent of that dimension may stretch up to the upper-bound at runtime. -In this sample, we consider a 2D input tensor `inp0`; the output will be an $N x 2$ tensor (a set of $N$ 2D indices), where $N$ is the number of non-zero indices. At maximum, all elements could be non-zero, and so the upper-bound could be expressed as `upper_bound = inp0.shape_expr[0] * inp0.shape_expr[1]`. Note that `trt.plugin.TensorDesc.shape_expr` returns symbolic shape expressions for that tensor. Arithmetic operations on shape expressions are supported through standard Python binary operators (see [TRT Python API reference](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/tensorrt.plugin/Shape/ShapeExpr.html) for full list of supported operations). +In this sample, we consider a 2D input tensor `inp0`; the output will be an $N x 2$ tensor (a set of $N$ 2D indices), where $N$ is the number of non-zero indices. At maximum, all elements could be non-zero, and so the upper-bound could be expressed as `upper_bound = inp0.shape_expr[0] * inp0.shape_expr[1]`. Note that `trt.plugin.TensorDesc.shape_expr` returns symbolic shape expressions for that tensor. Arithmetic operations on shape expressions are supported through standard Python binary operators (see [TRT Python API reference](https://docs.nvidia.com/deeplearning/tensorrt/latest/_static/python-api/infer/tensorrt.plugin/Shape/ShapeExpr.html) for full list of supported operations). On average, we can expect half of the input to be filled with zero, so a size tensor can be constructed with that as the autotune value: ```python @@ -157,7 +157,7 @@ python3 qdp_runner.py non_zero [-v] This sample contains a circular padding plugin, which is useful for ops like circular convolution. It is equivalent to PyTorch's [torch.nn.CircularPad2d](https://pytorch.org/docs/stable/generated/torch.nn.CircularPad2d.html#torch.nn.CircularPad2d). -Refer [this section about circular padding plugin](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/pluginGuide.html#example-circular-padding-plugin) in the python plugin guide for more info. +Refer [this section about circular padding plugin](https://docs.nvidia.com/deeplearning/tensorrt/latest/_static/python-api/pluginGuide.html#example-circular-padding-plugin) in the python plugin guide for more info. ## ONNX model with a plugin @@ -205,7 +205,7 @@ def circ_pad_plugin_autotune(inp0: trtp.TensorDesc, pads: npt.NDArray[np.int32], Note that we're using another way of constructing a `trt.plugin.AutoTuneCombination` here -- namely, through `pos(...)` to populate the type/format information and `tactics(...)` to specify the tactics. In this sample, we use an OpenAI Triton kernel and `torch.nn.functional.pad` as two methods to compute the circular padding. -Refer [this section](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/pluginGuide.html#example-plugins-with-multiple-backends-using-custom-tactics) in the Python plugin guide for more info. +Refer [this section](https://docs.nvidia.com/deeplearning/tensorrt/latest/_static/python-api/pluginGuide.html#example-plugins-with-multiple-backends-using-custom-tactics) in the Python plugin guide for more info. ## Loading and running a TRT engine containing a plugin @@ -234,7 +234,7 @@ Let's extend the [above sample](#using-multiple-tactics-and-onnx-cirular-padding Instead of specifying the OpenAI Triton Kernel callback to TRT through `@trt.plugin.impl`, we can directly compile the kernel ahead of time, and provide that to TRT under `@trt.plugin.aot_impl`. -Refer [this section](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/pluginGuide.html#providing-an-ahead-of-time-aot-implementation) in the Python plugin guide for more info. +Refer [this section](https://docs.nvidia.com/deeplearning/tensorrt/latest/_static/python-api/pluginGuide.html#providing-an-ahead-of-time-aot-implementation) in the Python plugin guide for more info. ## ONNX model with an AOT plugin @@ -284,10 +284,10 @@ options: # Additional Resources **Python Plugin Guide** -- [pluginGuide.md](../../../documentation/python/pluginGuide.md) +- [Python Plugin Guide](https://docs.nvidia.com/deeplearning/tensorrt/latest/_static/python-api/pluginGuide.html) **`tensorrt.plugin` API reference** -- [`tensorrt.plugin` module API reference](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/tensorrt.plugin/index.html) +- [`tensorrt.plugin` module API reference](https://docs.nvidia.com/deeplearning/tensorrt/latest/_static/python-api/infer/tensorrt.plugin/index.html) **Developer Guide** - [Extending TensorRT with Custom Layers](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#extending)