TensorRT uses a large amount of virtual  cpu memory after calling createExecutionContext 

## Description


On Windows, TensorRT reports a GPU out-of-memory error when 12 models are loaded（16G free memory of gpu ）. Further investigation shows that TensorRT execution contexts consume a large amount of system virtual memory, which eventually becomes exhausted and triggers the error. I found that any model consumes a large amount of virtual memory.  why ?  


## Environment



**TensorRT Version**:  8.6 or 10.x

**NVIDIA GPU**: all 

**NVIDIA Driver Version**:

**CUDA Version**: cuda12.8 

**CUDNN Version**: None
 

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):


## Relevant Files



Model :  same onnx 


## Steps To Reproduce



**Commands or scripts**:
```
auto GetCurrentProcessCommitBytes = []() {
  PROCESS_MEMORY_COUNTERS_EX pmc{};
  if (!GetProcessMemoryInfo(GetCurrentProcess(),
                            reinterpret_cast<PROCESS_MEMORY_COUNTERS *>(&pmc),
                            sizeof(pmc))) {
    return 0;
  }

  // Commit Bytes = PrivateUsage
  auto bytes = static_cast<uint64_t>(pmc.PrivateUsage);
  SPDLOG_INFO(" ----------------- commit memory ----------------- {}  GB",
              bytes / (1024 * 1024 * 1024));
  return bytes;
}
  
  std::unique_ptr<nvinfer1::ICudaEngine> engine_;
  std::unique_ptr<nvinfer1::IExecutionContext> context_;
  ........................ 
  context_.reset(engine_->createExecutionContext());
  GetCurrentProcessCommitBytes();
```

result :  
[2026-01-21 16:30:44.209] [info] [trt_onnx_engine.cpp:50]  ----------------- commit memory ----------------- 4  GB


and 
```
  ........................ 
  context_.reset(engine_->createExecutionContext());
  GetCurrentProcessCommitBytes();
  context_.reset(engine_->createExecutionContext(
      nvinfer1::ExecutionContextAllocationStrategy::kUSER_MANAGED));
  auto size = context_->updateDeviceMemorySizeForShapes();
  cudaMalloc((void **)&buffer_, size);
  context_->setDeviceMemoryV2(buffer_, static_cast<int64_t>(size));
  GetCurrentProcessCommitBytes();
 
```

result :  
[2026-01-21 16:30:44.209] [info] [trt_onnx_engine.cpp:50]  ----------------- commit memory ----------------- 0  GB
[2026-01-21 16:30:44.209] [info] [trt_onnx_engine.cpp:50]  ----------------- commit memory ----------------- 4  GB

**Have you tried [the latest release](https://developer.nvidia.com/tensorrt)?**:

**Attach the captured .json and .bin files from [TensorRT's API Capture tool](https://docs.nvidia.com/deeplearning/tensorrt/latest/inference-library/capture-replay.html) if you're on an x86_64 Unix system**

**Can this model run on other frameworks?** For example run ONNX model with ONNXRuntime (`polygraphy run <model.onnx> --onnxrt`):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT uses a large amount of virtual cpu memory after calling createExecutionContext #4684

Description

Environment

Relevant Files

Steps To Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TensorRT uses a large amount of virtual cpu memory after calling createExecutionContext #4684

Description

Description

Environment

Relevant Files

Steps To Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions