Skip to content

TensorRT uses a large amount of virtual cpu memory after calling createExecutionContext  #4684

@haibozhang123

Description

@haibozhang123

Description

On Windows, TensorRT reports a GPU out-of-memory error when 12 models are loaded(16G free memory of gpu ). Further investigation shows that TensorRT execution contexts consume a large amount of system virtual memory, which eventually becomes exhausted and triggers the error. I found that any model consumes a large amount of virtual memory. why ?

Environment

TensorRT Version: 8.6 or 10.x

NVIDIA GPU: all

NVIDIA Driver Version:

CUDA Version: cuda12.8

CUDNN Version: None

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model : same onnx

Steps To Reproduce

Commands or scripts:

auto GetCurrentProcessCommitBytes = []() {
  PROCESS_MEMORY_COUNTERS_EX pmc{};
  if (!GetProcessMemoryInfo(GetCurrentProcess(),
                            reinterpret_cast<PROCESS_MEMORY_COUNTERS *>(&pmc),
                            sizeof(pmc))) {
    return 0;
  }

  // Commit Bytes = PrivateUsage
  auto bytes = static_cast<uint64_t>(pmc.PrivateUsage);
  SPDLOG_INFO(" ----------------- commit memory ----------------- {}  GB",
              bytes / (1024 * 1024 * 1024));
  return bytes;
}
  
  std::unique_ptr<nvinfer1::ICudaEngine> engine_;
  std::unique_ptr<nvinfer1::IExecutionContext> context_;
  ........................ 
  context_.reset(engine_->createExecutionContext());
  GetCurrentProcessCommitBytes();

result :
[2026-01-21 16:30:44.209] [info] [trt_onnx_engine.cpp:50] ----------------- commit memory ----------------- 4 GB

and

  ........................ 
  context_.reset(engine_->createExecutionContext());
  GetCurrentProcessCommitBytes();
  context_.reset(engine_->createExecutionContext(
      nvinfer1::ExecutionContextAllocationStrategy::kUSER_MANAGED));
  auto size = context_->updateDeviceMemorySizeForShapes();
  cudaMalloc((void **)&buffer_, size);
  context_->setDeviceMemoryV2(buffer_, static_cast<int64_t>(size));
  GetCurrentProcessCommitBytes();
 

result :
[2026-01-21 16:30:44.209] [info] [trt_onnx_engine.cpp:50] ----------------- commit memory ----------------- 0 GB
[2026-01-21 16:30:44.209] [info] [trt_onnx_engine.cpp:50] ----------------- commit memory ----------------- 4 GB

Have you tried the latest release?:

Attach the captured .json and .bin files from TensorRT's API Capture tool if you're on an x86_64 Unix system

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:RuntimeOther generic runtime issues that does not fall into other modules

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions