-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
On Windows, TensorRT reports a GPU out-of-memory error when 12 models are loaded(16G free memory of gpu ). Further investigation shows that TensorRT execution contexts consume a large amount of system virtual memory, which eventually becomes exhausted and triggers the error. I found that any model consumes a large amount of virtual memory. why ?
Environment
TensorRT Version: 8.6 or 10.x
NVIDIA GPU: all
NVIDIA Driver Version:
CUDA Version: cuda12.8
CUDNN Version: None
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model : same onnx
Steps To Reproduce
Commands or scripts:
auto GetCurrentProcessCommitBytes = []() {
PROCESS_MEMORY_COUNTERS_EX pmc{};
if (!GetProcessMemoryInfo(GetCurrentProcess(),
reinterpret_cast<PROCESS_MEMORY_COUNTERS *>(&pmc),
sizeof(pmc))) {
return 0;
}
// Commit Bytes = PrivateUsage
auto bytes = static_cast<uint64_t>(pmc.PrivateUsage);
SPDLOG_INFO(" ----------------- commit memory ----------------- {} GB",
bytes / (1024 * 1024 * 1024));
return bytes;
}
std::unique_ptr<nvinfer1::ICudaEngine> engine_;
std::unique_ptr<nvinfer1::IExecutionContext> context_;
........................
context_.reset(engine_->createExecutionContext());
GetCurrentProcessCommitBytes();
result :
[2026-01-21 16:30:44.209] [info] [trt_onnx_engine.cpp:50] ----------------- commit memory ----------------- 4 GB
and
........................
context_.reset(engine_->createExecutionContext());
GetCurrentProcessCommitBytes();
context_.reset(engine_->createExecutionContext(
nvinfer1::ExecutionContextAllocationStrategy::kUSER_MANAGED));
auto size = context_->updateDeviceMemorySizeForShapes();
cudaMalloc((void **)&buffer_, size);
context_->setDeviceMemoryV2(buffer_, static_cast<int64_t>(size));
GetCurrentProcessCommitBytes();
result :
[2026-01-21 16:30:44.209] [info] [trt_onnx_engine.cpp:50] ----------------- commit memory ----------------- 0 GB
[2026-01-21 16:30:44.209] [info] [trt_onnx_engine.cpp:50] ----------------- commit memory ----------------- 4 GB
Have you tried the latest release?:
Attach the captured .json and .bin files from TensorRT's API Capture tool if you're on an x86_64 Unix system
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):