A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
-
Updated
Mar 12, 2026 - C++
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Thin, unified, C++-flavored wrappers for the CUDA APIs
Training neural networks in TensorFlow 2.0 with 5x less memory
A Toolkit for Training, Tracking, Saving Models and Syncing Results
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
OpenCV & Spout C++ library. Shared GPU memory and processing at reach.
Rust embedded things running on the seL4 microkernel for the Raspberry Pi 3
A simple tool to find out GPU VRAM requirements for running LLMs
A tiny, useful command-line tool to show each user gpu usage, pid under each gpu, provide more details than nvidia-smi/gpustat
Accurate VRAM calculator for Local LLMs (Llama 4, DeepSeek V3, Qwen 2.5). Calculates GGUF quantization, GQA context overhead, and offloading limits
Demonstration of generating mini-batches in Tensorlfow from GPU memory.
A fork of Kubernetes with support of schedulable resource of NVIDIA GPU memory
Dynamic GPU Layer Swapping: Train large models on consumer GPUs with intelligent memory management
A CLI tool for estimating GPU VRAM requirements for Hugging Face models, supporting various data types, parallelization strategies, and fine-tuning scenarios like LoRA.
Research harness for evaluating query-time bounded elimination of reconstructable KV-cache witnesses in long-context transformer inference workloads. Related provisional filing: IN 202641062451.
Context saving when switching local LLM backends
Tiered GPU memory architecture for consumer AI inference. VRAM as execution cache, system RAM as passive staging layer.
📊 A command line monitoring tool (graph) for NVIDIA GPUs
Add a description, image, and links to the gpu-memory topic page so that developers can more easily learn about it.
To associate your repository with the gpu-memory topic, visit your repo's landing page and select "manage topics."