GPUStack

gpustack Public

A GPU cluster manager for high-performance AI model serving (vLLM, SGLang) and on-demand SSH-accessible GPU instances.

Python 5.4k 591

runner Public

Collection of Dockerfiles to build images for various inference services across different accelerated backends.

Dockerfile 15 15

runtime Public

Provides a unified interface to detect GPU resources and manages GPU workloads.

Python 15 21

gguf-parser-go Public

Review/Check GGUF files and estimate the memory usage and maximum tokens per second.

vox-box Public

A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.

Python 214 36

Provide feedback