Popular repositories Loading
-
-
llama.cpp_turboquant
llama.cpp_turboquant PublicLLM inference with 7x KV cache compression. Combines llama.cpp (production inference engine) with TurboQuant (KV quantization). Run 131K token context on 16GB VRAM. OpenAI-compatible API server. Su…
Shell
-
quant.cpp
quant.cpp PublicForked from quantumaikr/quant.cpp
LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.
C
-
vllm
vllm PublicForked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
-
unzed
unzed PublicForked from zed-industries/zed
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
Rust
Repositories
- unzed Public Forked from zed-industries/zed
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
zetta-app/unzed’s past year of commit activity - quant.cpp Public Forked from quantumaikr/quant.cpp
LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.
zetta-app/quant.cpp’s past year of commit activity - vllm Public Forked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
zetta-app/vllm’s past year of commit activity - llama.cpp_turboquant Public
LLM inference with 7x KV cache compression. Combines llama.cpp (production inference engine) with TurboQuant (KV quantization). Run 131K token context on 16GB VRAM. OpenAI-compatible API server. Supports 100+ model architectures.
zetta-app/llama.cpp_turboquant’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…