Skip to content
#

quantisation

Here are 19 public repositories matching this topic...

LoRA fine-tune and serve NVFP4 models on one DGX Spark (GB10, 128 GB UMA): text backbones via generic-family onboarding, and VLM vision towers (Pixtral end-to-end; Llama-4 training-path validated). Fused Triton dequant; runtime-LoRA and merge serving.

  • Updated Jul 5, 2026
  • Python

Distributed GPT-2 fine-tuning with PyTorch FSDP and BF16 mixed precision, INT8 post-training quantisation, a custom Triton quantisation kernel achieving 1.4x throughput over unfused PyTorch, and a full FP32 vs BF16 vs INT8 benchmark suite. 13 tests passing.

  • Updated May 23, 2026
  • Python

Improve this page

Add a description, image, and links to the quantisation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantisation topic, visit your repo's landing page and select "manage topics."

Learn more