FP8 Quantized model now available! (only requires half the original model's VRAM)
#33
by
mysticbeing
- opened
Runs on 1x H100 / A100 (80GB) : https://huggingface.co./mysticbeing/Llama-3.1-Nemotron-70B-Instruct-HF-FP8-DYNAMIC
Weight-and-activation quantization to FP8 is virtually lossless, as the text generated by FP8 models is nearly indistinguishable from that of their unquantized counterparts, requiring a very close examination to notice any differences.