Joseph717171/Llama-3.1-SuperNova-Lite-8.0B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF

Custom GGUF quants of arcee-ai’s Llama-3.1-SuperNova-Lite-8B, where the Output Tensors are quantized to Q8_0 while the Embeddings are kept at F32. Enjoy! 🧠🔥🚀

Update: For some reason, the model was initially smaller than LLama-3.1-8B-Instruct after quantizing. This has since been rectified: if you want the most intelligent and most capable quantized GGUF version of Llama-3.1-SuperNova-Lite-8.0B, use the OF32.EF32.IQuants. The original OQ8_0.EF32.IQuants will remain in the repo for those who want to use them. Cheers! 😁

Addendum: The OQ8_0.EF32.IQuants are right for the model's size; I was just being naive: I was comparing my OQ8_0.EF32 IQuants of Llama-3.1-SuperNova-Lite-8B to that of my OQ8_0.EF32 IQuants of Hermes-3-Llama-3.1-8B - thinking they were both the same size as my OQ8_0.EF32.IQuants of LLama-3.1-8B-Instruct; they're not: Hereme-3-Llama-3.1-8B is bigger. So, now we have both OQ8_0.EF32.IQuants and OF32.EF32.IQuants, and they're both great quant schemes. The only difference is being, of course, that OF32.EF32.IQuants have even more accuracy at the expense of more vRAM. So, there you have it. I'm a dumbass, but its okay because I learned something, and now we have even more quantizations to play with now. Cheers! 😂😏

Joseph717171
/

Llama-3.1-SuperNova-Lite-8.0B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF

Collection including Joseph717171/Llama-3.1-SuperNova-Lite-8.0B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF

GGUF LLaMa-3.1-8B-Instruct-Based OQ8_0.EF32 IQuants