Please provide 4 bit, Thank you.
I also would like to have Q6_K or Q5_K_M as a great balance between size and performance (Q4 can be a bit rough around the edges at this model size)
Actually you can do it yourself (although it's not recommended for already quantized models) with llama.cpp
. Just build llama.cpp
from source and then run:
(llama.cpp)$ ./quantize --allow-requantize Magicoder-S-DS-6.7B_q8_0.gguf <output_requantized_model>.gguf q4_k_m
any 4 bit good quants yet?
I just uploaded a q3, q4 and q5 here https://huggingface.co./matthoffner/Magicoder-S-DS-6.7B-GGUF/tree/main
I just uploaded a q3, q4 and q5 here https://huggingface.co./matthoffner/Magicoder-S-DS-6.7B-GGUF/tree/main
Hey thanks, i ended up( merging 417884e regex_gpt2_preprocess pr and it works) , wonder if this will produce any better results or otherwise.
It seems to work ok so far, I set up a space with it here https://huggingface.co./spaces/matthoffner/ggml-coding-llm