I'm now working on finetuning of coding models. If you are GPU-hungry like me, you will find quantized models very helpful. But quantization for finetuning and inference are different and incompatible. So I made two collections here.
For quantization, the inference models are far more popular on HF than finetuning models. I use https://huggingface.co./QuantFactory to generate inference models (GGUF), and there are a few other choices.
But there hasn't been such a service for finetuning models. DIY isn't too hard though. I made a few myself and you can find the script in the model cards. If the original model is small enough, you can even do it on a free T4 (available via Google Colab).
If you know a (small) coding model worthy of quantization, please let me know and I'd love to add it to the collections.