why not release the 7b model?
I would like to have smaller one too, I am using phi-3.5-mini-instruct with success, though would like some upgrade to run on my 16 GB RAM and 4 GB GPU
I would like to have smaller one too, I am using phi-3.5-mini-instruct with success, though would like some upgrade to run on my 16 GB RAM and 4 GB GPU
try the unsloth 4 bit dynamic quant, it gets nearly the same performance as 16 bit and fits in under 15GB
Thank you, how would I unsloth, how do I do it? Which command? I know how to use llama-quantize command, but please help me with specific one.
Thank you, how would I unsloth, how do I do it? Which command? I know how to use llama-quantize command, but please help me with specific one.
you can get the already quantized model here: https://huggingface.co./unsloth/phi-4-unsloth-bnb-4bit
it also links a colab notebook you can use for inference and finetuning of it, I assume all you have to do is change the model its using and lower the batch size.