other quants available?
ty for uploading experimental so quick, curious if you've got the 4,5,6,8 as well
Yes, I'm just uploading them now
got them ty, how did you load them in? i combined but not able to load on lm studio
Will here be IQ quants? My potato server really need them😭. Whatever, thanks for all your work!
got them ty, how did you load them in? i combined but not able to load on lm studio
I built the llama.cpp
fork mentioned in the readme and used it for inference. Combined weights won't work in LM Studio because the bundled llama.cpp
version doesn't support them
Will here be IQ quants? My potato server really need them😭. Whatever, thanks for all your work!
If you're looking for IQ quants you can check this repo: https://huggingface.co./dranger003/c4ai-command-r-plus-iMat.GGUF
Shouldn't the 2_K version be around 25GB? Why is it 40GB?
So 32 GB(16x2) VRAM are not enough without offloading some layers on the RAM.
Unfortunately, yes. If you want to move all layers to GPUs, check the imatrix quants (such as IQ2_XXS
) from the dranger003/c4ai-command-r-plus-iMat.GGUF repo. They're smaller than 32 GB
Thanks for the direction.