other quants available?

by veryVANYA - opened Apr 5

Discussion

veryVANYA

Apr 5

ty for uploading experimental so quick, curious if you've got the 4,5,6,8 as well

pmysl

Owner Apr 5

Yes, I'm just uploading them now

veryVANYA

Apr 5

got them ty, how did you load them in? i combined but not able to load on lm studio

luzamu

Apr 5

•

edited Apr 5

Will here be IQ quants? My potato server really need them😭. Whatever, thanks for all your work!

pmysl

Owner Apr 5

got them ty, how did you load them in? i combined but not able to load on lm studio

I built the llama.cpp fork mentioned in the readme and used it for inference. Combined weights won't work in LM Studio because the bundled llama.cpp version doesn't support them

Will here be IQ quants? My potato server really need them😭. Whatever, thanks for all your work!

If you're looking for IQ quants you can check this repo: https://huggingface.co./dranger003/c4ai-command-r-plus-iMat.GGUF

Kalemnor

Apr 6

Shouldn't the 2_K version be around 25GB? Why is it 40GB?

pmysl

Owner Apr 7

This is the effect of how Q2_K quantization works in llama.cpp. Not all tensors have Q2_K precision. In the image below, you can see what this looks like for the first block in the Q2_K model

Kalemnor

Apr 7

So 32 GB(16x2) VRAM are not enough without offloading some layers on the RAM.

pmysl

Owner Apr 7

•

edited Apr 7

Unfortunately, yes. If you want to move all layers to GPUs, check the imatrix quants (such as IQ2_XXS) from the dranger003/c4ai-command-r-plus-iMat.GGUF repo. They're smaller than 32 GB

Kalemnor

Apr 7

Thanks for the direction.

pmysl changed discussion status to closed Apr 8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment