is it Double Quantized?

by Respair - opened Jul 13

Jul 13

Sorry but I'm a bit confused. The original model is around 104B, Cohere did a quantized version using Bitsandbytes.
Did you initiate a yet another quantization on top of the already quantized weights? in other words, will it not completely annihilate the performance of the model if you do that ? or is it just a reformatting?

Ainonake

Jul 13

•

edited Jul 13

Did you initiate a yet another quantization on top of the already quantized weights?

No, all quants are made from the original fp16 model including this and cohere's bitsandbytes quants.

or is it just a reformatting?

This is new quant from original weights.

Respair

Jul 13

That's a relief, thanks a lot.
the automated calculation of parameters by HF in the repo's page seemed a bit wrong so that's what made me to ask this. i'll close this discussion.

Respair changed discussion status to closed Jul 13

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment