is it Double Quantized?
Sorry but I'm a bit confused. The original model is around 104B, Cohere did a quantized version using Bitsandbytes.
Did you initiate a yet another quantization on top of the already quantized weights? in other words, will it not completely annihilate the performance of the model if you do that ? or is it just a reformatting?
Did you initiate a yet another quantization on top of the already quantized weights?
No, all quants are made from the original fp16 model including this and cohere's bitsandbytes quants.
or is it just a reformatting?
This is new quant from original weights.
That's a relief, thanks a lot.
the automated calculation of parameters by HF in the repo's page seemed a bit wrong so that's what made me to ask this. i'll close this discussion.