Orion-zhen/aya-expanse-32b-AWQ · quantize parameters

1 day ago

Hi, can you share your quantize parameters, I have finetuned model try to quantize in awq and exlv2 I need best performance config for quantize my 32b model
Thanks in advance

devops724

1 day ago

also share what is minimum hardware requirement for quantize this 32b model into awq, please

Orion-zhen

Owner about 23 hours ago

•

edited about 23 hours ago

TBH, this model was made long time ago, as far as I can recall, my parameters are listed below:

precision: 4bit
version: GEMM
group size: 128
zero point: true
calibration dataset: Orion-zhen/meissa-lima

Hardware requirement:

AutoAWQ will firstly load the model into memory, and quantize the model layer by layer in VRAM, which means the whole model (fp/bf16 weights) should be fitted into your memory. You maybe need 64G+ memory, given that the original model occupies approximately 64G, and your system will consume some memory. As for VRAM, I didn't pay much attention to it. At least 16G VRAM I guess.