quantize parameters
Hi, can you share your quantize parameters, I have finetuned model try to quantize in awq and exlv2 I need best performance config for quantize my 32b model
Thanks in advance
also share what is minimum hardware requirement for quantize this 32b model into awq, please
TBH, this model was made long time ago, as far as I can recall, my parameters are listed below:
- precision: 4bit
- version: GEMM
- group size: 128
- zero point: true
- calibration dataset: Orion-zhen/meissa-lima
Hardware requirement:
AutoAWQ will firstly load the model into memory, and quantize the model layer by layer in VRAM, which means the whole model (fp/bf16 weights) should be fitted into your memory. You maybe need 64G+ memory, given that the original model occupies approximately 64G, and your system will consume some memory. As for VRAM, I didn't pay much attention to it. At least 16G VRAM I guess.