Post
540
Check out AutoRound, SOTA LLM quantization algorithm across 2-4 bits without adding any inference overhead to any model
paper: https://arxiv.org/abs/2309.05516
github: https://github.com/intel/auto-round
lowbits leaderboard: https://huggingface.co./spaces/Intel/low-bit-leaderboard
paper: https://arxiv.org/abs/2309.05516
github: https://github.com/intel/auto-round
lowbits leaderboard: https://huggingface.co./spaces/Intel/low-bit-leaderboard