Papers
arxiv:2309.05516

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Published on Sep 11, 2023
· Submitted by akhaliq on Sep 12, 2023
Authors:
,
Xin He ,

Abstract

Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their considerable memory and storage requirements. In response to this issue, weight-only quantization, particularly 3 and 4-bit weight-only quantization, has emerged as one of the most viable solutions. As the number of bits decreases, the quantization grid broadens, thus emphasizing the importance of up and down rounding. While previous studies have demonstrated that fine-tuning up and down rounding with the addition of perturbations can enhance accuracy in some scenarios, our study is driven by the precise and limited boundary of these perturbations, where only the threshold for altering the rounding value is of significance. Consequently, we propose a concise and highly effective approach for optimizing the weight rounding task. Our method, named SignRound, involves lightweight block-wise tuning using signed gradient descent, enabling us to achieve outstanding results within 400 steps. SignRound outperforms the established baseline of rounding-to-nearest (RTN) and competes impressively against recent methods, without introducing additional inference overhead. The source code will be publicly available at https://github.com/intel/neural-compressor soon.

Community

Sign up or log in to comment

Models citing this paper 42

Browse 42 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2309.05516 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2309.05516 in a Space README.md to link it from this page.

Collections including this paper 6