Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Cheng, Wenhua; Zhang, Weiwei; Shen, Haihao; Cai, Yiyang; He, Xin; Lv, Kaokao

Computer Science > Computation and Language

arXiv:2309.05516v2 (cs)

[Submitted on 11 Sep 2023 (v1), revised 28 Sep 2023 (this version, v2), latest version 8 Oct 2024 (v5)]

Title:Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Authors:Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao Lv

View PDF

Abstract:Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their considerable memory and storage requirements. In response to this issue, weight-only quantization, particularly 3 and 4-bit weight-only quantization, has emerged as one of the most viable solutions. As the number of bits decreases, the quantization grid broadens, thus emphasizing the importance of up and down rounding. While previous studies have demonstrated that fine-tuning up and down rounding with the addition of perturbations can enhance accuracy in some scenarios, our study is driven by the precise and limited boundary of these perturbations, where only the threshold for altering the rounding value is of significance. Consequently, we propose a concise and highly effective approach for optimizing the weight rounding task. Our method, named SignRound, involves lightweight block-wise tuning using signed gradient descent, enabling us to achieve outstanding results within 400 steps. SignRound competes impressively against recent methods without introducing additional inference overhead. The source code will be publicly available at \url{this https URL} soon.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2309.05516 [cs.CL]
	(or arXiv:2309.05516v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.05516

Submission history

From: Wenhua Cheng [view email]
[v1] Mon, 11 Sep 2023 14:58:23 UTC (6,682 KB)
[v2] Thu, 28 Sep 2023 09:05:57 UTC (8,941 KB)
[v3] Fri, 17 May 2024 09:12:19 UTC (4,903 KB)
[v4] Thu, 23 May 2024 10:43:09 UTC (4,903 KB)
[v5] Tue, 8 Oct 2024 02:02:35 UTC (10,154 KB)

Computer Science > Computation and Language

Title:Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators