Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

Muhamed, Aashiq; Li, Oscar; Woodruff, David; Diab, Mona; Smith, Virginia

Computer Science > Machine Learning

arXiv:2406.17660 (cs)

[Submitted on 25 Jun 2024]

Title:Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

Authors:Aashiq Muhamed, Oscar Li, David Woodruff, Mona Diab, Virginia Smith

View PDF HTML (experimental)

Abstract:Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdient Stuctured Sparsification), a novel approach that leverages sparse projections to transform gradients into structured sparse updates. This design not only significantly reduces memory usage for optimizer states but also minimizes gradient memory footprint, computation, and communication costs, leading to substantial throughput improvements. Extensive experiments on pretraining and finetuning tasks demonstrate that Grass achieves competitive performance to full-rank training and existing projection-based methods. Notably, Grass enables half-precision pretraining of a 13B parameter LLaMA model on a single 40GB A100 GPU--a feat infeasible for previous methods--and yields up to a $2\times$ throughput improvement on an 8-GPU system. Code can be found at this https URL .

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2406.17660 [cs.LG]
	(or arXiv:2406.17660v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.17660

Submission history

From: Aashiq Muhamed [view email]
[v1] Tue, 25 Jun 2024 15:50:32 UTC (3,274 KB)

Computer Science > Machine Learning

Title:Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators