Small Language Models: Survey, Measurements, and Insights

Lu, Zhenyan; Li, Xiang; Cai, Dongqi; Yi, Rongjie; Liu, Fangming; Zhang, Xiwen; Lane, Nicholas D.; Xu, Mengwei

Computer Science > Computation and Language

arXiv:2409.15790 (cs)

[Submitted on 24 Sep 2024 (v1), last revised 26 Feb 2025 (this version, v3)]

Title:Small Language Models: Survey, Measurements, and Insights

Authors:Zhenyan Lu, Xiang Li, Dongqi Cai, Rongjie Yi, Fangming Liu, Xiwen Zhang, Nicholas D. Lane, Mengwei Xu

View PDF HTML (experimental)

Abstract:Small language models (SLMs), despite their widespread adoption in modern smart devices, have received significantly less academic attention compared to their large language model (LLM) counterparts, which are predominantly deployed in data centers and cloud environments. While researchers continue to improve the capabilities of LLMs in the pursuit of artificial general intelligence, SLM research aims to make machine intelligence more accessible, affordable, and efficient for everyday tasks. Focusing on transformer-based, decoder-only language models with 100M-5B parameters, we survey 70 state-of-the-art open-source SLMs, analyzing their technical innovations across three axes: architectures, training datasets, and training algorithms. In addition, we evaluate their capabilities in various domains, including commonsense reasoning, mathematics, in-context learning, and long context. To gain further insight into their on-device runtime costs, we benchmark their inference latency and memory footprints. Through in-depth analysis of our benchmarking data, we offer valuable insights to advance research in this field.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2409.15790 [cs.CL]
	(or arXiv:2409.15790v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2409.15790

Submission history

From: Zhenyan Lu [view email]
[v1] Tue, 24 Sep 2024 06:36:56 UTC (11,763 KB)
[v2] Tue, 25 Feb 2025 13:48:03 UTC (16,141 KB)
[v3] Wed, 26 Feb 2025 06:34:55 UTC (16,141 KB)

Computer Science > Computation and Language

Title:Small Language Models: Survey, Measurements, and Insights

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Small Language Models: Survey, Measurements, and Insights

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators