Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Zhang, Yunxiang; Khalifa, Muhammad; Logeswaran, Lajanugen; Kim, Jaekyeom; Lee, Moontae; Lee, Honglak; Wang, Lu

Computer Science > Computation and Language

arXiv:2404.17140 (cs)

[Submitted on 26 Apr 2024 (v1), last revised 6 Jun 2024 (this version, v2)]

Title:Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Authors:Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang

View PDF HTML (experimental)

Abstract:Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs), where LLMs refine their solutions using self-generated critiques that pinpoint the errors. This work explores whether small (<= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs. We propose a novel pipeline that prompts smaller LMs to collect self-correction data that supports the training of self-refinement abilities. First, we leverage correct solutions to guide the model in critiquing their incorrect responses. Second, the generated critiques, after filtering, are used for supervised fine-tuning of the self-correcting reasoner through solution refinement. Our experimental results show improved self-correction abilities of two models on five datasets spanning math and commonsense reasoning, with notable performance gains when paired with a strong GPT-4-based verifier, though limitations are identified when using a weak self-verifier for determining when to correct.

Comments:	ACL Findings 2024 - Camera Ready
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2404.17140 [cs.CL]
	(or arXiv:2404.17140v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.17140

Submission history

From: Yunxiang Zhang [view email]
[v1] Fri, 26 Apr 2024 03:41:28 UTC (440 KB)
[v2] Thu, 6 Jun 2024 03:59:24 UTC (1,015 KB)

Computer Science > Computation and Language

Title:Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators