Large Language Models Are Not Robust Multiple Choice Selectors

Zheng, Chujie; Zhou, Hao; Meng, Fandong; Zhou, Jie; Huang, Minlie

Computer Science > Computation and Language

arXiv:2309.03882 (cs)

[Submitted on 7 Sep 2023 (v1), last revised 22 Feb 2024 (this version, v4)]

Title:Large Language Models Are Not Robust Multiple Choice Selectors

Authors:Chujie Zheng, Hao Zhou, Fandong Meng, Jie Zhou, Minlie Huang

View PDF

Abstract:Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs). This work shows that modern LLMs are vulnerable to option position changes in MCQs due to their inherent "selection bias", namely, they prefer to select specific option IDs as answers (like "Option A"). Through extensive empirical analyses with 20 LLMs on three benchmarks, we pinpoint that this behavioral bias primarily stems from LLMs' token bias, where the model a priori assigns more probabilistic mass to specific option ID tokens (e.g., A/B/C/D) when predicting answers from the option IDs. To mitigate selection bias, we propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution. PriDe first estimates the prior by permutating option contents on a small number of test samples, and then applies the estimated prior to debias the remaining samples. We demonstrate that it achieves interpretable and transferable debiasing with high computational efficiency. We hope this work can draw broader research attention to the bias and robustness of modern LLMs.

Comments:	ICLR 2024 Spotlight
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2309.03882 [cs.CL]
	(or arXiv:2309.03882v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.03882

Submission history

From: Chujie Zheng [view email]
[v1] Thu, 7 Sep 2023 17:44:56 UTC (2,433 KB)
[v2] Fri, 8 Sep 2023 15:54:56 UTC (2,433 KB)
[v3] Fri, 6 Oct 2023 08:27:26 UTC (3,736 KB)
[v4] Thu, 22 Feb 2024 01:40:35 UTC (3,738 KB)

Computer Science > Computation and Language

Title:Large Language Models Are Not Robust Multiple Choice Selectors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Large Language Models Are Not Robust Multiple Choice Selectors

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators