Black-box Uncertainty Quantification Method for LLM-as-a-Judge

Wagner, Nico; Desmond, Michael; Nair, Rahul; Ashktorab, Zahra; Daly, Elizabeth M.; Pan, Qian; Cooper, Martín Santillán; Johnson, James M.; Geyer, Werner

Computer Science > Machine Learning

arXiv:2410.11594 (cs)

[Submitted on 15 Oct 2024]

Title:Black-box Uncertainty Quantification Method for LLM-as-a-Judge

Authors:Nico Wagner, Michael Desmond, Rahul Nair, Zahra Ashktorab, Elizabeth M. Daly, Qian Pan, Martín Santillán Cooper, James M. Johnson, Werner Geyer

View PDF HTML (experimental)

Abstract:LLM-as-a-Judge is a widely used method for evaluating the performance of Large Language Models (LLMs) across various tasks. We address the challenge of quantifying the uncertainty of LLM-as-a-Judge evaluations. While uncertainty quantification has been well-studied in other domains, applying it effectively to LLMs poses unique challenges due to their complex decision-making capabilities and computational demands. In this paper, we introduce a novel method for quantifying uncertainty designed to enhance the trustworthiness of LLM-as-a-Judge evaluations. The method quantifies uncertainty by analyzing the relationships between generated assessments and possible ratings. By cross-evaluating these relationships and constructing a confusion matrix based on token probabilities, the method derives labels of high or low uncertainty. We evaluate our method across multiple benchmarks, demonstrating a strong correlation between the accuracy of LLM evaluations and the derived uncertainty scores. Our findings suggest that this method can significantly improve the reliability and consistency of LLM-as-a-Judge evaluations.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.11594 [cs.LG]
	(or arXiv:2410.11594v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.11594

Submission history

From: Nico Wagner [view email]
[v1] Tue, 15 Oct 2024 13:29:22 UTC (1,258 KB)

Computer Science > Machine Learning

Title:Black-box Uncertainty Quantification Method for LLM-as-a-Judge

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Black-box Uncertainty Quantification Method for LLM-as-a-Judge

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators