An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications

Alhanahnah, Mohannad; Hasan, Md Rashedul; Xu, Lisong; Bagheri, Hamid

Computer Science > Software Engineering

arXiv:2404.11050 (cs)

[Submitted on 17 Apr 2024 (v1), last revised 12 Jun 2025 (this version, v2)]

Title:An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications

Authors:Mohannad Alhanahnah, Md Rashedul Hasan, Lisong Xu, Hamid Bagheri

View PDF HTML (experimental)

Abstract:Automatic Program Repair (APR) has garnered significant attention as a practical research domain focused on automatically fixing bugs in programs. While existing APR techniques primarily target imperative programming languages like C and Java, there is a growing need for effective solutions applicable to declarative software specification languages. This paper systematically investigates the capacity of Large Language Models (LLMs) to repair declarative specifications in Alloy, a declarative formal language used for software specification. We designed 12 different repair settings, encompassing single-agent and dual-agent paradigms, utilizing various LLMs. These configurations also incorporate different levels of feedback, including an auto-prompting mechanism for generating prompts autonomously using LLMs. Our study reveals that dual-agent with auto-prompting setup outperforms the other settings, albeit with a marginal increase in the number of iterations and token usage. This dual-agent setup demonstrated superior effectiveness compared to state-of-the-art Alloy APR techniques when evaluated on a comprehensive set of benchmarks. This work is the first to empirically evaluate LLM capabilities to repair declarative specifications, while taking into account recent trending LLM concepts such as LLM-based agents, feedback, auto-prompting, and tools, thus paving the way for future agent-based techniques in software engineering.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2404.11050 [cs.SE]
	(or arXiv:2404.11050v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2404.11050

Submission history

From: Mohannad Alhanahnah [view email]
[v1] Wed, 17 Apr 2024 03:46:38 UTC (439 KB)
[v2] Thu, 12 Jun 2025 14:28:03 UTC (694 KB)

Computer Science > Software Engineering

Title:An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators