SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

Shen, Ke; Kejriwal, Mayank

Computer Science > Computation and Language

arXiv:2409.10007 (cs)

[Submitted on 16 Sep 2024]

Title:SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

Authors:Ke Shen, Mayank Kejriwal

View PDF HTML (experimental)

Abstract:In recent years,Text-to-SQL, the problem of automatically converting questions posed in natural language to formal SQL queries, has emerged as an important problem at the intersection of natural language processing and data management research. Large language models (LLMs) have delivered impressive performance when used in an off-the-shelf performance, but still fall significantly short of expected expert-level performance. Errors are especially probable when a nuanced understanding is needed of database schemas, questions, and SQL clauses to do proper Text-to-SQL conversion. We introduce SelECT-SQL, a novel in-context learning solution that uses an algorithmic combination of chain-of-thought (CoT) prompting, self-correction, and ensemble methods to yield a new state-of-the-art result on challenging Text-to-SQL benchmarks. Specifically, when configured using GPT-3.5-Turbo as the base LLM, SelECT-SQL achieves 84.2% execution accuracy on the Spider leaderboard's development set, exceeding both the best results of other baseline GPT-3.5-Turbo-based solutions (81.1%), and the peak performance (83.5%) of the GPT-4 result reported on the leaderboard.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2409.10007 [cs.CL]
	(or arXiv:2409.10007v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2409.10007

Submission history

From: Ke Shen [view email]
[v1] Mon, 16 Sep 2024 05:40:18 UTC (4,712 KB)

Computer Science > Computation and Language

Title:SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators