Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Ma, Xiao; Mishra, Swaroop; Beirami, Ahmad; Beutel, Alex; Chen, Jilin

Computer Science > Computation and Language

arXiv:2306.14308 (cs)

[Submitted on 25 Jun 2023]

Title:Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Authors:Xiao Ma, Swaroop Mishra, Ahmad Beirami, Alex Beutel, Jilin Chen

View PDF

Abstract:Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many language models, including GPT-3. In this work, we propose a new prompting framework, Thought Experiments, to teach language models to do better moral reasoning using counterfactuals. Experiment results show that our framework elicits counterfactual questions and answers from the model, which in turn helps improve the accuracy on Moral Scenarios task by 9-16% compared to other zero-shot baselines. Interestingly, unlike math reasoning tasks, zero-shot Chain-of-Thought (CoT) reasoning doesn't work out of the box, and even reduces accuracy by around 4% compared to direct zero-shot. We further observed that with minimal human supervision in the form of 5 few-shot examples, the accuracy of the task can be improved to as much as 80%.

Comments:	8 pages, ICML Neural Conversational AI workshop, thought experiments, moral reasoning
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.14308 [cs.CL]
	(or arXiv:2306.14308v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.14308

Submission history

From: Xiao Ma [view email]
[v1] Sun, 25 Jun 2023 18:40:43 UTC (40 KB)

Computer Science > Computation and Language

Title:Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators