Search | arXiv e-print repository

Characterizing AI Agents for Alignment and Governance

Authors: Atoosa Kasirzadeh, Iason Gabriel

Abstract: The creation of effective governance mechanisms for AI agents requires a deeper understanding of their core properties and how these properties relate to questions surrounding the deployment and operation of agents in the world. This paper provides a characterization of AI agents that focuses on four dimensions: autonomy, efficacy, goal complexity, and generality. We propose different gradations f… ▽ More The creation of effective governance mechanisms for AI agents requires a deeper understanding of their core properties and how these properties relate to questions surrounding the deployment and operation of agents in the world. This paper provides a characterization of AI agents that focuses on four dimensions: autonomy, efficacy, goal complexity, and generality. We propose different gradations for each dimension, and argue that each dimension raises unique questions about the design, operation, and governance of these systems. Moreover, we draw upon this framework to construct "agentic profiles" for different kinds of AI agents. These profiles help to illuminate cross-cutting technical and non-technical governance challenges posed by different classes of AI agents, ranging from narrow task-specific assistants to highly autonomous general-purpose systems. By mapping out key axes of variation and continuity, this framework provides developers, policymakers, and members of the public with the opportunity to develop governance approaches that better align with collective societal goals. △ Less

Submitted 30 April, 2025; originally announced April 2025.

arXiv:2502.14143 [pdf, other]

Multi-Agent Risks from Advanced AI

Authors: Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob Foerster, Tomáš Gavenčiak, The Anh Han, Edward Hughes, Vojtěch Kovařík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schroeder de Witt, Nisarg Shah, Michael Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Quentin Feuillade-Montixi, Matija Franklin, Esben Kran , et al. (19 additional authors not shown)

Abstract: The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, a… ▽ More The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, as well as seven key risk factors (information asymmetries, network effects, selection pressures, destabilising dynamics, commitment problems, emergent agency, and multi-agent security) that can underpin them. We highlight several important instances of each risk, as well as promising directions to help mitigate them. By anchoring our analysis in a range of real-world examples and experimental evidence, we illustrate the distinct challenges posed by multi-agent systems and their implications for the safety, governance, and ethics of advanced AI. △ Less

Submitted 19 February, 2025; originally announced February 2025.

Comments: Cooperative AI Foundation, Technical Report #1

arXiv:2502.09288 [pdf, other]

AI Safety for Everyone

Authors: Balint Gyevnar, Atoosa Kasirzadeh

Abstract: Recent discussions and research in AI safety have increasingly emphasized the deep connection between AI safety and existential risk from advanced AI systems, suggesting that work on AI safety necessarily entails serious consideration of potential existential threats. However, this framing has three potential drawbacks: it may exclude researchers and practitioners who are committed to AI safety bu… ▽ More Recent discussions and research in AI safety have increasingly emphasized the deep connection between AI safety and existential risk from advanced AI systems, suggesting that work on AI safety necessarily entails serious consideration of potential existential threats. However, this framing has three potential drawbacks: it may exclude researchers and practitioners who are committed to AI safety but approach the field from different angles; it could lead the public to mistakenly view AI safety as focused solely on existential scenarios rather than addressing a wide spectrum of safety challenges; and it risks creating resistance to safety measures among those who disagree with predictions of existential AI risks. Through a systematic literature review of primarily peer-reviewed research, we find a vast array of concrete safety work that addresses immediate and practical concerns with current AI systems. This includes crucial areas like adversarial robustness and interpretability, highlighting how AI safety research naturally extends existing technological and systems safety concerns and practices. Our findings suggest the need for an epistemically inclusive and pluralistic conception of AI safety that can accommodate the full range of safety considerations, motivations, and perspectives that currently shape the field. △ Less

Submitted 14 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

arXiv:2412.16022 [pdf, other]

The Only Way is Ethics: A Guide to Ethical Research with Large Language Models

Authors: Eddie L. Ungless, Nikolas Vitsakis, Zeerak Talat, James Garforth, Björn Ross, Arno Onken, Atoosa Kasirzadeh, Alexandra Birch

Abstract: There is a significant body of work looking at the ethical considerations of large language models (LLMs): critiquing tools to measure performance and harms; proposing toolkits to aid in ideation; discussing the risks to workers; considering legislation around privacy and security etc. As yet there is no work that integrates these resources into a single practical guide that focuses on LLMs; we at… ▽ More There is a significant body of work looking at the ethical considerations of large language models (LLMs): critiquing tools to measure performance and harms; proposing toolkits to aid in ideation; discussing the risks to workers; considering legislation around privacy and security etc. As yet there is no work that integrates these resources into a single practical guide that focuses on LLMs; we attempt this ambitious goal. We introduce 'LLM Ethics Whitepaper', which we provide as an open and living resource for NLP practitioners, and those tasked with evaluating the ethical implications of others' work. Our goal is to translate ethics literature into concrete recommendations and provocations for thinking with clear first steps, aimed at computer scientists. 'LLM Ethics Whitepaper' distils a thorough literature review into clear Do's and Don'ts, which we present also in this paper. We likewise identify useful toolkits to support ethical work. We refer the interested reader to the full LLM Ethics Whitepaper, which provides a succinct discussion of ethical considerations at each stage in a project lifecycle, as well as citations for the hundreds of papers from which we drew our recommendations. The present paper can be thought of as a pocket guide to conducting ethical research with LLMs. △ Less

Submitted 20 December, 2024; originally announced December 2024.

Comments: Accepted to COLING '25. This paper is the condensed pocket guide to accompany our full LLM Ethics Whitepaper, available at arXiv:2410.19812, and at https://github.com/MxEddie/Ethics-Whitepaper for suggested revisions

arXiv:2412.07780 [pdf, other]

A Taxonomy of Systemic Risks from General-Purpose AI

Authors: Risto Uuk, Carlos Ignacio Gutierrez, Daniel Guppy, Lode Lauwaert, Atoosa Kasirzadeh, Lucia Velasco, Peter Slattery, Carina Prunkl

Abstract: Through a systematic review of academic literature, we propose a taxonomy of systemic risks associated with artificial intelligence (AI), in particular general-purpose AI. Following the EU AI Act's definition, we consider systemic risks as large-scale threats that can affect entire societies or economies. Starting with an initial pool of 1,781 documents, we analyzed 86 selected papers to identify… ▽ More Through a systematic review of academic literature, we propose a taxonomy of systemic risks associated with artificial intelligence (AI), in particular general-purpose AI. Following the EU AI Act's definition, we consider systemic risks as large-scale threats that can affect entire societies or economies. Starting with an initial pool of 1,781 documents, we analyzed 86 selected papers to identify 13 categories of systemic risks and 50 contributing sources. Our findings reveal a complex landscape of potential threats, ranging from environmental harm and structural discrimination to governance failures and loss of control. Key sources of systemic risk emerge from knowledge gaps, challenges in recognizing harm, and the unpredictable trajectory of AI development. The taxonomy provides a snapshot of current academic literature on systemic risks. This paper contributes to AI safety research by providing a structured groundwork for understanding and addressing the potential large-scale negative societal impacts of general-purpose AI. The taxonomy can inform policymakers in risk prioritization and regulatory development. △ Less

Submitted 24 November, 2024; originally announced December 2024.

Comments: 34 pages, 9 tables, 1 figure

arXiv:2411.09222 [pdf, ps, other]

Democratic AI is Possible. The Democracy Levels Framework Shows How It Might Work

Authors: Aviv Ovadya, Kyle Redman, Luke Thorburn, Quan Ze Chen, Oliver Smith, Flynn Devine, Andrew Konya, Smitha Milli, Manon Revel, K. J. Kevin Feng, Amy X. Zhang, Bilva Chandra, Michiel A. Bakker, Atoosa Kasirzadeh

Abstract: This position paper argues that effectively "democratizing AI" requires democratic governance and alignment of AI, and that this is particularly valuable for decisions with systemic societal impacts. Initial steps -- such as Meta's Community Forums and Anthropic's Collective Constitutional AI -- have illustrated a promising direction, where democratic processes could be used to meaningfully improv… ▽ More This position paper argues that effectively "democratizing AI" requires democratic governance and alignment of AI, and that this is particularly valuable for decisions with systemic societal impacts. Initial steps -- such as Meta's Community Forums and Anthropic's Collective Constitutional AI -- have illustrated a promising direction, where democratic processes could be used to meaningfully improve public involvement and trust in critical decisions. To more concretely explore what increasingly democratic AI might look like, we provide a "Democracy Levels" framework and associated tools that: (i) define milestones toward meaningfully democratic AI, which is also crucial for substantively pluralistic, human-centered, participatory, and public-interest AI, (ii) can help guide organizations seeking to increase the legitimacy of their decisions on difficult AI governance and alignment questions, and (iii) support the evaluation of such efforts. △ Less

Submitted 18 June, 2025; v1 submitted 14 November, 2024; originally announced November 2024.

Comments: 31 pages. Accepted to the position paper track at ICML 2025. A previous version was presented at the Pluralistic Alignment Workshop at NeurIPS 2024. For ongoing work, see: https://democracylevels.org

arXiv:2410.19812 [pdf]

Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models

Authors: Eddie L. Ungless, Nikolas Vitsakis, Zeerak Talat, James Garforth, Björn Ross, Arno Onken, Atoosa Kasirzadeh, Alexandra Birch

Abstract: This whitepaper offers an overview of the ethical considerations surrounding research into or with large language models (LLMs). As LLMs become more integrated into widely used applications, their societal impact increases, bringing important ethical questions to the forefront. With a growing body of work examining the ethical development, deployment, and use of LLMs, this whitepaper provides a co… ▽ More This whitepaper offers an overview of the ethical considerations surrounding research into or with large language models (LLMs). As LLMs become more integrated into widely used applications, their societal impact increases, bringing important ethical questions to the forefront. With a growing body of work examining the ethical development, deployment, and use of LLMs, this whitepaper provides a comprehensive and practical guide to best practices, designed to help those in research and in industry to uphold the highest ethical standards in their work. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 47 pages

ACM Class: I.2

arXiv:2410.00608 [pdf, other]

Measurement challenges in AI catastrophic risk governance and safety frameworks

Authors: Atoosa Kasirzadeh

Abstract: Safety frameworks represent a significant development in AI governance: they are the first type of publicly shared catastrophic risk management framework developed by major AI companies and focus specifically on AI scaling decisions. I identify six critical measurement challenges in their implementation and propose three policy recommendations to improve their validity and reliability. Safety frameworks represent a significant development in AI governance: they are the first type of publicly shared catastrophic risk management framework developed by major AI companies and focus specifically on AI scaling decisions. I identify six critical measurement challenges in their implementation and propose three policy recommendations to improve their validity and reliability. △ Less

Submitted 1 October, 2024; originally announced October 2024.

Comments: Tech Policy Press

arXiv:2409.03632 [pdf, ps, other]

doi 10.1007/s00146-024-02056-1

Beyond Model Interpretability: Socio-Structural Explanations in Machine Learning

Authors: Andrew Smart, Atoosa Kasirzadeh

Abstract: What is it to interpret the outputs of an opaque machine learning model. One approach is to develop interpretable machine learning techniques. These techniques aim to show how machine learning models function by providing either model centric local or global explanations, which can be based on mechanistic interpretations revealing the inner working mechanisms of models or nonmechanistic approximat… ▽ More What is it to interpret the outputs of an opaque machine learning model. One approach is to develop interpretable machine learning techniques. These techniques aim to show how machine learning models function by providing either model centric local or global explanations, which can be based on mechanistic interpretations revealing the inner working mechanisms of models or nonmechanistic approximations showing input feature output data relationships. In this paper, we draw on social philosophy to argue that interpreting machine learning outputs in certain normatively salient domains could require appealing to a third type of explanation that we call sociostructural explanation. The relevance of this explanation type is motivated by the fact that machine learning models are not isolated entities but are embedded within and shaped by social structures. Sociostructural explanations aim to illustrate how social structures contribute to and partially explain the outputs of machine learning models. We demonstrate the importance of sociostructural explanations by examining a racially biased healthcare allocation algorithm. Our proposal highlights the need for transparency beyond model interpretability, understanding the outputs of machine learning systems could require a broader analysis that extends beyond the understanding of the machine learning model itself. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Journal ref: AI & Soc (2024).

arXiv:2408.16961 [pdf, other]

The Future of Open Human Feedback

Authors: Shachar Don-Yehiya, Ben Burtenshaw, Ramon Fernandez Astudillo, Cailean Osborne, Mimansa Jaiswal, Tzu-Sheng Kuo, Wenting Zhao, Idan Shenfeld, Andi Peng, Mikhail Yurochkin, Atoosa Kasirzadeh, Yangsibo Huang, Tatsunori Hashimoto, Yacine Jernite, Daniel Vila-Suero, Omri Abend, Jennifer Ding, Sara Hooker, Hannah Rose Kirk, Leshem Choshen

Abstract: Human feedback on conversations with language language models (LLMs) is central to how these systems learn about the world, improve their capabilities, and are steered toward desirable and safe behaviors. However, this feedback is mostly collected by frontier AI labs and kept behind closed doors. In this work, we bring together interdisciplinary experts to assess the opportunities and challenges t… ▽ More Human feedback on conversations with language language models (LLMs) is central to how these systems learn about the world, improve their capabilities, and are steered toward desirable and safe behaviors. However, this feedback is mostly collected by frontier AI labs and kept behind closed doors. In this work, we bring together interdisciplinary experts to assess the opportunities and challenges to realizing an open ecosystem of human feedback for AI. We first look for successful practices in peer production, open source, and citizen science communities. We then characterize the main challenges for open human feedback. For each, we survey current approaches and offer recommendations. We end by envisioning the components needed to underpin a sustainable and open human feedback ecosystem. In the center of this ecosystem are mutually beneficial feedback loops, between users and specialized models, incentivizing a diverse stakeholders community of model trainers and feedback providers to support a general open feedback pool. △ Less

Submitted 4 September, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.11441 [pdf, ps, other]

Epistemic Injustice in Generative AI

Authors: Jackie Kay, Atoosa Kasirzadeh, Shakir Mohamed

Abstract: This paper investigates how generative AI can potentially undermine the integrity of collective knowledge and the processes we rely on to acquire, assess, and trust information, posing a significant threat to our knowledge ecosystem and democratic discourse. Grounded in social and political philosophy, we introduce the concept of \emph{generative algorithmic epistemic injustice}. We identify four… ▽ More This paper investigates how generative AI can potentially undermine the integrity of collective knowledge and the processes we rely on to acquire, assess, and trust information, posing a significant threat to our knowledge ecosystem and democratic discourse. Grounded in social and political philosophy, we introduce the concept of \emph{generative algorithmic epistemic injustice}. We identify four key dimensions of this phenomenon: amplified and manipulative testimonial injustice, along with hermeneutical ignorance and access injustice. We illustrate each dimension with real-world examples that reveal how generative AI can produce or amplify misinformation, perpetuate representational harm, and create epistemic inequities, particularly in multilingual contexts. By highlighting these injustices, we aim to inform the development of epistemically just generative AI systems, proposing strategies for resistance, system design principles, and two approaches that leverage generative AI to foster a more equitable information ecosystem, thereby safeguarding democratic values and the integrity of knowledge production. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2406.11843 [pdf]

Explanation Hacking: The perils of algorithmic recourse

Authors: Emily Sullivan, Atoosa Kasirzadeh

Abstract: We argue that the trend toward providing users with feasible and actionable explanations of AI decisions, known as recourse explanations, comes with ethical downsides. Specifically, we argue that recourse explanations face several conceptual pitfalls and can lead to problematic explanation hacking, which undermines their ethical status. As an alternative, we advocate that explanations of AI decisi… ▽ More We argue that the trend toward providing users with feasible and actionable explanations of AI decisions, known as recourse explanations, comes with ethical downsides. Specifically, we argue that recourse explanations face several conceptual pitfalls and can lead to problematic explanation hacking, which undermines their ethical status. As an alternative, we advocate that explanations of AI decisions should aim at understanding. △ Less

Submitted 22 March, 2024; originally announced June 2024.

arXiv:2405.13974 [pdf, other]

CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models

Authors: Giada Pistilli, Alina Leidinger, Yacine Jernite, Atoosa Kasirzadeh, Alexandra Sasha Luccioni, Margaret Mitchell

Abstract: This paper introduces the "CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset, designed to evaluate the social and cultural variation of Large Language Models (LLMs) across multiple languages and value-sensitive topics. We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, so… ▽ More This paper introduces the "CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset, designed to evaluate the social and cultural variation of Large Language Models (LLMs) across multiple languages and value-sensitive topics. We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy. CIVICS is designed to generate responses showing LLMs' encoded and implicit values. Through our dynamic annotation processes, tailored prompt design, and experiments, we investigate how open-weight LLMs respond to value-sensitive issues, exploring their behavior across diverse linguistic and cultural contexts. Using two experimental set-ups based on log-probabilities and long-form responses, we show social and cultural variability across different LLMs. Specifically, experiments involving long-form responses demonstrate that refusals are triggered disparately across models, but consistently and more frequently in English or translated statements. Moreover, specific topics and sources lead to more pronounced differences across model answers, particularly on immigration, LGBTQI rights, and social welfare. As shown by our experiments, the CIVICS dataset aims to serve as a tool for future research, promoting reproducibility and transparency across broader linguistic settings, and furthering the development of AI technologies that respect and reflect global cultural diversities and value pluralism. The CIVICS dataset and tools will be made available upon publication under open licenses; an anonymized version is currently available at https://huggingface.co/CIVICS-dataset. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2404.09932 [pdf, other]

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (17 additional authors not shown)

Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions. This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions. △ Less

Submitted 5 September, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.00579 [pdf, other]

A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)

Authors: Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Arnau Ramisa, René Vidal, Maheswaran Sathiamoorthy, Atoosa Kasirzadeh, Silvia Milano

Abstract: Traditional recommender systems (RS) typically use user-item rating histories as their main data source. However, deep generative models now have the capability to model and sample from complex data distributions, including user-item interactions, text, images, and videos, enabling novel recommendation tasks. This comprehensive, multidisciplinary survey connects key advancements in RS using Genera… ▽ More Traditional recommender systems (RS) typically use user-item rating histories as their main data source. However, deep generative models now have the capability to model and sample from complex data distributions, including user-item interactions, text, images, and videos, enabling novel recommendation tasks. This comprehensive, multidisciplinary survey connects key advancements in RS using Generative Models (Gen-RecSys), covering: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS. Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges. This survey accompanies a tutorial presented at ACM KDD'24, with supporting materials provided at: https://encr.pw/vDhLq. △ Less

Submitted 4 July, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: This survey accompanies a tutorial presented at ACM KDD'24

arXiv:2402.06811 [pdf, ps, other]

Discipline and Label: A WEIRD Genealogy and Social Theory of Data Annotation

Authors: Andrew Smart, Ding Wang, Ellis Monk, Mark Díaz, Atoosa Kasirzadeh, Erin Van Liemt, Sonja Schmer-Galunder

Abstract: Data annotation remains the sine qua non of machine learning and AI. Recent empirical work on data annotation has begun to highlight the importance of rater diversity for fairness, model performance, and new lines of research have begun to examine the working conditions for data annotation workers, the impacts and role of annotator subjectivity on labels, and the potential psychological harms from… ▽ More Data annotation remains the sine qua non of machine learning and AI. Recent empirical work on data annotation has begun to highlight the importance of rater diversity for fairness, model performance, and new lines of research have begun to examine the working conditions for data annotation workers, the impacts and role of annotator subjectivity on labels, and the potential psychological harms from aspects of annotation work. This paper outlines a critical genealogy of data annotation; starting with its psychological and perceptual aspects. We draw on similarities with critiques of the rise of computerized lab-based psychological experiments in the 1970's which question whether these experiments permit the generalization of results beyond the laboratory settings within which these results are typically obtained. Do data annotations permit the generalization of results beyond the settings, or locations, in which they were obtained? Psychology is overly reliant on participants from Western, Educated, Industrialized, Rich, and Democratic societies (WEIRD). Many of the people who work as data annotation platform workers, however, are not from WEIRD countries; most data annotation workers are based in Global South countries. Social categorizations and classifications from WEIRD countries are imposed on non-WEIRD annotators through instructions and tasks, and through them, on data, which is then used to train or evaluate AI models in WEIRD countries. We synthesize evidence from several recent lines of research and argue that data annotation is a form of automated social categorization that risks entrenching outdated and static social categories that are in reality dynamic and changing. We propose a framework for understanding the interplay of the global social conditions of data annotation with the subjective phenomenological experience of data annotation work. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 18 pages

arXiv:2401.07836 [pdf, other]

Two Types of AI Existential Risk: Decisive and Accumulative

Authors: Atoosa Kasirzadeh

Abstract: The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level intelligence. These events have severe consequences that either lead to human extinction or irreversibly cripple human civilization to a point beyond recovery. This discourse, however, often neglects t… ▽ More The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level intelligence. These events have severe consequences that either lead to human extinction or irreversibly cripple human civilization to a point beyond recovery. This discourse, however, often neglects the serious possibility of AI x-risks manifesting incrementally through a series of smaller yet interconnected disruptions, gradually crossing critical thresholds over time. This paper contrasts the conventional "decisive AI x-risk hypothesis" with an "accumulative AI x-risk hypothesis." While the former envisions an overt AI takeover pathway, characterized by scenarios like uncontrollable superintelligence, the latter suggests a different causal pathway to existential catastrophes. This involves a gradual accumulation of critical AI-induced threats such as severe vulnerabilities and systemic erosion of economic and political structures. The accumulative hypothesis suggests a boiling frog scenario where incremental AI risks slowly converge, undermining societal resilience until a triggering event results in irreversible collapse. Through systems analysis, this paper examines the distinct assumptions differentiating these two hypotheses. It is then argued that the accumulative view can reconcile seemingly incompatible perspectives on AI risks. The implications of differentiating between these causal pathways -- the decisive and the accumulative -- for the governance of AI as well as long-term AI safety are discussed. △ Less

Submitted 17 January, 2025; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: Journal article for Philosophical Studies

arXiv:2307.05543 [pdf, ps, other]

Typology of Risks of Generative Text-to-Image Models

Authors: Charlotte Bird, Eddie L. Ungless, Atoosa Kasirzadeh

Abstract: This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney, through a comprehensive literature review. While these models offer unprecedented capabilities for generating images, their development and use introduce new types of risk that require careful consideration. Our review reveals significant knowledge gaps concerni… ▽ More This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney, through a comprehensive literature review. While these models offer unprecedented capabilities for generating images, their development and use introduce new types of risk that require careful consideration. Our review reveals significant knowledge gaps concerning the understanding and treatment of these risks despite some already being addressed. We offer a taxonomy of risks across six key stakeholder groups, inclusive of unexplored issues, and suggest future research directions. We identify 22 distinct risk types, spanning issues from data bias to malicious use. The investigation presented here is intended to enhance the ongoing discourse on responsible model development and deployment. By highlighting previously overlooked risks and gaps, it aims to shape subsequent research and governance initiatives, guiding them toward the responsible, secure, and ethically conscious evolution of text-to-image models. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: Accepted for publication in 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES 2023)

arXiv:2306.01479 [pdf, ps, other]

Reconciling Governmental Use of Online Targeting With Democracy

Authors: Katja Andric, Atoosa Kasirzadeh

Abstract: The societal and epistemological implications of online targeted advertising have been scrutinized by AI ethicists, legal scholars, and policymakers alike. However, the government's use of online targeting and its consequential socio-political ramifications remain under-explored from a critical socio-technical standpoint. This paper investigates the socio-political implications of governmental onl… ▽ More The societal and epistemological implications of online targeted advertising have been scrutinized by AI ethicists, legal scholars, and policymakers alike. However, the government's use of online targeting and its consequential socio-political ramifications remain under-explored from a critical socio-technical standpoint. This paper investigates the socio-political implications of governmental online targeting, using a case study of the UK government's application of such techniques for public policy objectives. We argue that this practice undermines democratic ideals, as it engenders three primary concerns -- Transparency, Privacy, and Equality -- that clash with fundamental democratic doctrines and values. To address these concerns, the paper introduces a preliminary blueprint for an AI governance framework that harmonizes governmental use of online targeting with certain democratic principles. Furthermore, we advocate for the creation of an independent, non-governmental regulatory body responsible for overseeing the process and monitoring the government's use of online targeting, a critical measure for preserving democratic values. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: Accepted for publication in 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT 2023)

arXiv:2304.11163 [pdf, other]

ChatGPT, Large Language Technologies, and the Bumpy Road of Benefiting Humanity

Authors: Atoosa Kasirzadeh

Abstract: The allure of emerging AI technologies is undoubtedly thrilling. However, the promise that AI technologies will benefit all of humanity is empty so long as we lack a nuanced understanding of what humanity is supposed to be in the face of widening global inequality and pressing existential threats. Going forward, it is crucial to invest in rigorous and collaborative AI safety and ethics research. W… ▽ More The allure of emerging AI technologies is undoubtedly thrilling. However, the promise that AI technologies will benefit all of humanity is empty so long as we lack a nuanced understanding of what humanity is supposed to be in the face of widening global inequality and pressing existential threats. Going forward, it is crucial to invest in rigorous and collaborative AI safety and ethics research. We also need to develop standards in a sustainable and equitable way that differentiate between merely speculative and well-researched questions. Only the latter enable us to co-construct and deploy the values that are necessary for creating beneficial AI. Failure to do so could result in a future in which our AI technological advancements outstrip our ability to navigate their ethical and social implications. This path we do not want to go down. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: As part of a series on Dailynous : "Philosophers on next-generation large language models"

arXiv:2209.00731 [pdf, ps, other]

In conversation with Artificial Intelligence: aligning language models with human values

Authors: Atoosa Kasirzadeh, Iason Gabriel

Abstract: Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational… ▽ More Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values. △ Less

Submitted 21 December, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

Comments: Accepted for publication with minor revisions at Philosophy & Technology

arXiv:2206.00945 [pdf, ps, other]

doi 10.1145/3514094.3534188

Algorithmic Fairness and Structural Injustice: Insights from Feminist Political Philosophy

Authors: Atoosa Kasirzadeh

Abstract: Data-driven predictive algorithms are widely used to automate and guide high-stake decision making such as bail and parole recommendation, medical resource distribution, and mortgage allocation. Nevertheless, harmful outcomes biased against vulnerable groups have been reported. The growing research field known as 'algorithmic fairness' aims to mitigate these harmful biases. Its primary methodology… ▽ More Data-driven predictive algorithms are widely used to automate and guide high-stake decision making such as bail and parole recommendation, medical resource distribution, and mortgage allocation. Nevertheless, harmful outcomes biased against vulnerable groups have been reported. The growing research field known as 'algorithmic fairness' aims to mitigate these harmful biases. Its primary methodology consists in proposing mathematical metrics to address the social harms resulting from an algorithm's biased outputs. The metrics are typically motivated by -- or substantively rooted in -- ideals of distributive justice, as formulated by political and legal philosophers. The perspectives of feminist political philosophers on social justice, by contrast, have been largely neglected. Some feminist philosophers have criticized the paradigm of distributive justice and have proposed corrective amendments to surmount its limitations. The present paper brings some key insights of feminist political philosophy to algorithmic fairness. The paper has three goals. First, I show that algorithmic fairness does not accommodate structural injustices in its current scope. Second, I defend the relevance of structural injustices -- as pioneered in the contemporary philosophical literature by Iris Marion Young -- to algorithmic fairness. Third, I take some steps in developing the paradigm of 'responsible algorithmic fairness' to correct for errors in the current scope and implementation of algorithmic fairness. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: This paper is accepted for publication in the Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (AIES 22)

arXiv:2112.04359 [pdf, other]

Ethical and social risks of harm from Language Models

Authors: Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel

Abstract: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguist… ▽ More This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities. In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs. △ Less

Submitted 8 December, 2021; originally announced December 2021.

arXiv:2109.06309 [pdf, ps, other]

doi 10.1145/3461702.3462528

Fairness and Data Protection Impact Assessments

Authors: Atoosa Kasirzadeh, Damian Clifford

Abstract: In this paper, we critically examine the effectiveness of the requirement to conduct a Data Protection Impact Assessment (DPIA) in Article 35 of the General Data Protection Regulation (GDPR) in light of fairness metrics. Through this analysis, we explore the role of the fairness principle as introduced in Article 5(1)(a) and its multifaceted interpretation in the obligation to conduct a DPIA. Our… ▽ More In this paper, we critically examine the effectiveness of the requirement to conduct a Data Protection Impact Assessment (DPIA) in Article 35 of the General Data Protection Regulation (GDPR) in light of fairness metrics. Through this analysis, we explore the role of the fairness principle as introduced in Article 5(1)(a) and its multifaceted interpretation in the obligation to conduct a DPIA. Our paper argues that although there is a significant theoretical role for the considerations of fairness in the DPIA process, an analysis of the various guidance documents issued by data protection authorities on the obligation to conduct a DPIA reveals that they rarely mention the fairness principle in practice. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Journal ref: AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

arXiv:2109.04083 [pdf, other]

doi 10.1145/3600211.3604669

User Tampering in Reinforcement Learning Recommender Systems

Authors: Charles Evans, Atoosa Kasirzadeh

Abstract: In this paper, we introduce new formal methods and provide empirical evidence to highlight a unique safety concern prevalent in reinforcement learning (RL)-based recommendation algorithms -- 'user tampering.' User tampering is a situation where an RL-based recommender system may manipulate a media user's opinions through its suggestions as part of a policy to maximize long-term user engagement. We… ▽ More In this paper, we introduce new formal methods and provide empirical evidence to highlight a unique safety concern prevalent in reinforcement learning (RL)-based recommendation algorithms -- 'user tampering.' User tampering is a situation where an RL-based recommender system may manipulate a media user's opinions through its suggestions as part of a policy to maximize long-term user engagement. We use formal techniques from causal modeling to critically analyze prevailing solutions proposed in the literature for implementing scalable RL-based recommendation systems, and we observe that these methods do not adequately prevent user tampering. Moreover, we evaluate existing mitigation strategies for reward tampering issues, and show that these methods are insufficient in addressing the distinct phenomenon of user tampering within the context of recommendations. We further reinforce our findings with a simulation study of an RL-based recommendation system focused on the dissemination of political content. Our study shows that a Q-learning algorithm consistently learns to exploit its opportunities to polarize simulated users with its early recommendations in order to have more consistent success with subsequent recommendations that align with this induced polarization. Our findings emphasize the necessity for developing safer RL-based recommendation systems and suggest that achieving such safety would require a fundamental shift in the design away from the approaches we have seen in the recent literature. △ Less

Submitted 24 July, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

Comments: In proceedings of the 6th AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES '23)

arXiv:2103.00752 [pdf, ps, other]

Reasons, Values, Stakeholders: A Philosophical Framework for Explainable Artificial Intelligence

Authors: Atoosa Kasirzadeh

Abstract: The societal and ethical implications of the use of opaque artificial intelligence systems for consequential decisions, such as welfare allocation and criminal justice, have generated a lively debate among multiple stakeholder groups, including computer scientists, ethicists, social scientists, policy makers, and end users. However, the lack of a common language or a multi-dimensional framework to… ▽ More The societal and ethical implications of the use of opaque artificial intelligence systems for consequential decisions, such as welfare allocation and criminal justice, have generated a lively debate among multiple stakeholder groups, including computer scientists, ethicists, social scientists, policy makers, and end users. However, the lack of a common language or a multi-dimensional framework to appropriately bridge the technical, epistemic, and normative aspects of this debate prevents the discussion from being as productive as it could be. Drawing on the philosophical literature on the nature and value of explanations, this paper offers a multi-faceted framework that brings more conceptual precision to the present debate by (1) identifying the types of explanations that are most pertinent to artificial intelligence predictions, (2) recognizing the relevance and importance of social and ethical values for the evaluation of these explanations, and (3) demonstrating the importance of these explanations for incorporating a diversified approach to improving the design of truthful algorithmic ecosystems. The proposed philosophical framework thus lays the groundwork for establishing a pertinent connection between the technical and ethical aspects of artificial intelligence systems. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: This paper is accepted for non-archival publication at the ACM conference on Fairness, Accountability, and Transparency (FAccT) 2021

arXiv:2102.05085 [pdf, ps, other]

The Use and Misuse of Counterfactuals in Ethical Machine Learning

Authors: Atoosa Kasirzadeh, Andrew Smart

Abstract: The use of counterfactuals for considerations of algorithmic fairness and explainability is gaining prominence within the machine learning community and industry. This paper argues for more caution with the use of counterfactuals when the facts to be considered are social categories such as race or gender. We review a broad body of papers from philosophy and social sciences on social ontology and… ▽ More The use of counterfactuals for considerations of algorithmic fairness and explainability is gaining prominence within the machine learning community and industry. This paper argues for more caution with the use of counterfactuals when the facts to be considered are social categories such as race or gender. We review a broad body of papers from philosophy and social sciences on social ontology and the semantics of counterfactuals, and we conclude that the counterfactual approach in machine learning fairness and social explainability can require an incoherent theory of what social categories are. Our findings suggest that most often the social categories may not admit counterfactual manipulation, and hence may not appropriately satisfy the demands for evaluating the truth or falsity of counterfactuals. This is important because the widespread use of counterfactuals in machine learning can lead to misleading results when applied in high-stakes domains. Accordingly, we argue that even though counterfactuals play an essential part in some causal inferences, their use for questions of algorithmic fairness and social explanations can create more problems than they resolve. Our positive result is a set of tenets about using counterfactuals for fairness and explanations in machine learning. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: 9 pages, 1 table, 1 figure

arXiv:1910.13607 [pdf, ps, other]

Mathematical decisions and non-causal elements of explainable AI

Authors: Atoosa Kasirzadeh

Abstract: The social implications of algorithmic decision-making in sensitive contexts have generated lively debates among multiple stakeholders, such as moral and political philosophers, computer scientists, and the public. Yet, the lack of a common language and a conceptual framework for an appropriate bridging of the moral, technical, and political aspects of the debate prevents the discussion to be as e… ▽ More The social implications of algorithmic decision-making in sensitive contexts have generated lively debates among multiple stakeholders, such as moral and political philosophers, computer scientists, and the public. Yet, the lack of a common language and a conceptual framework for an appropriate bridging of the moral, technical, and political aspects of the debate prevents the discussion to be as effective as it can be. Social scientists and psychologists are contributing to this debate by gathering a wealth of empirical data, yet a philosophical analysis of the social implications of algorithmic decision-making remains comparatively impoverished. In attempting to address this lacuna, this paper argues that a hierarchy of different types of explanations for why and how an algorithmic decision outcome is achieved can establish the relevant connection between the moral and technical aspects of algorithmic decision-making. In particular, I offer a multi-faceted conceptual framework for the explanations and the interpretations of algorithmic decisions, and I claim that this framework can lay the groundwork for a focused discussion among multiple stakeholders about the social implications of algorithmic decision-making, as well as AI governance and ethics more generally. △ Less

Submitted 12 December, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: A shorter version of this paper was presented at the NeurIPS 2019, Human-Centric Machine Learning Workshop

Showing 1–28 of 28 results for author: Kasirzadeh, A