这是indexloc提供的服务,不要输入任何密码
Skip to main content

Showing 1–28 of 28 results for author: Kasirzadeh, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.21848  [pdf

    cs.CY cs.AI eess.SY

    Characterizing AI Agents for Alignment and Governance

    Authors: Atoosa Kasirzadeh, Iason Gabriel

    Abstract: The creation of effective governance mechanisms for AI agents requires a deeper understanding of their core properties and how these properties relate to questions surrounding the deployment and operation of agents in the world. This paper provides a characterization of AI agents that focuses on four dimensions: autonomy, efficacy, goal complexity, and generality. We propose different gradations f… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  2. arXiv:2502.14143  [pdf, other

    cs.MA cs.AI cs.CY cs.ET cs.LG

    Multi-Agent Risks from Advanced AI

    Authors: Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob Foerster, Tomáš Gavenčiak, The Anh Han, Edward Hughes, Vojtěch Kovařík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schroeder de Witt, Nisarg Shah, Michael Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Quentin Feuillade-Montixi, Matija Franklin, Esben Kran , et al. (19 additional authors not shown)

    Abstract: The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, a… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Cooperative AI Foundation, Technical Report #1

  3. arXiv:2502.09288  [pdf, other

    cs.CY

    AI Safety for Everyone

    Authors: Balint Gyevnar, Atoosa Kasirzadeh

    Abstract: Recent discussions and research in AI safety have increasingly emphasized the deep connection between AI safety and existential risk from advanced AI systems, suggesting that work on AI safety necessarily entails serious consideration of potential existential threats. However, this framing has three potential drawbacks: it may exclude researchers and practitioners who are committed to AI safety bu… ▽ More

    Submitted 14 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

  4. arXiv:2412.16022  [pdf, other

    cs.CL cs.AI

    The Only Way is Ethics: A Guide to Ethical Research with Large Language Models

    Authors: Eddie L. Ungless, Nikolas Vitsakis, Zeerak Talat, James Garforth, Björn Ross, Arno Onken, Atoosa Kasirzadeh, Alexandra Birch

    Abstract: There is a significant body of work looking at the ethical considerations of large language models (LLMs): critiquing tools to measure performance and harms; proposing toolkits to aid in ideation; discussing the risks to workers; considering legislation around privacy and security etc. As yet there is no work that integrates these resources into a single practical guide that focuses on LLMs; we at… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: Accepted to COLING '25. This paper is the condensed pocket guide to accompany our full LLM Ethics Whitepaper, available at arXiv:2410.19812, and at https://github.com/MxEddie/Ethics-Whitepaper for suggested revisions

  5. arXiv:2412.07780  [pdf, other

    cs.CY

    A Taxonomy of Systemic Risks from General-Purpose AI

    Authors: Risto Uuk, Carlos Ignacio Gutierrez, Daniel Guppy, Lode Lauwaert, Atoosa Kasirzadeh, Lucia Velasco, Peter Slattery, Carina Prunkl

    Abstract: Through a systematic review of academic literature, we propose a taxonomy of systemic risks associated with artificial intelligence (AI), in particular general-purpose AI. Following the EU AI Act's definition, we consider systemic risks as large-scale threats that can affect entire societies or economies. Starting with an initial pool of 1,781 documents, we analyzed 86 selected papers to identify… ▽ More

    Submitted 24 November, 2024; originally announced December 2024.

    Comments: 34 pages, 9 tables, 1 figure

  6. arXiv:2411.09222  [pdf, ps, other

    cs.CY

    Democratic AI is Possible. The Democracy Levels Framework Shows How It Might Work

    Authors: Aviv Ovadya, Kyle Redman, Luke Thorburn, Quan Ze Chen, Oliver Smith, Flynn Devine, Andrew Konya, Smitha Milli, Manon Revel, K. J. Kevin Feng, Amy X. Zhang, Bilva Chandra, Michiel A. Bakker, Atoosa Kasirzadeh

    Abstract: This position paper argues that effectively "democratizing AI" requires democratic governance and alignment of AI, and that this is particularly valuable for decisions with systemic societal impacts. Initial steps -- such as Meta's Community Forums and Anthropic's Collective Constitutional AI -- have illustrated a promising direction, where democratic processes could be used to meaningfully improv… ▽ More

    Submitted 18 June, 2025; v1 submitted 14 November, 2024; originally announced November 2024.

    Comments: 31 pages. Accepted to the position paper track at ICML 2025. A previous version was presented at the Pluralistic Alignment Workshop at NeurIPS 2024. For ongoing work, see: https://democracylevels.org

  7. arXiv:2410.19812  [pdf

    cs.CY cs.CL

    Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models

    Authors: Eddie L. Ungless, Nikolas Vitsakis, Zeerak Talat, James Garforth, Björn Ross, Arno Onken, Atoosa Kasirzadeh, Alexandra Birch

    Abstract: This whitepaper offers an overview of the ethical considerations surrounding research into or with large language models (LLMs). As LLMs become more integrated into widely used applications, their societal impact increases, bringing important ethical questions to the forefront. With a growing body of work examining the ethical development, deployment, and use of LLMs, this whitepaper provides a co… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 47 pages

    ACM Class: I.2

  8. arXiv:2410.00608  [pdf, other

    cs.CY

    Measurement challenges in AI catastrophic risk governance and safety frameworks

    Authors: Atoosa Kasirzadeh

    Abstract: Safety frameworks represent a significant development in AI governance: they are the first type of publicly shared catastrophic risk management framework developed by major AI companies and focus specifically on AI scaling decisions. I identify six critical measurement challenges in their implementation and propose three policy recommendations to improve their validity and reliability.

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: Tech Policy Press

  9. Beyond Model Interpretability: Socio-Structural Explanations in Machine Learning

    Authors: Andrew Smart, Atoosa Kasirzadeh

    Abstract: What is it to interpret the outputs of an opaque machine learning model. One approach is to develop interpretable machine learning techniques. These techniques aim to show how machine learning models function by providing either model centric local or global explanations, which can be based on mechanistic interpretations revealing the inner working mechanisms of models or nonmechanistic approximat… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Journal ref: AI & Soc (2024).

  10. arXiv:2408.16961  [pdf, other

    cs.HC cs.AI

    The Future of Open Human Feedback

    Authors: Shachar Don-Yehiya, Ben Burtenshaw, Ramon Fernandez Astudillo, Cailean Osborne, Mimansa Jaiswal, Tzu-Sheng Kuo, Wenting Zhao, Idan Shenfeld, Andi Peng, Mikhail Yurochkin, Atoosa Kasirzadeh, Yangsibo Huang, Tatsunori Hashimoto, Yacine Jernite, Daniel Vila-Suero, Omri Abend, Jennifer Ding, Sara Hooker, Hannah Rose Kirk, Leshem Choshen

    Abstract: Human feedback on conversations with language language models (LLMs) is central to how these systems learn about the world, improve their capabilities, and are steered toward desirable and safe behaviors. However, this feedback is mostly collected by frontier AI labs and kept behind closed doors. In this work, we bring together interdisciplinary experts to assess the opportunities and challenges t… ▽ More

    Submitted 4 September, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  11. arXiv:2408.11441  [pdf, ps, other

    cs.AI

    Epistemic Injustice in Generative AI

    Authors: Jackie Kay, Atoosa Kasirzadeh, Shakir Mohamed

    Abstract: This paper investigates how generative AI can potentially undermine the integrity of collective knowledge and the processes we rely on to acquire, assess, and trust information, posing a significant threat to our knowledge ecosystem and democratic discourse. Grounded in social and political philosophy, we introduce the concept of \emph{generative algorithmic epistemic injustice}. We identify four… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  12. arXiv:2406.11843  [pdf

    cs.CY cs.AI

    Explanation Hacking: The perils of algorithmic recourse

    Authors: Emily Sullivan, Atoosa Kasirzadeh

    Abstract: We argue that the trend toward providing users with feasible and actionable explanations of AI decisions, known as recourse explanations, comes with ethical downsides. Specifically, we argue that recourse explanations face several conceptual pitfalls and can lead to problematic explanation hacking, which undermines their ethical status. As an alternative, we advocate that explanations of AI decisi… ▽ More

    Submitted 22 March, 2024; originally announced June 2024.

  13. arXiv:2405.13974  [pdf, other

    cs.CL cs.AI

    CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models

    Authors: Giada Pistilli, Alina Leidinger, Yacine Jernite, Atoosa Kasirzadeh, Alexandra Sasha Luccioni, Margaret Mitchell

    Abstract: This paper introduces the "CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset, designed to evaluate the social and cultural variation of Large Language Models (LLMs) across multiple languages and value-sensitive topics. We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, so… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  14. arXiv:2404.09932  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Foundational Challenges in Assuring Alignment and Safety of Large Language Models

    Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (17 additional authors not shown)

    Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.

    Submitted 5 September, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  15. arXiv:2404.00579  [pdf, other

    cs.IR cs.AI

    A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)

    Authors: Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Arnau Ramisa, René Vidal, Maheswaran Sathiamoorthy, Atoosa Kasirzadeh, Silvia Milano

    Abstract: Traditional recommender systems (RS) typically use user-item rating histories as their main data source. However, deep generative models now have the capability to model and sample from complex data distributions, including user-item interactions, text, images, and videos, enabling novel recommendation tasks. This comprehensive, multidisciplinary survey connects key advancements in RS using Genera… ▽ More

    Submitted 4 July, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: This survey accompanies a tutorial presented at ACM KDD'24

  16. arXiv:2402.06811  [pdf, ps, other

    cs.AI

    Discipline and Label: A WEIRD Genealogy and Social Theory of Data Annotation

    Authors: Andrew Smart, Ding Wang, Ellis Monk, Mark Díaz, Atoosa Kasirzadeh, Erin Van Liemt, Sonja Schmer-Galunder

    Abstract: Data annotation remains the sine qua non of machine learning and AI. Recent empirical work on data annotation has begun to highlight the importance of rater diversity for fairness, model performance, and new lines of research have begun to examine the working conditions for data annotation workers, the impacts and role of annotator subjectivity on labels, and the potential psychological harms from… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 18 pages

  17. arXiv:2401.07836  [pdf, other

    cs.CY cs.AI cs.LG

    Two Types of AI Existential Risk: Decisive and Accumulative

    Authors: Atoosa Kasirzadeh

    Abstract: The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level intelligence. These events have severe consequences that either lead to human extinction or irreversibly cripple human civilization to a point beyond recovery. This discourse, however, often neglects t… ▽ More

    Submitted 17 January, 2025; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: Journal article for Philosophical Studies

  18. arXiv:2307.05543  [pdf, ps, other

    cs.CY

    Typology of Risks of Generative Text-to-Image Models

    Authors: Charlotte Bird, Eddie L. Ungless, Atoosa Kasirzadeh

    Abstract: This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney, through a comprehensive literature review. While these models offer unprecedented capabilities for generating images, their development and use introduce new types of risk that require careful consideration. Our review reveals significant knowledge gaps concerni… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted for publication in 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES 2023)

  19. arXiv:2306.01479  [pdf, ps, other

    cs.CY

    Reconciling Governmental Use of Online Targeting With Democracy

    Authors: Katja Andric, Atoosa Kasirzadeh

    Abstract: The societal and epistemological implications of online targeted advertising have been scrutinized by AI ethicists, legal scholars, and policymakers alike. However, the government's use of online targeting and its consequential socio-political ramifications remain under-explored from a critical socio-technical standpoint. This paper investigates the socio-political implications of governmental onl… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted for publication in 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT 2023)

  20. arXiv:2304.11163  [pdf, other

    cs.CY cs.CL

    ChatGPT, Large Language Technologies, and the Bumpy Road of Benefiting Humanity

    Authors: Atoosa Kasirzadeh

    Abstract: The allure of emerging AI technologies is undoubtedly thrilling. However, the promise that AI technologies will benefit all of humanity is empty so long as we lack a nuanced understanding of what humanity is supposed to be in the face of widening global inequality and pressing existential threats. Going forward, it is crucial to invest in rigorous and collaborative AI safety and ethics research. W… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

    Comments: As part of a series on Dailynous : "Philosophers on next-generation large language models"

  21. arXiv:2209.00731  [pdf, ps, other

    cs.CY cs.CL

    In conversation with Artificial Intelligence: aligning language models with human values

    Authors: Atoosa Kasirzadeh, Iason Gabriel

    Abstract: Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational… ▽ More

    Submitted 21 December, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

    Comments: Accepted for publication with minor revisions at Philosophy & Technology

  22. Algorithmic Fairness and Structural Injustice: Insights from Feminist Political Philosophy

    Authors: Atoosa Kasirzadeh

    Abstract: Data-driven predictive algorithms are widely used to automate and guide high-stake decision making such as bail and parole recommendation, medical resource distribution, and mortgage allocation. Nevertheless, harmful outcomes biased against vulnerable groups have been reported. The growing research field known as 'algorithmic fairness' aims to mitigate these harmful biases. Its primary methodology… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: This paper is accepted for publication in the Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (AIES 22)

  23. arXiv:2112.04359  [pdf, other

    cs.CL cs.AI cs.CY

    Ethical and social risks of harm from Language Models

    Authors: Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel

    Abstract: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguist… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  24. Fairness and Data Protection Impact Assessments

    Authors: Atoosa Kasirzadeh, Damian Clifford

    Abstract: In this paper, we critically examine the effectiveness of the requirement to conduct a Data Protection Impact Assessment (DPIA) in Article 35 of the General Data Protection Regulation (GDPR) in light of fairness metrics. Through this analysis, we explore the role of the fairness principle as introduced in Article 5(1)(a) and its multifaceted interpretation in the obligation to conduct a DPIA. Our… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Journal ref: AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

  25. User Tampering in Reinforcement Learning Recommender Systems

    Authors: Charles Evans, Atoosa Kasirzadeh

    Abstract: In this paper, we introduce new formal methods and provide empirical evidence to highlight a unique safety concern prevalent in reinforcement learning (RL)-based recommendation algorithms -- 'user tampering.' User tampering is a situation where an RL-based recommender system may manipulate a media user's opinions through its suggestions as part of a policy to maximize long-term user engagement. We… ▽ More

    Submitted 24 July, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: In proceedings of the 6th AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES '23)

  26. arXiv:2103.00752  [pdf, ps, other

    cs.CY cs.AI

    Reasons, Values, Stakeholders: A Philosophical Framework for Explainable Artificial Intelligence

    Authors: Atoosa Kasirzadeh

    Abstract: The societal and ethical implications of the use of opaque artificial intelligence systems for consequential decisions, such as welfare allocation and criminal justice, have generated a lively debate among multiple stakeholder groups, including computer scientists, ethicists, social scientists, policy makers, and end users. However, the lack of a common language or a multi-dimensional framework to… ▽ More

    Submitted 28 February, 2021; originally announced March 2021.

    Comments: This paper is accepted for non-archival publication at the ACM conference on Fairness, Accountability, and Transparency (FAccT) 2021

  27. arXiv:2102.05085  [pdf, ps, other

    cs.CY

    The Use and Misuse of Counterfactuals in Ethical Machine Learning

    Authors: Atoosa Kasirzadeh, Andrew Smart

    Abstract: The use of counterfactuals for considerations of algorithmic fairness and explainability is gaining prominence within the machine learning community and industry. This paper argues for more caution with the use of counterfactuals when the facts to be considered are social categories such as race or gender. We review a broad body of papers from philosophy and social sciences on social ontology and… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: 9 pages, 1 table, 1 figure

  28. arXiv:1910.13607  [pdf, ps, other

    cs.AI cs.CY cs.HC cs.LG

    Mathematical decisions and non-causal elements of explainable AI

    Authors: Atoosa Kasirzadeh

    Abstract: The social implications of algorithmic decision-making in sensitive contexts have generated lively debates among multiple stakeholders, such as moral and political philosophers, computer scientists, and the public. Yet, the lack of a common language and a conceptual framework for an appropriate bridging of the moral, technical, and political aspects of the debate prevents the discussion to be as e… ▽ More

    Submitted 12 December, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: A shorter version of this paper was presented at the NeurIPS 2019, Human-Centric Machine Learning Workshop