• Corpus ID: 10242377

Concrete Problems in AI Safety

  title={Concrete Problems in AI Safety},
  author={Dario Amodei and Christopher Olah and Jacob Steinhardt and Paul Francis Christiano and John Schulman and Dandelion Man{\'e}},
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. [] Key Result Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.
Dynamic Models Applied to Value Learning in Artificial Intelligence
It is of utmost importance that artificial intelligent agents have their values aligned with human values, given the fact that the authors cannot expect an AI to develop human moral values simply because of its intelligence, as discussed in the Orthogonality Thesis.
Risks of using naïve approaches to Artificial Intelligence: A case study
It is demonstrated that naive approaches can have unanticipated consequences and can generate predictions based on discriminatory factors such as gender or race in the application of machine learning applied to secondary school student grades.
Dynamic Cognition Applied to Value Learning in Artificial Intelligence
  • N. D. Oliveira, N. Corrêa
  • Computer Science
    Aoristo - International Journal of Phenomenology, Hermeneutics and Metaphysics
  • 2021
It is of utmost importance that artificial intelligent agents have their values aligned with human values, given the fact that an AI cannot expect an AI to develop the authors' moral preferences simply because of its intelligence.
Predicting future AI failures from historic examples
It is suggested that both the frequency and the seriousness of future AI failures will steadily increase and the first attempt to assemble a public data set of AI failures is extremely valuable to AI Safety researchers.
Pitfalls of machine learning for tail events in high risk environments
This paper reviews the current situation and challenges of applying ML in high risk environments and outlines how phenomenological knowledge, together with an uncertainty-based risk perspective can be incorporated to alleviate the missing causality considerations in current practice.
Evolutionary Computation and AI Safety: Research Problems Impeding Routine and Safe Real-world Application of Evolution
This paper explores the intersection of AI safety with evolutionary computation, to show how safety issues arise in evolutionary computation and how understanding from evolutionary computational and biological evolution can inform the broader study ofAI safety.
Towards Safe Artificial General Intelligence
The central conclusion is that while reinforcement learning systems as designed today are inherently unsafe to scale to human levels of intelligence, there are ways to potentially address many of these issues without straying too far from the currently so successful reinforcement learning paradigm.
Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems
This article provides a comprehensive overview of different forms of NSEs and the recent research efforts to address them, identifying key characteristics of N SEs, highlighting the challenges in avoiding NSES, and discussing recently developed approaches, contrasting their benefits and limitations.
AI Paradigms and AI Safety: Mapping Artefacts and Techniques to Safety Issues
A need for AI safety to be more explicit about the artefacts and techniques for which a particular issue may be applicable, in order to identify gaps and cover a broader range of issues is identified.
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
This work formalizes human intervention for RL and shows how to reduce the human labor required by training a supervised learner to imitate the human's intervention decisions, and outlines extensions of the scheme that are necessary if the authors are to train model-free agents without a single catastrophe.


Safe Exploration Techniques for Reinforcement Learning - An Overview
This work overviews different approaches to safety in (semi)autonomous robotics and addresses the issues of how to define safety in the real-world applications (apparently absolute safety is unachievable in the continuous and random real world).
Artificial Intelligence as a Positive and Negative Factor in Global Risk
By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it. Of course this problem is not limited to the field of AI. Jacques Monod wrote: "A
Learning What to Value
I. J. Good's intelligence explosion theory predicts that ultraintelligent agents will undergo a process of repeated self-improvement; in the wake of such an event, how well our values are fulfilled
Safely Interruptible Agents
This paper explores a way to make sure a learning agent will not learn to prevent being interrupted by the environment or a human operator, and provides a formal definition of safe interruptibility and exploit the off-policy learning property to prove that either some agents are already safely interruptible, like Q-learning, or can be made so, like Sarsa.
The First Law of Robotics (A Call to Arms)
Inspired by Asimov, fundamental questions are posed about how to formalize the rich, but informal, notion of "harm" and how to avoid performing harmful actions in an computationally tractable manner.
Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
A novel algorithm is developed and proved that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint, and is demonstrated on digital terrain models for the task of exploring an unknown map with a rover.
Utility function security in artificially intelligent agents
It is concluded that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead.
Exploration and apprenticeship learning in reinforcement learning
This paper considers the apprenticeship learning setting in which a teacher demonstration of the task is available, and shows that, given the initial demonstration, no explicit exploration is necessary, and the student can attain near-optimal performance simply by repeatedly executing "exploitation policies" that try to maximize rewards.
Research Priorities for Robust and Beneficial Artificial Intelligence
This article gives numerous examples of worthwhile research aimed at ensuring that AI remains robust and beneficial.
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems.