• Corpus ID: 236428654

Achilles Heels for AGI/ASI via Decision Theoretic Adversaries

  title={Achilles Heels for AGI/ASI via Decision Theoretic Adversaries},
  author={Stephen L. Casper},
As progress in AI continues to advance, it is crucial to know how advanced systems will make choices and in what ways they may fail. Machines can already outsmart humans in some domains, and understanding how to safely build ones which may have capabilities at or above the human level is of particular concern. One might suspect that artificially generally intelligent (AGI) and artificially superintelligent (ASI) systems should be modeled as as something which humans, by definition, can’t reliably… 

Tables from this paper

White-Box Adversarial Policies in Deep Reinforcement Learning

This work introduces white-box adversarial policies in which an attacker can observe a victim’s internal state at each timestep, and demonstrates that white- box access to a victim makes for better attacks in two-agent environments, resulting in both faster initial learning and higher asymptotic performance against the victim.

Efficient and Insidious Adversaries in Deep Reinforcement Learning

Findings show potential for effective attacks, reveal directions for continued work, and suggest a need for caution and effective defenses in the continued development of deep reinforcement learning systems.



Putting a value on beauty

  • Oxford studies in epistemology,
  • 2010

Anthropic Bias: Observation Selection Effects in Science and Philosophy

Preface Content Acknowledgements Chapter1: Introduction Observation selection effects A brief history of anthropic reasoning Synopsis of this book Chapter 2: Fine- Tuning Arguments in Cosmology Does

Functional Decision Theory: A New Theory of Instrumental Rationality

This paper defines FDT, explores its prescriptions in a number of different decision problems, compares it to CDT and EDT, and gives philosophical justifications for FDT as a normative theory of decision-making.


We consider logical agents in a predictable universe running a variant of updateless decision theory. We give an algorithm to predict the behavior of such agents in the special case where the order

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Sequential choice and the agent’s perspective

  • 2018

Human Compatible: Artificial Intelligence and the Problem of Control

"The most important book I have read in quite some time" (Daniel Kahneman); "A must-read" (Max Tegmark); "The book we've all been waiting for" (Sam Harris) LONGLISTED FOR THE 2019 FINANCIAL TIMES AND

The precipice: Existential risk and the future of humanity, TobyOrd, Hachette Books, New York, NY, 2020. 480 pp. $30.00 (cloth)

Ord's view of humanity's potential not only shapes much of his moral argument for prioritizing existential risks,[4] but also the comparative analysis of existential risks. While humanity may face

Reinforcement Learning in Newcomblike Environments

It is shown that a value-based reinforcement learning agent cannot converge to a policy that is not ratifiable, which gives a powerful tool for reasoning about the limit behaviour of agents and proves several results about the possible limit behaviours of agents in cases where they do not converge to any policy.

Extracting Money from Causal Decision Theorists

This paper provides a new argument against what is probably the most popular variant of expected utility maximization: causal decision theory (CDT), and provides two scenarios in which CDT voluntarily loses money.