• Publications
  • Influence
Racing to the precipice: a model of artificial intelligence development
The Nash equilibrium of this process, where each team takes the correct amount of safety precautions in the arms race, points the way to methods of increasing the chance of the safe development of AI. Expand
Safely Interruptible Agents
This paper explores a way to make sure a learning agent will not learn to prevent being interrupted by the environment or a human operator, and provides a formal definition of safe interruptibility and exploit the off-policy learning property to prove that either some agents are already safely interruptible, like Q-learning, or can be made so, like Sarsa. Expand
Eternity in six hours: Intergalactic spreading of intelligent life and sharpening the Fermi paradox
Abstract The Fermi paradox is the discrepancy between the strong likelihood of alien intelligent life emerging (under a wide variety of assumptions) and the absence of any visible evidence for suchExpand
Motivated Value Selection for Artificial Agents
  • S. Armstrong
  • Computer Science
  • AAAI Workshop: AI and Ethics
  • 1 April 2015
The conditions under which motivated value selection is an issue for some types of agents are established, and an example of an `indifferent' agent that avoids it entirely is presented, which poses and solves an issue which has not been formally addressed in the literature. Expand
Low Impact Artificial Intelligences
The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction. Expand
The errors, insights and lessons of famous AI predictions – and what they mean for the future
The general reliability of expert judgement in AI timeline predictions is shown to be poor, a result that fits in with previous studies of expert competence. Expand
How We're Predicting AI - or Failing to
This paper will look at the various predictions that have been made about AI and propose decomposition schemas for analysing them, and show that there are strong theoretical grounds to expect predictions to be quite poor in this area. Expand
Thinking Inside the Box: Controlling and Using an Oracle AI
This paper analyzes and critique various methods of controlling the AI, and suggests that an Oracle AI might be safer than unrestricted AI, but still remains potentially dangerous. Expand
Good and safe uses of AI Oracles
Two designs for Oracles are presented which, even under pessimistic assumptions, will not manipulate their users into releasing them and yet will still be incentivised to provide their users with helpful answers. Expand