Utility function security in artificially intelligent agents

@article{Yampolskiy2014UtilityFS,
  title={Utility function security in artificially intelligent agents},
  author={Roman V Yampolskiy},
  journal={Journal of Experimental \& Theoretical Artificial Intelligence},
  year={2014},
  volume={26},
  pages={373 - 389}
}
  • Roman V Yampolskiy
  • Published 8 April 2014
  • Computer Science
  • Journal of Experimental & Theoretical Artificial Intelligence
The notion of ‘wireheading’, or direct reward centre stimulation of the brain, is a well-known concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved… 
Personal Universes: A Solution to the Multi-Agent Value Alignment Problem
TLDR
This paper assumes that the value extraction problem will be solved and proposes a possible way to implement an AI solution which optimally aligns with individual preferences of each user, and analyzed benefits and limitations of the proposed approach.
Goal Changes in Intelligent Agents
TLDR
A taxonomy of four separate ways that changes in effective goals may occur in an AGI system, and how measures to mitigate the risk of some types of goal change may exacerbate therisk of others.
An AGI Modifying Its Utility Function in Violation of the Strong Orthogonality Thesis
An artificial general intelligence (AGI) might have an instrumental drive to modify its utility function to improve its ability to cooperate, bargain, promise, threaten, and resist and engage in
Augmented Utilitarianism for AGI Safety
TLDR
A novel socio-technological ethical framework denoted Augmented Utilitarianism is proposed which directly alleviates the perverse instantiation problem and is elaborate on how augmented by AI and more generally science and technology, it might allow a society to craft and update ethical utility functions while jointly undergoing a dynamical ethical enhancement.
Detecting Qualia in Natural and Artificial Agents
TLDR
It is shown that computers are at least rudimentarily conscious with potential to eventually reach superconsciousness, and a test for confirming certain subjective experiences in a tested agent is introduced.
On Controllability of AI
TLDR
Consequences of uncontrollability of AI are discussed with respect to future of humanity and research on AI, and AI safety and security.
Concrete Problems in AI Safety
TLDR
A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented.
Raising Ethical Machines
TLDR
This chapter explores how all top-down approaches to implementing machine ethics are fundamentally limited and how bottom-up approaches, in particular, reinforcement learning methods, are not beset by the same problems as top- down approaches.
Taxonomy of Pathways to Dangerous AI
TLDR
This work survey, classify and analyze a number of circumstances, which might lead to arrival of malicious AI, the first attempt to systematically classify types of pathways leading to malevolent AI.
...
1
2
3
4
...

References

SHOWING 1-10 OF 110 REFERENCES
Learning What to Value
  • Dan Dewey
  • Computer Science, Psychology
    AGI
  • 2011
I. J. Good's intelligence explosion theory predicts that ultraintelligent agents will undergo a process of repeated self-improvement; in the wake of such an event, how well our values are fulfilled
Complex Value Systems in Friendly AI
TLDR
Some of the reasoning which suggests that Friendly AI is solvable, but not simply or trivially so, is presented, and it is suggested that a wise strategy would be to invoke detailed learning of and inheritance from human values as a basis for further normalization and reflection.
Delusion, Survival, and Intelligent Agents
TLDR
The main results are that: 1) The reinforcement-learning agent under reasonable circumstances behaves exactly like an agent whose sole task is to survive (to preserve the integrity of its code); and 2) Only the knowledge-seeking agent behaves completely as expected.
Self-Modification and Mortality in Artificial Agents
TLDR
The "Simpleton Gambit" is introduced which allows us to discuss and compare some very different kinds of agents, specifically: reinforcement-learning, goal-seeking, predictive, and knowledge-seeking agents and whether these agents would choose to modify themselves toward their own detriment.
The Singularity and Machine Ethics
Many researchers have argued that a self-improving artificial intelligence (AI) could become so vastly more powerful than humans that we would not be able to stop it from achieving its goals. If so,
AI-Complete CAPTCHAs as Zero Knowledge Proofs of Access to an Artificially Intelligent System
TLDR
This paper proposes a method based on the combination of zero knowledge proofs and provably AI-complete CAPTCHA problems to show that a superintelligent system has been constructed without having to reveal the system itself.
Model-based Utility Functions
TLDR
This paper argues, via two examples, that the behavior problems can be avoided by formulating the utility function in two steps: inferring a model of the environment from interactions, and computing utility as a function of the Environment model.
The Basic AI Drives
TLDR
This paper identifies a number of “drives” that will appear in sufficiently advanced AI systems of any design and discusses how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity.
Artificial General Intelligence and the Human Mental Model
TLDR
This chapter applies a goal-oriented understanding of intelligence to show that humanity occupies only a tiny portion of the design space of possible minds; and the mental architectures and goals of future superintelligences need not have most of the properties of human minds.
Why Computers Can’t Feel Pain
TLDR
It is shown that conceding the ‘strong AI’ thesis for Q (crediting it with mental states and consciousness) opens the door to a vicious form of panpsychism whereby all open systems must instantiate conscious experience and hence that disembodied minds lurk everywhere.
...
1
2
3
4
5
...