Utility function security in artificially intelligent agents


The notion of ‘wireheading’, or direct reward centre stimulation of the brain, is a wellknown concept in neuroscience. In this paper, we examine the corresponding issue of reward (utility) function integrity in artificially intelligent machines. We survey the relevant literature and propose a number of potential solutions to ensure the integrity of our artificial assistants. Overall, we conclude that wireheading in rational selfimproving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead. A relevant issue of literalness in goal setting also remains largely unsolved and we suggest that the development of a non-ambiguous knowledge transfer language might be a step in the right direction.

DOI: 10.1080/0952813X.2014.895114

Extracted Key Phrases

2 Figures and Tables

Cite this paper

@article{Yampolskiy2014UtilityFS, title={Utility function security in artificially intelligent agents}, author={Roman V. Yampolskiy}, journal={J. Exp. Theor. Artif. Intell.}, year={2014}, volume={26}, pages={373-389} }