The AGI Containment Problem

@inproceedings{Babcock2016TheAC,
  title={The AGI Containment Problem},
  author={James Babcock and J{\'a}nos Kram{\'a}r and Roman Yampolskiy},
  booktitle={AGI},
  year={2016}
}
There is considerable uncertainty about what properties, capabilities and motivations future AGIs will have. In some plausible scenarios, AGIs may pose security risks arising from accidents and defects. In order to mitigate these risks, prudent early AGI research teams will perform significant testing on their creations before use. Unfortunately, if an AGI has human-level or greater intelligence, testing itself may not be safe; some natural AGI goal systems create emergent incentives for AGIs… 
Goal Changes in Intelligent Agents
TLDR
A taxonomy of four separate ways that changes in effective goals may occur in an AGI system, and how measures to mitigate the risk of some types of goal change may exacerbate therisk of others.
Good and safe uses of AI Oracles
TLDR
Two designs for Oracles are presented which, even under pessimistic assumptions, will not manipulate their users into releasing them and yet will still be incentivised to provide their users with helpful answers.
Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures
TLDR
It is suggested that both the frequency and the seriousness of future AI failures will steadily increase and AI Safety can be improved based on ideas developed by cybersecurity experts.
Predicting future AI failures from historic examples
TLDR
It is suggested that both the frequency and the seriousness of future AI failures will steadily increase and the first attempt to assemble a public data set of AI failures is extremely valuable to AI Safety researchers.
Guidelines for Artificial Intelligence Containment
TLDR
A number of guidelines are proposed which should help AI safety researchers to develop reliable sandboxing software for intelligent programs of all levels to make it possible to study and analyze intelligent artificial agent while maintaining certain level of safety.
AI Paradigms and AI Safety: Mapping Artefacts and Techniques to Safety Issues
TLDR
A need for AI safety to be more explicit about the artefacts and techniques for which a particular issue may be applicable, in order to identify gaps and cover a broader range of issues is identified.
Understanding and Avoiding AI Failures: A Practical Guide
TLDR
This work uses AI safety principles to quantify the unique risks of increased intelligence and humanlike qualities in AI, and identifies where attention should be paid to safety for current generation AI systems.
Unownability of AI : Why Legal Ownership of Artificial Intelligence is Hard
TLDR
It is concluded that it is difficult if not impossible to establish ownership claims over AI models beyond a reasonable doubt.
Concrete Problems in AI Safety
TLDR
A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented.
Human ≠ AGI
TLDR
This paper proves that humans are not general intelligences, and widespread implicit assumption of equivalence between capabilities of AGI and HLAI appears to be unjustified.
...
...

References

SHOWING 1-10 OF 24 REFERENCES
The Basic AI Drives
TLDR
This paper identifies a number of “drives” that will appear in sufficiently advanced AI systems of any design and discusses how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity.
Responses to catastrophic AGI risk: a survey
Many researchers have argued that humanity will create artificial general intelligence (AGI) within the next twenty to one hundred years. It has been suggested that AGI may inflict serious damage to
Corrigendum: Responses to catastrophic AGI risk: a survey (2015 Phys. Scr. 90 018001)
TLDR
It is suggested that AGI may inflict serious damage to human well-being on a global scale (‘catastrophic risk’) and the field's proposed responses to AGI risk are reviewed.
Intelligence Explosion Microeconomics
TLDR
It is proposed that the next step in analyzing positions on the intelligence explosion would be to formalize return on investment curves, so that each stance can formally state which possible microfoundations they hold to be falsified by historical observations.
Rowhammer.js: A Remote Software-Induced Fault Attack in JavaScript
TLDR
This work shows that caches can be forced into fast cache eviction to trigger the Rowhammer bug with only regular memory accesses, and demonstrates a fully automated attack that requires nothing but a website with JavaScript to trigger faults on remote hardware.
Leakproofing the Singularity Artificial Intelligence Confinement Problem
TLDR
This paper attempts to formalize and to address the 'leakproofing' of the Singularity problem presented by David Chalmers and proposes a protocol aimed at making a more secure confinement environment which might delay potential negative effect from the technological singularity while allowing humanity to benefit from the superintelligence.
The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)
TLDR
The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.
Confidentiality Issues on a GPU in a Virtualized Environment
TLDR
The objective is to highlight possible information leakage due to GPUs in virtualized and cloud computing environments, and provides insight into the different GPU virtualization techniques, along with their security implications.
ARTIFICIAL INTELLIGENCE APPROACHES FOR INTRUSION DETECTION
TLDR
After multiple techniques and methodologies are investigated, it is shown that properly trained neural networks are capable of fast recognition and classification of different attacks at the level superior to previous approaches.
Software Random Number Generation Based on Race Conditions
The paper presents a new software strategy for generating true random numbers, by creating several threads and letting them compete unsynchronized for a shared variable, whose value is
...
...