Thinking Inside the Box: Controlling and Using an Oracle AI

@article{Armstrong2012ThinkingIT,
  title={Thinking Inside the Box: Controlling and Using an Oracle AI},
  author={Stuart Armstrong and Anders Sandberg and Nick Bostrom},
  journal={Minds and Machines},
  year={2012},
  volume={22},
  pages={299-324}
}
There is no strong reason to believe that human-level intelligence represents an upper limit of the capacity of artificial intelligence, should it be realized. This poses serious safety issues, since a superintelligent system would have great power to direct the future according to its possibly flawed motivation system. Solving this issue in general has proven to be considerably harder than expected. This paper looks at one particular approach, Oracle AI. An Oracle AI is an AI that does not act… 
Risks and Mitigation Strategies for Oracle AI
TLDR
OAIs are still strictly safer than general AIs, and there are many extra layers of precautions the authors can add on top of these, and this paper looks at some of them and analyses their weaknesses.
Good and safe uses of AI Oracles
TLDR
Two designs for Oracles are presented which, even under pessimistic assumptions, will not manipulate their users into releasing them and yet will still be incentivised to provide their users with helpful answers.
Low Impact Artificial Intelligences
TLDR
The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction.
Chess as a Testing Grounds for the Oracle Approach to AI Safety
TLDR
This paper proposes a possibly practical means of using machine learning to create two classes of narrow AI oracles that would provide chess advice: those aligned with the player's interest, and those that want the player to lose and give deceptively bad advice.
Decision Support for Safe AI Design
TLDR
This paper shows that the most probable finite stochastic program to explain a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions.
Two arguments against human-friendly AI
TLDR
It is argued that, given that the authors are capable of developing AGI, it ought to be developed with impartial, species-neutral values rather than those prioritizing friendliness to humans above all else.
Asymptotically Unambitious Artificial General Intelligence
TLDR
This work identifies an exception to the Instrumental Convergence Thesis, which is roughly that by default, an AGI would seek power, including over us, where "unambitiousness" includes not seeking arbitrary power.
On Controllability of AI
TLDR
Consequences of uncontrollability of AI are discussed with respect to future of humanity and research on AI, and AI safety and security.
Diminishing Returns and Recursive Self Improving Artificial Intelligence
In this chapter we will examine in more detail the concept of an artificial intelligence that can improve upon itself, and show how that might not be as problematic as some researchers think. The
Editorial: Risks of Artificial Intelligence
If the intelligence of artificial systems were to surpass that of humans significantly, this would constitute a significant risk for humani- ty. Time has come to consider these issues, and this
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 69 REFERENCES
Artificial Intelligence as a Positive and Negative Factor in Global Risk
By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it. Of course this problem is not limited to the field of AI. Jacques Monod wrote: "A
The Basic AI Drives
TLDR
This paper identifies a number of “drives” that will appear in sufficiently advanced AI systems of any design and discusses how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity.
The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents
  • N. Bostrom
  • Psychology, Computer Science
    Minds and Machines
  • 2012
TLDR
The relation between intelligence and motivation in artificial agents is discussed, developing and briefly arguing for two theses that help understand the possible range of behavior of superintelligent agents and point to some potential dangers in building such an agent.
Artificial Intelligence: A Modern Approach
The long-anticipated revision of this #1 selling book offers the most comprehensive, state of the art introduction to the theory and practice of artificial intelligence for modern applications.
The Singularity: a Philosophical Analysis
What happens when machines become more intelligent than humans? One view is that this event will be followed by an explosion to ever-greater levels of intelligence, as each generation of machines
Thinking About Foreign Policy: Finding an Appropriate Role for Artificially Intelligent Computers
The growing complexity of the foreign-policy conundrum has spawned a tremendous increase in the information available to support decisions without a commensurate increase in the ability to
Speculations Concerning the First Ultraintelligent Machine
  • I. Good
  • Computer Science
    Adv. Comput.
  • 1965
TLDR
The subassembly theory sheds light on the physical embodiment of memory and meaning, and there can be little doubt that both needs embodiment in an ultra-intelligent machine.
Ontological Crises in Artificial Agents' Value Systems
TLDR
This paper discusses in this paper which sorts of agents will undergo ontological crises and why they may want to create such agents, and argues that a well-defined procedure for resolving ontology crises is needed.
Economic Implications of Software Minds
Economic growth has so far come from human minds. The future could bring software minds: AIs designed from scratch, or human brains transferred to computer hardware. Such minds could substitute for
Probing the improbable: methodological challenges for risks with low probabilities and high stakes
TLDR
It is argued that there are important new methodological problems which arise when assessing global catastrophic risks and they are focused on a problem regarding probability estimation, which differs from the related distinction between model and parameter uncertainty.
...
1
2
3
4
5
...