• Corpus ID: 57573688

Personal Universes: A Solution to the Multi-Agent Value Alignment Problem

@article{Yampolskiy2019PersonalUA,
  title={Personal Universes: A Solution to the Multi-Agent Value Alignment Problem},
  author={Roman V Yampolskiy},
  journal={ArXiv},
  year={2019},
  volume={abs/1901.01851}
}
AI Safety researchers attempting to align values of highly capable intelligent systems with those of humanity face a number of challenges including personal value extraction, multi-agent value merger and finally in-silico encoding. State-of-the-art research in value alignment shows difficulties in every stage in this process, but merger of incompatible preferences is a particularly difficult challenge to overcome. In this paper we assume that the value extraction problem will be solved and… 
Augmented Utilitarianism for AGI Safety
TLDR
A novel socio-technological ethical framework denoted Augmented Utilitarianism is proposed which directly alleviates the perverse instantiation problem and is elaborate on how augmented by AI and more generally science and technology, it might allow a society to craft and update ethical utility functions while jointly undergoing a dynamical ethical enhancement.
Impossibility Results in AI: A Survey
TLDR
This paper categorized impossibility theorems applicable to the domain of AI into five categories: deduction, indistinguishability, induction, tradeoffs, and intractability, and concluded that deductive impossibilities deny 100%-guarantees for security.
AI Ethics and Value Alignment for Nonhuman Animals
TLDR
This article focuses on the two subproblems—value extraction and value aggregation—discussions challenges for the integration of values of nonhuman animals and explores approaches to how AI systems could address them.
An Overview of Artificial General Intelligence: Recent Developments and Future Challenges
TLDR
This study provides an analysis of what AGI protection scholars have written about the essence of human beliefs and proposes several well-supported hypotheses to indicate the difficulty of describing the character of human belief and a few meta-level theories are needed.
Do No Harm Policy for Minds in Other Substrates
Various authors have argued that in the future not only will it be technically feasible for human minds to be transferred to other substrates, but this will become, for most humans, the preferred
Unexplainability and Incomprehensibility of Artificial Intelligence
TLDR
This paper describes two complementary impossibility results (Unexplainability and Incomprehensibility) showing that advanced AIs would not be able to accurately explain some of their decisions and for the decisions they could explain people would not understand some of those explanations.
XR for Augmented Utilitarianism
TLDR
This short paper presents a compact review on how XR technologies could leverage the underlying transdisciplinary AI governance approach utilizing the AU framework and outlines pertinent needs for XR in two hereto related contexts.
On Controllability of AI
TLDR
Consequences of uncontrollability of AI are discussed with respect to future of humanity and research on AI, and AI safety and security.
Axes for Sociotechnical Inquiry in AI Research
TLDR
A lexicon for sociotechnical inquiry is provided and illustrates it through the example of consumer drone technology and four directions for inquiry into new and evolving areas of technological development are proposed.
Simulation Typology and Termination Risks
TLDR
All types of the most probable simulations except resurrectional simulations are prone to termination risks in a relatively short time frame of hundreds of years or less from now.
...
1
2
...

References

SHOWING 1-10 OF 112 REFERENCES
Utility function security in artificially intelligent agents
TLDR
It is concluded that wireheading in rational self-improving optimisers above a certain capacity remains an unsolved problem despite opinion of many that such machines will choose not to wirehead.
Using Stories to Teach Human Values to Artificial Agents
TLDR
It is hypothesize that an artificial intelligence that can read and understand stories can learn the values tacitly held by the culture from which the stories originate.
Aligning Superintelligence with Human Interests: A Technical Research Agenda
TLDR
It is essential to use caution when developing AI systems that can exceed human levels of general intelligence, or that can facilitate the creation of such systems.
AI-Complete CAPTCHAs as Zero Knowledge Proofs of Access to an Artificially Intelligent System
TLDR
This paper proposes a method based on the combination of zero knowledge proofs and provably AI-complete CAPTCHA problems to show that a superintelligent system has been constructed without having to reveal the system itself.
Artificial Superintelligence: A Futuristic Approach
TLDR
Artificial Superintelligence: A Futuristic Approach is designed to become a foundational text for the new science of AI safety engineering and should be an invaluable resource for AI researchers and students, computer security researchers, futurists, and philosophers.
AGI Safety Literature Review
TLDR
The intention of this paper is to provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety, and to review the current public policy on AGI.
Mimetic vs Anchored Value Alignment in Artificial Intelligence
TLDR
This paper isolating two distinct forms of VA: "mimetic" and "anchored" and discusses which VA approach better avoids the naturalistic fallacy, revealing stumbling blocks for VA approaches that neglect implications of the naturalism fallacy.
Thinking Inside the Box: Controlling and Using an Oracle AI
TLDR
This paper analyzes and critique various methods of controlling the AI, and suggests that an Oracle AI might be safer than unrestricted AI, but still remains potentially dangerous.
Responsible Artificial Intelligence: Designing Ai for Human Values
TLDR
The impact of AI in the case of the expected effects on the European labor market is explored, and the accountability, responsibility and transparency (ART) design principles for the development of AI systems that are sensitive to human values are proposed.
Artificial General Intelligence
TLDR
The AGI containment problem is surveyed – the question of how to build a container in which tests can be conducted safely and reliably, even on AGIs with unknown motivations and capabilities that could be dangerous.
...
1
2
3
4
5
...