Is Power-Seeking AI an Existential Risk?

@article{Carlsmith2022IsPA,
  title={Is Power-Seeking AI an Existential Risk?},
  author={Joseph Carlsmith},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.13353}
}
This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire – especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans… 

X-Risk Analysis for AI Research

TLDR
A collection of time-tested concepts from hazard analysis and systems safety, which have been designed to steer large processes in safer directions are reviewed, to discuss how AI researchers can realistically have long-term impacts on the safety of AI systems.

Current and Near-Term AI as a Potential Existential Risk Factor

TLDR
This paper problematise the notion that current and near-term artificial intelligence technologies have the potential to contribute to existential risk by acting as intermediate risk factors, and that this potential is not limited to the unaligned AGI scenario.

The alignment problem from a deep learning perspective

Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. This report makes a case for why, without substantial action to

Understanding AI alignment research: A Systematic Analysis

TLDR
This project collected and analyzed existing AI alignment research and found that the dataset is growing quickly, with several sub-elds emerging in parallel, and a classifier trained onAI alignment research articles can detect relevant articles that the authors did not originally include in the dataset.

References

SHOWING 1-10 OF 148 REFERENCES

AI Research Considerations for Human Existential Safety (ARCHES)

TLDR
This report examines how technical AI research might be steered in a manner that is more attentive to humanity's long-term prospects for survival as a species, and what existential risks humanity might face from AI development in the next century.

The Basic AI Drives

TLDR
This paper identifies a number of “drives” that will appear in sufficiently advanced AI systems of any design and discusses how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity.

AI safety via debate

TLDR
This work proposes training agents via self play on a zero sum debate game, focusing on potential weaknesses as the model scales up, and proposes future human and computer experiments to test these properties.

Artificial Intelligence as a Positive and Negative Factor in Global Risk

By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it. Of course this problem is not limited to the field of AI. Jacques Monod wrote: "A

Human Compatible: Artificial Intelligence and the Problem of Control

"The most important book I have read in quite some time" (Daniel Kahneman); "A must-read" (Max Tegmark); "The book we've all been waiting for" (Sam Harris) LONGLISTED FOR THE 2019 FINANCIAL TIMES AND

When Will AI Exceed Human Performance? Evidence from AI Experts

TLDR
The results from a large survey of machine learning researchers on their beliefs about progress in AI suggest there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years.

Life 3.0: Being Human in the Age of Artificial Intelligence

New York Times Best Seller How will Artificial Intelligence affect crime, war, justice, jobs, society and our very sense of being human? The rise of AI has the potential to transform our future more

Concrete Problems in AI Safety

TLDR
A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented.

Optimal Policies Tend To Seek Power

TLDR
This work formalizes a notion of power within the context of Markov decision processes, and provides sufficient conditions for when optimal policies tend to seek power over the environment.

Categorizing Variants of Goodhart's Law

TLDR
This paper expands on an earlier discussion by Garrabrant, which notes there are "(at least) four different mechanisms" that relate to Goodhart's Law, and specifies more clearly how they occur.
...