Alignment for Advanced Machine Learning Systems
@inproceedings{Taylor2020AlignmentFA, title={Alignment for Advanced Machine Learning Systems}, author={Jessica Taylor and Eliezer Yudkowsky and Patrick LaVictoire and Andrew Critch}, year={2020} }
We survey eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? We focus on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions, and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective… CONTINUE READING
Supplemental Video
Topics from this paper
42 Citations
FHI Oxford Technical Report # 2018-2 Predicting Human Deliberative Judgments with Machine Learning
- 2018
- 4
- PDF
Scalable agent alignment via reward modeling: a research direction
- Computer Science, Mathematics
- ArXiv
- 2018
- 59
- PDF
AI safety: state of the field through quantitative lens
- Computer Science
- 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO)
- 2020
- 3
- PDF
References
SHOWING 1-10 OF 114 REFERENCES
Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains
- Computer Science, Mathematics
- ArXiv
- 2016
- 36
- PDF
Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda
- Political Science
- 2017
- 30
- Highly Influential
- PDF
Interactively shaping agents via human reinforcement: the TAMER framework
- Computer Science
- K-CAP '09
- 2009
- 304
- PDF
Letter to the Editor: Research Priorities for Robust and Beneficial Artificial Intelligence: An Open Letter
- Computer Science
- AI Mag.
- 2015
- 38
Using informative behavior to increase engagement while learning from human reward
- Computer Science
- Autonomous Agents and Multi-Agent Systems
- 2015
- 16
- PDF
Active lmitation learning: formal and practical reductions to I.I.D. learning
- Computer Science
- J. Mach. Learn. Res.
- 2014
- 21
- PDF
Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings
- Computer Science, Mathematics
- ACL
- 2016
- 9
- PDF