Alignment for Advanced Machine Learning Systems

@inproceedings{Taylor2020AlignmentFA,
  title={Alignment for Advanced Machine Learning Systems},
  author={Jessica Taylor and Eliezer Yudkowsky and Patrick LaVictoire and Andrew Critch},
  year={2020}
}
  • Jessica Taylor, Eliezer Yudkowsky, +1 author Andrew Critch
  • Published 2020
  • Computer Science
  • We survey eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? We focus on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions, and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective… CONTINUE READING
    42 Citations
    Towards Safe Artificial General Intelligence
    • 20
    • PDF
    On the Ethics of Building AI in a Responsible Manner
    • PDF
    Many Kinds of Minds are Better than One : Value Alignment Through Dialogue
    • Highly Influenced
    • PDF
    Scalable agent alignment via reward modeling: a research direction
    • 59
    • PDF
    AI safety: state of the field through quantitative lens
    • 3
    • PDF
    Asymptotically Unambitious Artificial General Intelligence
    • 7
    • PDF
    How feasible is the rapid development of artificial superintelligence
    • 10
    • PDF
    AI Safety Gridworlds
    • 118
    • PDF

    References

    SHOWING 1-10 OF 114 REFERENCES
    Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains
    • 36
    • PDF
    Concrete Problems in AI Safety
    • 824
    • Highly Influential
    • PDF
    What artificial experts can and cannot do
    • 26
    • PDF
    Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda
    • 30
    • Highly Influential
    • PDF
    Learning What to Value
    • 85
    • PDF
    Using informative behavior to increase engagement while learning from human reward
    • 16
    • PDF