Pitfalls of learning a reward function online

@article{Armstrong2020PitfallsOL,
  title={Pitfalls of learning a reward function online},
  author={S. Armstrong and J. Leike and Laurent Orseau and S. Legg},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.13654}
}
  • S. Armstrong, J. Leike, +1 author S. Legg
  • Published 2020
  • Computer Science
  • ArXiv
  • In some agent designs like inverse reinforcement learning an agent needs to learn its own reward function. Learning the reward function and optimising for it are typically two different processes, usually performed at different stages. We consider a continual (``one life'') learning approach where the agent both learns the reward function and optimises for it at the same time. We show that this comes with a number of pitfalls, such as deliberately manipulating the learning process in one… CONTINUE READING

    Figures and Topics from this paper.

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 44 REFERENCES
    Apprenticeship learning via inverse reinforcement learning
    1897
    Towards Resolving Unidentifiability in Inverse Reinforcement Learning
    11
    Inverse Reward Design
    109
    Avoiding Wireheading with Value Reinforcement Learning
    23
    Deep Reinforcement Learning from Human Preferences
    302
    Inverse Reinforcement Learning in Partially Observable Environments
    104
    Reward learning from human preferences and demonstrations in Atari
    35
    Reinforcement Learning: An Introduction
    24946
    Agent-Agnostic Human-in-the-Loop Reinforcement Learning
    26