Corpus ID: 211126609

RL agents Implicitly Learning Human Preferences

@article{Wichers2020RLAI,
  title={RL agents Implicitly Learning Human Preferences},
  author={Nevan Wichers},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.06137}
}
In the real world, RL agents should be rewarded for fulfilling human preferences. We show that RL agents implicitly learn the preferences of humans in their environment. Training a classifier to predict if a simulated human's preferences are fulfilled based on the activations of a RL agent's neural network gets .93 AUC. Training a classifier on the raw environment state gets only .8 AUC. Training the classifier off of the RL agent's activations also does much better than training off of… Expand

References

SHOWING 1-9 OF 9 REFERENCES
Deep Reinforcement Learning from Human Preferences
Modeling Others using Oneself in Multi-Agent Reinforcement Learning
Visualizing and Understanding Atari Agents
Agent Modeling as Auxiliary Task for Deep Reinforcement Learning
Machine Theory of Mind
Solving Rubik's Cube with a Robot Hand
  • OpenAI, I. Akkaya, +16 authors Lei Zhang
  • Mathematics, Computer Science
  • ArXiv
  • 2019
Graying the black box: Understanding DQNs
Mirror neurons and the simulation theory of mind-reading
Deep Transfer Learning: A new deep learning glitch classification method for advanced LIGO