Corpus ID: 17141244

Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks

  title={Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks},
  author={Rein Houthooft and Xi Chen and Yan Duan and John Schulman and F. Turck and P. Abbeel},
  • Rein Houthooft, Xi Chen, +3 authors P. Abbeel
  • Published 2016
  • Computer Science, Mathematics
  • ArXiv
  • Scalable and effective exploration remains a key challenge in reinforcement learning (RL. [...] Key Method We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous…Expand Abstract
    GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms
    • 60
    • PDF
    BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems
    • 70
    • Highly Influenced
    • PDF
    Graph networks as learnable physics engines for inference and control
    • 159
    • PDF
    Curiosity Driven Exploration of Learned Disentangled Goal Spaces
    • 53
    • PDF
    Dynamics-Aware Unsupervised Discovery of Skills
    • 34
    • PDF
    Deep Active Inference as Variational Policy Gradients
    • 14
    • PDF
    Learning Actionable Representations with Goal-Conditioned Policies
    • 25
    • Highly Influenced
    • PDF


    Publications referenced by this paper.
    Human-level control through deep reinforcement learning
    • 9,807
    • PDF
    Adam: A Method for Stochastic Optimization
    • 49,694
    • PDF
    Deep Exploration via Bootstrapped DQN
    • 529
    • PDF
    Near-Bayesian exploration in polynomial time
    • 217
    • PDF
    Generalization and Exploration via Randomized Value Functions
    • 160
    • PDF
    Trust Region Policy Optimization
    • 2,400
    • PDF