Corpus ID: 220845650

Munchausen Reinforcement Learning

  title={Munchausen Reinforcement Learning},
  author={Nino Vieillard and Olivier Pietquin and M. Geist},
  • Nino Vieillard, Olivier Pietquin, M. Geist
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate of this value. Yet, another estimate could be leveraged to bootstrap RL: the current policy. Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward. We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with… CONTINUE READING
    5 Citations
    Self-Imitation Advantage Learning
    • 1
    • PDF
    Logistic $Q$-Learning
    • PDF
    Adversarially Guided Actor-Critic
    • PDF
    Leverage the Average: an Analysis of Regularization in RL
    • 8
    • PDF


    A Distributional Perspective on Reinforcement Learning
    • 489
    • PDF
    Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning
    • 8
    • PDF
    Fully Parameterized Quantile Function for Distributional Reinforcement Learning
    • 12
    • PDF
    Implicit Quantile Networks for Distributional Reinforcement Learning
    • 130
    • Highly Influential
    • PDF
    Taming the Noise in Reinforcement Learning via Soft Updates
    • 175
    • PDF
    Revisiting the Softmax Bellman Operator: New Benefits and New Perspective
    • 13
    • PDF
    • 5,692
    • PDF
    A Comparative Analysis of Expected and Distributional Reinforcement Learning
    • 31
    • PDF
    Recurrent Experience Replay in Distributed Reinforcement Learning
    • 140
    • PDF