Corpus ID: 211259373

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors

@inproceedings{Duan2020DistributionalSA,
  title={Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors},
  author={Jingliang Duan and Yang Guan and Shengbo Eben Li and Yangang Ren and Bo Cheng},
  year={2020}
}
  • Jingliang Duan, Yang Guan, +2 authors Bo Cheng
  • Published 2020
  • Computer Science, Engineering
  • In current reinforcement learning (RL) methods, function approximation errors are known to lead to the overestimated or underestimated Q-value estimates, thus resulting in suboptimal policies. We show that the learning of a state-action return distribution function can be used to improve the Q-value estimation accuracy. We employ the return distribution function within the maximum entropy RL framework in order to develop what we call the Distributional Soft Actor-Critic (DSAC) algorithm, which… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Explore Further: Topics Discussed in This Paper

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 35 REFERENCES
    Combining policy gradient and Q-learning
    71
    Taming the Noise in Reinforcement Learning via Soft Updates
    151
    Equivalence Between Policy Gradients and Soft Q-Learning
    152
    A Distributional Perspective on Reinforcement Learning
    390
    Double Q-learning
    458