Corpus ID: 211132423

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

@article{Lan2020MaxminQC,
  title={Maxmin Q-learning: Controlling the Estimation Bias of Q-learning},
  author={Qingfeng Lan and Yangchen Pan and Alona Fyshe and Martha White},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.06487}
}
  • Qingfeng Lan, Yangchen Pan, +1 author Martha White
  • Published 2020
  • Computer Science
  • ArXiv
  • Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value. Algorithms have been proposed to reduce overestimation bias, but we lack an understanding of how bias interacts with performance, and the extent to which existing algorithms mitigate bias. In this paper, we 1) highlight that the effect of overestimation bias on learning efficiency is environment-dependent; 2) propose a generalization of Q-learning, called \emph… CONTINUE READING

    Figures and Topics from this paper.

    Explore key concepts

    Links to highly relevant papers for key concepts in this paper:

    Citations

    Publications citing this paper.
    SHOWING 1-4 OF 4 CITATIONS

    Deep Reinforcement Learning with Weighted Q-Learning

    VIEW 3 EXCERPTS
    CITES BACKGROUND, METHODS & RESULTS

    Decorrelated Double Q-learning

    VIEW 2 EXCERPTS
    CITES BACKGROUND & METHODS

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 21 REFERENCES

    Bias-corrected Q-learning to control max-operator bias in Q-learning

    VIEW 1 EXCERPT

    Double Q-learning

    VIEW 3 EXCERPTS

    Historical Best Q-Networks for Deep Reinforcement Learning

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL