Addressing the policy-bias of q-learning by repeating updates

  title={Addressing the policy-bias of q-learning by repeating updates},
  author={Sherief Abdallah and Michael Kaisers},
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal policies in Markov decision processes. However, Q-learning shows artifacts in non-stationary environments, e.g., the probability of playing the optimal action may decrease if Q-values deviate significantly from the true values, a situation that may arise in the initial phase as well as after changes in the environment.These artifacts were resolved in literature by the variant Frequency Adjusted Q… CONTINUE READING

Figures, Tables, and Topics from this paper.

Similar Papers

Loading similar papers…