We present a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D.Expand

We present a learning algorithm for undiscounted reinforcement learning that achieves logarithmic online regret in the number of steps taken with respect to an optimal policy.Expand

We propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improved rates for one-dimensional continuum-armed bandit problems.Expand

We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where both the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time.Expand

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time.Expand

We consider the restless bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions.Expand

We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known.Expand