• Publications
  • Influence
Natural actor-critic algorithms
Four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas are presented, and their convergence proofs are provided, providing the first convergence proofs and the first fully incremental algorithms. Expand
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
A performance bound is proved for the two versions of the UGapE algorithm showing that the two problems are characterized by the same notion of complexity. Expand
More Robust Doubly Robust Off-policy Evaluation
This paper proposes alternative DR estimators, called more robust doubly robust (MRDR), that learn the model parameter by minimizing the variance of the DR estimator, and proves that the MRDR estimators are strongly consistent and asymptotically optimal. Expand
Benchmarking Batch Deep Reinforcement Learning Algorithms
This paper benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy, and finds that many of these algorithms underperform DQN trained online with the same amount of data. Expand
Finite-Sample Analysis of Proximal Gradient TD Algorithms
Theoretical analysis of gradient TD (GTD) reinforcement learning methods implies that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. Expand
Bayesian Reinforcement Learning: A Survey
An in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm, and a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties. Expand
Incremental Natural Actor-Critic Algorithms
The results extend prior two-timescale convergence results for actor-critic methods by using temporal difference learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor- Criterion methods by providing the first convergence proofs and the first fully incremental algorithms. Expand
High-Confidence Off-Policy Evaluation
This paper proposes an off-policy method for computing a lower confidence bound on the expected return of a policy and provides confidences regarding the accuracy of their estimates. Expand
High Confidence Policy Improvement
We present a batch reinforcement learning (RL) algorithm that provides probabilistic guarantees about the quality of each policy that it proposes, and which has no hyper-parameters that requireExpand
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
This paper derives a formula for computing the gradient of the Lagrangian function for percentile risk-constrained Markov decision processes and devise policy gradient and actor-critic algorithms that estimate such gradient, update the policy in the descent direction, and update the Lagrange multiplier in the ascent direction. Expand