Natural actor-critic algorithms
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
A performance bound is proved for the two versions of the UGapE algorithm showing that the two problems are characterized by the same notion of complexity.
Benchmarking Batch Deep Reinforcement Learning Algorithms
This paper benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy, and finds that many of these algorithms underperform DQN trained online with the same amount of data.
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
- Yinlam Chow, M. Ghavamzadeh, Lucas Janson, M. Pavone
- Computer ScienceJournal of machine learning research
- 5 December 2015
This paper derives a formula for computing the gradient of the Lagrangian function for percentile risk-constrained Markov decision processes and devise policy gradient and actor-critic algorithms that estimate such gradient, update the policy in the descent direction, and update the Lagrange multiplier in the ascent direction.
More Robust Doubly Robust Off-policy Evaluation
- Mehrdad Farajtabar, Yinlam Chow, M. Ghavamzadeh
- Computer ScienceInternational Conference on Machine Learning
- 10 February 2018
This paper proposes alternative DR estimators, called more robust doubly robust (MRDR), that learn the model parameter by minimizing the variance of the DR estimator, and proves that the MRDR estimators are strongly consistent and asymptotically optimal.
Bayesian Reinforcement Learning: A Survey
- M. Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar
- Computer ScienceFound. Trends Mach. Learn.
- 18 November 2015
An in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm, and a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.
A Lyapunov-based Approach to Safe Reinforcement Learning
- Yinlam Chow, Ofir Nachum, Edgar A. Duéñez-Guzmán, M. Ghavamzadeh
- Computer ScienceNeural Information Processing Systems
- 1 May 2018
This work defines and presents a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints.
High-Confidence Off-Policy Evaluation
- P. Thomas, Georgios Theocharous, M. Ghavamzadeh
- Computer Science, Political ScienceAAAI Conference on Artificial Intelligence
- 25 January 2015
This paper proposes an off-policy method for computing a lower confidence bound on the expected return of a policy and provides confidences regarding the accuracy of their estimates.
Incremental Natural Actor-Critic Algorithms
The results extend prior two-timescale convergence results for actor-critic methods by using temporal difference learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor- Criterion methods by providing the first convergence proofs and the first fully incremental algorithms.
Finite-Sample Analysis of Proximal Gradient TD Algorithms
- Bo Liu, Ji Liu, M. Ghavamzadeh, S. Mahadevan, Marek Petrik
- Computer ScienceConference on Uncertainty in Artificial…
- 12 July 2015
Theoretical analysis of gradient TD (GTD) reinforcement learning methods implies that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity.