Corpus ID: 13177893

Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces

@article{Mahadevan2014ProximalRL,
  title={Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces},
  author={Sridhar Mahadevan and Bo Liu and Philip S. Thomas and Will Dabney and Stephen Giguere and Nicholas Jacek and Ian M. Gemp and Ji Liu},
  journal={ArXiv},
  year={2014},
  volume={abs/1405.6757}
}
In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy… Expand
Stochastic Primal-Dual Q-Learning
In this work, we present a new model-free and off-policy reinforcement learning (RL) algorithm, that is capable of finding a near-optimal policy with state-action observations from arbitrary behaviorExpand
Stochastic Primal-Dual Q-Learning Algorithm For Discounted MDPs
TLDR
It is proved a first-of-its-kind result that the SPD Q-learning guarantees a certain convergence rate, even when the state-action distribution under a given behavior policy is time-varying but sub-linearly converges to a stationary distribution. Expand
Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity
TLDR
The results of the theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. Expand
A Functional Mirror Descent Perspective on Reinforcement Learning
  • 2020
Functional mirror descent offers a unifying perspective on optimization of statistical models and provides numerous advantages for the design and analysis of learning algorithms. It brings theExpand
Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD
TLDR
A primal-dual distributed GTD algorithm is proposed and it is proved that it almost surely converges to a set of stationary points of the optimization problem. Expand
Primal-Dual Distributed Temporal Difference Learning.
TLDR
A stochastic primal-dual distributed algorithm is proposed to solve the problem of estimating the global value function is converted into a constrained convex optimization problem and it is proved that the algorithm converges to a set of solutions of the optimization problem. Expand
SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation
TLDR
This paper revisits the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov’s smoothing technique and the Legendre-Fenchel transformation, and develops a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiable function class may be used. Expand
Investigating Practical Linear Temporal Difference Learning
TLDR
This paper derives two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms and performs an empirical comparison to elicit which of these new linear TD methods should be preferred in different situations, and makes concrete suggestions about practical use. Expand
Proximal Gradient Temporal Difference Learning Algorithms
TLDR
The results of the theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. Expand
Non-convex Policy Search Using Variational In- equalities
Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have shown to be successful in high-dimensionalExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 147 REFERENCES
The Fixed Points of Off-Policy TD
TLDR
A novel TD algorithm is proposed that has approximation guarantees even in the case of off-policy sampling and which empirically outperforms existing TD methods. Expand
Multiagent learning using a variable learning rate
TLDR
This article introduces the WoLF principle, “Win or Learn Fast”, for varying the learning rate, and examines this technique theoretically, proving convergence in self-play on a restricted class of iterated matrix games. Expand
Practical Kernel-Based Reinforcement Learning
TLDR
An algorithm that turns KBRL into a practical reinforcement learning tool that significantly outperforms other state-of-the-art reinforcement learning algorithms on the tasks studied and derive upper bounds for the distance between the value functions computed by KBRL and KBSF using the same data. Expand
Algorithms for Reinforcement Learning
TLDR
This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations. Expand
Natural actor-critic algorithms
TLDR
Four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas are presented, and their convergence proofs are provided, providing the first convergence proofs and the first fully incremental algorithms. Expand
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications. Expand
Linear Complementarity for Regularized Policy Evaluation and Improvement
TLDR
It is demonstrated that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration and permit a form of modified policy iteration that can beused to approximate a "greedy" homotopy path, a generalization of the LARS-TD homotopic path that combines policy evaluation and optimization. Expand
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
TLDR
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight. Expand
Risk-Sensitive Reinforcement Learning Applied to Control under Constraints
TLDR
A model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies based on weighting the original value function and the risk, which was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. Expand
Policy Gradient Methods for Reinforcement Learning with Function Approximation
TLDR
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand
...
1
2
3
4
5
...