• Corpus ID: 234790445

# A Stochastic Composite Augmented Lagrangian Method For Reinforcement Learning

@article{Li2021ASC,
title={A Stochastic Composite Augmented Lagrangian Method For Reinforcement Learning},
author={Yongfeng Li and Mingming Zhao and Weijie Chen and Zaiwen Wen},
journal={ArXiv},
year={2021},
volume={abs/2105.09716}
}
• Published 20 May 2021
• Computer Science
• ArXiv
In this paper, we consider the linear programming (LP) formulation for deep reinforcement learning. The number of the constraints depends on the size of state and action spaces, which makes the problem intractable in large or continuous environments. The general augmented Lagrangian method suffers the double-sampling obstacle in solving the LP. Namely, the conditional expectations originated from the constraint functions and the quadratic penalties in the augmented Lagrangian function impose…
1 Citations

## Figures from this paper

• Computer Science
ArXiv
• 2022
A near-optimal primal-dual learning algorithm called DPDL is proposed that provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an ˜ O ((1 − γ ) − 1 ) factor.

## References

SHOWING 1-10 OF 30 REFERENCES

• Computer Science
ArXiv
• 2017
A parameterized Primal-Dual $\pi$ Learning method based on deep neural networks for Markov decision process with large state space and off-policy reinforcement learning that significantly outperforms the one-step temporal-difference actor-critic method.
• Computer Science, Mathematics
ICML
• 2018
This paper revisits the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov’s smoothing technique and the Legendre-Fenchel transformation, and develops a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiable function class may be used.
• Computer Science
MSML
• 2021
It is proved that BFF is close to unbiased SGD when the underlying dynamics vary slowly with respect to actions, which is similar to unbiased stochastic gradient descent.
• Computer Science
ICLR
• 2016
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
• Computer Science
ICLR
• 2018
This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC, derived in a principled way from the Lagrangian dual form of the Bellman optimality equation, providing a more transparent way for learning the critic that is directly related to the objective function of the actor.
• Computer Science
ArXiv
• 2019
It is proved that a variant of PPO and TRPO equipped with overparametrized neural networks converges to the globally optimal policy at a sublinear rate.
• Computer Science
IEEE Transactions on Neural Networks
• 2005
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
• Computer Science
ICLR
• 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
The theory of the proximal point algorithm for maximal monotone operators is applied to three algorithms for solving convex programs, one of which has not previously been formulated and is shown to have much the same convergence properties, but with some potential advantages.
• Computer Science
ArXiv
• 2017
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective