Corpus ID: 3178672

# Variance Adjusted Actor Critic Algorithms

@article{Tamar2013VarianceAA,
author={Aviv Tamar and Shie Mannor},
journal={ArXiv},
year={2013},
volume={abs/1310.3697}
}
• Published 2013
• Mathematics, Computer Science
• ArXiv
We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return. Our critic uses linear function approximation, and we extend the concept of compatible features to the variance-adjusted setting. We present an episodic actor-critic algorithm and show that it converges almost surely to a locally optimal point of the objective function.

#### Topics from this paper

Variance Penalized On-Policy and Off-Policy Actor-Critic
• Computer Science
• AAAI
• 2021
This work addresses the former source of variability in an RL setup via mean-variance optimization and modify the standard policy gradient objective to include a direct variance estimator for learning policies that maximize the variance-penalized return. Expand
Variance-constrained actor-critic algorithms for discounted and average reward MDPs
• Computer Science, Mathematics
• Machine Learning
• 2016
This paper considers both discounted and average reward Markov decision processes and devise actor-critic algorithms that operate on three timescales—a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale), and a dual ascent for Lagrange multipliers on the slowest timescale. Expand
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy
• Computer Science, Mathematics
• ArXiv
• 2020
This work makes the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria, and proposes an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable. Expand
Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework
• Computer Science
• 2019
This work establishes connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero, and proves a policy improvement theorem, based on which an implementable RL algorithm is devised. Expand
Continuous‐Time Mean–Variance Portfolio Selection: A Reinforcement Learning Framework
This work establishes connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero, and proves a policy improvement theorem, based on which an implementable RL algorithm is devised. Expand
Continuous-Time Mean-Variance Portfolio Optimization via Reinforcement Learning
• Mathematics, Economics
• ArXiv
• 2019
The PIT leads to an implementable RL algorithm that outperforms an adaptive control based method that estimates the underlying parameters in real-time and a state-of-the-art RL method that uses deep neural networks for continuous control problems by a large margin in nearly all simulations. Expand
Reward Constrained Policy Optimization
• Computer Science, Mathematics
• ICLR
• 2019
This work presents a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization' (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one. Expand
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
• Computer Science, Mathematics
• J. Mach. Learn. Res.
• 2017
This paper derives a formula for computing the gradient of the Lagrangian function for percentile risk-constrained Markov decision processes and devise policy gradient and actor-critic algorithms that estimate such gradient, update the policy in the descent direction, and update the Lagrange multiplier in the ascent direction. Expand
Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods
This paper investigates estimating the variance of a temporal-difference learning agent's update target using policy evaluation methods from reinforcement learning, contributing a method significantly simpler than prior methods that independently estimate the second moment of the {\lambda}-return. Expand
Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods
A method for estimating the variance of the λ-return directly using policy evaluation methods from reinforcement learning is contributed, significantly simpler than prior methods that independently estimate the second moment of the â‚¬return. Expand

#### References

SHOWING 1-10 OF 26 REFERENCES
Temporal Difference Methods for the Variance of the Reward To Go
• Mathematics, Computer Science
• ICML
• 2013
This paper proposes variants of both TD(0) and LSTD(λ) with linear function approximation, proves their convergence, and demonstrates their utility in a 4-dimensional continuous state space problem. Expand
Actor-Critic Algorithms for Risk-Sensitive MDPs
• Computer Science, Mathematics
• NIPS
• 2013
This paper considers both discounted and average reward Markov decision processes and devise actor-critic algorithms for estimating the gradient and updating the policy parameters in the ascent direction, which establish the convergence of the algorithms to locally risk-sensitive optimal policies. Expand
Natural Actor-Critic
• Sociology, Computer Science
• ECML
• 2005
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's natural gradientExpand
A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
• Computer Science
• IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)
• 2012
The workings of the natural gradient is described, which has made its way into many actor-critic algorithms over the past few years, and a review of several standard and natural actor-Critic algorithms is given. Expand
Policy Gradients with Variance Related Risk Criteria
• Computer Science, Mathematics
• ICML
• 2012
A framework for local policy gradient style algorithms for reinforcement learning for variance related criteria for policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost. Expand
Policy Gradient Methods for Reinforcement Learning with Function Approximation
• Mathematics, Computer Science
• NIPS
• 1999
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand
TD algorithm for the variance of return and mean-variance reinforcement learning
• Computer Science
• 2001
A TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control are presented. Expand
Algorithmic aspects of mean-variance optimization in Markov decision processes
• Mathematics, Computer Science
• Eur. J. Oper. Res.
• 2013
It is proved that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP- hard for others. Expand
Simulation-based optimization of Markov reward processes
• Mathematics, Computer Science
• IEEE Trans. Autom. Control.
• 2001
This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters and relies on the regenerative structure of finite- state Markov processes. Expand
Neuro-Dynamic Programming
• Computer Science, Economics
• Encyclopedia of Machine Learning
• 1996
From the Publisher: This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application ofExpand