# Variance Adjusted Actor Critic Algorithms

@article{Tamar2013VarianceAA, title={Variance Adjusted Actor Critic Algorithms}, author={Aviv Tamar and Shie Mannor}, journal={ArXiv}, year={2013}, volume={abs/1310.3697} }

We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return. Our critic uses linear function approximation, and we extend the concept of compatible features to the variance-adjusted setting. We present an episodic actor-critic algorithm and show that it converges almost surely to a locally optimal point of the objective function.

#### Topics from this paper

#### 27 Citations

Variance Penalized On-Policy and Off-Policy Actor-Critic

- Computer Science
- AAAI
- 2021

This work addresses the former source of variability in an RL setup via mean-variance optimization and modify the standard policy gradient objective to include a direct variance estimator for learning policies that maximize the variance-penalized return. Expand

Variance-constrained actor-critic algorithms for discounted and average reward MDPs

- Computer Science, Mathematics
- Machine Learning
- 2016

This paper considers both discounted and average reward Markov decision processes and devise actor-critic algorithms that operate on three timescales—a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale), and a dual ascent for Lagrange multipliers on the slowest timescale. Expand

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy

- Computer Science, Mathematics
- ArXiv
- 2020

This work makes the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria, and proposes an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable. Expand

Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

- Computer Science
- 2019

This work establishes connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero, and proves a policy improvement theorem, based on which an implementable RL algorithm is devised. Expand

Continuous‐Time Mean–Variance Portfolio Selection: A Reinforcement Learning Framework

- Computer Science
- 2020

This work establishes connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero, and proves a policy improvement theorem, based on which an implementable RL algorithm is devised. Expand

Continuous-Time Mean-Variance Portfolio Optimization via Reinforcement Learning

- Mathematics, Economics
- ArXiv
- 2019

The PIT leads to an implementable RL algorithm that outperforms an adaptive control based method that estimates the underlying parameters in real-time and a state-of-the-art RL method that uses deep neural networks for continuous control problems by a large margin in nearly all simulations. Expand

Reward Constrained Policy Optimization

- Computer Science, Mathematics
- ICLR
- 2019

This work presents a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization' (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one. Expand

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2017

This paper derives a formula for computing the gradient of the Lagrangian function for percentile risk-constrained Markov decision processes and devise policy gradient and actor-critic algorithms that estimate such gradient, update the policy in the descent direction, and update the Lagrange multiplier in the ascent direction. Expand

Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods

- Computer Science
- 2018

This paper investigates estimating the variance of a temporal-difference learning agent's update target using policy evaluation methods from reinforcement learning, contributing a method significantly simpler than prior methods that independently estimate the second moment of the {\lambda}-return. Expand

Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods

- Computer Science
- ArXiv
- 2018

A method for estimating the variance of the λ-return directly using policy evaluation methods from reinforcement learning is contributed, significantly simpler than prior methods that independently estimate the second moment of the â‚¬return. Expand

#### References

SHOWING 1-10 OF 26 REFERENCES

Temporal Difference Methods for the Variance of the Reward To Go

- Mathematics, Computer Science
- ICML
- 2013

This paper proposes variants of both TD(0) and LSTD(λ) with linear function approximation, proves their convergence, and demonstrates their utility in a 4-dimensional continuous state space problem. Expand

Actor-Critic Algorithms for Risk-Sensitive MDPs

- Computer Science, Mathematics
- NIPS
- 2013

This paper considers both discounted and average reward Markov decision processes and devise actor-critic algorithms for estimating the gradient and updating the policy parameters in the ascent direction, which establish the convergence of the algorithms to locally risk-sensitive optimal policies. Expand

Natural Actor-Critic

- Sociology, Computer Science
- ECML
- 2005

This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's natural gradient… Expand

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

- Computer Science
- IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)
- 2012

The workings of the natural gradient is described, which has made its way into many actor-critic algorithms over the past few years, and a review of several standard and natural actor-Critic algorithms is given. Expand

Policy Gradients with Variance Related Risk Criteria

- Computer Science, Mathematics
- ICML
- 2012

A framework for local policy gradient style algorithms for reinforcement learning for variance related criteria for policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost. Expand

Policy Gradient Methods for Reinforcement Learning with Function Approximation

- Mathematics, Computer Science
- NIPS
- 1999

This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand

TD algorithm for the variance of return and mean-variance reinforcement learning

- Computer Science
- 2001

A TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control are presented. Expand

Algorithmic aspects of mean-variance optimization in Markov decision processes

- Mathematics, Computer Science
- Eur. J. Oper. Res.
- 2013

It is proved that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP- hard for others. Expand

Simulation-based optimization of Markov reward processes

- Mathematics, Computer Science
- IEEE Trans. Autom. Control.
- 2001

This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters and relies on the regenerative structure of finite- state Markov processes. Expand

Neuro-Dynamic Programming

- Computer Science, Economics
- Encyclopedia of Machine Learning
- 1996

From the Publisher:
This is the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of… Expand