# Automatic Risk Adaptation in Distributional Reinforcement Learning

@article{Schubert2021AutomaticRA, title={Automatic Risk Adaptation in Distributional Reinforcement Learning}, author={Frederik Schubert and Theresa Eimer and Bodo Rosenhahn and Marius Thomas Lindauer}, journal={ArXiv}, year={2021}, volume={abs/2106.06317} }

The use of Reinforcement Learning (RL) agents in practical applications requires the consideration of suboptimal outcomes, depending on the familiarity of the agent with its environment. This is especially important in safety-critical environments, where errors can lead to high costs or damage. In distributional RL, the risksensitivity can be controlled via different distortion measures of the estimated return distribution. However, these distortion functions require an estimate of the risk…

## 4 Citations

### CARL: Conditional-value-at-risk Adversarial Reinforcement Learning

- Computer ScienceArXiv
- 2021

It is proved that, at the maximin equilibrium point, the learned policy is CVaR optimal with a risk tolerance explicitly related to the adversary’s budget, and it is shown that solving the CARL game does lead to risk-averse behaviour in a toy grid environment.

### A study of first-passage time minimization via Q-learning in heated gridworlds

- Computer ScienceIEEE Access
- 2021

This work extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution and shows certain bias effects in agents trained via simple tabular Q-learning, SARSA, Expected SARSA and Double Q- learning.

### ACReL: Adversarial Conditional value-at-risk Reinforcement Learning

- Computer Science
- 2021

as a Stackelberg game, enabling the use of deep RL architectures and training algorithms. Empirical experiments show that ACReL matches a CVaR RL state-of-the-art baseline for retrieving CVaR optimal…

### A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning

- Computer ScienceSafeAI@AAAI
- 2022

The proposed approach is a twoplayer zero-sum game between a policy player and an adversary that perturbs the policy player’s state transitions given a finite budget, and it is shown that, the closer the players are to the game”s equilibrium point, the close the learned policy is to the CVaR-optimal one with a risk tolerance explicitly related to the adversary“s budget.

## References

SHOWING 1-10 OF 23 REFERENCES

### Improving Robustness via Risk Averse Distributional Reinforcement Learning

- Computer ScienceL4DC
- 2020

This work proposes a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation, based on recently discovered distributional RL framework and includes CVaR risk measure in sample based distributional policy gradients (SDPG) for learning risk-averse policies.

### Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation

- Computer Science2021 IEEE International Conference on Robotics and Automation (ICRA)
- 2021

A novel distributional RL algorithm is presented that not only learns an uncertainty-aware policy, but can also change its risk measure without expensive fine-tuning or retraining, and shows superior performance and safety over baselines in partially- observed navigation tasks.

### DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning

- Computer Science
- 2020

A new reinforcement learning algorithm called Distributional Soft Actor Critic (DSAC), which exploits the distributional information of accumulated rewards to achieve better performance, and proposes a unified framework for risk-sensitive learning.

### Distributional Reinforcement Learning with Quantile Regression

- Computer ScienceAAAI
- 2018

This paper examines methods of learning the value distribution instead of the value function in reinforcement learning, and presents a novel distributional reinforcement learning algorithm consistent with the theoretical formulation.

### Likelihood Quantile Networks for Coordinating Multi-Agent Reinforcement Learning

- Computer ScienceAAMAS
- 2020

A decentralized quantile estimator, which aims to improve performance by distinguishing non-stationary samples based on the likelihood of returns, and introduces a formal method of calculating differences of the return distribution representations and methods for utilizing it to guide updates.

### A Distributional Perspective on Reinforcement Learning

- Computer ScienceICML
- 2017

This paper argues for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent, and designs a new algorithm which applies Bellman's equation to the learning of approximate value distributions.

### Safe Reinforcement Learning via Curriculum Induction

- Computer ScienceNeurIPS
- 2020

This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor that saves the agent from violating constraints during learning.

### Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

- Computer ScienceAAAI
- 2020

This paper presents the first algorithm for sample-efficient learning of CVaR-optimal policies in Markov decision processes based on the optimism in the face of uncertainty principle by relying on a novel optimistic version of the distributional Bellman operator that moves probability mass from the lower to the upper tail of the return distribution.

### Risk-Aware Model-Based Control

- Computer ScienceFrontiers in Robotics and AI
- 2021

A novel MBRL method called Risk-Aware Model-Based Control (RAMCO), which combines uncertainty-aware deep dynamics models and the risk assessment technique Conditional Value at Risk (CVaR), and it produces superior results on a walking robot model.

### Robust Adversarial Reinforcement Learning

- Computer ScienceICML
- 2017

RARL is proposed, where an agent is trained to operate in the presence of a destabilizing adversary that applies disturbance forces to the system and the jointly trained adversary is reinforced - that is, it learns an optimal destabilization policy.