• Corpus ID: 218487654

Improving Robustness via Risk Averse Distributional Reinforcement Learning

@article{Singh2020ImprovingRV,
  title={Improving Robustness via Risk Averse Distributional Reinforcement Learning},
  author={Rahul Singh and Qinsheng Zhang and Yongxin Chen},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.00585}
}
One major obstacle that precludes the success of reinforcement learning in real-world applications is the lack of robustness, either to model uncertainties or external disturbances, of the trained policies. Robustness is critical when the policies are trained in simulations instead of real world environment. In this work, we propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation. Our algorithm is based on… 

Figures from this paper

Automatic Risk Adaptation in Distributional Reinforcement Learning
TLDR
This work demonstrates the suboptimality of a static risk level estimation and proposes a method to dynamically select risk levels at each environment step and reduces failure rates and improves generalization performance compared to both risk-aware and risk-agnostic agents in several locomotion environments.
Robust Reinforcement Learning with Distributional Risk-averse formulation
TLDR
This paper approximate the Robust Reinforcement Learning constrained with a Φ-divergence using an approximate Risk-Averse formulation and shows that the classical Reinforcement learning formulation can be robustified using standard deviation penalization of the objective.
Risk-Averse Offline Reinforcement Learning
TLDR
The Offline RiskAverse Actor-Critic (O-RAAC), a model-free RL algorithm that is able to learn risk-averse policies in a fully offline setting, is presented and it is demonstrated empirically that in the presence of natural distribution-shifts, O- RAAC learns policies with good average performance.
Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation
TLDR
A novel distributional RL algorithm is presented that not only learns an uncertainty-aware policy, but can also change its risk measure without expensive fine-tuning or retraining, and shows superior performance and safety over baselines in partially- observed navigation tasks.
Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control
TLDR
This work proposes an uncertainty-aware reinforcement learning algorithm for continuous control tasks that extends the Deep Deterministic Policy Gradient algorithm (DDPG) and exploits epistemic uncertainty to accelerate explo- ration and aleatoric uncertainty to learn a risk-sensitive policy.
Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations
TLDR
This paper proposes State-Noisy Markov Decision Process (SN-MDP) in the tabular case to incorporate both random and adversarial state observation noises, and theoretically characterize the bounded gradient norm of histogram-based distributional loss, accounting for the better training robustness of distribution RL.
Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations
TLDR
In noisy settings beyond SN-MDP, distributional RL is less vulnerable against noisy state observations compared with its expectation-based counterpart, and the resulting stable gradients while the optimization in Distributional RL accounts for its better training robustness against state observation noises.
EVaR Optimization for Risk-Sensitive Reinforcement Learning
In the existing work on risk-sensitive reinforcement learning (RL) problems, in order to take uncertainty into consideration, risk measure such as conditional value-at-risk (CVaR) has been widely
DEEP REINFORCEMENT LEARNING FOR EQUAL RISK PRICING
Recently equal risk pricing, a framework for fair derivative pricing, was extended to consider dynamic risk measures. However, all current implementations either employ a static risk measure that
Temporal Difference and Return Optimism in Cooperative Multi-Agent Reinforcement Learning
TLDR
A framework for decentralised cooperative multi-agent reinforcement learning that encompasses existing approaches based on asymmetric updates, as well as methods based on distributional reinforcement learning is described, and several new families of algorithms are introduced that can be seen as interpolating between TD-optimism and return- Optimism.
...
...

References

SHOWING 1-10 OF 34 REFERENCES
Worst Cases Policy Gradients
TLDR
This work proposes an actor-critic framework that models the uncertainty of the future and simultaneously learns a policy based on that uncertainty model, and optimize policies for varying levels of conditional Value-at-Risk.
Robust Adversarial Reinforcement Learning
TLDR
RARL is proposed, where an agent is trained to operate in the presence of a destabilizing adversary that applies disturbance forces to the system and the jointly trained adversary is reinforced - that is, it learns an optimal destabilization policy.
A comprehensive survey on safe reinforcement learning
TLDR
This work categorize and analyze two approaches of Safe Reinforcement Learning, based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor and the incorporation of external knowledge or the guidance of a risk metric.
Sample-based Distributional Policy Gradient
TLDR
This work proposes sample-based distributional policy gradient (SDPG) algorithm, which models the return distribution using samples via a reparameterization technique widely used in generative modeling and inference, and compares it with the state-of-art policy gradient method in DRL, D4PG.
Distributed Distributional Deterministic Policy Gradients
TLDR
The results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance.
A Distributional Perspective on Reinforcement Learning
TLDR
This paper argues for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent, and designs a new algorithm which applies Bellman's equation to the learning of approximate value distributions.
Implicit Quantile Networks for Distributional Reinforcement Learning
In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by
Consideration of Risk in Reinforcement Learning
Scaling Up Robust MDPs using Function Approximation
TLDR
This work develops a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs and shows that the proposed method provably succeeds under certain technical conditions, and its effectiveness through simulation of an option pricing problem.
Nonparametric Return Distribution Approximation for Reinforcement Learning
TLDR
A method of approximating the distribution of returns, which allows to derive various kinds of information about the returns, and shows that the proposed algorithm leads to a risk-sensitive RL paradigm.
...
...