# Robust Reinforcement Learning with Wasserstein Constraint

@article{Hou2020RobustRL, title={Robust Reinforcement Learning with Wasserstein Constraint}, author={Linfang Hou and Liang Pang and Xin Hong and Yanyan Lan and Zhiming Ma and Dawei Yin}, journal={ArXiv}, year={2020}, volume={abs/2006.00945} }

Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics. Existing learning algorithms usually enable the robustness through disturbing the current state or simulating environmental parameters in a heuristic way, which lack quantified robustness to the system dynamics (i.e. transition probability). To overcome this issue, we leverage Wasserstein distance to measure the disturbance to the reference transition kernel. With Wasserstein…

## 13 Citations

### Online Robust Reinforcement Learning with Model Uncertainty

- Computer ScienceNeurIPS
- 2021

This paper develops a sample-based approach to estimate the unknown uncertainty set, and designs a robust Q-learning algorithm and robust TDC algorithm, which can be implemented in an online and incremental fashion and proves the robustness of the algorithms.

### Data-Driven Robust Multi-Agent Reinforcement Learning

- Computer Science2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP)
- 2022

This paper develops a robust multi-agent Q-learning algorithm, which is model-free and fully decentralized, and offers provable robustness under model uncertainty without incurring additional computational and memory cost.

### Policy Gradient Method For Robust Reinforcement Learning

- Computer Science, EconomicsICML
- 2022

This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch and designs the robust actor-critic method with differentiable parametric policy class and value function.

### Robust Constrained Reinforcement Learning

- Computer ScienceArXiv
- 2022

This work designs a robust primal-dual approach, and further theoretically develop guarantee on its convergence, complexity and robust feasibility, and investigates a concrete example of δ -contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.

### Learning Adversarially Robust Policies in Multi-Agent Games

- Economics
- 2022

Robust decision-making in multiplayer games requires anticipating what reactions a player policy may elicit from other players. This is difﬁ-cult in games with three or more players: when one player…

### ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations

- EconomicsArXiv
- 2021

This work introduces -Robust Multi-Agent Simulation (ERMAS), a robust optimization framework for learning AI policies that are robust to such multiagent sim-to-real gaps, and addresses a novel robustness objective concerning perturbations in the reward functions of agents.

### Robust Average-Reward Markov Decision Processes

- Computer ScienceArXiv
- 2023

The robust Bellman equation for robust average-reward MDPs is derived, it is proved that the optimal policy can be derived from its solution, and a robust relative value iteration algorithm is designed that provably addresses the uncertainty in the transition kernel.

### Review of Metrics to Measure the Stability, Robustness and Resilience of Reinforcement Learning

- PsychologyArXiv
- 2022

Reinforcement learning (RL) has received significant interest in recent years, due primarily to the successes of deep reinforcement learning at solving many challenging tasks such as playing Chess,…

### Learning to Play General-Sum Games Against Multiple Boundedly Rational Agents

- Economics
- 2021

We study the problem of training a principal in a multi-agent general-sum game using reinforcement learning (RL). Learning a robust principal policy requires anticipating the worst possible strategic…

### Distributionally Robust Bayesian Optimization with $\phi$-divergences

- Computer Science
- 2022

This work shows that the DRO-BO problem in this setting is equivalent to a ﬁnite-dimensional optimization problem which, even in the continuous context setting, can be easily implemented with provable sublinear regret bounds, and shows experimentally that the method surpasses existing methods.

## References

SHOWING 1-10 OF 36 REFERENCES

### Robust Reinforcement Learning

- Computer ScienceNeural Computation
- 2005

A new reinforcement learning paradigm that explicitly takes into account input disturbance as well as modeling errors is proposed, which is called robust reinforcement learning (RRL) and tested on the control task of an inverted pendulum.

### Robust Adversarial Reinforcement Learning

- Computer ScienceICML
- 2017

RARL is proposed, where an agent is trained to operate in the presence of a destabilizing adversary that applies disturbance forces to the system and the jointly trained adversary is reinforced - that is, it learns an optimal destabilization policy.

### Wasserstein Distributionally Robust Stochastic Control: A Data-Driven Approach

- MathematicsIEEE Transactions on Automatic Control
- 2021

This article characterize an explicit form of the optimal control policy and the worst-case distribution policy for linear-quadratic problems with Wasserstein penalty and shows that the contraction property of associated Bellman operators extends a single-stage out-of-sample performance guarantee to the corresponding multistage guarantee without any degradation in the confidence level.

### A Convex Optimization Approach to Distributionally Robust Markov Decision Processes With Wasserstein Distance

- Computer ScienceIEEE Control Systems Letters
- 2017

The existence and optimality of Markov policies are proved and convex optimization-based tools to compute and analyze the policies are developed and a sensitivity analysis tool is developed to quantify the effect of ambiguity set parameters on the performance of distributionally robust policies.

### Robustness in Markov Decision Problems with Uncertain Transition Matrices

- Computer Science, MathematicsNIPS
- 2003

This work proposes an algorithm for solving finite-state and finite-action MDPs, where the solution is guaranteed to be robust with respect to estimation errors on the state transition probabilities, via Kullback-Leibler divergence bounds.

### Robust Dynamic Programming

- Mathematics, EconomicsMath. Oper. Res.
- 2005

It is proved that when this set of measures has a certain "rectangularity" property, all of the main results for finite and infinite horizon DP extend to natural robust counterparts.

### Robust Control of Markov Decision Processes with Uncertain Transition Matrices

- Computer Science, MathematicsOper. Res.
- 2005

This work considers a robust control problem for a finite-state, finite-action Markov decision process, where uncertainty on the transition matrices is described in terms of possibly nonconvex sets, and shows that perfect duality holds for this problem, and that it can be solved with a variant of the classical dynamic programming algorithm, the "robust dynamic programming" algorithm.

### Adversarially Robust Policy Learning: Active construction of physically-plausible perturbations

- Computer Science2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2017

This work introduces Adversarially Robust Policy Learning (ARPL), an algorithm that leverages active computation of physically-plausible adversarial examples during training to enable robust policy learning in the source domain and robust performance under both random and adversarial input perturbations.

### Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

- Computer ScienceMath. Program.
- 2018

It is demonstrated that the distributionally robust optimization problems over Wasserstein balls can in fact be reformulated as finite convex programs—in many interesting cases even as tractable linear programs.

### EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

- Computer ScienceICLR
- 2017

The EPOpt algorithm is introduced, which uses an ensemble of simulated source domains and a form of adversarial training to learn policies that are robust and generalize to a broad range of possible target domains, including unmodeled effects.