• Corpus ID: 219177205

# Robust Reinforcement Learning with Wasserstein Constraint

@article{Hou2020RobustRL,
title={Robust Reinforcement Learning with Wasserstein Constraint},
author={Linfang Hou and Liang Pang and Xin Hong and Yanyan Lan and Zhiming Ma and Dawei Yin},
journal={ArXiv},
year={2020},
volume={abs/2006.00945}
}
• Published 1 June 2020
• Computer Science
• ArXiv
Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics. Existing learning algorithms usually enable the robustness through disturbing the current state or simulating environmental parameters in a heuristic way, which lack quantified robustness to the system dynamics (i.e. transition probability). To overcome this issue, we leverage Wasserstein distance to measure the disturbance to the reference transition kernel. With Wasserstein…

## Figures from this paper

• Computer Science
NeurIPS
• 2021
This paper develops a sample-based approach to estimate the unknown uncertainty set, and designs a robust Q-learning algorithm and robust TDC algorithm, which can be implemented in an online and incremental fashion and proves the robustness of the algorithms.
• Computer Science
2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP)
• 2022
This paper develops a robust multi-agent Q-learning algorithm, which is model-free and fully decentralized, and offers provable robustness under model uncertainty without incurring additional computational and memory cost.
• Computer Science, Economics
ICML
• 2022
This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch and designs the robust actor-critic method with differentiable parametric policy class and value function.
• Computer Science
ArXiv
• 2022
This work designs a robust primal-dual approach, and further theoretically develop guarantee on its convergence, complexity and robust feasibility, and investigates a concrete example of δ -contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.
• Economics
• 2022
Robust decision-making in multiplayer games requires anticipating what reactions a player policy may elicit from other players. This is difﬁ-cult in games with three or more players: when one player
• Economics
ArXiv
• 2021
This work introduces -Robust Multi-Agent Simulation (ERMAS), a robust optimization framework for learning AI policies that are robust to such multiagent sim-to-real gaps, and addresses a novel robustness objective concerning perturbations in the reward functions of agents.
• Computer Science
ArXiv
• 2023
The robust Bellman equation for robust average-reward MDPs is derived, it is proved that the optimal policy can be derived from its solution, and a robust relative value iteration algorithm is designed that provably addresses the uncertainty in the transition kernel.
Reinforcement learning (RL) has received significant interest in recent years, due primarily to the successes of deep reinforcement learning at solving many challenging tasks such as playing Chess,
• Economics
• 2021
We study the problem of training a principal in a multi-agent general-sum game using reinforcement learning (RL). Learning a robust principal policy requires anticipating the worst possible strategic
• Computer Science
• 2022
This work shows that the DRO-BO problem in this setting is equivalent to a ﬁnite-dimensional optimization problem which, even in the continuous context setting, can be easily implemented with provable sublinear regret bounds, and shows experimentally that the method surpasses existing methods.

## References

SHOWING 1-10 OF 36 REFERENCES

• Computer Science
Neural Computation
• 2005
A new reinforcement learning paradigm that explicitly takes into account input disturbance as well as modeling errors is proposed, which is called robust reinforcement learning (RRL) and tested on the control task of an inverted pendulum.
• Computer Science
ICML
• 2017
RARL is proposed, where an agent is trained to operate in the presence of a destabilizing adversary that applies disturbance forces to the system and the jointly trained adversary is reinforced - that is, it learns an optimal destabilization policy.
• Insoon Yang
• Mathematics
IEEE Transactions on Automatic Control
• 2021
This article characterize an explicit form of the optimal control policy and the worst-case distribution policy for linear-quadratic problems with Wasserstein penalty and shows that the contraction property of associated Bellman operators extends a single-stage out-of-sample performance guarantee to the corresponding multistage guarantee without any degradation in the confidence level.
• Insoon Yang
• Computer Science
IEEE Control Systems Letters
• 2017
The existence and optimality of Markov policies are proved and convex optimization-based tools to compute and analyze the policies are developed and a sensitivity analysis tool is developed to quantify the effect of ambiguity set parameters on the performance of distributionally robust policies.
• Computer Science, Mathematics
NIPS
• 2003
This work proposes an algorithm for solving finite-state and finite-action MDPs, where the solution is guaranteed to be robust with respect to estimation errors on the state transition probabilities, via Kullback-Leibler divergence bounds.
• G. Iyengar
• Mathematics, Economics
Math. Oper. Res.
• 2005
It is proved that when this set of measures has a certain "rectangularity" property, all of the main results for finite and infinite horizon DP extend to natural robust counterparts.
• Computer Science, Mathematics
Oper. Res.
• 2005
This work considers a robust control problem for a finite-state, finite-action Markov decision process, where uncertainty on the transition matrices is described in terms of possibly nonconvex sets, and shows that perfect duality holds for this problem, and that it can be solved with a variant of the classical dynamic programming algorithm, the "robust dynamic programming" algorithm.
• Computer Science
2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
• 2017
This work introduces Adversarially Robust Policy Learning (ARPL), an algorithm that leverages active computation of physically-plausible adversarial examples during training to enable robust policy learning in the source domain and robust performance under both random and adversarial input perturbations.
• Computer Science
Math. Program.
• 2018
It is demonstrated that the distributionally robust optimization problems over Wasserstein balls can in fact be reformulated as finite convex programs—in many interesting cases even as tractable linear programs.
• Computer Science
ICLR
• 2017
The EPOpt algorithm is introduced, which uses an ensemble of simulated source domains and a form of adversarial training to learn policies that are robust and generalize to a broad range of possible target domains, including unmodeled effects.