# Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

@inproceedings{Sonar2021InvariantPO, title={Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning}, author={Anoopkumar Sonar and Vincent Pacelli and Anirudha Majumdar}, booktitle={L4DC}, year={2021} }

A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domains experienced during training. In this paper, we approach this challenge through the following invariance principle: an agent must find a representation such that there exists an action-predictor built on top of this representation that is simultaneously optimal across all training domains. Intuitively, the resulting invariant policy enhances generalization by finding causes of…

## 23 Citations

### Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

- Computer ScienceICLR
- 2021

A theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states is introduced and it is demonstrated that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.

### I NVARIANT C AUSAL R EPRESENTATION L EARNING FOR G ENERALIZATION IN I MITATION AND R EINFORCEMENT L EARNING

- Computer Science
- 2022

A fundamental challenge in imitation and reinforcement learning is to learn policies, representations, or dynamics that do not build on spurious correlations and generalize beyond the specific environments that they were trained on by leveraging a diverse set of training environments.

### An Overview of Violence Detection Techniques: Current Challenges and Future Directions

- Computer ScienceArtificial Intelligence Review
- 2022

This paper focuses on overview of deep sequence learning approaches along with localization strategies of the detected violence, and dives into the initial image processing and machine learning-based VD literature and their possible advantages such asency against the current complex models.

### Language-Based Causal Representation Learning

- Computer ScienceArXiv
- 2022

It is shown that the dynamics of a dynamical system in which an agent moves in a rect-angular grid picking up and dropping packages can be recovered from the structure of the state graph alone without having access to information about the objects, theructure of the states, or any background knowl- edge.

### Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions

- Computer ScienceKDD
- 2022

Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments, outperforming several state-of-the-arts on DeepMind Control tasks with different visual distractions.

### Improving adaptability to new environments and removing catastrophic forgetting in Reinforcement Learning by using an eco-system of agents

- Computer ScienceArXiv
- 2022

An evaluation of the approach on two distinct dis-tributions of environments shows that the approach outperforms state-of-the-art techniques in terms of adaptability/generalization as well as avoids catastrophic forgetting.

### Improving generalization to new environments and removing catastrophic forgetting in Reinforcement Learning by using an eco-system of agents

- Computer Science
- 2022

The (limited) adaptive power of individual agents is harvested to build a highly adaptive eco-system to address both concerns of catastrophic forgetting and retraining on new environments.

### Invariant Causal Imitation Learning for Generalizable Policies

- Computer ScienceNeurIPS
- 2021

Invariant Causal Imitation Learning (ICIL), a novel technique in which a feature representation that is invariant across domains is learned, is proposed on the basis of which an imitation policy is learned that matches expert behavior.

### A Survey of Generalisation in Deep Reinforcement Learning

- Computer ScienceArXiv
- 2021

It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation are suggested.

### Learning Provably Robust Motion Planners Using Funnel Libraries

- Computer ScienceArXiv
- 2021

The ability of the approach to provide strong guarantees on two simulated examples: navigation of an autonomous vehicle under external disturbances on a five-lane highway with multiple vehicles, and navigation of a drone across an obstacle field in the presence of wind disturbances is demonstrated.

## References

SHOWING 1-10 OF 45 REFERENCES

### Invariant Risk Minimization Games

- Computer ScienceICML
- 2020

A simple training algorithm is developed that uses best response dynamics and yields similar or better empirical accuracy with much lower variance than the challenging bi-level optimization problem of Arjovsky et al. (2019).

### Observational Overfitting in Reinforcement Learning

- Computer ScienceICLR
- 2020

This work provides a general framework for analyzing the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process, and designs multiple synthetic benchmarks from only modifying the observation space of an MDP.

### Invariant Risk Minimization

- Computer ScienceArXiv
- 2019

This work introduces Invariant Risk Minimization, a learning paradigm to estimate invariant correlations across multiple training distributions and shows how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.

### Proximal Policy Optimization Algorithms

- Computer ScienceArXiv
- 2017

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective…

### Local Characterizations of Causal Bayesian Networks

- Computer ScienceGKR
- 2011

Two new definitions of causal Bayesian networks are introduced, the first interprets the missing edges in the graph, and the second interprets "zero direct effect" (i.e., ceteris paribus).

### Minimalistic gridworld environment for openai gym

- https://github.com/maximecb/gym-minigrid
- 2018

### Understanding the Failure Modes of Out-of-Distribution Generalization

- Computer ScienceICLR
- 2021

This work identifies the fundamental factors that give rise to why models fail this way in easy-to-learn tasks where one would expect these models to succeed, and uncovers two complementary failure modes.

### The Risks of Invariant Risk Minimization

- Computer ScienceICLR
- 2021

In this setting, the first analysis of classification under the IRM objective is presented, and it is found that IRM and its alternatives fundamentally do not improve over standard Empirical Risk Minimization.

### In Search of Lost Domain Generalization

- Computer ScienceICLR
- 2021

This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.

### Invariant Causal Prediction for Block MDPs

- Computer ScienceICML
- 2020

This paper uses tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting, and proves that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return.