• Corpus ID: 219176823

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

  title={Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning},
  author={Anoopkumar Sonar and Vincent Pacelli and Anirudha Majumdar},
A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domains experienced during training. In this paper, we approach this challenge through the following invariance principle: an agent must find a representation such that there exists an action-predictor built on top of this representation that is simultaneously optimal across all training domains. Intuitively, the resulting invariant policy enhances generalization by finding causes of… 

Figures and Tables from this paper

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

A theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states is introduced and it is demonstrated that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.


A fundamental challenge in imitation and reinforcement learning is to learn policies, representations, or dynamics that do not build on spurious correlations and generalize beyond the specific environments that they were trained on by leveraging a diverse set of training environments.

An Overview of Violence Detection Techniques: Current Challenges and Future Directions

This paper focuses on overview of deep sequence learning approaches along with localization strategies of the detected violence, and dives into the initial image processing and machine learning-based VD literature and their possible advantages such asency against the current complex models.

Language-Based Causal Representation Learning

It is shown that the dynamics of a dynamical system in which an agent moves in a rect-angular grid picking up and dropping packages can be recovered from the structure of the state graph alone without having access to information about the objects, theructure of the states, or any background knowl- edge.

Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions

Experiments demonstrate that CRESP significantly improves the performance of generalization on unseen environments, outperforming several state-of-the-arts on DeepMind Control tasks with different visual distractions.

Improving adaptability to new environments and removing catastrophic forgetting in Reinforcement Learning by using an eco-system of agents

An evaluation of the approach on two distinct dis-tributions of environments shows that the approach outperforms state-of-the-art techniques in terms of adaptability/generalization as well as avoids catastrophic forgetting.

Improving generalization to new environments and removing catastrophic forgetting in Reinforcement Learning by using an eco-system of agents

The (limited) adaptive power of individual agents is harvested to build a highly adaptive eco-system to address both concerns of catastrophic forgetting and retraining on new environments.

Invariant Causal Imitation Learning for Generalizable Policies

Invariant Causal Imitation Learning (ICIL), a novel technique in which a feature representation that is invariant across domains is learned, is proposed on the basis of which an imitation policy is learned that matches expert behavior.

A Survey of Generalisation in Deep Reinforcement Learning

It is argued that taking a purely procedural content generation approach to benchmark design is not conducive to progress in generalisation, and fast online adaptation and tackling RL-specific problems as some areas for future work on methods for generalisation are suggested.

Learning Provably Robust Motion Planners Using Funnel Libraries

The ability of the approach to provide strong guarantees on two simulated examples: navigation of an autonomous vehicle under external disturbances on a five-lane highway with multiple vehicles, and navigation of a drone across an obstacle field in the presence of wind disturbances is demonstrated.



Invariant Risk Minimization Games

A simple training algorithm is developed that uses best response dynamics and yields similar or better empirical accuracy with much lower variance than the challenging bi-level optimization problem of Arjovsky et al. (2019).

Observational Overfitting in Reinforcement Learning

This work provides a general framework for analyzing the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process, and designs multiple synthetic benchmarks from only modifying the observation space of an MDP.

Invariant Risk Minimization

This work introduces Invariant Risk Minimization, a learning paradigm to estimate invariant correlations across multiple training distributions and shows how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective

Local Characterizations of Causal Bayesian Networks

Two new definitions of causal Bayesian networks are introduced, the first interprets the missing edges in the graph, and the second interprets "zero direct effect" (i.e., ceteris paribus).

Minimalistic gridworld environment for openai gym

  • https://github.com/maximecb/gym-minigrid
  • 2018

Understanding the Failure Modes of Out-of-Distribution Generalization

This work identifies the fundamental factors that give rise to why models fail this way in easy-to-learn tasks where one would expect these models to succeed, and uncovers two complementary failure modes.

The Risks of Invariant Risk Minimization

In this setting, the first analysis of classification under the IRM objective is presented, and it is found that IRM and its alternatives fundamentally do not improve over standard Empirical Risk Minimization.

In Search of Lost Domain Generalization

This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.

Invariant Causal Prediction for Block MDPs

This paper uses tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting, and proves that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return.