• Corpus ID: 239998174

Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning

  title={Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning},
  author={Jongjin Park and Younggyo Seo and Chang Liu and Li Zhao and Tao Qin and Jinwoo Shin and Tie-Yan Liu},
Behavioral cloning has proven to be effective for learning sequential decisionmaking policies from expert demonstrations. However, behavioral cloning often suffers from the causal confusion problem where a policy relies on the noticeable effect of expert actions due to the strong correlation but not the cause we desire. This paper presents Object-aware REgularizatiOn (OREO), a simple technique that regularizes an imitation policy in an object-aware manner. Our main idea is to encourage a policy… 


Causal Confusion in Imitation Learning
It is shown that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and the proposed solution to combat it through targeted interventions to determine the correct causal model is validated.
Disagreement-Regularized Imitation Learning
The algorithm operates by training an ensemble of policies on the expert demonstration data, and using the variance of their predictions as a cost which is minimized with RL together with a supervised behavioral cloning cost, and uses a fixed reward function which is easy to optimize.
Fighting Copycat Agents in Behavioral Cloning from Observation Histories
This work proposes an adversarial approach to learn a feature representation that removes excess information about the previous expert action nuisance correlate, while retaining the information necessary to predict the next action.
SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
This work proposes a simple alternative that still uses RL, but does not require learning a reward function, and can be implemented with a handful of minor modifications to any standard Q-learning or off-policy actor-critic algorithm, called soft Q imitation learning (SQIL).
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
This work presents two gradient procedures that can learn neural network policies for several problems, including a sequential prediction task and several high-dimensional robotics control problems and provides a comprehensive theoretical study of IL.
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
Generative Adversarial Imitation Learning
A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.
A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
This work proposes to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional changes, e.g. due to interventions, actions of agents and other sources of non-stationarities and shows that causal structures can be parameterized via continuous variables and learned end-to-end.
Drop-Bottleneck: Learning Discrete Compressed Representation for Noise-Robust Exploration
It is demonstrated that Drop-Bottleneck outperforms Variational Information Bottleneck (VIB) (Alemi et al., 2017) in multiple aspects including adversarial robustness and dimensionality reduction.
Observe and Look Further: Achieving Consistent Performance on Atari
This paper proposes an algorithm that addresses three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and exploring efficiently.