Sequential Causal Imitation Learning with Unobserved Confounders

  title={Sequential Causal Imitation Learning with Unobserved Confounders},
  author={Daniel Kumor and Junzhe Zhang},
  booktitle={Neural Information Processing Systems},
“Monkey see monkey do" is an age-old adage, referring to naïve imitation without a deep understanding of a system’s underlying mechanics. Indeed, if a demonstrator has access to information unavailable to the imitator (monkey), such as a different set of sensors, then no matter how perfectly the imitator models its perceived environment (S EE ), attempting to reproduce the demonstrator’s behavior (D O ) can lead to poor outcomes. Imitation learning in the presence of a mismatch between… 

Figures and Tables from this paper

Causal Imitation Learning With Unobserved Confounders

This paper provides a non-parametric, graphical criterion that is complete (both necessary and sufficient) for determining the feasibility of imitation from the combinations of demonstration data and qualitative assumptions about the underlying environment, represented in the form of a causal model.

Deconfounded Imitation Learning

This work introduces an algorithm for deconfounded imitation learning, which trains an inference model jointly with a latent-conditional policy and shows in theory and practice that this algorithm converges to the correct interventional policy, solves the confounding issue, and can under certain assumptions achieve an asymptotically optimal imitation performance.

What Would the Expert do ( · ) ?: Causal Imitation Learning

Modern variants of the classical instrumental variable regression (IVR) technique are applied, enabling us to recover the causally correct underlying policy without requiring access to an interactive expert.

Sequence Model Imitation Learning with Unobserved Contexts

It is proved that on-policy imitation learning algorithms (with or without access to a queryable expert) are better equipped to handle these sorts of asymptotically realizable problems than off-policy methods.

Learning Human Driving Behaviors with Sequential Causal Imitation Learning

A sequential causal template is developed that generalizes the default MDP settings to one with Unobserved Confounders (MDPUC-HD) and a sufficient graphical criterion is developed to determine when ignoring causality leads to poor performances in MDPUc-HD.

A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning

It is empirically show that ˆ Z estimated by this method can significantly reduce dynamics prediction errors and improve the performance of model-based RL methods on zero-shot new environments with unseen dynamics.


A fundamental challenge in imitation and reinforcement learning is to learn policies, representations, or dynamics that do not build on spurious correlations and generalize beyond the specific environments that they were trained on by leveraging a diverse set of training environments.

Instrumental Variables in Causal Inference and Machine Learning: A Survey

This paper provides the formal formation of IVs and discusses the identification problem of IV regression methods under different assumptions, and introduces a variety of applications of IV methods in real-world scenarios and provides a summary of the available datasets and algorithms.

Adaptively Exploiting d-Separators with Causal Bandits

This work formalize and study the notion of adaptivity, and provides a novel algorithm that simultaneously achieves (a) optimal regret when a d -separator is observed, improving on classical minimax algorithms, and (b) significantly smaller regret than recent causal bandit algorithms when the observed variables are not a d-separator.

Causal Imitation Learning under Temporally Correlated Noise

Modern variants of the instrumental variable regression (IVR) technique of econometrics are applied, enabling us to recover the underlying policy without requiring access to an interactive expert to break up spurious correlations.



Causal Imitation Learning With Unobserved Confounders

This paper provides a non-parametric, graphical criterion that is complete (both necessary and sufficient) for determining the feasibility of imitation from the combinations of demonstration data and qualitative assumptions about the underlying environment, represented in the form of a causal model.

Causal Transfer for Imitation Learning and Decision Making under Sensor-shift

This paper rigorously analyzes to what extent the relevant underlying mechanisms can be identified and transferred from the available observations together with prior knowledge of sensor characteristics, and introduces several proxy methods which are easier to calculate, estimate from finite data and interpret than the exact solutions.

Causal Confusion in Imitation Learning

It is shown that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and the proposed solution to combat it through targeted interventions to determine the correct causal model is validated.

Apprenticeship learning via inverse reinforcement learning

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

A Game-Theoretic Approach to Apprenticeship Learning

A new algorithm is given that is computationally faster, is easier to implement, and can be applied even in the absence of an expert, and it is shown that this algorithm may produce a policy that is substantially better than the expert's.

Maximum Entropy Inverse Reinforcement Learning

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.

Efficiently Finding Conditional Instruments for Causal Inference

It is shown that whenever a conditional IV exists, so does an ancestral IV, and ancestral IVs can be found in polynomial time, which implies a complete and constructive solution to causal effect identification using IVs in linear causal models.

The mirror-neuron system.

A neurophysiological mechanism appears to play a fundamental role in both action understanding and imitation, and those properties specific to the human mirror-neuron system that might explain the human capacity to learn by imitation are stressed.

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Fairness in Decision-Making - The Causal Explanation Formula

The causal explanation formula is derived, which allows the AI designer to quantitatively evaluate fairness and explain the total observed disparity of decisions through different discriminatory mechanisms, and provides a quantitative approach to policy implementation and the design of fair AI systems.