Sequential Causal Imitation Learning with Unobserved Confounders

  title={Sequential Causal Imitation Learning with Unobserved Confounders},
  author={Daniel Kumor and Junzhe Zhang},
“Monkey see monkey do" is an age-old adage, referring to naïve imitation without a deep understanding of a system’s underlying mechanics. Indeed, if a demonstrator has access to information unavailable to the imitator (monkey), such as a different set of sensors, then no matter how perfectly the imitator models its perceived environment (S EE ), attempting to reproduce the demonstrator’s behavior (D O ) can lead to poor outcomes. Imitation learning in the presence of a mismatch between… 

Figures and Tables from this paper

Causal Imitation Learning With Unobserved Confounders

This paper provides a non-parametric, graphical criterion that is complete (both necessary and sufficient) for determining the feasibility of imitation from the combinations of demonstration data and qualitative assumptions about the underlying environment, represented in the form of a causal model.

Deconfounded Imitation Learning

This work introduces an algorithm for deconfounded imitation learning, which trains an inference model jointly with a latent-conditional policy and shows in theory and practice that this algorithm converges to the correct interventional policy, solves the confounding issue, and can under certain assumptions achieve an asymptotically optimal imitation performance.

Sequence Model Imitation Learning with Unobserved Contexts

It is proved that on-policy imitation learning algorithms (with or without access to a queryable expert) are better equipped to handle these sorts of asymptotically realizable problems than off-policy methods.

Learning Human Driving Behaviors with Sequential Causal Imitation Learning

A sequential causal template is developed that generalizes the default MDP settings to one with Unobserved Confounders (MDPUC-HD) and a sufficient graphical criterion is developed to determine when ignoring causality leads to poor performances in MDPUc-HD.

Causal Imitation Learning under Temporally Correlated Noise

Modern variants of the instrumental variable regression (IVR) technique of econometrics are applied, enabling us to recover the underlying policy without requiring access to an interactive expert to break up spurious correlations.

Can Humans Be out of the Loop?

It is shown that agents are bound to learn sub-optimal policies if they do not take into account human advice, perhaps surprisingly, even when human’s decisions are less accurate than their own.

A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning

It is empirically show that ˆ Z estimated by this method can significantly reduce dynamics prediction errors and improve the performance of model-based RL methods on zero-shot new environments with unseen dynamics.


A fundamental challenge in imitation and reinforcement learning is to learn policies, representations, or dynamics that do not build on spurious correlations and generalize beyond the specific environments that they were trained on by leveraging a diverse set of training environments.

Adaptively Exploiting d-Separators with Causal Bandits

This work formalize and study the notion of adaptivity, and provides a novel algorithm that simultaneously achieves (a) optimal regret when a d -separator is observed, improving on classical minimax algorithms, and (b) significantly smaller regret than recent causal bandit algorithms when the observed variables are not a d-separator.

What Would the Expert do ( · ) ?: Causal Imitation Learning

Modern variants of the classical instrumental variable regression (IVR) technique are applied, enabling us to recover the causally correct underlying policy without requiring access to an interactive expert.



Causal Imitation Learning With Unobserved Confounders

This paper provides a non-parametric, graphical criterion that is complete (both necessary and sufficient) for determining the feasibility of imitation from the combinations of demonstration data and qualitative assumptions about the underlying environment, represented in the form of a causal model.

Causal Transfer for Imitation Learning and Decision Making under Sensor-shift

This paper rigorously analyzes to what extent the relevant underlying mechanisms can be identified and transferred from the available observations together with prior knowledge of sensor characteristics, and introduces several proxy methods which are easier to calculate, estimate from finite data and interpret than the exact solutions.

Causal Confusion in Imitation Learning

It is shown that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and the proposed solution to combat it through targeted interventions to determine the correct causal model is validated.

Apprenticeship learning via inverse reinforcement learning

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

An Algorithmic Perspective on Imitation Learning

This work provides an introduction to imitation learning, dividing imitation learning into directly replicating desired behavior and learning the hidden objectives of the desired behavior from demonstrations (called inverse optimal control or inverse reinforcement learning [Russell, 1998]).

A Game-Theoretic Approach to Apprenticeship Learning

A new algorithm is given that is computationally faster, is easier to implement, and can be applied even in the absence of an expert, and it is shown that this algorithm may produce a policy that is substantially better than the expert's.

Maximum Entropy Inverse Reinforcement Learning

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.

Studies in causal reasoning and learning

It is shown how useful information on the probabilities of causation can be extracted from empirical data, and how data from both experimental and nonexperimental studies can be combined to yield information that neither study alone can provide.

Efficiently Finding Conditional Instruments for Causal Inference

It is shown that whenever a conditional IV exists, so does an ancestral IV, and ancestral IVs can be found in polynomial time, which implies a complete and constructive solution to causal effect identification using IVs in linear causal models.

The mirror-neuron system.

A neurophysiological mechanism appears to play a fundamental role in both action understanding and imitation, and those properties specific to the human mirror-neuron system that might explain the human capacity to learn by imitation are stressed.