# Causal Imitation Learning With Unobserved Confounders

@article{Zhang2020CausalIL, title={Causal Imitation Learning With Unobserved Confounders}, author={Junzhe Zhang and Daniel Kumor and Elias Bareinboim}, journal={ArXiv}, year={2020}, volume={abs/2208.06267} }

One of the common ways children learn is by mimicking adults. Imitation learning focuses on learning policies with suitable performance from demonstrations generated by an expert, with an unspeciﬁed performance measure, and unobserved reward signal. Popular methods for imitation learning start by either directly mimicking the behavior policy of an expert ( behavior cloning ) or by learning a reward function that prioritizes observed expert trajectories ( inverse reinforcement learning…

## 25 Citations

### Sequential Causal Imitation Learning with Unobserved Confounders

- Computer ScienceNeurIPS
- 2021

A graphical criterion that is necessary and sufﬁcient for determining the feasibility of causal imitation is developed, providing conditions when an imitator can match a demonstrator’s performance despite differing capabilities.

### Invariant Causal Imitation Learning for Generalizable Policies

- Computer ScienceNeurIPS
- 2021

Invariant Causal Imitation Learning (ICIL), a novel technique in which a feature representation that is invariant across domains is learned, is proposed on the basis of which an imitation policy is learned that matches expert behavior.

### Sequence Model Imitation Learning with Unobserved Contexts

- Computer ScienceArXiv
- 2022

It is proved that on-policy imitation learning algorithms (with or without access to a queryable expert) are better equipped to handle these sorts of asymptotically realizable problems than off-policy methods.

### What Would the Expert do ( · ) ?: Causal Imitation Learning

- Computer Science
- 2021

Modern variants of the classical instrumental variable regression (IVR) technique are applied, enabling us to recover the causally correct underlying policy without requiring access to an interactive expert.

### Learning Human Driving Behaviors with Sequential Causal Imitation Learning

- Computer ScienceAAAI
- 2022

A sequential causal template is developed that generalizes the default MDP settings to one with Unobserved Confounders (MDPUC-HD) and a sufficient graphical criterion is developed to determine when ignoring causality leads to poor performances in MDPUc-HD.

### Feedback in Imitation Learning: The Three Regimes of Covariate Shift

- Computer ScienceArXiv
- 2021

This work demonstrates a broad class of problems where this shift can be mitigated, both theoretically and practically, by taking advantage of a simulator but without any further querying of expert demonstration.

### Feedback in Imitation Learning: Confusion on Causality and Covariate Shift

- Computer Science
- 2021

This work demonstrates a broad class of problems where this shift can be mitigated, both theoretically and practically, by taking advantage of a simulator but without any further querying of expert demonstration.

### Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality

- Computer ScienceNeurIPS
- 2021

The approach, Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations, while using an outer loss to track the performance of the authors' model and to learn the confidence.

### Causal Imitation Learning under Temporally Correlated Noise

- Computer ScienceICML
- 2022

Modern variants of the instrumental variable regression (IVR) technique of econometrics are applied, enabling us to recover the underlying policy without requiring access to an interactive expert to break up spurious correlations.

### Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning

- Computer ScienceL4DC
- 2021

This paper considers a transfer Reinforcement Learning problem in continuous state and action spaces, under unobserved contextual information, and forms the learning problem as a causal bound-constrained Multi-Armed-Bandit (MAB) problem.

## References

SHOWING 1-10 OF 48 REFERENCES

### Sequential Causal Imitation Learning with Unobserved Confounders

- Computer ScienceNeurIPS
- 2021

A graphical criterion that is necessary and sufﬁcient for determining the feasibility of causal imitation is developed, providing conditions when an imitator can match a demonstrator’s performance despite differing capabilities.

### Causal Transfer for Imitation Learning and Decision Making under Sensor-shift

- Computer ScienceAAAI
- 2020

This paper rigorously analyzes to what extent the relevant underlying mechanisms can be identified and transferred from the available observations together with prior knowledge of sensor characteristics, and introduces several proxy methods which are easier to calculate, estimate from finite data and interpret than the exact solutions.

### Causal Confusion in Imitation Learning

- Computer ScienceNeurIPS
- 2019

It is shown that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and the proposed solution to combat it through targeted interventions to determine the correct causal model is validated.

### An Algorithmic Perspective on Imitation Learning

- Computer ScienceFound. Trends Robotics
- 2018

This work provides an introduction to imitation learning, dividing imitation learning into directly replicating desired behavior and learning the hidden objectives of the desired behavior from demonstrations (called inverse optimal control or inverse reinforcement learning [Russell, 1998]).

### Apprenticeship learning via inverse reinforcement learning

- Computer ScienceICML
- 2004

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

### Maximum Entropy Inverse Reinforcement Learning

- Computer ScienceAAAI
- 2008

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.

### From Statistical Transportability to Estimating the Effect of Stochastic Interventions

- Computer ScienceIJCAI
- 2019

This paper develops the first sound and complete procedure for statistical transportability, which formally closes the problem of completeness of stochastic identification by constructing a reduction of any instance of this problem to an instance of statistical transportable, closing the problem.

### Q-learning

- Computer ScienceMachine Learning
- 2004

This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

### A Game-Theoretic Approach to Apprenticeship Learning

- Computer ScienceNIPS
- 2007

A new algorithm is given that is computationally faster, is easier to implement, and can be applied even in the absence of an expert, and it is shown that this algorithm may produce a policy that is substantially better than the expert's.

### Reinforcement Learning: An Introduction

- Computer ScienceIEEE Transactions on Neural Networks
- 2005

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.