Causal Imitation Learning With Unobserved Confounders

@article{Zhang2020CausalIL,
  title={Causal Imitation Learning With Unobserved Confounders},
  author={Junzhe Zhang and Daniel Kumor and Elias Bareinboim},
  journal={ArXiv},
  year={2020},
  volume={abs/2208.06267}
}
One of the common ways children learn is by mimicking adults. Imitation learning focuses on learning policies with suitable performance from demonstrations generated by an expert, with an unspecified performance measure, and unobserved reward signal. Popular methods for imitation learning start by either directly mimicking the behavior policy of an expert ( behavior cloning ) or by learning a reward function that prioritizes observed expert trajectories ( inverse reinforcement learning… 

Figures and Tables from this paper

Sequential Causal Imitation Learning with Unobserved Confounders

A graphical criterion that is necessary and sufficient for determining the feasibility of causal imitation is developed, providing conditions when an imitator can match a demonstrator’s performance despite differing capabilities.

Invariant Causal Imitation Learning for Generalizable Policies

Invariant Causal Imitation Learning (ICIL), a novel technique in which a feature representation that is invariant across domains is learned, is proposed on the basis of which an imitation policy is learned that matches expert behavior.

Sequence Model Imitation Learning with Unobserved Contexts

It is proved that on-policy imitation learning algorithms (with or without access to a queryable expert) are better equipped to handle these sorts of asymptotically realizable problems than off-policy methods.

What Would the Expert do ( · ) ?: Causal Imitation Learning

Modern variants of the classical instrumental variable regression (IVR) technique are applied, enabling us to recover the causally correct underlying policy without requiring access to an interactive expert.

Learning Human Driving Behaviors with Sequential Causal Imitation Learning

A sequential causal template is developed that generalizes the default MDP settings to one with Unobserved Confounders (MDPUC-HD) and a sufficient graphical criterion is developed to determine when ignoring causality leads to poor performances in MDPUc-HD.

Feedback in Imitation Learning: The Three Regimes of Covariate Shift

This work demonstrates a broad class of problems where this shift can be mitigated, both theoretically and practically, by taking advantage of a simulator but without any further querying of expert demonstration.

Feedback in Imitation Learning: Confusion on Causality and Covariate Shift

This work demonstrates a broad class of problems where this shift can be mitigated, both theoretically and practically, by taking advantage of a simulator but without any further querying of expert demonstration.

Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality

The approach, Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations, while using an outer loss to track the performance of the authors' model and to learn the confidence.

Causal Imitation Learning under Temporally Correlated Noise

Modern variants of the instrumental variable regression (IVR) technique of econometrics are applied, enabling us to recover the underlying policy without requiring access to an interactive expert to break up spurious correlations.

Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning

This paper considers a transfer Reinforcement Learning problem in continuous state and action spaces, under unobserved contextual information, and forms the learning problem as a causal bound-constrained Multi-Armed-Bandit (MAB) problem.

References

SHOWING 1-10 OF 48 REFERENCES

Sequential Causal Imitation Learning with Unobserved Confounders

A graphical criterion that is necessary and sufficient for determining the feasibility of causal imitation is developed, providing conditions when an imitator can match a demonstrator’s performance despite differing capabilities.

Causal Transfer for Imitation Learning and Decision Making under Sensor-shift

This paper rigorously analyzes to what extent the relevant underlying mechanisms can be identified and transferred from the available observations together with prior knowledge of sensor characteristics, and introduces several proxy methods which are easier to calculate, estimate from finite data and interpret than the exact solutions.

Causal Confusion in Imitation Learning

It is shown that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and the proposed solution to combat it through targeted interventions to determine the correct causal model is validated.

An Algorithmic Perspective on Imitation Learning

This work provides an introduction to imitation learning, dividing imitation learning into directly replicating desired behavior and learning the hidden objectives of the desired behavior from demonstrations (called inverse optimal control or inverse reinforcement learning [Russell, 1998]).

Apprenticeship learning via inverse reinforcement learning

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

Maximum Entropy Inverse Reinforcement Learning

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.

From Statistical Transportability to Estimating the Effect of Stochastic Interventions

This paper develops the first sound and complete procedure for statistical transportability, which formally closes the problem of completeness of stochastic identification by constructing a reduction of any instance of this problem to an instance of statistical transportable, closing the problem.

Q-learning

This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

A Game-Theoretic Approach to Apprenticeship Learning

A new algorithm is given that is computationally faster, is easier to implement, and can be applied even in the absence of an expert, and it is shown that this algorithm may produce a policy that is substantially better than the expert's.

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.