# Causal Reinforcement Learning using Observational and Interventional Data

@article{Gasse2021CausalRL, title={Causal Reinforcement Learning using Observational and Interventional Data}, author={Maxime Gasse and Damien Grasset and Guillaume Gaudron and Pierre-Yves Oudeyer}, journal={ArXiv}, year={2021}, volume={abs/2106.14421} }

Learning efficiently a causal model of the environment is a key challenge of model-based RL agents operating in POMDPs. We consider here a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment (interventional data), but has also access to a large collection of offline experiences, obtained by observing another agent interacting with the environment (observational data). A key ingredient, that makes this situation non… Expand

#### Figures and Tables from this paper

#### 3 Citations

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

- Computer Science, Mathematics
- ArXiv
- 2021

This work considers off-policy evaluation in a partially observed MDP (POMDP) by considering estimating the value of a given target policy in a POMDP given trajectories with only partial state observations generated by a different and unknown policy that may depend on the unobserved state. Expand

A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions

- Computer Science
- ArXiv
- 2021

This survey provides a taxonomy of current DRL-based recommender systems and a summary of existing methods, and discusses emerging topics and open issues, and provides the perspective on advancing the domain. Expand

Causal Multi-Agent Reinforcement Learning: Review and Open Problems

- Computer Science, Mathematics
- ArXiv
- 2021

It is argued that causality can offer improved safety, interpretability, and robustness, while also providing strong theoretical guarantees for emergent behaviour. Expand

#### References

SHOWING 1-10 OF 28 REFERENCES

Deconfounding Reinforcement Learning in Observational Settings

- Computer Science, Mathematics
- ArXiv
- 2018

This work considers the problem of learning good policies solely from historical data in which unobserved factors affect both observed actions and rewards, and for the first time that confounders are taken into consideration for addressing full RL problems with observational data. Expand

Reinforcement Learning and Causal Models

- Psychology
- 2017

This chapter reviews the diverse roles that causal knowledge plays in reinforcement learning. The first half of the chapter contrasts a “model-free” system that learns to repeat actions that lead to… Expand

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

- Computer Science, Mathematics
- AISTATS
- 2021

It is shown how, given only a latent variable model for states and actions, policy value can be identified from off-policy data, and optimal balancing can be combined with such learned ratios to obtain policy value while avoiding direct modeling of reward functions. Expand

Causal Confusion in Imitation Learning

- Computer Science, Mathematics
- NeurIPS
- 2019

It is shown that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and the proposed solution to combat it through targeted interventions to determine the correct causal model is validated. Expand

Transfer Learning in Multi-Armed Bandit: A Causal Approach

- Computer Science
- AAMAS
- 2017

This work tackles the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by Pearl's {do-calculus} nor standard off-policy learning techniques, and proposes a new identification strategy, B-kl-UCB. Expand

Designing Optimal Dynamic Treatment Regimes: A Causal Reinforcement Learning Approach

- Computer Science
- ICML
- 2020

It is shown that, if the causal diagram of the underlying environment is provided, one could achieve regret that is exponentially smaller than DX∪S, and two online algorithms are developed that satisfy such regret bounds by exploiting the causal structure underlying the DTR. Expand

Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes

- Computer Science
- NeurIPS
- 2019

This paper develops the first adaptive algorithm that achieves near-optimal regret in DTRs in online settings, without any access to historical data, and develops a novel RL algorithm that efficiently learns the optimal DTR while leveraging the abundant, yet imperfect confounded observations. Expand

Batch Reinforcement Learning

- Computer Science
- Reinforcement Learning
- 2012

This chapter introduces the basic principles and the theory behind batch reinforcement learning, the most important algorithms, exemplarily discuss ongoing research within this field, and briefly survey real-world applications ofbatch reinforcement learning. Expand

Off-Policy Evaluation in Partially Observable Environments

- Computer Science, Engineering
- AAAI
- 2020

A model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP is formulated, which shows how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general PomDPs. Expand

Bandits with Unobserved Confounders: A Causal Approach

- Computer Science, Mathematics
- NIPS
- 2015

It is shown that to achieve low regret in certain realistic classes of bandit problems (namely, in the face of unobserved confounders), both experimental and observational quantities are required by the rational agent. Expand