Corpus ID: 235658155

Causal Reinforcement Learning using Observational and Interventional Data

@article{Gasse2021CausalRL,
  title={Causal Reinforcement Learning using Observational and Interventional Data},
  author={Maxime Gasse and Damien Grasset and Guillaume Gaudron and Pierre-Yves Oudeyer},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.14421}
}
Learning efficiently a causal model of the environment is a key challenge of model-based RL agents operating in POMDPs. We consider here a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment (interventional data), but has also access to a large collection of offline experiences, obtained by observing another agent interacting with the environment (observational data). A key ingredient, that makes this situation non… Expand
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes
  • Andrew Bennett, Nathan Kallus
  • Computer Science, Mathematics
  • ArXiv
  • 2021
TLDR
This work considers off-policy evaluation in a partially observed MDP (POMDP) by considering estimating the value of a given target policy in a POMDP given trajectories with only partial state observations generated by a different and unknown policy that may depend on the unobserved state. Expand
A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions
TLDR
This survey provides a taxonomy of current DRL-based recommender systems and a summary of existing methods, and discusses emerging topics and open issues, and provides the perspective on advancing the domain. Expand
Causal Multi-Agent Reinforcement Learning: Review and Open Problems
TLDR
It is argued that causality can offer improved safety, interpretability, and robustness, while also providing strong theoretical guarantees for emergent behaviour. Expand

References

SHOWING 1-10 OF 28 REFERENCES
Deconfounding Reinforcement Learning in Observational Settings
TLDR
This work considers the problem of learning good policies solely from historical data in which unobserved factors affect both observed actions and rewards, and for the first time that confounders are taken into consideration for addressing full RL problems with observational data. Expand
Reinforcement Learning and Causal Models
This chapter reviews the diverse roles that causal knowledge plays in reinforcement learning. The first half of the chapter contrasts a “model-free” system that learns to repeat actions that lead toExpand
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders
TLDR
It is shown how, given only a latent variable model for states and actions, policy value can be identified from off-policy data, and optimal balancing can be combined with such learned ratios to obtain policy value while avoiding direct modeling of reward functions. Expand
Causal Confusion in Imitation Learning
TLDR
It is shown that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and the proposed solution to combat it through targeted interventions to determine the correct causal model is validated. Expand
Transfer Learning in Multi-Armed Bandit: A Causal Approach
TLDR
This work tackles the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by Pearl's {do-calculus} nor standard off-policy learning techniques, and proposes a new identification strategy, B-kl-UCB. Expand
Designing Optimal Dynamic Treatment Regimes: A Causal Reinforcement Learning Approach
TLDR
It is shown that, if the causal diagram of the underlying environment is provided, one could achieve regret that is exponentially smaller than DX∪S, and two online algorithms are developed that satisfy such regret bounds by exploiting the causal structure underlying the DTR. Expand
Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes
TLDR
This paper develops the first adaptive algorithm that achieves near-optimal regret in DTRs in online settings, without any access to historical data, and develops a novel RL algorithm that efficiently learns the optimal DTR while leveraging the abundant, yet imperfect confounded observations. Expand
Batch Reinforcement Learning
TLDR
This chapter introduces the basic principles and the theory behind batch reinforcement learning, the most important algorithms, exemplarily discuss ongoing research within this field, and briefly survey real-world applications ofbatch reinforcement learning. Expand
Off-Policy Evaluation in Partially Observable Environments
TLDR
A model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP is formulated, which shows how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general PomDPs. Expand
Bandits with Unobserved Confounders: A Causal Approach
TLDR
It is shown that to achieve low regret in certain realistic classes of bandit problems (namely, in the face of unobserved confounders), both experimental and observational quantities are required by the rational agent. Expand
...
1
2
3
...