Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning
@inproceedings{Liu2021LearningWK, title={Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning}, author={Chenyu Liu and Yan Zhang and Yi Shen and Michael M. Zavlanos}, booktitle={L4DC}, year={2021} }
In this paper, we consider a transfer Reinforcement Learning (RL) problem in continuous state and action spaces, under unobserved contextual information. For example, the context can represent the mental view of the world that an expert agent has formed through past interactions with this world. We assume that this context is not accessible to a learner agent who can only observe the expert data. Then, our goal is to use the context-aware expert data to learn an optimal context-unaware policy…
References
SHOWING 1-10 OF 29 REFERENCES
Transfer Reinforcement Learning under Unobserved Contextual Information
- Computer Science2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)
- 2020
A method to obtain causal bounds on the transition and reward functions using the demonstrator’s data is developed, which is then used to obtain value function bounds and new Q learning and UCB-Q learning algorithms that converge to the true value function without bias are proposed.
Causal Transfer for Imitation Learning and Decision Making under Sensor-shift
- Computer ScienceAAAI
- 2020
This paper rigorously analyzes to what extent the relevant underlying mechanisms can be identified and transferred from the available observations together with prior knowledge of sensor characteristics, and introduces several proxy methods which are easier to calculate, estimate from finite data and interpret than the exact solutions.
Causal Imitation Learning With Unobserved Confounders
- Computer ScienceNeurIPS
- 2020
This paper provides a non-parametric, graphical criterion that is complete (both necessary and sufficient) for determining the feasibility of imitation from the combinations of demonstration data and qualitative assumptions about the underlying environment and develops an efficient procedure for learning the imitating policy from experts’ trajectories.
Successor Features for Transfer in Reinforcement Learning
- Computer ScienceNIPS
- 2017
This work proposes a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same, and derives two theorems that set the approach in firm theoretical ground and presents experiments that show that it successfully promotes transfer in practice.
Meta-Inverse Reinforcement Learning with Probabilistic Context Variables
- Computer Science, PsychologyNeurIPS
- 2019
This work proposes a deep latent variable model that is capable of learning rewards from unstructured, multi-task demonstration data, and critically, uses this experience to infer robust rewards for new, structurally-similar tasks from a single demonstration.
Behavioral Cloning from Observation
- Computer ScienceIJCAI
- 2018
This work proposes a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that allows the agent to acquire experience in a self-supervised fashion to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken.
SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
- Computer ScienceNeurIPS
- 2019
This work proposes SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations and is the first efficient method for Meta-irL that scales to the function approximator setting.
Contextual Markov Decision Processes
- Computer ScienceArXiv
- 2015
A family of algorithms with provable guarantees that learn the underlying models and the latent contexts, and optimize the CMDPs are suggested.
Efficient Exploration via State Marginal Matching
- Computer ScienceArXiv
- 2019
This work recast exploration as a problem of State Marginal Matching (SMM), where it is demonstrated that agents that directly optimize the SMM objective explore faster and adapt more quickly to new tasks as compared to prior exploration methods.
Transfer of samples in batch reinforcement learning
- Computer ScienceICML '08
- 2008
A novel algorithm is introduced that transfers samples from the source tasks that are mostly similar to the target task, and is empirically show that, following the proposed approach, the transfer of samples is effective in reducing the learning complexity.