• Corpus ID: 235363979

Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning

@inproceedings{Liu2021LearningWK,
  title={Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning},
  author={Chenyu Liu and Yan Zhang and Yi Shen and Michael M. Zavlanos},
  booktitle={L4DC},
  year={2021}
}
In this paper, we consider a transfer Reinforcement Learning (RL) problem in continuous state and action spaces, under unobserved contextual information. For example, the context can represent the mental view of the world that an expert agent has formed through past interactions with this world. We assume that this context is not accessible to a learner agent who can only observe the expert data. Then, our goal is to use the context-aware expert data to learn an optimal context-unaware policy… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 29 REFERENCES
Transfer Reinforcement Learning under Unobserved Contextual Information
  • Yan ZhangM. Zavlanos
  • Computer Science
    2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)
  • 2020
TLDR
A method to obtain causal bounds on the transition and reward functions using the demonstrator’s data is developed, which is then used to obtain value function bounds and new Q learning and UCB-Q learning algorithms that converge to the true value function without bias are proposed.
Causal Transfer for Imitation Learning and Decision Making under Sensor-shift
TLDR
This paper rigorously analyzes to what extent the relevant underlying mechanisms can be identified and transferred from the available observations together with prior knowledge of sensor characteristics, and introduces several proxy methods which are easier to calculate, estimate from finite data and interpret than the exact solutions.
Causal Imitation Learning With Unobserved Confounders
TLDR
This paper provides a non-parametric, graphical criterion that is complete (both necessary and sufficient) for determining the feasibility of imitation from the combinations of demonstration data and qualitative assumptions about the underlying environment and develops an efficient procedure for learning the imitating policy from experts’ trajectories.
Successor Features for Transfer in Reinforcement Learning
TLDR
This work proposes a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same, and derives two theorems that set the approach in firm theoretical ground and presents experiments that show that it successfully promotes transfer in practice.
Meta-Inverse Reinforcement Learning with Probabilistic Context Variables
TLDR
This work proposes a deep latent variable model that is capable of learning rewards from unstructured, multi-task demonstration data, and critically, uses this experience to infer robust rewards for new, structurally-similar tasks from a single demonstration.
Behavioral Cloning from Observation
TLDR
This work proposes a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that allows the agent to acquire experience in a self-supervised fashion to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken.
SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
TLDR
This work proposes SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations and is the first efficient method for Meta-irL that scales to the function approximator setting.
Contextual Markov Decision Processes
TLDR
A family of algorithms with provable guarantees that learn the underlying models and the latent contexts, and optimize the CMDPs are suggested.
Efficient Exploration via State Marginal Matching
TLDR
This work recast exploration as a problem of State Marginal Matching (SMM), where it is demonstrated that agents that directly optimize the SMM objective explore faster and adapt more quickly to new tasks as compared to prior exploration methods.
Transfer of samples in batch reinforcement learning
TLDR
A novel algorithm is introduced that transfers samples from the source tasks that are mostly similar to the target task, and is empirically show that, following the proposed approach, the transfer of samples is effective in reducing the learning complexity.
...
...