Corpus ID: 226289950

Continual Learning of Control Primitives: Skill Discovery via Reset-Games

@article{Xu2020ContinualLO,
  title={Continual Learning of Control Primitives: Skill Discovery via Reset-Games},
  author={Kelvin Xu and Siddharth Verma and Chelsea Finn and Sergey Levine},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.05286}
}
Reinforcement learning has the potential to automate the acquisition of behavior in complex settings, but in order for it to be successfully deployed, a number of practical challenges must be addressed. First, in real world settings, when an agent attempts a task and fails, the environment must somehow "reset" so that the agent can attempt the task again. While easy in simulation, this could require considerable human effort in the real world, especially if the number of trials is very large… Expand

Figures from this paper

Autonomous Reinforcement Learning via Subgoal Curricula
TLDR
Value-accelerated Persistent Reinforcement Learning is proposed, which generates a curriculum of initial states such that the agent can bootstrap on the success of easier tasks to efficiently learn harder tasks and reduces the reliance on human interventions into the learning. Expand
Persistent Reinforcement Learning via Subgoal Curricula
TLDR
Value-accelerated Persistent Reinforcement Learning is proposed, which generates a curriculum of initial states such that the agent can bootstrap on the success of easier tasks to efficiently learn harder tasks and reduces the reliance on human interventions into the learning. Expand
Explore and Control with Adversarial Surprise
TLDR
It is shown that Adversarial Surprise learns more complex behaviors, and explores more effectively than competitive baselines, outperforming intrinsic motivation methods based on active inference, novelty-seeking, and multi-agent unsupervised RL in MiniGrid, Atari and VizDoom environments. Expand
Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching
TLDR
A novel algorithm designed to maximize coverage while ensuring a constraint on the directedness of each skill is proposed, with a decoupled policy structure with a first part trained to be directed and a second diffusing part that ensures local coverage. Expand
Automatic Curricula via Expert Demonstrations
We propose Automatic Curricula via Expert Demonstrations (ACED), a reinforcement learning (RL) approach that combines the ideas of imitation learning and curriculum learning in order to solveExpand
Long-Term Exploration in Persistent MDPs
TLDR
This paper proposes an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process, in which agents during training can roll back to visited states. Expand

References

SHOWING 1-10 OF 55 REFERENCES
Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
TLDR
This work proposes an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt. Expand
Learning compound multi-step controllers under unknown dynamics
TLDR
It is demonstrated that a recently developed method that optimizes linear-Gaussian controllers under learned local linear models can tackle this sort of non-stationary problem, and that training controllers concurrently with a corresponding reset controller only minimally increases training time. Expand
Diversity is All You Need: Learning Skills without a Reward Function
TLDR
The proposed DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function, learns skills by maximizing an information theoretic objective using a maximum entropy policy. Expand
On the sample complexity of reinforcement learning.
TLDR
Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time. Expand
The Ingredients of Real-World Robotic Reinforcement Learning
TLDR
This work discusses the required elements of a robotic system that can continually and autonomously improve with data collected in the real world, and proposes a particular instantiation of such a system, and demonstrates the efficacy of this proposed system on dexterous robotic manipulation tasks in simulation and thereal world. Expand
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
TLDR
It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning. Expand
Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition
TLDR
Variational inverse control with events (VICE) is proposed, which generalizes inverse reinforcement learning methods to cases where full demonstrations are not needed, such as when only samples of desired goal states are available. Expand
Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
TLDR
This paper proposes a formal exploration objective for goal-reaching policies that maximizes state coverage and presents an algorithm called Skew-Fit, which enables a real-world robot to learn to open a door, entirely from scratch, from pixels, and without any manually-designed reward function. Expand
Temporal abstraction in reinforcement learning
TLDR
A general framework for prediction, control and learning at multiple temporal scales, and the way in which multi-time models can be used to produce plans of behavior very quickly, using classical dynamic programming or reinforcement learning techniques is developed. Expand
Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play
TLDR
This work describes a simple scheme that allows an agent to learn about its environment in an unsupervised manner, and focuses on two kinds of environments: (nearly) reversible environments and environments that can be reset. Expand
...
1
2
3
4
5
...