• Corpus ID: 3521071

Diversity is All You Need: Learning Skills without a Reward Function

@article{Eysenbach2019DiversityIA,
  title={Diversity is All You Need: Learning Skills without a Reward Function},
  author={Benjamin Eysenbach and Abhishek Gupta and Julian Ibarz and Sergey Levine},
  journal={ArXiv},
  year={2019},
  volume={abs/1802.06070}
}
Intelligent creatures can explore their environments and learn useful skills without supervision. [...] Key Method Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this simple objective results in the unsupervised emergence of diverse skills, such as walking and jumping. In a number of reinforcement learning benchmark environments, our method is able to learn a skill that solves the benchmark…Expand
Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning
TLDR
This paper shows that a recently proposed unsupervised skill discovery algorithm can be extended into an efficient off-policy method, making it suitable for performing unsuper supervised reinforcement learning in the real world, and provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
Discovering Generalizable Skills via Automated Generation of Diverse Tasks
TLDR
The proposed Skill Learning In Diversified Environments (SLIDE), a method to discover generalizable skills via automated generation of a diverse set of tasks, suggests that the learned skills can effectively improve the robot’s performance in various unseen target tasks compared to existing reinforcement learning and skill learning methods.
Learning Novel Policies For Tasks
TLDR
This work presents a two-objective update technique for policy gradient algorithms in which each update of the policy is a compromise between improving the task reward and improving the novelty reward.
Self-Supervised Exploration via Disagreement
TLDR
This paper proposes a formulation for exploration inspired by the work in active learning literature and trains an ensemble of dynamics models and incentivizes the agent to explore such that the disagreement of those ensembles is maximized, which results in a sample-efficient exploration.
Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching
TLDR
A novel algorithm designed to maximize coverage while ensuring a constraint on the directedness of each skill is proposed, with a decoupled policy structure with a first part trained to be directed and a second diffusing part that ensures local coverage.
INTRINSIC MUTUAL INFORMATION REWARDS
  • 2019
Learning to discover useful skills without a manually-designed reward function would have many applications, yet is still a challenge for reinforcement learning. In this paper, we propose Mutual
The Information Geometry of Unsupervised Reinforcement Learning
TLDR
This work shows that unsupervised skill discovery algorithms based on mutual information maximization do not learn skills that are optimal for every possible reward function, but it is shown that the distribution over skills provides an optimal initialization minimizing regret against adversarially-chosen reward functions, assuming a certain type of adaptation procedure.
Learning Embodied Agents with Scalably-Supervised Reinforcement Learning
  • Lisa Lee
  • 2021
Reinforcement learning (RL) agents learn to perform a task through trial-and-error interactions with an initially unknown environment. Despite the recent progress in deep RL, it remains a challenge
Novelty-Guided Reinforcement Learning via Encoded Behaviors
TLDR
A function approximation paradigm to instead learn sparse representations of agent behaviors using auto-encoders, which are later used to assign novelty scores to policies, which suggest that this way of novelty-guided exploration is a viable alternative to classic novelty search methods.
Acquiring Diverse Robot Skills via Maximum Entropy Deep Reinforcement Learning
TLDR
This thesis studies how maximum entropy framework can provide efficient deep reinforcement learning algorithms that solve tasks consistently and sample efficiently, and devise new algorithms based on this framework, starting from soft Q-learning that learns expressive energy-based policies, to soft actor-critic that provides simplicity and convenience of actor-Critic methods.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 52 REFERENCES
Stochastic Neural Networks for Hierarchical Reinforcement Learning
TLDR
This work proposes a general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks, and uses Stochastic Neural Networks combined with an information-theoretic regularizer to efficiently pre-train a large span of skills.
Deep Reinforcement Learning from Human Preferences
TLDR
This work explores goals defined in terms of (non-expert) human preferences between pairs of trajectory segments in order to effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion.
Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates
TLDR
It is demonstrated that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.
Meta Learning Shared Hierarchies
TLDR
A metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps, and provides a concrete metric for measuring the strength of such hierarchies.
Learning to Navigate in Complex Environments
TLDR
This work considers jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks and shows that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs.
Active learning of inverse models with intrinsically motivated goal exploration in robots
We introduce the Self-Adaptive Goal Generation Robust Intelligent Adaptive Curiosity (SAGG-RIAC) architecture as an intrinsically motivated goal exploration mechanism which allows active learning of
Benchmarking Deep Reinforcement Learning for Continuous Control
TLDR
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.
EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
TLDR
This work proposes a novelty detection algorithm for exploration that is based entirely on discriminatively trained exemplar models, where classifiers are trained to discriminate each visited state against all others, and shows that this kind of discriminative modeling corresponds to implicit density estimation.
VIME: Variational Information Maximizing Exploration
TLDR
VIME is introduced, an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics which efficiently handles continuous state and action spaces and can be applied with several different underlying RL algorithms.
...
1
2
3
4
5
...