Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay

  title={Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay},
  author={Tianhong Dai and Hengyang Liu and Kai Arulkumaran and Guangyu Ren and Anil Anthony Bharath},
Hindsight experience replay (HER) is a goal relabelling technique typically used with off-policy deep reinforcement learning algorithms to solve goal-oriented tasks; it is well suited to robotic manipulation tasks that deliver only sparse rewards. In HER, both trajectories and transitions are sampled uniformly for training. However, not all of the agent’s experiences contribute equally to training, and so naive uniform sampling may lead to inefficient learning. In this paper, we propose… Expand

Figures and Tables from this paper


Energy-Based Hindsight Experience Prioritization
An energy-based framework for prioritizing hindsight experience in robotic manipulation tasks, inspired by the work-energy principle in physics, that hypothesizes that replaying episodes that have high trajectory energy is more effective for reinforcement learning in robotics. Expand
Curriculum-guided Hindsight Experience Replay
This paper proposes to adaptively select the failed experiences for replay according to the proximity to the true goals and the curiosity of exploration over diverse pseudo goals, and adopts a human-like learning strategy that enforces more curiosity in earlier stages and changes to larger goal-proximity later. Expand
Hindsight Experience Replay
A novel technique is presented which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering and may be seen as a form of implicit curriculum. Expand
Competitive Experience Replay
This work proposes a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents, creating a competitive game designed to drive exploration. Expand
End-to-End Training of Deep Visuomotor Policies
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method. Expand
Deep Reinforcement Learning: A Brief Survey
This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL. Expand
Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates
It is demonstrated that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots. Expand
Hindsight policy gradients
This paper shows how hindsight can be introduced to likelihood-ratio policy gradient methods, generalizing this capacity to an entire class of highly successful algorithms. Expand
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
A suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware and following a Multi-Goal Reinforcement Learning (RL) framework are introduced. Expand
Prioritized Experience Replay
A framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, in Deep Q-Networks, a reinforcement learning algorithm that achieved human-level performance across many Atari games. Expand