• Corpus ID: 239050294

Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information

  title={Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information},
  author={Jin Li and Xianyuan Zhan and Zixu Xiao and Guyue Zhou},
End-to-end learning robotic manipulation with high data efficiency is one of the key challenges in robotics. The latest methods that utilize human demonstration data and unsupervised representation learning has proven to be a promising direction to improve RL learning efficiency. The use of demonstration data also allows “warming-up” the RL policies using offline data with imitation learning or the recently emerged offline reinforcement learning algorithms. However, existing works often treat… 

Figures and Tables from this paper


A Framework for Efficient Robotic Manipulation
It is shown that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels, such as reaching, picking, moving, pulling a large object, flipping a switch, and opening a drawer in just 15-50 minutes of real-world training time.
Overcoming Exploration in Reinforcement Learning with Demonstrations
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
A general and model-free approach for Reinforcement Learning on real robotics with sparse rewards built upon the Deep Deterministic Policy Gradient algorithm to use demonstrations that out-performs DDPG, and does not require engineered rewards.
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
QT-Opt is introduced, a scalable self-supervised vision-based reinforcement learning framework that can leverage over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters to perform closed-loop, real- world grasping that generalizes to 96% grasp success on unseen objects.
Asymmetric Actor Critic for Image-Based Robot Learning
This work exploits the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images) and combines this method with domain randomization and shows real robot experiments for several tasks like picking, pushing, and moving a block.
Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning
It is shown that naïve approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions, and developed a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), which admits the use of data generated by mixed behavior policies.
End-to-End Training of Deep Visuomotor Policies
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.
Learning Dense Rewards for Contact-Rich Manipulation Tasks
This work provides an approach capable of extracting dense reward functions algorithmically from robots’ high-dimensional observations, such as images and tactile feedback, and does not leverage adversarial training, and is thus less prone to the associated training instabilities.
Behavior Regularized Offline Reinforcement Learning
A general framework, behavior regularized actor critic (BRAC), is introduced to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks.
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations
This work shows that model-free DRL with natural policy gradients can effectively scale up to complex manipulation tasks with a high-dimensional 24-DoF hand, and solve them from scratch in simulated experiments.