Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

  title={Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient},
  author={Kevin Sebastian Luck and Mel Vecer{\'i}k and Simon Stepputtis and Heni Ben Amor and Jonathan Scholz},
  journal={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  • K. Luck, Mel Vecerík, +2 authors Jonathan Scholz
  • Published 2019
  • Computer Science, Mathematics
  • 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to… Expand
Residual Learning from Demonstration
This work proposes residual learning from demonstration (rLfD), a framework that combines dynamic movement primitives that rely on behavioural cloning with a reinforcement learning (RL) based residual correction policy that outperforms alternatives and improves the generalisation abilities of DMPs. Expand
InsertionNet - A Scalable Solution for Insertion
By combining visual and force inputs, this method can scale to 16 different insertion tasks in less than 10 minutes and is robust to changes in the socket position, orientation or peg color, as well as to small differences in peg shape. Expand
Online reinforcement learning for a continuous space system with experimental validation
  • Oguzhan Dogru, Nathan Wieczorek, Kirubakaran Velswamy, F. Ibrahim, Biao Huang
  • Computer Science
  • Journal of Process Control
  • 2021
Abstract Reinforcement learning (RL) for continuous state/action space systems has remained a challenge for nonlinear multivariate dynamical systems even at a simulation level. Implementing suchExpand
Residual Learning from Demonstration: Adapting Dynamic Movement Primitives for Contact-rich Insertion Tasks
This work proposes a framework called residual learning from demonstration (rLfD) that combines dynamic movement primitives (DMP) that rely on behavioural cloning with a reinforcement learning (RL) based residual correction policy and shows that rLFD outperforms alternatives and improves the generalisation abilities of DMPs. Expand
Parareal with a learned coarse model for robotic manipulation
This work investigates the use of a deep neural network physics model as a coarse model for Parareal in the context of robotic manipulation and shows that the learned coarse model leads to faster Pararareal convergence than a coarse physics-based model. Expand


Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples. Expand
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
A general and model-free approach for Reinforcement Learning on real robotics with sparse rewards built upon the Deep Deterministic Policy Gradient algorithm to use demonstrations that out-performs DDPG, and does not require engineered rewards. Expand
Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods. Expand
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objectiveExpand
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods. Expand
Universal Planning Networks
This work finds that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images. Expand
Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped
This work explores whether policies learned in simulation can be transferred to hardware with the use of high-fidelity simulators and structured controllers and proposes a way of using neural networks to improve expert designed controllers, while maintaining ease of understanding. Expand
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
A simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks, and is found that simple hash functions can achieve surprisingly good results on many challenging tasks. Expand
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
It is demonstrated that neural network dynamics models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits that accomplish various complex locomotion tasks. Expand