Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

@article{Luck2019ImprovedET,
  title={Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient},
  author={Kevin Sebastian Luck and Mel Vecer{\'i}k and Simon Stepputtis and Heni Ben Amor and Jonathan Scholz},
  journal={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2019},
  pages={3704-3711}
}
Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to… 

Figures and Tables from this paper

Parareal with a learned coarse model for robotic manipulation

TLDR
This work investigates the use of a deep neural network physics model as a coarse model for Parareal in the context of robotic manipulation and shows that the learned coarse model leads to faster Pararareal convergence than a coarse physics-based model.

Residual Learning from Demonstration: Adapting Dynamic Movement Primitives for Contact-rich Insertion Tasks

TLDR
This work proposes a framework called residual learning from demonstration (rLfD) that combines dynamic movement primitives (DMP) that rely on behavioural cloning with a reinforcement learning (RL) based residual correction policy and shows that rLFD outperforms alternatives and improves the generalisation abilities of DMPs.

Residual Learning from Demonstration

TLDR
This work proposes residual learning from demonstration (rLfD), a framework that combines dynamic movement primitives that rely on behavioural cloning with a reinforcement learning (RL) based residual correction policy that outperforms alternatives and improves the generalisation abilities of DMPs.

InsertionNet - A Scalable Solution for Insertion

TLDR
By combining visual and force inputs, this method can scale to 16 different insertion tasks in less than 10 minutes and is robust to changes in the socket position, orientation or peg color, as well as to small differences in peg shape.

Simulation Design of a Live Working Manipulator for Patrol Inspection in Power Grid

The distribution line network is the electric power infrastructure directly facing the users, with the characteristics of large coverage and complex network, and its operation safety is directly

References

SHOWING 1-10 OF 27 REFERENCES

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

TLDR
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

TLDR
A general and model-free approach for Reinforcement Learning on real robotics with sparse rewards built upon the Deep Deterministic Policy Gradient algorithm to use demonstrations that out-performs DDPG, and does not require engineered rewards.

Continuous control with deep reinforcement learning

TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

TLDR
This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

TLDR
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

Policy search for motor primitives in robotics

TLDR
A novel EM-inspired algorithm for policy learning that is particularly well-suited for dynamical system motor primitives is introduced and applied in the context of motor learning and can learn a complex Ball-in-a-Cup task on a real Barrett WAM™ robot arm.

Universal Planning Networks

TLDR
This work finds that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images.

Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped

TLDR
This work explores whether policies learned in simulation can be transferred to hardware with the use of high-fidelity simulators and structured controllers and proposes a way of using neural networks to improve expert designed controllers, while maintaining ease of understanding.

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

TLDR
A simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks, and is found that simple hash functions can achieve surprisingly good results on many challenging tasks.