• Publications
  • Influence
Natural Evolution Strategies
NES is presented, a novel algorithm for performing real-valued dasiablack boxpsila function optimization: optimizing an unknown objective function where algorithm-selected function measurements constitute the only information accessible to the method. Expand
Natural Actor-Critic
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's natural gradientExpand
Reinforcement learning of motor skills with policy gradients
This paper examines learning of complex motor skills with human-like limbs, and combines the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning with the theory of stochastic policy gradient learning. Expand
A Survey on Policy Search for Robotics
This work classifies model-free methods based on their policy evaluation strategy, policy update strategy, and exploration strategy and presents a unified view on existing algorithms. Expand
Relative Entropy Policy Search
The Relative Entropy Policy Search (REPS) method is suggested, which differs significantly from previous policy gradient approaches and yields an exact update step and works well on typical reinforcement learning benchmark problems. Expand
Reinforcement learning in robotics: A survey
This article attempts to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots by highlighting both key challenges in robot reinforcement learning as well as notable successes. Expand
Episodic Future Thinking Reduces Reward Delay Discounting through an Enhancement of Prefrontal-Mediotemporal Interactions
It is shown using functional magnetic resonance imaging (fMRI) and neural coupling analyses that episodic future thinking reduces the rate of delay discounting through a modulation of neural decision-making and episodi future thinking networks. Expand
The neural mechanisms of inter-temporal decision-making: understanding variability
The neural mechanisms underlying delay discounting are discussed and how interindividual variability (trait effects) in the neural instantiation of subprocesses ofdelay discounting contributes to differences in behaviour are described. Expand
Parameter-exploring policy gradients
This method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than obtained by regular policy gradient methods, and shows that the improvement is largest when the parameter samples are drawn symmetrically. Expand
Probabilistic Movement Primitives
This work analytically derive a stochastic feedback controller which reproduces the given trajectory distribution for robot movement control and presents a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Expand