Learn More
Humans and animals prefer immediate over delayed rewards (delay discounting). This preference for smaller-but-sooner over larger-but-later rewards shows substantial interindividual variability in healthy subjects. Moreover, a strong bias towards immediate reinforcement characterizes many psychiatric conditions such as addiction and attention-deficit(More)
Humans discount the value of future rewards over time. Here we show using functional magnetic resonance imaging (fMRI) and neural coupling analyses that episodic future thinking reduces the rate of delay discounting through a modulation of neural decision-making and episodic future thinking networks. In addition to a standard control condition, real(More)
Decision neuroscience suggests that there exists a core network for the subjective valuation of rewards from a range of different domains, encompassing the ventral striatum and regions of the orbitofrontal cortex (OFC), in particular the ventromedial aspect of the OFC. Here we first review ways to measure subjective value experimentally in a cognitive(More)
OBJECTIVE Adolescents are particularly vulnerable to addiction, and in the case of smoking, this often leads to long-lasting nicotine dependence. The authors investigated a possible neural mechanism underlying this vulnerability. METHOD Functional MRI was performed during reward anticipation in 43 adolescent smokers and 43 subjects matched on age, gender,(More)
During decision making, valuation of different types of rewards may involve partially distinct neural systems, but efficient choice behavior requires a common neural coding of stimulus value. We addressed this issue by measuring neural activity with functional magnetic resonance imaging while volunteers processed delayed and probabilistic decision options.(More)
— In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, 'vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole(More)
Acknowledgments First of all, I have to thank the wonderful people at the University of Southern Califor-nia. This thesis proposal would have never been possible without initiation, continuing encouragement, coordination, supervision and understanding help of Stefan Schaal. He is a great 'sensei' and has endured my emotional roller-coaster ride from my(More)
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterized in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequently, on the one hand, it is more flexible and general than(More)
In both human and humanoid movement science, the topic of movement primitives has become central in understanding the generation of complex motion with high degree-of-freedom bodies. A theory of control, planning, learning, and imitation with movement primitives seems to be crucial in order to reduce the search space during motor learning and achieve a(More)