Thomas Rückstieß

Learn More
We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than obtained by regular policy gradient methods. We show that for several complex control tasks, including robust(More)
PyBrain is a versatile machine learning library for Python. Its goal is to provide flexible, easyto-use yet still powerful algorithms for machine learning tasks, including a variety of predefined environments and benchmarks to test and compare algorithms. Implemented algorithms include Long Short-Term Memory (LSTM), policy gradient methods,(More)
This paper discusses parameter-based exploration methods for reinforcement learning. Parameter-based methods perturb parameters of a general function approximator directly, rather than adding noise to the resulting actions. Parameter-based exploration unifies reinforcement learning and black-box optimization, and has several advantages over action(More)
Policy Gradient methods are model-free reinforcement learning algorithms which in recent years have been successfully applied to many real-world problems. Typically, Likelihood Ratio (LR) methods are used to estimate the gradient, but they suffer from high variance due to random exploration at every time step of each training episode. Our solution to this(More)
We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINFORCE. For several complex control tasks, including robust(More)
This paper describes an algorithm to detect obstacles and landmarks, using the omnidirectional vision system of a RoboCup robot, to build an internal representation of the robot’s environment. The restriction to pixels corresponding to an equally spaced grid on the floor around the robot and a biologically inspired fault-tolerant colour segmentation of this(More)
In most real-world information processing problems, data is not a free resource; its acquisition is rather time-consuming and/or expensive. We investigate how these two factors can be included in supervised classification tasks by deriving classification as a sequential decision process and making it accessible to Reinforcement Learning. Our method performs(More)
In most real-world information processing problems, data is not a free resource. Its acquisition is often expensive and time-consuming. We investigate how such cost factors can be included in supervised classification tasks by deriving classification as a sequential decision process and making it accessible to Reinforcement Learning. Depending on previously(More)
Slow convergence is a major problem for policy gradient methods. It is a consequence of the fact that the stateaction histories used to estimate the gradient are obtained by repeatedly sampling from a probabilistic policy. Given that histories vary greatly even for a fixed policy, gradient estimates obtained by perturbing the policy are bound to be noisy.(More)
Policy gradient algorithms are among the few learning methods successfully applied to demanding real-world problems including those found in the field of robotics. While Likelihood Ratio (LR) methods are typically used to estimate the gradient, they suffer from high variance due to random exploration at each timestep during the rollout. We therefore(More)