Christoph Dann

Learn More
Value functions are an essential tool for solving sequential decision making problems such as Markov decision processes (MDPs). Computing the value function for a given policy (policy evaluation) is not only important for determining the quality of the policy but also a key step in prominent policy-iteration-type algorithms. In common settings where a model(More)
Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or(More)
VOC (2010) % UO* RP* SP* Potts* 1T* Pottics background 80.3 80.5 77.5 78.8 78.7 80.8 plane 27.6 27.4 12.8 22.4 10.1 41.0 bicycle 0.6 0.6 0.0 0.6 0.1 3.9 bird 11.9 11.9 2.3 10.7 2.8 22.1 boat 16.0 16.1 4.8 13.7 4.8 25.3 bottle 15.2 14.9 2.3 12.7 3.8 24.2 bus 33.0 33.1 29.0 32.2 22.4 41.3 car 43.3 44.2 37.3 43.2 31.4 52.8 cat 28.8 30.4 25.4 26.5 25.0 25.3(More)
We introduce a framework and early results for massively scalable Gaussian processes (MSGP), significantly extending the KISS-GP approach of Wilson and Nickisch (2015). The MSGP framework enables the use of Gaussian processes (GPs) on billions of datapoints, without requiring distributed inference, or severe assumptions. In particular, MSGP reduces the(More)
We propose a computational model for shape, illumination and albedo inference in a pulsed time-of-flight (TOF) camera. In contrast to TOF cameras based on phase modulation, our camera enables general exposure profiles. This results in added flexibility and requires novel computational approaches. To address this challenge we propose a generative(More)
RLPy is an object-oriented reinforcement learning software package with a focus on valuefunction-based methods using linear function approximation and discrete actions. The framework was designed for both educational and research purposes. It provides a rich library of fine-grained, easily exchangeable components for learning agents (e.g., policies or(More)
We propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions(More)
Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast(More)
Temporal difference learning is one of the oldest and most used techniques in reinforcement learning to estimate value functions. Many modifications and extension of the classical TD methods have been proposed. Recent examples are TDC and GTD(2) ([Sutton et al., 2009b]), the first approaches that are as fast as classical TD and have proven convergence for(More)