Chris J. Maddison

Learn More
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural(More)
The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the(More)
The game of Go is more challenging than other board games, due to the difficulty of constructing a position or move evaluation function. In this paper we investigate whether deep convolutional networks can be used to directly represent and learn this knowledge. We train a large 12-layer convolutional neural network by supervised learning from a database of(More)
Across vertebrate species, 17β-estradiol (E(2)) acts on the brain via both genomic and nongenomic mechanisms to influence neuronal physiology and behavior. Nongenomic E(2) signaling is typically initiated by membrane-associated estrogen receptors that modulate intracellular signaling cascades, including rapid phosphorylation of ERK. Phosphorylated ERK(More)
It is well known that songbirds produce high amplitude songs ("broadcast songs"). Songbirds also produce low amplitude songs ("soft songs") during courtship or territorial aggression in the breeding season. Soft songs are important social signals but have been studied far less than broadcast songs. To date, no studies have examined seasonal changes in soft(More)
Many powerful Monte Carlo techniques for estimating partition functions, such as annealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and an intractable target distribution. The nearuniversal practice is to use geometric averages of the initial and(More)
The problem of drawing samples from a discrete distribution can be converted into a discrete optimization problem [1, 2, 3, 4]. In this work, we show how sampling from a continuous distribution can be converted into an optimization problem over continuous space. Central to the method is a stochastic process recently described in mathematical statistics that(More)
Simulating samples from arbitrary probability distributions is a major research program of statistical computing. Recent work has shown promise in an old idea, that sampling from a discrete distribution can be accomplished by perturbing and maximizing its mass function. Yet, it has not been clearly explained how this research project relates to more(More)
The policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its effects on(More)