Learn More
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural(More)
The reparameterization trick enables the optimization of large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through(More)
Across vertebrate species, 17β-estradiol (E(2)) acts on the brain via both genomic and nongenomic mechanisms to influence neuronal physiology and behavior. Nongenomic E(2) signaling is typically initiated by membrane-associated estrogen receptors that modulate intracellular signaling cascades, including rapid phosphorylation of ERK. Phosphorylated ERK(More)
The game of Go is more challenging than other board games, due to the difficulty of constructing a position or move evaluation function. In this paper we investigate whether deep convolutional networks can be used to directly represent and learn this knowledge. We train a large 12-layer convolutional neural network by supervised learning from a database of(More)
The problem of drawing samples from a discrete distribution can be converted into a discrete optimization problem [1, 2, 3, 4]. In this work, we show how sampling from a continuous distribution can be converted into an optimization problem over continuous space. Central to the method is a stochastic process recently described in mathematical statistics that(More)
Many powerful Monte Carlo techniques for estimating partition functions, such as an-nealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and an intractable target distribution. The near-universal practice is to use geometric averages of the initial(More)
It is well known that songbirds produce high amplitude songs ("broadcast songs"). Songbirds also produce low amplitude songs ("soft songs") during courtship or territorial aggression in the breeding season. Soft songs are important social signals but have been studied far less than broadcast songs. To date, no studies have examined seasonal changes in soft(More)
Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al., 2016; Maddi-son et al., 2016) has taken a different approach, introducing a continuous relaxation of discrete variables(More)