• Publications
  • Influence
A Fast Learning Algorithm for Deep Belief Nets
A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. Expand
Hierarchical Dirichlet Processes
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume thatExpand
Bayesian Learning via Stochastic Gradient Langevin Dynamics
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochasticExpand
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
Concrete random variables---continuous relaxations of discrete random variables is a new family of distributions with closed form densities and a simple reparameterization, and the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks is demonstrated. Expand
A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes
  • Y. Teh
  • Computer Science
  • ACL
  • 17 July 2006
It is shown that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Expand
A fast and simple algorithm for training neural probabilistic language models
This work proposes a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions and demonstrates the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary. Expand
On Smoothing and Inference for Topic Models
Using the insights gained from this comparative study, it is shown how accurate topic models can be learned in several seconds on text corpora with thousands of documents. Expand
Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks
This work presents an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set, and reduces the computation time of self-attention from quadratic to linear in the number of Elements in the set. Expand
Conditional Neural Processes
Conditional Neural Processes are inspired by the flexibility of stochastic processes such as GPs, but are structured as neural networks and trained via gradient descent, yet scale to complex functions and large datasets. Expand
Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes
The hierarchical Dirichlet process (HDP), a nonparametric Bayesian model for clustering problems involving multiple groups of data, is proposed and experimental results are reported showing the effective and superior performance of the HDP over previous models. Expand