• Publications
  • Influence
Probabilistic Matrix Factorization
TLDR
The Probabilistic Matrix Factorization (PMF) model is presented, which scales linearly with the number of observations and performs well on the large, sparse, and very imbalanced Netflix dataset and is extended to include an adaptive prior on the model parameters. Expand
Bayesian probabilistic matrix factorization using Markov chain Monte Carlo
TLDR
This paper presents a fully Bayesian treatment of the Probabilistic Matrix Factorization (PMF) model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters and shows that Bayesian PMF models can be efficiently trained using Markov chain Monte Carlo methods by applying them to the Netflix dataset. Expand
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
TLDR
Concrete random variables---continuous relaxations of discrete random variables is a new family of distributions with closed form densities and a simple reparameterization, and the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks is demonstrated. Expand
Disentangling by Factorising
TLDR
FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions, is proposed and it improves upon $\beta$-VAE by providing a better trade-off between disentanglement and reconstruction quality. Expand
Restricted Boltzmann machines for collaborative filtering
TLDR
This paper shows how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM's), can be used to model tabular data, such as user's ratings of movies, and demonstrates that RBM's can be successfully applied to the Netflix data set. Expand
Neural Variational Inference and Learning in Belief Networks
TLDR
This work proposes a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior and shows that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset. Expand
A fast and simple algorithm for training neural probabilistic language models
TLDR
This work proposes a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions and demonstrates the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary. Expand
Three new graphical models for statistical language modelling
TLDR
It is shown how real-valued distributed representations for words can be learned at the same time as learning a large set of stochastic binary hidden features that are used to predict the distributed representation of the next word from previous distributed representations. Expand
A Scalable Hierarchical Distributed Language Model
TLDR
A fast hierarchical language model along with a simple feature-based algorithm for automatic construction of word trees from the data are introduced and it is shown that the resulting models can outperform non-hierarchical neural models as well as the best n-gram models. Expand
Learning word embeddings efficiently with noise-contrastive estimation
TLDR
This work proposes a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation, and achieves results comparable to the best ones reported, using four times less data and more than an order of magnitude less computing time. Expand
...
1
2
3
4
...