• Publications
  • Influence
Dropout: a simple way to prevent neural networks from overfitting
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, makingExpand
  • 19,564
  • 1720
  • PDF
Reducing the Dimensionality of Data with Neural Networks
High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent canExpand
  • 11,365
  • 944
  • PDF
Probabilistic Matrix Factorization
Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic MatrixExpand
  • 3,082
  • 628
  • PDF
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can trainExpand
  • 5,209
  • 527
  • PDF
Improving neural networks by preventing co-adaptation of feature detectors
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of theExpand
  • 4,889
  • 424
  • PDF
XLNet: Generalized Autoregressive Pretraining for Language Understanding
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive languageExpand
  • 1,442
  • 281
  • PDF
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct theExpand
  • 1,522
  • 235
  • PDF
Bayesian probabilistic matrix factorization using Markov chain Monte Carlo
Low-rank matrix approximation methods provide one of the simplest and most effective approaches to collaborative filtering. Such models are usually fitted to data by finding a MAP estimate of theExpand
  • 1,227
  • 229
  • PDF
Deep Boltzmann Machines
We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent expectations are estimated using a variational approximation that tends toExpand
  • 1,718
  • 223
  • PDF
Neighbourhood Components Analysis
In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of theExpand
  • 1,375
  • 191
  • PDF