#### Filter Results:

- Full text PDF available (127)

#### Publication Year

1985

2017

- This year (16)
- Last 5 years (90)
- Last 10 years (121)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov
- Journal of Machine Learning Research
- 2014

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this… (More)

- G E Hinton, R R Salakhutdinov
- Science
- 2006

High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an… (More)

- Ruslan Salakhutdinov, Andriy Mnih
- NIPS
- 2007

Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced… (More)

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This " overfitting " is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other… (More)

- Ruslan Salakhutdinov, Geoffrey E. Hinton
- AISTATS
- 2009

We present a new learning algorithm for Boltz-mann machines that contain many layers of hidden variables. Data-dependent expectations are estimated using a variational approximation that tends to focus on a single mode, and data-independent expectations are approximated using persistent Markov chains. The use of two quite different techniques for estimating… (More)

In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. It can also learn a low-dimensional linear embedding of labeled data that can be used for data visualization and fast… (More)

- Kelvin Xu, Jimmy Ba, +5 authors Yoshua Bengio
- ICML
- 2015

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through… (More)

- Ruslan Salakhutdinov, Andriy Mnih
- ICML
- 2008

Low-rank matrix approximation methods provide one of the simplest and most effective approaches to collaborative filtering. Such models are usually fitted to data by finding a MAP estimate of the model parameters, a procedure that can be performed efficiently even on very large datasets. However, unless the regularization parameters are tuned carefully,… (More)

Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM's), can be used to model tabular data, such as user's ratings of movies. We present efficient learning and inference procedures for this class of… (More)

- Ruslan Salakhutdinov, Geoffrey E. Hinton
- Int. J. Approx. Reasoning
- 2009

We show how to learn a deep graphical model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the… (More)