#### Filter Results:

- Full text PDF available (238)

#### Publication Year

1976

2017

- This year (4)
- Last 5 years (46)
- Last 10 years (125)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

What makes people smarter than machines? They certainly are not quicker or more precise. Yet people are far better at perceiving objects in natural scenes and noting their relations , at understanding language and retrieving contextually appropriate information from memory, at making plans and carrying out contextually appropriate actions, and at a wideā¦ (More)

- Geoffrey E. Hinton, Simon Osindero, Yee Whye Teh
- Neural Computation
- 2006

We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirectedā¦ (More)

- G E Hinton, R R Salakhutdinov
- Science
- 2006

High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe anā¦ (More)

- Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, Geoffrey E. Hinton
- Neural Computation
- 1991

We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new linkā¦ (More)

- Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov
- Journal of Machine Learning Research
- 2014

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing thisā¦ (More)

We present a new technique called " t-SNE " that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowdā¦ (More)

- Geoffrey E. Hinton
- Neural Computation
- 2002

It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination ruleā¦ (More)

- David H. Ackley, Geoffrey E. Hinton, Terrence J. Sejnowski
- Cognitive Science
- 1985

The computotionol power of massively parallel networks of simple processing elements resides i n the communication bandwidth provi ded by the hardware connections between elements. These connections con allow a significant fraction of the knowledge of the system to be appl i ed to an instance of a problem i n o very short time. One kind of computation forā¦ (More)

The EM algorithm performs maximum likelihood estimation for data in which some variables are unobserved. We present a function that resembles negative free energy and show that the M step maximizes this function with respect to the model parameters and the E step maximizes it with respect to the distribution over the unobserved variables. From thisā¦ (More)

When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This " overfitting " is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several otherā¦ (More)