• Corpus ID: 18268744

Learning Multiple Layers of Features from Tiny Images

  title={Learning Multiple Layers of Features from Tiny Images},
  author={Alex Krizhevsky},
Groups at MIT and NYU have collected a dataset of millions of tiny colour images from the web. It is, in principle, an excellent dataset for unsupervised training of deep generative models, but previous researchers who have tried this have found it dicult to learn a good set of lters from the images. We show how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex. Using a novel parallelization algorithm to… 

Building high-level features using large scale unsupervised learning

Contrary to what appears to be a widely-held intuition, the experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

The results show that large numbers of hidden nodes and dense feature extraction are critical to achieving high performance—so critical, in fact, that when these parameters are pushed to their limits, they achieve state-of-the-art performance on both CIFAR-10 and NORB using only a single layer of features.

Extrapolating from a Single Image to a Thousand Classes using Distillation

This work shows that one image can be used to extrapolate to thousands of object classes and motivates a renewed research agenda on the fundamental interplay of augmentations and images.

Diving deeper into mentee networks

It is shown that features learnt thus, are more general than those learnt independently, which provides better generalization accuracies than networks trained with common regularization techniques such as l2, l1 and dropouts.

Surprising Effectiveness of Few-Image Unsupervised Feature Learning

This paper provides an analysis for three different self-supervised feature learning methods vs number of training images and shows that it can top the accuracy for the first two convolutional layers of common networks using just a single unlabelled training image and obtain competitive results for other layers.

Active Learning for Convolutional Neural Networks: A Core-Set Approach

This work defines the problem of active learning as core-set selection as choosing set of points such that a model learned over the selected subset is competitive for the remaining data points, and presents a theoretical result characterizing the performance of any selected subset using the geometry of the datapoints.

Learning from Noisy Labels with Deep Neural Networks

A novel way of modifying deep learning models so they can be effectively trained on data with high level of label noise is proposed, and it is shown that random images without labels can improve the classification performance.

Do Deep Generative Models Know What They Don't Know?

The density learned by flow-based models, VAEs, and PixelCNNs cannot distinguish images of common objects such as dogs, trucks, and horses from those of house numbers, and such behavior persists even when the flows are restricted to constant-volume transformations.


This work presents an alternative framework for comparing image classifiers, which it is noted is readily extensible and costeffective to add future classifiers into the competition, and adaptively sample a small test set from an arbitrarily large corpus of unlabeled images so as to maximize the discrepancies between the classifiers.

How deep is deep enough? - Optimizing deep neural network architecture

This work introduces a new measure, called the generalized discrimination value (GDV), which quantifies how well different object classes separate in each layer of a Deep Belief Network that was trained unsupervised on the MNIST data set.



80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

For certain classes that are particularly prevalent in the dataset, such as people, this work is able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.

Rate-coded Restricted Boltzmann Machines for Face Recognition

We describe a neurally-inspired, unsupervised learning algorithm that builds a non-linear generative model for pairs of face images from the same individual. Individuals are then recognized by

Robust Object Recognition with Cortex-Like Mechanisms

A hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation is described.

Greedy Layer-Wise Training of Deep Networks

These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

Unsupervised Learning of Distributions of Binary Vectors Using 2-Layer Networks

It is shown that arbitrary distributions of binary vectors can be approximated by the combination model and shown how the weight vectors in the model can be interpreted as high order correlation patterns among the input bits, and how the combination machine can be used as a mechanism for detecting these patterns.

On the quantitative analysis of deep belief networks

It is shown that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and a novel AIS scheme for comparing RBM's with different architectures is presented.

Training Products of Experts by Minimizing Contrastive Divergence

A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.

Reducing the Dimensionality of Data with Neural Networks

This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

Information processing in dynamical systems: foundations of harmony theory

The work reported in this chapter rests on the conviction that a methodology that has a crucial role to play in the development of cognitive science is mathematical analysis.