An empirical evaluation of deep architectures on problems with many factors of variation

  title={An empirical evaluation of deep architectures on problems with many factors of variation},
  author={H. Larochelle and D. Erhan and Aaron C. Courville and James Bergstra and Yoshua Bengio},
  booktitle={International Conference on Machine Learning},
Recently, several learning algorithms relying on models with deep architectures have been proposed. Though they have demonstrated impressive performance, to date, they have only been evaluated on relatively simple problems such as digit recognition in a controlled environment, for which many machine learning algorithms already report reasonable results. Here, we present a series of experiments which indicate that these models show promise in solving harder learning problems that exhibit many… 

Figures and Tables from this paper

An Algorithm for Training Polynomial Networks

The main goal of this paper is the derivation of an efficient layer-by-layer algorithm for training deep neural networks, which is a universal learner in the sense that the training error is guaranteed to decrease at every iteration, and can eventually reach zero under mild conditions.

To go deep or wide in learning?

This paper proposes an approach called wide learning based on arc-cosine kernels, that learns a single layer of infinite width and shows that wide learning with single layer outperforms single layer as well as deep architectures of finite width for some benchmark datasets.

On the Expressive Power of Deep Architectures

Some of the theoretical motivations for deep architectures, as well as some of their practical successes, are reviewed, and directions of investigations to address some of the remaining challenges are proposed.

Deep Learners Benefit More from Out-of-Distribution Examples

Results show that a deep learner did beat previously published results and reached human-level performance, and the hypothesis is that intermediate levels of representation, because they can be shared across tasks and examples from different but related distributions, can yield even more benefits.

A Framework for Selecting Deep Learning Hyper-parameters

This work provides a framework for building deep learning architectures via a stepwise approach, together with an evaluation methodology to quickly identify poorly performing architectural configurations, using a dataset with high dimensionality.

Visualizing Higher-Layer Features of a Deep Network

This paper contrast and compare several techniques applied on Stacked Denoising Autoencoders and Deep Belief Networks, trained on several vision datasets, and shows that good qualitative interpretations of high level features represented by such models are possible at the unit level.

Understanding Representations Learned in Deep Architectures

It is shown that consistent filter-like interpretation is possible and simple to accomp lish at the unit level and it is hoped that such techniques will allow researchers in deep architectures to unde rstand more of how and why deep architectures work.

The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training

The experiments confirm and clarify the advantage of unsupervised pre- training, and empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples.

Learning Deep Architectures for AI

The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

Deep, super-narrow neural network is a universal classifier

It is shown that, given enough layers, a super-narrow neural network, with two neurons per layer, is capable of shattering any separable binary dataset.



Greedy Layer-Wise Training of Deep Networks

These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

Scaling learning algorithms towards AI

It is argued that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence.

Reducing the Dimensionality of Data with Neural Networks

This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

A Fast Learning Algorithm for Deep Belief Nets

A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

Training Invariant Support Vector Machines

This work reports the recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.

Large-scale kernel machines

This volume offers researchers and engineers practical solutions for learning from large scale datasets, with detailed descriptions of algorithms and experiments carried out on realistically large datasets, and offers information that can address the relative lack of theoretical grounding for many useful algorithms.

Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure

We show how to pretrain and fine-tune a multilayer neural network to learn a nonlinear transformation from the input space to a lowdimensional feature space in which K-nearest neighbour

To recognize shapes, first learn to generate images.

Backpropagation Applied to Handwritten Zip Code Recognition

This paper demonstrates how constraints from the task domain can be integrated into a backpropagation network through the architecture of the network, successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.

Training Products of Experts by Minimizing Contrastive Divergence

A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.