Exploring Strategies for Training Deep Neural Networks

@article{Larochelle2009ExploringSF,
  title={Exploring Strategies for Training Deep Neural Networks},
  author={H. Larochelle and Yoshua Bengio and J{\'e}r{\^o}me Louradour and Pascal Lamblin},
  journal={J. Mach. Learn. Res.},
  year={2009},
  volume={10},
  pages={1-40}
}
Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise unsupervised learning procedure relying on the training algorithm of restricted Boltzmann machines (RBM… 

Understanding the difficulty of training deep feedforward neural networks

The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

Optimization of deep network models through fine tuning

It is concluded that the two steps strategy and the proposed fine tuning technique significantly yield promising results in optimization of deep network models.

Random Projection in Deep Neural Networks

This work investigates the ways in which deep learning methods can benefit from random projection (RP), a classic linear dimensionality reduction method. We focus on two areas where, as we have

Batch-normalized Mlpconv-wise supervised pre-training network in network

A new deep architecture with enhanced model discrimination ability that is referred to as mlpconv-wise supervised pre-training network in network (MPNIN) is proposed, which may contribute to overcoming the difficulties of training deep networks by better initializing the weights in all the layers.

Unsupervised Layer-Wise Model Selection in Deep Neural Networks

The proposed approach, considering an unsupervised criterion, empirically examines whether model selection is a modular optimization problem, and can be tackled in a layer-wise manner, and preliminary results suggest the answer is positive.

Learning Deep Architectures for AI

The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

Deep Target Algorithms for Deep Learning

  • Computer Science
  • 2012
This approach is very general, in that it works with both differentiable and non-d ifferentiable functions, and can be shown to be convergent under reasonable assumptio ns.

Why Does Unsupervised Pre-training Help Deep Learning?

The results suggest that unsupervised pre-training guides the learning towards basins of attraction of minima that support better generalization from the training data set; the evidence from these results supports a regularization explanation for the effect of pre- training.

Sparseness Analysis in the Pretraining of Deep Neural Networks

The experimental results demonstrate that when the sufficient conditions are satisfied, the pretraining models lead to sparseness, and the experimental results show that the performance of RePLU is better than ReLU, and is comparable with those with some pretraining techniques, such as RBMs and DAEs.

Restricted Boltzmann machines for pre-training deep Gaussian networks

A Restricted Boltzmann Machine (RBM) is proposed with an energy function which we show results in hidden node activation probabilities which match the activation rule of neurons in a Gaussian synapse
...

References

SHOWING 1-10 OF 64 REFERENCES

Greedy Layer-Wise Training of Deep Networks

These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

A Fast Learning Algorithm for Deep Belief Nets

A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

Learning Deep Architectures for AI

The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

Sparse Feature Learning for Deep Belief Networks

This work proposes a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation, and describes a novel and efficient algorithm to learn sparse representations.

Training MLPs layer by layer using an objective function for internal representations

On the quantitative analysis of deep belief networks

It is shown that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and a novel AIS scheme for comparing RBM's with different architectures is presented.

Connectionist Learning Procedures

Scaling learning algorithms towards AI

It is argued that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence.

Connectionist Learning of Belief Networks

Neural networks and principal component analysis: Learning from examples without local minima

...