Exploring Strategies for Training Deep Neural Networks
@article{Larochelle2009ExploringSF, title={Exploring Strategies for Training Deep Neural Networks}, author={H. Larochelle and Yoshua Bengio and J{\'e}r{\^o}me Louradour and Pascal Lamblin}, journal={J. Mach. Learn. Res.}, year={2009}, volume={10}, pages={1-40} }
Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise unsupervised learning procedure relying on the training algorithm of restricted Boltzmann machines (RBM…
Figures and Tables from this paper
1,094 Citations
Understanding the difficulty of training deep feedforward neural networks
- Computer ScienceAISTATS
- 2010
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Optimization of deep network models through fine tuning
- Computer ScienceInt. J. Intell. Comput. Cybern.
- 2018
It is concluded that the two steps strategy and the proposed fine tuning technique significantly yield promising results in optimization of deep network models.
Random Projection in Deep Neural Networks
- Computer ScienceArXiv
- 2018
This work investigates the ways in which deep learning methods can benefit from random projection (RP), a classic linear dimensionality reduction method. We focus on two areas where, as we have…
Batch-normalized Mlpconv-wise supervised pre-training network in network
- Computer ScienceApplied Intelligence
- 2017
A new deep architecture with enhanced model discrimination ability that is referred to as mlpconv-wise supervised pre-training network in network (MPNIN) is proposed, which may contribute to overcoming the difficulties of training deep networks by better initializing the weights in all the layers.
Unsupervised Layer-Wise Model Selection in Deep Neural Networks
- Computer ScienceECAI
- 2010
The proposed approach, considering an unsupervised criterion, empirically examines whether model selection is a modular optimization problem, and can be tackled in a layer-wise manner, and preliminary results suggest the answer is positive.
Learning Deep Architectures for AI
- Computer ScienceFound. Trends Mach. Learn.
- 2007
The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.
Deep Target Algorithms for Deep Learning
- Computer Science
- 2012
This approach is very general, in that it works with both differentiable and non-d ifferentiable functions, and can be shown to be convergent under reasonable assumptio ns.
Why Does Unsupervised Pre-training Help Deep Learning?
- Computer ScienceAISTATS
- 2010
The results suggest that unsupervised pre-training guides the learning towards basins of attraction of minima that support better generalization from the training data set; the evidence from these results supports a regularization explanation for the effect of pre- training.
Sparseness Analysis in the Pretraining of Deep Neural Networks
- Computer ScienceIEEE Transactions on Neural Networks and Learning Systems
- 2017
The experimental results demonstrate that when the sufficient conditions are satisfied, the pretraining models lead to sparseness, and the experimental results show that the performance of RePLU is better than ReLU, and is comparable with those with some pretraining techniques, such as RBMs and DAEs.
Restricted Boltzmann machines for pre-training deep Gaussian networks
- Computer ScienceThe 2013 International Joint Conference on Neural Networks (IJCNN)
- 2013
A Restricted Boltzmann Machine (RBM) is proposed with an energy function which we show results in hidden node activation probabilities which match the activation rule of neurons in a Gaussian synapse…
References
SHOWING 1-10 OF 64 REFERENCES
Greedy Layer-Wise Training of Deep Networks
- Computer ScienceNIPS
- 2006
These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
A Fast Learning Algorithm for Deep Belief Nets
- Computer ScienceNeural Computation
- 2006
A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Learning Deep Architectures for AI
- Computer ScienceFound. Trends Mach. Learn.
- 2007
The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.
Sparse Feature Learning for Deep Belief Networks
- Computer ScienceNIPS
- 2007
This work proposes a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation, and describes a novel and efficient algorithm to learn sparse representations.
Training MLPs layer by layer using an objective function for internal representations
- Computer ScienceNeural Networks
- 1996
On the quantitative analysis of deep belief networks
- Computer ScienceICML '08
- 2008
It is shown that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and a novel AIS scheme for comparing RBM's with different architectures is presented.
Scaling learning algorithms towards AI
- Computer Science
- 2007
It is argued that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence.
Neural networks and principal component analysis: Learning from examples without local minima
- Computer ScienceNeural Networks
- 1989