# Greedy Layer-Wise Training of Deep Networks

@inproceedings{Bengio2006GreedyLT, title={Greedy Layer-Wise Training of Deep Networks}, author={Yoshua Bengio and Pascal Lamblin and Dan Popovici and H. Larochelle}, booktitle={NIPS}, year={2006} }

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. [] Key Method Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically andâ€¦

## 3,949 Citations

### Exploring Strategies for Training Deep Neural Networks

- Computer ScienceJ. Mach. Learn. Res.
- 2009

These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input.

### Learning Deep Architectures for AI

- Computer ScienceFound. Trends Mach. Learn.
- 2007

The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

### How do We Train Deep Architectures ?

- Computer Science
- 2009

The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

### How deep is deep enough? - Optimizing deep neural network architecture

- Computer ScienceArXiv
- 2018

This work introduces a new measure, called the generalized discrimination value (GDV), which quantifies how well different object classes separate in each layer of a Deep Belief Network that was trained unsupervised on the MNIST data set.

### Understanding the difficulty of training deep feedforward neural networks

- Computer ScienceAISTATS
- 2010

The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

### Training Larger Networks for Deep Reinforcement Learning

- Computer ScienceArXiv
- 2021

This paper proposes a novel method that consists of 1) wider networks with DenseNet connection, 2) decoupling representation learning from training of RL, 3) a distributed training method to mitigate overfitting problems and shows that it can train very large networks that result in significant performance gains.

### A Greedy Algorithm for Building Compact Binary Activated Neural Networks

- Computer Science
- 2022

This work studies binary activated neural networks in the context of regression tasks, provides guarantees on the expressiveness of these particular networks and proposes a greedy algorithm for building such networks for predictors having small resources needs, which provides compact and sparse predictors.

### The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training

- Computer ScienceAISTATS
- 2009

The experiments confirm and clarify the advantage of unsupervised pre- training, and empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples.

### Deep Complex Networks

- Computer ScienceICLR
- 2018

This work relies on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and uses them in experiments with end-to-end training schemes and demonstrates that such complex- valued models are competitive with their real-valued counterparts.

### Conditional Computation in Deep and Recurrent Neural Networks

- Computer Science
- 2016

Two cases of conditional computation are explored â€“ in the feed forward case, a technique is developed that trades off accuracy for potential computational benefits, and in the recurrent case, techniques that yield practical speed benefits on a language modeling task are demonstrated.

## References

SHOWING 1-10 OF 19 REFERENCES

### A Fast Learning Algorithm for Deep Belief Nets

- Computer ScienceNeural Computation
- 2006

A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

### Scaling learning algorithms towards AI

- Computer Science
- 2007

It is argued that deep architectures have the potential to generalize in non-local ways, i.e., beyond immediate neighbors, and that this is crucial in order to make progress on the kind of complex tasks required for artificial intelligence.

### Training MLPs layer by layer using an objective function for internal representations

- Computer ScienceNeural Networks
- 1996

### Convex Neural Networks

- Computer ScienceNIPS
- 2005

Training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem, which involves an infinite number of variables but can be solved by incrementally inserting a hidden unit at a time.

### The Curse of Highly Variable Functions for Local Kernel Machines

- Computer ScienceNIPS
- 2005

We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms that rely solely on the smoothness prior - with similarity between examplesâ€¦

### A Monte Carlo EM Approach for Partially Observable Diffusion Processes: Theory and Applications to Neural Networks

- Computer ScienceNeural Computation
- 2002

It is proposed that the main reason that recurrent neural networks have not worked well in engineering applications is that they implicitly rely on a very simplistic likelihood model, and the diffusion network approach proposed here is much richer and may open new avenues for applications of recurrent Neural networks.

### The Cascade-Correlation Learning Architecture

- Computer ScienceNIPS
- 1989

The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no back-propagation of error signals through the connections of the network.

### Reducing the Dimensionality of Data with Neural Networks

- Computer ScienceScience
- 2006

This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

### Practical issues in temporal difference learning

- Computer ScienceMachine Learning
- 2004

It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which surpasses comparable networks trained on a massive human expert data set.

### Training Products of Experts by Minimizing Contrastive Divergence

- Computer ScienceNeural Computation
- 2002

A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.