• Corpus ID: 14541470

Why Regularized Auto-Encoders learn Sparse Representation?

@article{Arpit2016WhyRA,
  title={Why Regularized Auto-Encoders learn Sparse Representation?},
  author={Devansh Arpit and Yingbo Zhou and Hung Quoc Ngo and Venu Govindaraju},
  journal={ArXiv},
  year={2016},
  volume={abs/1505.05561}
}
While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- \textit{Internal Covariate Shift}-- the current solution has certain drawbacks. For instance, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate due to shifting parameter values (especially during initial training epochs). Another… 

Figures and Tables from this paper

Walking the Tightrope: An Investigation of the Convolutional Autoencoder Bottleneck
TLDR
An in-depth investigation of the convolutional autoencoder (CAE) bottleneck shows empirically that, contrary to popular belief, CAEs do not learn to copy their input, even when the bottleneck has the same number of neurons as there are pixels in the input.
M L ] 2 J un 2 01 8 Autoencoders Learn Generative Linear Models
TLDR
The analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as unsupervised feature training mechanisms for a wide range of datasets, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks.
Pseudo-Rehearsal for Continual Learning with Normalizing Flows
TLDR
This paper proposes a novel method that combines the strengths of regularization and generative-based rehearsal approaches, and shows that the method performs favorably with respect to state-of-the-art approaches in the literature, with bounded computational power and memory overheads.
ON RANDOM DEEP AUTOENCODERS: EXACT ASYMP-
  • Computer Science
  • 2018
TLDR
It is demonstrated experimentally that it is possible to train a deep autoencoder, even with the tanh activation and a depth as large as 200 layers, without resorting to techniques such as layer-wise pre-training or batch normalization.
Sparseness Analysis in the Pretraining of Deep Neural Networks
TLDR
The experimental results demonstrate that when the sufficient conditions are satisfied, the pretraining models lead to sparseness, and the experimental results show that the performance of RePLU is better than ReLU, and is comparable with those with some pretraining techniques, such as RBMs and DAEs.
LiSSA: Localized Stochastic Sensitive Autoencoders
TLDR
A localized stochastic sensitive AE (LiSSA) is proposed to enhance the robustness of AE with respect to input perturbations to outperform several classical and recent AE training methods significantly on classification tasks.
Autoencoders Learn Generative Linear Models
TLDR
The analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as feature learning mechanisms for a variety of data models, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks.
On Optimality Conditions for Auto-Encoder Signal Recovery
TLDR
This paper shows that the true hidden representation can be approximately recovered if the weight matrices are highly incoherent with unit $ \ell^{2} $ row length and the bias vectors takes the value equal to the negative of the data mean.
Critical Points Of An Autoencoder Can Provably Recover Sparsely Used Overcomplete Dictionaries
TLDR
A rigorous analysis of the possibility that dictionary learning could be performed by gradient descent on Autoencoders, which are R → R neural network with a single ReLU activation layer of size h, and creates a proxy for the expected gradient of this loss function which is motivated with high probability arguments, under natural distributional assumptions on the sparse code x∗.
On the Dynamics of Gradient Descent for Autoencoders
TLDR
The analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as feature learning mechanisms for a variety of data models, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks.
...
...

References

SHOWING 1-10 OF 44 REFERENCES
Marginalized Denoising Auto-encoders for Nonlinear Representations
TLDR
The marginalized Denoising Auto-encoder (mDAE) is presented, which (approximately) marginalizes out the corruption during training and is able to match or outperform the DAE with much fewer training epochs.
Higher Order Contractive Auto-Encoder
TLDR
A novel regularizer when training an autoencoder for unsupervised feature extraction yields representations that are significantly better suited for initializing deep architectures than previously proposed approaches, beating state-of-the-art performance on a number of datasets.
Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
TLDR
It is found empirically that this penalty helps to carve a representation that better captures the local directions of variation dictated by the data, corresponding to a lower-dimensional non-linear manifold, while being more invariant to the vast majority of directions orthogonal to the manifold.
Zero-bias autoencoders and the benefits of co-adapting features
TLDR
This work shows that negative biases are a natural result of using a hidden layer whose responsibility is to both represent the input data and act as a selection mechanism that ensures sparsity of the representation and proposes a new activation function that decouples the two roles of the hidden layer.
Extracting and composing robust features with denoising autoencoders
TLDR
This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.
Understanding the difficulty of training deep feedforward neural networks
TLDR
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization
TLDR
This work investigates the reasons for the success of sparse coding over VQ by decoupling these phases, allowing us to separate out the contributions of training and encoding in a controlled way and shows not only that it can use fast VQ algorithms for training, but that they can just as well use randomly chosen exemplars from the training set.
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
Maxout Networks
TLDR
A simple new model called maxout is defined designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique.
Learning Deep Architectures for AI
TLDR
The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.
...
...