• Corpus ID: 14541470

Why Regularized Auto-Encoders learn Sparse Representation?

@article{Arpit2016WhyRA,
title={Why Regularized Auto-Encoders learn Sparse Representation?},
author={Devansh Arpit and Yingbo Zhou and Hung Quoc Ngo and Venu Govindaraju},
journal={ArXiv},
year={2016},
volume={abs/1505.05561}
}
• Published 21 May 2015
• Computer Science
• ArXiv
While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- \textit{Internal Covariate Shift}-- the current solution has certain drawbacks. For instance, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate due to shifting parameter values (especially during initial training epochs). Another…
52 Citations

Figures and Tables from this paper

Walking the Tightrope: An Investigation of the Convolutional Autoencoder Bottleneck
• Computer Science
ArXiv
• 2019
An in-depth investigation of the convolutional autoencoder (CAE) bottleneck shows empirically that, contrary to popular belief, CAEs do not learn to copy their input, even when the bottleneck has the same number of neurons as there are pixels in the input.
M L ] 2 J un 2 01 8 Autoencoders Learn Generative Linear Models
• Computer Science
• 2018
The analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as unsupervised feature training mechanisms for a wide range of datasets, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks.
Pseudo-Rehearsal for Continual Learning with Normalizing Flows
• Computer Science
ArXiv
• 2020
This paper proposes a novel method that combines the strengths of regularization and generative-based rehearsal approaches, and shows that the method performs favorably with respect to state-of-the-art approaches in the literature, with bounded computational power and memory overheads.
ON RANDOM DEEP AUTOENCODERS: EXACT ASYMP-
• Computer Science
• 2018
It is demonstrated experimentally that it is possible to train a deep autoencoder, even with the tanh activation and a depth as large as 200 layers, without resorting to techniques such as layer-wise pre-training or batch normalization.
Sparseness Analysis in the Pretraining of Deep Neural Networks
• Computer Science
IEEE Transactions on Neural Networks and Learning Systems
• 2017
The experimental results demonstrate that when the sufficient conditions are satisfied, the pretraining models lead to sparseness, and the experimental results show that the performance of RePLU is better than ReLU, and is comparable with those with some pretraining techniques, such as RBMs and DAEs.
LiSSA: Localized Stochastic Sensitive Autoencoders
• Computer Science
IEEE Transactions on Cybernetics
• 2021
A localized stochastic sensitive AE (LiSSA) is proposed to enhance the robustness of AE with respect to input perturbations to outperform several classical and recent AE training methods significantly on classification tasks.
Autoencoders Learn Generative Linear Models
• Computer Science
ArXiv
• 2018
The analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as feature learning mechanisms for a variety of data models, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks.
On Optimality Conditions for Auto-Encoder Signal Recovery
• Computer Science
• 2018
This paper shows that the true hidden representation can be approximately recovered if the weight matrices are highly incoherent with unit $\ell^{2}$ row length and the bias vectors takes the value equal to the negative of the data mean.
Critical Points Of An Autoencoder Can Provably Recover Sparsely Used Overcomplete Dictionaries
• Computer Science
ArXiv
• 2017
A rigorous analysis of the possibility that dictionary learning could be performed by gradient descent on Autoencoders, which are R → R neural network with a single ReLU activation layer of size h, and creates a proxy for the expected gradient of this loss function which is motivated with high probability arguments, under natural distributional assumptions on the sparse code x∗.
On the Dynamics of Gradient Descent for Autoencoders
• Computer Science
AISTATS
• 2019
The analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as feature learning mechanisms for a variety of data models, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks.

References

SHOWING 1-10 OF 44 REFERENCES
Marginalized Denoising Auto-encoders for Nonlinear Representations
• Computer Science
ICML
• 2014
The marginalized Denoising Auto-encoder (mDAE) is presented, which (approximately) marginalizes out the corruption during training and is able to match or outperform the DAE with much fewer training epochs.
Higher Order Contractive Auto-Encoder
• Computer Science
ECML/PKDD
• 2011
A novel regularizer when training an autoencoder for unsupervised feature extraction yields representations that are significantly better suited for initializing deep architectures than previously proposed approaches, beating state-of-the-art performance on a number of datasets.
Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
• Computer Science
ICML
• 2011
It is found empirically that this penalty helps to carve a representation that better captures the local directions of variation dictated by the data, corresponding to a lower-dimensional non-linear manifold, while being more invariant to the vast majority of directions orthogonal to the manifold.
Zero-bias autoencoders and the benefits of co-adapting features
• Computer Science
ICLR
• 2015
This work shows that negative biases are a natural result of using a hidden layer whose responsibility is to both represent the input data and act as a selection mechanism that ensures sparsity of the representation and proposes a new activation function that decouples the two roles of the hidden layer.
Extracting and composing robust features with denoising autoencoders
• Computer Science
ICML '08
• 2008
This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.
Understanding the difficulty of training deep feedforward neural networks
• Computer Science
AISTATS
• 2010
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization
• Computer Science
ICML
• 2011
This work investigates the reasons for the success of sparse coding over VQ by decoupling these phases, allowing us to separate out the contributions of training and encoding in a controlled way and shows not only that it can use fast VQ algorithms for training, but that they can just as well use randomly chosen exemplars from the training set.
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
Maxout Networks
• Computer Science
ICML
• 2013
A simple new model called maxout is defined designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique.
Learning Deep Architectures for AI
The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.