• Corpus ID: 53115121

On the relationship between Dropout and Equiangular Tight Frames

  title={On the relationship between Dropout and Equiangular Tight Frames},
  author={Dor Bank and Raja Giryes},
Dropout is a popular regularization technique in neural networks. Yet, the reason for its success is still not fully understood. This paper provides a new interpretation of Dropout from a frame theory perspective. By drawing a connection to recent developments in analog channel coding, we suggest that for a certain family of autoencoders with a linear encoder, the minimizer of an optimization with dropout regularization on the encoder is an equiangular tight frame (ETF). Since this optimization… 

Figures and Tables from this paper

Dropout: Explicit Forms and Capacity Control

This work shows that the data-dependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks.

On Dropout and Nuclear Norm Regularization

A formal and complete characterization of the explicit regularizer induced by dropout in deep linear networks with squared loss is given and the global optima of the dropout objective is characterized.


This chapter surveys the different types of autoencoders that are mainly used today, and describes various applications and use-cases of Autoencoder.

Asymptotic Frame Theory for Analog Coding

An informationtheoretic random-like behavior of frame subsets is observed in setups involving erasures, random user activity, or sparsity (signal processing), in addition to channel or quantization noise.



Understanding Dropout

A general formalism for studying dropout on either units or connections, with arbitrary probability values, is introduced and used to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks.

Reducing Overfitting in Deep Networks by Decorrelating Representations

A new regularizer called DeCov is proposed which leads to significantly reduced overfitting, improved generalization performance, and better generalization in Deep Neural Networks.

Information Dropout: Learning Optimal Representations Through Noisy Computation

It is proved that Information Dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.

Altitude Training: Strong Bounds for Single-Layer Dropout

It is shown that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization and should therefore induce minimal bias in high dimensions.

Regularization of Neural Networks using DropConnect

This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect.

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

This work applies a new variational inference based dropout technique in LSTM and GRU models, which outperforms existing techniques, and to the best of the knowledge improves on the single model state-of-the-art in language modelling with the Penn Treebank.

Fast dropout training

This work shows how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective, which gives an order of magnitude speedup and more stability.

Dropout Training as Adaptive Regularization

By casting dropout as regularization, this work develops a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer and consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.

Robust Large Margin Deep Neural Networks

The analysis leads to the conclusion that a bounded spectral norm of the network's Jacobian matrix in the neighbourhood of the training samples is crucial for a deep neural network of arbitrary depth and width to generalize well.