• Corpus ID: 8151505

Adaptive dropout for training deep neural networks

  title={Adaptive dropout for training deep neural networks},
  author={Jimmy Ba and Brendan J. Frey},
Recently, it was shown that deep neural networks can perform very well if the activities of hidden units are regularized during learning, e.g, by randomly dropping out 50% of their activities. [] Key Method This 'adaptive dropout network' can be trained jointly with the neural network by approximately computing local expectations of binary dropout variables, computing derivatives using back-propagation, and using stochastic gradient descent. Interestingly, experiments show that the learnt dropout network…

Figures from this paper

Improved Dropout for Shallow and Deep Learning
An efficient adaptive dropout that computes the sampling probabilities on-the-fly from a mini-batch of examples that achieves not only much faster convergence and but also a smaller testing error than the standard dropout is proposed.
Automatic Dropout for Deep Neural Networks
This paper introduces a method of sampling a dropout rate from an automatically determined distribution and builds on this automatic selection of drop out rate by clustering the activations and adaptively applying different rates to each cluster.
Dropout with Tabu Strategy for Regularizing Deep Neural Networks
This work adds a diversification strategy into dropout, which aims at generating more different neural network architectures in a proper times of iterations, and improves the performance of the standard dropout.
Continuous Dropout
The proposed continuous dropout is considerably closer to the activation characteristics of neurons in the human brain than traditional binary dropout and has the property of avoiding the co-adaptation of feature detectors, which suggests that it can extract more independent feature detectors for model averaging in the test stage.
Ising-dropout: A Regularization Method for Training and Compression of Deep Neural Networks
  • H. Salehinejad, S. Valaee
  • Computer Science
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
The preliminary results show that the proposed approach can keep the classification performance competitive to the original network while eliminating optimization of unnecessary network parameters in each training cycle.
Curriculum Dropout
It is shown that using a fixed dropout probability during training is a suboptimal choice, and proposed a time scheduling for the probability of retaining neurons in the network, which induces an adaptive regularization scheme that smoothly increases the difficulty of the optimization problem.
Selective Dropout for Deep Neural Networks
3 new alternative methods for performing dropout on a deep neural network are presented which improves the effectiveness of the dropout method over the same training period, with the most effective of these being the Output Variance method.
The aim of this dissertation to study dropout and other which are built on dropout regularization methods to create data having a correlation with real world data.
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks
This work investigates the empirical Rademacher complexity related to intermediate layers of deep neural networks and proposes a feature distortion method for addressing the problem of over-fitting.


On the importance of initialization and momentum in deep learning
It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.
Greedy Layer-Wise Training of Deep Networks
These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
A Fast Learning Algorithm for Deep Belief Nets
A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Efficient Learning of Deep Boltzmann Machines
We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. The algorithm learns a separate “recognition” model that
An Analysis of Single-Layer Networks in Unsupervised Feature Learning
The results show that large numbers of hidden nodes and dense feature extraction are critical to achieving high performance—so critical, in fact, that when these parameters are pushed to their limits, they achieve state-of-the-art performance on both CIFAR-10 and NORB using only a single layer of features.
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Improving neural networks by preventing co-adaptation of feature detectors
When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the
Learning Recurrent Neural Networks with Hessian-Free Optimization
This work solves the long-outstanding problem of how to effectively train recurrent neural networks on complex and difficult sequence modeling problems which may contain long-term data dependencies and offers a new interpretation of the generalized Gauss-Newton matrix of Schraudolph which is used within the HF approach of Martens.
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations.
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.