Corpus ID: 6844431

Dropout: a simple way to prevent neural networks from overfitting

@article{Srivastava2014DropoutAS,
  title={Dropout: a simple way to prevent neural networks from overfitting},
  author={Nitish Srivastava and Geoffrey E. Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov},
  journal={J. Mach. Learn. Res.},
  year={2014},
  volume={15},
  pages={1929-1958}
}
Deep neural nets with a large number of parameters are very powerful machine learning systems. [...] Key Method During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves…Expand
A Survey on Prevention of Overfitting in Convolution Neural Networks Using Machine Learning Techniques
TLDR
It is shown that dropout enhance the overall performance of neural networks on manage gaining knowledge of obligations in imaginative and prescient, speech reputation, document type and computational biology, acquiring today's effects on many benchmark facts sets. Expand
Automatic Dropout for Deep Neural Networks
TLDR
This paper introduces a method of sampling a dropout rate from an automatically determined distribution and builds on this automatic selection of drop out rate by clustering the activations and adaptively applying different rates to each cluster. Expand
Improving Generalization for Convolutional Neural Networks
Stochastic Gradient Descent (SGD) minimizes the training risk LT (w) of neural network h over the set of all possible network parameters in w ∈ R. Since the risk is a very non-convex function of w,Expand
The Effect Of Hyperparameters In The Activation Layers Of Deep Neural Networks
TLDR
This paper aims to describe and verify the effectiveness of current techniques in the literature that utilize hyperparameters in the activation layer, and to introduce some new activation layers that introducehyperparameters into the model, including activation pools (APs) and parametric activationpools (PAPs). Expand
Regularizing Neural Networks with Gradient Monitoring
TLDR
This paper presents a regularization methodology for reducing the size of these complex models while still maintaining generalizability of shallow and deep neural networks and is evaluated on several benchmark classification tasks with a drastically smaller size and better performance to models trained with other similar regularization technique of DropConnect. Expand
Curriculum Dropout
TLDR
It is shown that using a fixed dropout probability during training is a suboptimal choice, and proposed a time scheduling for the probability of retaining neurons in the network, which induces an adaptive regularization scheme that smoothly increases the difficulty of the optimization problem. Expand
Mixed-pooling-dropout for convolutional neural network regularization
TLDR
This work proposes a novel method called Mixed-Pooling-Dropout that adapts the dropout function with a mixed-pooling strategy, represented by a binary mask with each element drawn independently from a Bernoulli distribution. Expand
A systematic review on overfitting control in shallow and deep neural networks
TLDR
A systematic review of the overfit controlling methods and categorizes them into passive, active, and semi-active subsets, which includes the theoretical and experimental backgrounds of these methods, their strengths and weaknesses, and the emerging techniques for overfitting detection. Expand
Method of Pre-processing a Deep Neural Network for Addressing Overfitting and Designing a Very Deep Neural Network
  • 2017
Sparseness of hidden unit activation is the common effect of the three primary methods (unsupervised pre-training, rectifier neural networks, and dropout) that significantly reduce overfitting inExpand
Learning Compact Architectures for Deep Neural Networks
Deep Neural Networks (NNs) have recently emerged as the model of choice for a wide a range of Machine Learning applications ranging from computer vision to speech recognition to natural languageExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
Improving Neural Networks with Dropout
TLDR
In this work, models that improve the performance of neural networks using dropout are described, often obtaining state-of-the-art results on benchmark datasets. Expand
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network. Expand
Fast dropout training
TLDR
This work shows how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective, which gives an order of magnitude speedup and more stability. Expand
Bayesian learning for neural networks
TLDR
Bayesian Learning for Neural Networks shows that Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional neural network learning methods. Expand
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand
A Fast Learning Algorithm for Deep Belief Nets
TLDR
A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. Expand
Learning with Marginalized Corrupted Features
TLDR
This work proposes to corrupt training examples with noise from known distributions within the exponential family and presents a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution. Expand
Simplifying Neural Networks by Soft Weight-Sharing
TLDR
A more complicated penalty term is proposed in which the distribution of weight values is modeled as a mixture of multiple gaussians, which allows the parameters of the mixture model to adapt at the same time as the network learns. Expand
Dropout Training as Adaptive Regularization
TLDR
By casting dropout as regularization, this work develops a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer and consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset. Expand
Deep Boltzmann Machines
TLDR
A new learning algorithm for Boltzmann machines that contain many layers of hidden variables that is made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass. Expand
...
1
2
3
4
5
...