• Corpus ID: 2002198

Efficient batchwise dropout training using submatrices

  title={Efficient batchwise dropout training using submatrices},
  author={Benjamin Graham and Jeremy Reizenstein and Leigh Robinson},
Dropout is a popular technique for regularizing artificial neural networks. Dropout networks are generally trained by minibatch gradient descent with a dropout mask turning off some of the units---a different pattern of dropout is applied to every sample in the minibatch. We explore a very simple alternative to the dropout mask. Instead of masking dropped out units by setting them to zero, we perform matrix multiplication using a submatrix of the weight matrix---unneeded hidden units are never… 

Figures from this paper

Dropout distillation

This work introduces a novel approach, coined "dropout distillation", that allows to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency.

Improved Dropout for Shallow and Deep Learning

An efficient adaptive dropout that computes the sampling probabilities on-the-fly from a mini-batch of examples that achieves not only much faster convergence and but also a smaller testing error than the standard dropout is proposed.

Dropout as data augmentation

An approach to projecting the dropout noise within a network back into the input space, thereby generating augmented versions of the training data, and it is shown that training a deterministic network on the augmented samples yields similar results.

Active learning strategy for CNN combining batchwise Dropout and Query-By-Committee

This paper presents an active learning strategy based on query by committee and dropout technique to train a Convolutional Neural Network (CNN), and evaluates it on MNIST and USPS benchmarks, showing that selecting less than 22 % from the annotated database is enough to get similar error rate as using the full training set.

QBDC: Query by dropout committee for training deep supervised architecture

An active learning strategy based on query by committee and dropout technique to train a Convolutional Neural Network (CNN) is presented and a commmittee of partial CNNs resulting from batchwise dropout runs on the initial CNN is derived.

Robust Learning of Parsimonious Deep Neural Networks

The simulations show that the proposed simultaneous learning and pruning algorithm achieves pruning levels on par with state-of the-art methods for structured pruning, while maintaining better test-accuracy and more importantly in a manner robust with respect to network initialization and initial size.

Increasing the robustness of CNN acoustic models using ARMA spectrogram features and channel dropout

This work proposes an improved version of input dropout, which exploits the special structure of the input time-frequency representation, and replaced the standard mel-spectrogram input representation with the autoregressive moving average (ARMA) spectrogram, which was recently shown to outperform the former under mismatched train-test conditions.

Faster Neural Network Training with Approximate Tensor Operations

A novel technique for faster Neural Network (NN) training by systematically approximating all the constituent matrix multiplications and convolutions, complementary to other approximation techniques, requires no changes to the dimensions of the network layers, and is compatible with existing training frameworks.

Active learning and input space analysis for deep networks

An active learning strategy is hijacked to confront the relevance of the sentences selected with active learning to state-of-the-art phraseology techniques to understand the hierarchy of the linguistic knowledge acquired during the training of CNNs on NLP tasks.



Regularization of Neural Networks using DropConnect

This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect.

Dropout: a simple way to prevent neural networks from overfitting

It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

On the importance of initialization and momentum in deep learning

It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.

Recurrent Neural Network Regularization

This paper shows how to correctly apply dropout to LSTMs, and shows that it substantially reduces overfitting on a variety of tasks.

Fractional Max-Pooling

The form of fractional max-pooling formulated is found to reduce overfitting on a variety of datasets: for instance, it improves on the state of the art for CIFAR-100 without even using dropout.

Learning Ordered Representations with Nested Dropout

Nested dropout, a procedure for stochastically removing coherent nested sets of hidden units in a neural network, is introduced and it is rigorously shown that the application of nested dropout enforces identifiability of the units, which leads to an exact equivalence with PCA.

Reducing the Dimensionality of Data with Neural Networks

This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

Multi-column deep neural networks for image classification

On the very competitive MNIST handwriting benchmark, this method is the first to achieve near-human performance and improves the state-of-the-art on a plethora of common image classification benchmarks.

Learning Multiple Layers of Features from Tiny Images

It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.

Gradient-based learning applied to document recognition

This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.