Corpus ID: 29393619

RandomOut: Using a convolutional gradient norm to rescue convolutional filters

  title={RandomOut: Using a convolutional gradient norm to rescue convolutional filters},
  author={Joseph Paul Cohen and Henry Z. Lo and Wei Ding},
  journal={arXiv: Computer Vision and Pattern Recognition},
Filters in convolutional neural networks are sensitive to their initialization. The random numbers used to initialize filters are a bias and determine if you will "win" and converge to a satisfactory local minimum so we call this The Filter Lottery. We observe that the 28x28 Inception-V3 model without Batch Normalization fails to train 26% of the time when varying the random seed alone. This is a problem that affects the trial and error process of designing a network. Because random seeds have… Expand
On Implicit Filter Level Sparsity in Convolutional Neural Networks
It is shown that the implicit sparsity can be harnessed for neural network speedup at par or better than explicit sparsification / pruning approaches, with no modifications to the typical training pipeline required. Expand
Analysis of Gene Interaction Graphs as Prior Knowledge for Machine Learning Models.
The authors' analysis with random graphs finds that dependencies can be captured almost as well at random which suggests that, in terms of gene expression levels, the relevant information about the state of the cell is spread across many genes. Expand
Methods for Pruning Deep Neural Networks
This paper brings together the reported results from some key papers in one place by providing a resource that can be used to quickly compare reported results, and trace studies where specific methods, data sets and architectures have been used. Expand


RandomOut: Using a convolutional gradient norm to win The Filter Lottery
The gradient norm is used to evaluate the impact of a filter on error, and re-initialize filters when the gradient norm of its weights falls below a specific threshold, which consistently improves accuracy across two datasets by up to 1.8%. Expand
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
Rethinking the Inception Architecture for Computer Vision
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. Expand
Understanding the difficulty of training deep feedforward neural networks
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. Expand
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand
Prediction gradients for feature extraction and analysis from convolutional neural networks
This work proposes prediction gradients to understand how neural networks encode concepts into individual units, and demonstrates the utility of the prediction gradient in understanding the importance and relationships between units inside a convolutional neural network. Expand
Neural Networks: Tricks of the Trade
It is shown how nonlinear semi-supervised embedding algorithms popular for use with â œshallowâ learning techniques such as kernel methods can be easily applied to deep multi-layer architectures. Expand
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network. Expand
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance. Expand