Corpus ID: 221293261

Channel-Directed Gradients for Optimization of Convolutional Neural Networks

@article{Lao2020ChannelDirectedGF,
  title={Channel-Directed Gradients for Optimization of Convolutional Neural Networks},
  author={Dong Lao and Peihao Zhu and Peter Wonka and Ganesh Sundaramoorthi},
  journal={ArXiv},
  year={2020},
  volume={abs/2008.10766}
}
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. The method requires only simple processing of existing stochastic gradients, can be used in conjunction with any optimizer, and has only a linear overhead (in the number of parameters) compared to computation of the stochastic gradient. The method works by computing the gradient of the loss function with respect to output-channel… Expand
1 Citations
CONetV2: Efficient Auto-Channel Size Optimization for CNNs
TLDR
Channel Size Visualizer applied to a standard ResNet34 Architecture provides the user insight into the topology and intricacies of the neural network as to help understand the importance of channel dependencies towards the issue of algorithm design for channel size optimization. Expand

References

SHOWING 1-10 OF 35 REFERENCES
Laplacian Smoothing Gradient Descent
TLDR
A class of very simple modifications of gradient descent and stochastic gradient descent can dramatically reduce the variance, allow to take a larger step size, and improve the generalization accuracy when applied to a large variety of machine learning problems. Expand
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
U-Net: Convolutional Networks for Biomedical Image Segmentation
TLDR
It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Expand
Practical Riemannian Neural Networks
We provide the first experimental results on non-synthetic datasets for the quasi-diagonal Riemannian gradient descents for neural networks introduced in [Ollivier, 2015]. These include the MNIST,Expand
Optimization Methods for Large-Scale Machine Learning
TLDR
A major theme of this study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter, leading to a discussion about the next generation of optimization methods for large- scale machine learning. Expand
Implicit Bias of Gradient Descent on Linear Convolutional Networks
We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain. This is inExpand
Generalized Gradients: Priors on Minimization Flows
TLDR
This paper investigates the relevance of using other inner products, yielding other gradient descents, and other minimizing flows not deriving from any inner product, and presents an extension of the definition of the gradient toward more general priors. Expand
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. Expand
Decoupled Weight Decay Regularization
TLDR
This work proposes a simple modification to recover the original formulation of weight decay regularization by decoupling the weight decay from the optimization steps taken w.r.t. the loss function, and provides empirical evidence that this modification substantially improves Adam's generalization performance. Expand
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
...
1
2
3
4
...