Corpus ID: 204915984

A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs

  title={A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs},
  author={Koyel Mukherjee and Alind Khare and Ashish Verma},
Training neural networks on image datasets generally require extensive experimentation to find the optimal learning rate regime. Especially, for the cases of adversarial training or for training a newly synthesized model, one would not know the best learning rate regime beforehand. We propose an automated algorithm for determining the learning rate trajectory, that works across datasets and models for both natural and adversarial training, without requiring any dataset/model specific tuning. It… Expand
Towards Efficient and Data Agnostic Image Classification Training Pipeline for Embedded Systems
This work is focusing on reviewing the latest augmentation and regularization methods for the image classification and exploring ways to automatically choose some of the most important hyperparameters: total number of epochs, initial learning rate value and it’s schedule. Expand
Few Shot Activity Recognition Using Variational Inference
This work proposes a novel variational inference based architectural framework (HF-AR) for few shot activity recognition that leverages volumepreserving Householder Flow to learn a flexible posterior distribution of the novel classes, resulting in better performance as compared to state-of-the-art few shot approaches for human activity recognition. Expand
LRWR: Large-Scale Benchmark for Lip Reading in Russian language
A naturally distributed large-scale benchmark for lipreading in Russian language, named LRWR, is introduced, which contains 235 classes and 135 speakers and a detailed description of the dataset collection pipeline and dataset statistics is provided. Expand


Cyclical Learning Rates for Training Neural Networks
  • Leslie N. Smith
  • Computer Science
  • 2017 IEEE Winter Conference on Applications of Computer Vision (WACV)
  • 2017
A new method for setting the learning rate, named cyclical learning rates, is described, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates. Expand
Improved Regularization of Convolutional Neural Networks with Cutout
This paper shows that the simple regularization technique of randomly masking out square regions of input during training, which is called cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Expand
Online Learning Rate Adaptation with Hypergradient Descent
We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in aExpand
The Marginal Value of Adaptive Gradient Methods in Machine Learning
It is observed that the solutions found by adaptive methods generalize worse (often significantly worse) than SGD, even when these solutions have better training performance, suggesting that practitioners should reconsider the use of adaptive methods to train neural networks. Expand
Towards Deep Learning Models Resistant to Adversarial Attacks
This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee. Expand
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
This paper empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization and enable training visual recognition models on internet-scale data with high efficiency. Expand
No more pesky learning rates
The proposed method to automatically adjust multiple learning rates so as to minimize the expected error at any one time relies on local gradient variations across samples, making it suitable for non-stationary problems. Expand
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight. Expand
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
This work investigates the cause for this generalization drop in the large-batch regime and presents numerical evidence that supports the view that large- batch methods tend to converge to sharp minimizers of the training and testing functions - and as is well known, sharp minima lead to poorer generalization. Expand