Learning with Random Learning Rates

@article{Blier2019LearningWR,
  title={Learning with Random Learning Rates},
  author={L{\'e}onard Blier and Pierre Wolinski and Yann Ollivier},
  journal={ArXiv},
  year={2019},
  volume={abs/1810.01322}
}
Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the 'All Learning Rates At Once' (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. This comes at practically no computational cost. Perhaps surprisingly, stochastic gradient descent… 
Disentangling Adaptive Gradient Methods from Learning Rates
TLDR
A "grafting" experiment is introduced which decouples an update's magnitude from its direction, finding that many existing beliefs in the literature may have arisen from insufficient isolation of the implicit schedule of step sizes.
Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility
TLDR
The infinite-width limit of deep feedforward neural networks whose weights are dependent, and modelled via a mixture of Gaussian distributions is studied, and it is shown that, in this regime, the weights are compressible, and feature learning is possible.
Learning Rate Optimisation of an Image Processing Deep Convolutional Neural Network
TLDR
A mathematical model is developed to identify an optimal learning rate (OLR) for an image processing deep convolutional neural network (DCNN) and a model validation graph is extrapolated, which will illustrate the mathematical model accuracy and the region of interest (ROI).
The Effect of Adaptive Learning Rate on the Accuracy of Neural Networks
TLDR
This work has assessed the effect of different learning rates and came up with the most appropriate learning rate for CNN plant leaf disease classification, which was able to achieve the highest accuracy with a learning rate of 0.001.
RNN-based Online Learning: An Efficient First-Order Optimization Algorithm with a Convergence Guarantee
TLDR
An efficient first-order training algorithm is introduced that theoretically guarantees to converge to the optimum network parameters and is truly online such that it does not make any assumption on the learning environment to guarantee convergence.
Annealed Label Transfer for Face Expression Recognition
TLDR
A method for recognizing facial expressions using information from a pair of domains: one has labelled data and one with unlabelled data, which depart from the traditional semi–supervised framework towards a transfer learning approach.
Implementation of a deep learning model for automated classification of Aedes aegypti (Linnaeus) and Aedes albopictus (Skuse) in real time
TLDR
This work proposed a highly accessible method to develop a deep learning model and implement the model for mosquito image classification by using hardware that could regulate the development process, and illustrated how to set up supervised deep convolutional neural networks (DCNNs) with hyperparameter adjustment.
Discrimination of malignant from benign thyroid lesions through neural networks using FTIR signals obtained from tissues
TLDR
N-based tools were able to predict thyroid cancer based on infrared spectroscopy of tissues with a high level of diagnostic performance in comparison to the gold standard.
Music Visualization Based on Spherical Projection With Adjustable Metrics
TLDR
A method to normalize data in MIDI files by 12 dimensional vector descriptors extracted from tonality as well as a novel technique for dimensionality reduction and visualization of extracted music data by 3D projections is discussed.
A Novel Segmentation Method for Furnace Flame Using Adaptive Color Model and Hybrid-Coded HLO
TLDR
A novel segmentation method for furnace flame using adaptive color model and hybrid-coded human learning optimization (AHcHLO) and the proposed NACMM outperforms state-of-the-art flame segmentation approaches, providing a high detection accuracy and a low false detection rate.
...
...

References

SHOWING 1-10 OF 76 REFERENCES
Training Deep Networks without Learning Rates Through Coin Betting
TLDR
This paper proposes a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting and reduces the optimization process to a game of betting on a coin and proposes a learning-rate-free optimal algorithm.
Online Learning Rate Adaptation with Hypergradient Descent
We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a
No more pesky learning rates
TLDR
The proposed method to automatically adjust multiple learning rates so as to minimize the expected error at any one time relies on local gradient variations across samples, making it suitable for non-stationary problems.
The Marginal Value of Adaptive Gradient Methods in Machine Learning
TLDR
It is observed that the solutions found by adaptive methods generalize worse (often significantly worse) than SGD, even when these solutions have better training performance, suggesting that practitioners should reconsider the use of adaptive methods to train neural networks.
Gradient-based Hyperparameter Optimization through Reversible Learning
TLDR
This work computes exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure, which allows us to optimize thousands ofhyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures.
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
TLDR
A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.
Practical Recommendations for Gradient-Based Training of Deep Architectures
  • Yoshua Bengio
  • Computer Science
    Neural Networks: Tricks of the Trade
  • 2012
TLDR
Overall, this chapter describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks and closes with open questions about the training difficulties observed with deeper architectures.
Dropout: a simple way to prevent neural networks from overfitting
TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Neural Architecture Search with Reinforcement Learning
TLDR
This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
Understanding deep learning requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
...
...