A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks

  title={A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks},
  author={Eustace M. Dogo and Oluwatobi Joshua Afolabi and Nnamdi Ikechi Nwulu and Bhekisipho Twala and Clinton Ohis Aigbavboa},
  journal={2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS)},
  • E. Dogo, O. J. Afolabi, C. Aigbavboa
  • Published 1 December 2018
  • Computer Science
  • 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS)
In this paper, we perform a comparative evaluation of seven most commonly used first-order stochastic gradient-based optimization techniques in a simple Convolutional Neural Network (ConvNet) architectural setup. [] Key Method The investigated techniques are the Stochastic Gradient Descent (SGD), with vanilla (vSGD), with momentum (SGDm), with momentum and nesterov (SGDm+n)), Root Mean Square Propagation (RMSProp), Adaptive Moment Estimation (Adam), Adaptive Gradient (AdaGrad), Adaptive Delta (AdaDelta…

Figures and Tables from this paper

Comparative study of optimization techniques in deep learning: Application in the ophthalmology field

A comparative study of stochastic, momentum, Nesterov, AdaGrad, RMSProp, AdaDelta, Adam, AdaMax and Nadam gradient descent algorithms based on the speed of convergence of these different algorithms, as well as the mean absolute error of each algorithm in the generation of an optimization solution is presented.

AG-SGD: Angle-Based Stochastic Gradient Descent

An algorithm is proposed that quantifies this deviation based on the angle between the past and the current gradients which is then applied to calibrate these two gradients, generating a more accurate new gradient.

Empirical Evaluation of the Effect of Optimization and Regularization Techniques on the Generalization Performance of Deep Convolutional Neural Network

This work explores the effect of the used optimization algorithm and regularization techniques on the final generalization performance of the model with convolutional neural network (CNN) architecture widely used in the field of computer vision.

Enhancing Performance of a Deep Neural Network: A Comparative Analysis of Optimization Algorithms

This paper experiments with seven of the most popular optimization algorithms namely: sgd, rmsprop, adagrad, adadelta, adam, adamax and nadam on four unrelated datasets discretely to conclude which one dispenses the best accuracy, efficiency and performance to the deep neural network.

Optimizing Stochastic Gradient Descent Using the Angle Between Gradients

This work proposes a method that reduces the aforementioned deviation via applying a preprocessing technique to the previous gradient prior to its usage, improving the precision of the calibration terms, reducing the effects of the deviations.

Deep convolutional neural network-based system for fish classification

This paper presents an effective fish classification method based on convolutional neural networks (CNNs) that attained the best accuracy of 98.46% in enhancing the CNN ability in classification, among others.

Meta-Optimization of Deep CNN for Image Denoising Using LSTM

This work investigates the application of the metaoptimization training approach to the DnCNN denoising algorithm to enhance itsDenoising capability and reveals the prospects of utilizing the meta-optimizationTraining approach towards the enhancement of the DNCNN denOising capability.

Analysis of various optimizers on deep convolutional neural network model in the application of hyperspectral remote sensing image classification

A spatial feature extraction technique using deep convolutional neural network (CNN) for HSI classification and the superiority of the presented deep CNN model with Adam optimizer is demonstrated.

Optimization Back Propagation Algorithm with Feature Engineering to Predict Building Hourly Consumption Energy

The results show that Feature engineering has given 1–22 percent influence in improving accuracy of prediction than without using features engineering, and best optimizer function on Back propagation with feature engineering is SGD optimizer.



Comparison of the stochastic gradient descent based optimization techniques

  • Ersan YazanM. F. Talu
  • Computer Science
    2017 International Artificial Intelligence and Data Processing Symposium (IDAP)
  • 2017
Five different approaches based on SDA used in updating the θ parameters were investigated and the advantages and disadvantages of each approach are compared with each other in terms of the number of oscillations, the parameter update rate and the minimum cost reached.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

A Study of Gradient-Based Algorithms

In the case of linear regression, it is found that SGD outperform GD as the data sets grow larger, and the precision of SGD does not necessarily improve when increasing the mini-batch size, while GD produces the most accurate model with a 73.39% success rate.

On the importance of initialization and momentum in deep learning

It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.

Effects of Degradations on Deep Neural Network Architectures

This first study on the performance of CapsuleNet (CapsNet) and other state-of-the-art CNN architectures under different types of image degradations is demonstrated and a network setup is proposed that can enhance the robustness of any CNN architecture for certain degradation.

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

Failures of Gradient-Based Deep Learning

This work describes four types of simple problems, for which the gradient-based algorithms commonly used in deep learning either fail or suffer from significant difficulties.

Comparison of Stochastic Optimization Algorithms in Hydrological Model Calibration

Ten stochastic optimization methods used to calibrate parameter sets for three hydrological models on 10 different basins revealed that the dimensionality and general fitness landscape characteristics of the model calibration problem are impo...

Unit Tests for Stochastic Optimization

A collection of unit tests for stochastic optimization that rapidly evaluates an optimization algorithm on a small-scale, isolated, and well-understood difficulty, rather than in real-world scenarios where many such issues are entangled.

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.