• Corpus ID: 4698616

Comparison of non-linear activation functions for deep neural networks on MNIST classification task

@article{Pedamonti2018ComparisonON,
  title={Comparison of non-linear activation functions for deep neural networks on MNIST classification task},
  author={Dabal Pedamonti},
  journal={ArXiv},
  year={2018},
  volume={abs/1804.02763}
}
Activation functions play a key role in neural networks so it becomes fundamental to understand their advantages and disadvantages in order to achieve better performances. This paper will first introduce common types of non linear activation functions that are alternative to the well known sigmoid function and then evaluate their characteristics. Moreover deeper neural networks will be analysed because they positively influence the final performances compared to shallower networks. They also… 

An overview of the activation functions used in deep learning algorithms

An overview of common and current activation functions used in deep learning algorithms is presented and sigmoid, hyperbolic tangent, ReLU, softplus and swish, and as trainable activation functions, LReLU, ELU, SELU and RSigELU are introduced.

A Methodology for Automatic Selection of Activation Functions to Design Hybrid Deep Neural Networks

This paper proposes a novel methodology to automatically select the best-possible activation function for each layer of a given DNN, such that the overall DNN accuracy is improved and the overall scientific challenge in exploring all the different configurations of activation functions is improved.

Activation functions in deep learning: A comprehensive survey and benchmark

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

A comprehensive overview and survey is presented for AFs in neural networks for deep learning, covering different classes of AFs such as Logistic Sigmoid and Tanh based, ReLU based, ELU based, and Learning based.

Catalysis of neural activation functions: Adaptive feed-forward training for big data applications

The proposed catalysis function works over Rectified Linear Unit, sigmoid, tanh and all other activation functions to provide adaptive feed-forward training and uses vector components of the activation function to provide variational flow of input.

Effects of the Nonlinearity in Activation Functions on the Performance of Deep Learning Models

This work investigates the model performance when using ReLU or L-ReLU as activation functions in different model architectures and data domains and finds that the application of L- reLU is mostly effective when the number of trainable parameters in a model is relatively small.

On the Impact of the Activation Function on Deep Neural Networks Training

A comprehensive theoretical analysis of the Edge of Chaos is given and it is shown that one can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve the performance.

Effects of Nonlinearity and Network Architecture on the Performance of Supervised Neural Networks

This work investigates the performance of neural network models as a function of nonlinearity using ReLU and L-ReLU activation functions in the context of different model architectures and data domains, and investigates the entropy profile of shallow neural networks as a way of representing their hidden layer dynamics.

Investigative Study of the Effect of Various Activation Functions with Stacked Autoencoder for Dimension Reduction of NIDS using SVM

To investigate the aforementioned issues linear and non-linear activation functions are chosen for dimension reduction using Stacked Autoencoder (SAE) when applied to Network Intrusion Detection Systems (NIDS) and it is concluded that ELU is performed with low computational overhead with negligible difference of accuracy when compared to other activation functions.
...

References

SHOWING 1-6 OF 6 REFERENCES

Self-Normalizing Neural Networks

Self-normalizing neural networks (SNNs) are introduced to enable high-level abstract representations and it is proved that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero meanand unit variance -- even under the presence of noise and perturbations.

Understanding the difficulty of training deep feedforward neural networks

The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.

Rectifier Nonlinearities Improve Neural Network Acoustic Models

This work explores the use of deep rectifier networks as acoustic models for the 300 hour Switchboard conversational speech recognition task, and analyzes hidden layer representations to quantify differences in how ReL units encode inputs as compared to sigmoidal units.

Rectified Linear Units Improve Restricted Boltzmann Machines

Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

The MNIST Dataset Of Handwritten Digits (Images)

  • 1999