• Corpus ID: 58014196

LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks

@article{Roy2019LiSHTNL,
  title={LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks},
  author={Swalpa Kumar Roy and Suvojit Manna and Shiv Ram Dubey and Bidyut. B. Chaudhuri},
  journal={ArXiv},
  year={2019},
  volume={abs/1901.05894}
}
The activation function in neural network is one of the important aspects which facilitates the deep training by introducing the non-linearity into the learning process. [] Key Method The proposed LiSHT activation function is an attempt to scale the non-linear Hyperbolic Tangent (Tanh) function by a linear function and tackle the dying gradient problem. The training and classification experiments are performed over benchmark Car Evaluation, Iris, MNIST, CIFAR10, CIFAR100 and twitter140 datasets to show that…
RMAF: Relu-Memristor-Like Activation Function for Deep Learning
TLDR
The proposed activation function called ReLU-Memristor-like Activation Function (RMAF) is proposed to leverage benefits of negative values in neural networks and can replace ReLU in any neural network due to the efficiency, scalability and its similarity to both ReLU and Swish.
PSNet: Parametric Sigmoid Norm Based CNN for Face Recognition
TLDR
This paper proposes a PSNet CNN model by using the PSN layer, a Parametric Sigmoid Norm (PSN) layer just before the final fully-connected layer that forces the network to learn the visual features of difficult examples.
ATA: Attentional Non-Linear Activation Function Approximation for VLSI-Based Neural Networks
TLDR
Experimental results demonstrate that the ATA outperforms other state-of-the-art approximation methods in recognition accuracy, power and area.
Symmetrical Gaussian Error Linear Units (SGELUs)
TLDR
A novel neural network activation function, called Symmetrical Gaussian Error Linear Unit (SGELU), is proposed to obtain high performance by effectively integrating the property of the stochastic regularizer in the GELU with the symmetrical characteristics.
Activation Functions: Comparison of trends in Practice and Research for Deep Learning
TLDR
This paper will be the first, to compile the trends in AF applications in practice against the research results from literature, found in deep learning research to date.
Low Curvature Activations Reduce Overfitting in Adversarial Training
TLDR
It is shown that using activation functions with low (exact or approximate) curvature values has a regularization effect that significantly reduces both the standard and robust generalization gaps in adversarial training.
Deep Learning: Current State
TLDR
This paper reviews the current state of deep learning and includes a revision of basic concepts, such as the operations of feed forward and backpropagation, the use of convolution to extract features, the role of the loss function, and the optimization and learning processes.
FuSENet: fused squeeze-and-excitation network for spectral-spatial hyperspectral image classification
TLDR
The authors propose a bilinear fusion mechanism over different types of squeeze operation such as global pooling and max pooling, which confirms the superiority of the proposed FuSENet method with respect to the state-of-the-art methods.
A 3D-2D Convolutional Neural Network and Transfer Learning for Hyperspectral Image Classification
TLDR
A 3D-2D convolutional neural network and transfer learning model where the early layers of the model exploit 3D convolutions to modeling spectral-spatial information and which outperformed several state-of-the-arts (SOTA), deep neural network-based approaches, and standard classifiers.
A Review of Activation Function for Artificial Neural Network
TLDR
This work provides a review of the most common and recent activation functions used in the hidden layer of an artificial Neural Network.
...
...

References

SHOWING 1-10 OF 55 REFERENCES
Swish: a Self-Gated Activation Function
TLDR
The experiments show that Swish tends to work better than ReLU on deeper models across a number of challenging datasets, and its simplicity and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
TLDR
The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.
Complex-Valued Neural Networks With Nonparametric Activation Functions
TLDR
This paper proposes the first fully complex, nonparametric activation function for CVNNs, which is based on a kernel expansion with a fixed dictionary that can be implemented efficiently on vectorized hardware.
Learning Activation Functions to Improve Deep Neural Networks
TLDR
A novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent is designed, achieving state-of-the-art performance on CIFar-10, CIFAR-100, and a benchmark from high-energy physics involving Higgs boson decay modes.
Empirical Evaluation of Rectified Activations in Convolutional Network
TLDR
The experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results, and are negative on the common belief that sparsity is the key of good performance in ReLU.
diffGrad: An Optimization Method for Convolutional Neural Networks
TLDR
A novel optimizer is proposed based on the difference between the present and the immediate past gradient, diffGrad, which shows that diffGrad outperforms other optimizers and performs uniformly well for training CNN using different activation functions.
OrthoMaps: an efficient convolutional neural network with orthogonal feature maps for tiny image classification
TLDR
A new model architecture is developed that has a minimal number of parameters and layers that is able to classify tiny images using much less computation and memory costs and a random augmentation to the input data that prevent the model from being overfitted.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
...
...