APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning

@article{Kumar2022APTxBA,
  title={APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning},
  author={Ravin Kumar},
  journal={ArXiv},
  year={2022},
  volume={abs/2209.06119}
}
  • Ravin Kumar
  • Published 10 September 2022
  • Computer Science
  • ArXiv
. Activation Functions introduce non-linearity in the deep neural networks. This nonlinearity helps the neural networks learn faster and efficiently from the dataset. In deep learning, many activation functions are developed and used based on the type of problem statement. ReLU’s variants, SWISH, and MISH are goto activation functions. MISH function is considered having similar or even better performance than SWISH, and much better than ReLU. In this paper, we propose an activation function… 

Figures from this paper

References

SHOWING 1-9 OF 9 REFERENCES

Swish: a Self-Gated Activation Function

The experiments show that Swish tends to work better than ReLU on deeper models across a number of challenging datasets, and its simplicity and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.

Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks

This research paper will evaluate the commonly used additive functions, such as swish, ReLU, Sigmoid, and so forth, followed by their properties, own cons and pros, and particular formula application recommendations.

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.

Deep Learning using Rectified Linear Units (ReLU)

The use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN) is introduced by taking the activation of the penultimate layer of a neural network, then multiplying it by weight parameters $\theta$ to get the raw scores.

Deeply learned face representations are sparse, selective, and robust

This paper designs a high-performance deep convolutional network (DeepID2+) for face recognition that is learned with the identification-verification supervisory signal, and finds it is much more robust to occlusions, although occlusion patterns are not included in the training set.

Rectifier Nonlinearities Improve Neural Network Acoustic Models

This work explores the use of deep rectifier networks as acoustic models for the 300 hour Switchboard conversational speech recognition task, and analyzes hidden layer representations to quantify differences in how ReL units encode inputs as compared to sigmoidal units.

Rectified Linear Units Improve Restricted Boltzmann Machines

Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

Mish: A Self Regularized Non-Monotonic Activation Function

Mish, a novel self-regularized non-monotonic activation function which can be mathematically defined as f (x) = x tanh(so f t plus(x)), is proposed, which validated experimentally on several well-known benchmarks against the best combinations of architectures and activation functions.

Deep sparse rectifier networks

  • In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume,
  • 2011