# APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning

@article{Kumar2022APTxBA,
title={APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning},
author={Ravin Kumar},
journal={ArXiv},
year={2022},
volume={abs/2209.06119}
}
• Ravin Kumar
• Published 10 September 2022
• Computer Science
• ArXiv
. Activation Functions introduce non-linearity in the deep neural networks. This nonlinearity helps the neural networks learn faster and efficiently from the dataset. In deep learning, many activation functions are developed and used based on the type of problem statement. ReLU’s variants, SWISH, and MISH are goto activation functions. MISH function is considered having similar or even better performance than SWISH, and much better than ReLU. In this paper, we propose an activation function…

## References

SHOWING 1-9 OF 9 REFERENCES

• Computer Science
• 2017
The experiments show that Swish tends to work better than ReLU on deeper models across a number of challenging datasets, and its simplicity and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.
This research paper will evaluate the commonly used additive functions, such as swish, ReLU, Sigmoid, and so forth, followed by their properties, own cons and pros, and particular formula application recommendations.
• Computer Science
ICLR
• 2016
The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.
The use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN) is introduced by taking the activation of the penultimate layer of a neural network, then multiplying it by weight parameters $\theta$ to get the raw scores.
• Computer Science
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
• 2015
This paper designs a high-performance deep convolutional network (DeepID2+) for face recognition that is learned with the identification-verification supervisory signal, and finds it is much more robust to occlusions, although occlusion patterns are not included in the training set.
This work explores the use of deep rectifier networks as acoustic models for the 300 hour Switchboard conversational speech recognition task, and analyzes hidden layer representations to quantify differences in how ReL units encode inputs as compared to sigmoidal units.
• Computer Science
ICML
• 2010
Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.
Mish, a novel self-regularized non-monotonic activation function which can be mathematically defined as f (x) = x tanh(so f t plus(x)), is proposed, which validated experimentally on several well-known benchmarks against the best combinations of architectures and activation functions.

### Deep sparse rectifier networks

• In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume,
• 2011