Corpus ID: 237940810

SAU: Smooth activation function using convolution with approximate identities

@article{Biswas2021SAUSA,
  title={SAU: Smooth activation function using convolution with approximate identities},
  author={Koushik Biswas and Sandeep Kumar and Shilpak Banerjee and Ashish Kumar Pandey},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.13210}
}
  • Koushik Biswas, Sandeep Kumar, +1 author Ashish Kumar Pandey
  • Published 27 September 2021
  • Computer Science
  • ArXiv
Well-known activation functions like ReLU or Leaky ReLU are non-differentiable at the origin. Over the years, many smooth approximations of ReLU have been proposed using various smoothing techniques. We propose new smooth approximations of a non-differentiable activation function by convolving it with approximate identities. In particular, we present smooth approximations of Leaky ReLU and show that they outperform several well-known activation functions in various datasets and models. We call… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 48 REFERENCES
ErfAct and PSerf: Non-monotonic smooth trainable Activation Functions
  • Koushik Biswas, Sandeep Kumar, Shilpak Banerjee, Ashish Kumar Pandey
  • Computer Science
  • ArXiv
  • 2021
TLDR
This work proposes two novel non-monotonic smooth trainable activation functions, called ErfAct and PSerf, that improve the network performance significantly compared to the widely used activations like ReLU, Swish, and Mish. Expand
Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks
TLDR
This work demonstrates how to eliminate the reliance on first picking fixed activation functions by using flexible parametric rational functions instead, and the resulting Pade Activation Units (PAUs) can both approximate common activation functions and also learn new ones while providing compact representations. Expand
Orthogonal-Padé Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks
TLDR
Two best candidates out of six orthogonal-Padé activations are found, which are safe Hermite-Pade (HP) activation functions, namely HP-1 and HP-2, which have faster learning capability and improves the accuracy in standard deep learning datasets and models. Expand
Searching for Activation Functions
TLDR
The experiments show that the best discovered activation function, f(x) = x \cdot \text{sigmoid}(\beta x)$, which is named Swish, tends to work better than ReLU on deeper models across a number of challenging datasets. Expand
Identity Mappings in Deep Residual Networks
TLDR
The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. Expand
TanhSoft—Dynamic Trainable Activation Functions for Faster Learning and Better Performance
  • Koushik Biswas, Sandeep Kumar, Shilpak Banerjee, Ashish Kumar Pandey
  • Computer Science
  • IEEE Access
  • 2021
TLDR
This work proposes three novel activation functions with learnable parameters, namely TanhSoft-1, Tanh Soft-2, and Tanh soft-3, which are shown to outperform several well-known activation functions. Expand
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
TLDR
This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities. Expand
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
TLDR
The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers. Expand
SGDR: Stochastic Gradient Descent with Warm Restarts
TLDR
This paper proposes a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks and empirically studies its performance on the CIFAR-10 and CIFARS datasets. Expand
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
...
1
2
3
4
5
...