• Corpus ID: 196158220

Swish: a Self-Gated Activation Function

@article{Ramachandran2017SwishAS,
  title={Swish: a Self-Gated Activation Function},
  author={Prajit Ramachandran and Barret Zoph and Quoc V. Le},
  journal={arXiv: Neural and Evolutionary Computing},
  year={2017}
}
The choice of activation functions in deep networks has a significant effect on the training dynamics and task performance. [] Key Result The simplicity of Swish and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.
E-swish: Adjusting Activations to Different Network Depths
TLDR
This paper introduces a new and novel activation function, closely related with the new activation $Swish = x * sigmoid(x)$ (Ramachandran et al., 2017) which generalizes it and is called E-swish, which outperforms many other well-known activations including both ReLU and Swish.
Mish: A Self Regularized Non-Monotonic Neural Activation Function
TLDR
A novel neural activation function called Mish, similar to Swish along with providing a boost in performance and its simplicity in implementation makes it easier for researchers and developers to use Mish in their Neural Network Models.
Benchmarking Comparison of Swish vs. Other Activation Functions on CIFAR-10 Imageset
TLDR
An experiment on CIFAR-10 image set where Swish appears not to outperform ReLU is described, where simply replacing ReLUs with Swish units improves top-1 classification accuracy on ImageNet by 0.9% and 0.6% respectively.
Soft-Root-Sign Activation Function
TLDR
The proposed nonlinearity, namely "Soft-Root-Sign" (SRS), is smooth, non-monotonic, and bounded, making it more compatible with batch normalization (BN) and less sensitive to initialization.
Regularized Flexible Activation Function Combination for Deep Neural Networks
TLDR
A novel family of flexible activation functions that can replace sigmoid or tanh in LSTM cells are implemented, as well as a new family by combining ReLU and ELUs, and two new regularisation terms based on assumptions as prior knowledge are introduced.
Deeper Learning with CoLU Activation
TLDR
CoLU is an activation function similar to Swish and Mish in properties and usually performs better than other functions on deeper neural networks, while training different neural networks on MNIST on an incrementally increasing number of convolutional layers.
Evolutionary optimization of deep learning activation functions
TLDR
This paper shows that evolutionary algorithms can discover novel activation functions that outperform ReLU, and these novelactivation functions are shown to generalize, achieving high performance across tasks.
SinP[N]: A Fast Convergence Activation Function for Convolutional Neural Networks
TLDR
A new activation function for the classification system that makes use of the properties of periodic functions, where the derivative of a periodic function is also periodic, and leads to very fast convergence even without the normalization layer.
Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks
TLDR
It is shown that oscillatory activation functions allow neurons to switch classification within the interior of neuronal hyperplane positive and negative half-spaces allowing complex decisions with fewer neurons to improve gradient flow and reduce network size.
Improving the Performance of Deep Neural Networks Using Two Proposed Activation Functions
TLDR
The statistical study of the overall experiments on both classification categories indicates that the proposed activation functions are robust and superior among all the competitive activation functions in terms of average accuracy.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 41 REFERENCES
Empirical Evaluation of Rectified Activations in Convolutional Network
TLDR
The experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results, and are negative on the common belief that sparsity is the key of good performance in ReLU.
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
TLDR
The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Neural Architecture Search with Reinforcement Learning
TLDR
This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
TLDR
This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
Flexible Rectified Linear Units for Improving Convolutional Neural Networks
TLDR
FReLU improves the flexibility of ReLU by a learnable rectified point, which achieves a faster convergence and higher performance and does not rely on strict assumptions by self-adaption.
Rethinking the Inception Architecture for Computer Vision
TLDR
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Self-Normalizing Neural Networks
TLDR
Self-normalizing neural networks (SNNs) are introduced to enable high-level abstract representations and it is proved that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero meanand unit variance -- even under the presence of noise and perturbations.
...
1
2
3
4
5
...