• Corpus ID: 125617073

Gaussian Error Linear Units (GELUs)

@article{Hendrycks2016GaussianEL,
  title={Gaussian Error Linear Units (GELUs)},
  author={Dan Hendrycks and Kevin Gimpel},
  journal={arXiv: Learning},
  year={2016}
}
We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered… 
Symmetrical Gaussian Error Linear Units (SGELUs)
TLDR
A novel neural network activation function, called Symmetrical Gaussian Error Linear Unit (SGELU), is proposed to obtain high performance by effectively integrating the property of the stochastic regularizer in the GELU with the symmetrical characteristics.
Two-argument activation functions learn soft XOR operations like cortical neurons
TLDR
This work emulates more biologically realistic neurons by learning canonical activation functions with two input arguments, analogous to basal and apical dendrites, in a network-in-network architecture where each neuron is modeled as a multilayer perceptron with two inputs and a single output.
SinLU: Sinu-Sigmoidal Linear Unit
TLDR
The proposed SinLU incorporates the sine wave, allowing new functionalities over traditional linear unit activations, and two trainable parameters of this function control the participation of the sinusoidal nature in the function, and help to achieve an easily trainable, and fast converging function.
ErfAct and PSerf: Non-monotonic smooth trainable Activation Functions
TLDR
This work proposes two novel non-monotonic smooth trainable activation functions, called ErfAct and Pserf, and suggests that the proposed functions improve the network performance compared to the widely used activations like ReLU, Swish, and Mish.
An Efficient Asymmetric Nonlinear Activation Function for Deep Neural Networks
TLDR
To demonstrate the effectiveness of this function in the field of object detection, the proposed activation function is compared with several state-of-the-art activation functions on the typical backbone networks such as ResNet and DSPDarkNet.
Learning a Single Neuron for Non-monotonic Activation Functions
TLDR
This work establishes learnability without assuming monotonicity of a single neuron x (cid:55)→ σ ( w T x ) with gradient descent (GD) when the input distribution is the standard Gaussian, and shows that mild conditions on σ are enough to guarantee the learnability in polynomial time andPolynomial samples.
LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks
TLDR
The proposed LiSHT activation function is an attempt to scale the non-linear Hyperbolic Tangent (Tanh) function by a linear function and tackle the dying gradient problem.
Introducing the DOME Activation Functions
TLDR
A novel non-linear activation function that spontaneously induces class-compactness and regularization in the embedding space of neural networks and it is shown that models using the function exhibit extra robustness against adversarial attacks.
An Analysis of State-of-the-art Activation Functions For Supervised Deep Neural Network
This paper provides an analysis of state-of-the-art activation functions with respect to supervised classification of deep neural network. These activation functions comprise of Rectified Linear
Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay
TLDR
It is proved that for a family of non-smooth activation functions, including ReLU, approximating any single neuron with random features suffers from the curse of dimensionality, providing an explicit separation of expressiveness between neural networks and random feature models.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 26 REFERENCES
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
TLDR
The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.
Adaptive dropout for training deep neural networks
TLDR
A method is described called 'standout' in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero, which achieves lower classification error rates than other feature learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines.
Rectified Linear Units Improve Restricted Boltzmann Machines
TLDR
Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.
Natural Neural Networks
TLDR
A specific example that employs a simple and efficient reparametrization of the neural network weights by implicitly whitening the representation obtained at each layer, while preserving the feed-forward computation of the network.
Residual Networks are Exponential Ensembles of Relatively Shallow Networks
TLDR
This work introduces a novel interpretation of residual networks showing they are exponential ensembles, and suggests that in addition to describing neural networks in terms of width and depth, there is a third dimension: multiplicity, the size of the implicit ensemble.
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
TLDR
This work proposes zoneout, a novel method for regularizing RNNs that uses random noise to train a pseudo-ensemble, improving generalization and performs an empirical investigation of various RNN regularizers, and finds that zoneout gives significant performance improvements across tasks.
Dropout: a simple way to prevent neural networks from overfitting
TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Neural networks and physical systems with emergent collective computational abilities.
  • J. Hopfield
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 1982
TLDR
A model of a system having a large number of simple equivalent components, based on aspects of neurobiology but readily adapted to integrated circuits, produces a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size.
Deep Residual Networks with Exponential Linear Unit
TLDR
This paper proposes to replace the combination of ReLU and Batch Normalization with Exponential Linear Unit (ELU) in Residual Networks, and shows that this not only speeds up the learning behavior in Residine Networks, but also improves the classification performance as the depth increases.
...
1
2
3
...