Corpus ID: 236772707

Piecewise Linear Units Improve Deep Neural Networks

@article{Inturrisi2021PiecewiseLU,
  title={Piecewise Linear Units Improve Deep Neural Networks},
  author={Jordan Inturrisi and Suiyang Khoo and Abbas Z. Kouzani and Riccardo M. Pagliarella},
  journal={ArXiv},
  year={2021},
  volume={abs/2108.00700}
}
The activation function is at the heart of a deep neural networks nonlinearity; the choice of the function has great impact on the success of training. Currently, many practitioners prefer the Rectified Linear Unit (ReLU) due to its simplicity and reliability, despite its few drawbacks. While most previous functions proposed to supplant ReLU have been hand-designed, recent work on learning the function during training has shown promising results. In this paper we propose an adaptive piecewise… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 48 REFERENCES
PLU: The Piecewise Linear Unit Activation Function
TLDR
A new activation function is presented, the Piecewise Linear Unit (PLU) that is a hybrid of tanh and ReLU and shown to outperform the ReLU on a variety of tasks while avoiding the vanishing gradients issue of the tanh. Expand
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
TLDR
The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers. Expand
Parametric Exponential Linear Unit for Deep Convolutional Neural Networks
TLDR
The results on the MNIST, CIFAR-10/100 and ImageNet datasets using the NiN, Overfeat, All-CNN and ResNet networks indicate that the proposed Parametric ELU (PELU) has better performances than the non-parametricELU. Expand
Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning
TLDR
This study proposes two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU), and suggests the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. Expand
Learning Activation Functions to Improve Deep Neural Networks
TLDR
A novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent is designed, achieving state-of-the-art performance on CIFar-10, CIFAR-100, and a benchmark from high-energy physics involving Higgs boson decay modes. Expand
Understanding the difficulty of training deep feedforward neural networks
TLDR
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. Expand
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
Empirical Evaluation of Rectified Activations in Convolutional Network
TLDR
The experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results, and are negative on the common belief that sparsity is the key of good performance in ReLU. Expand
Searching for Activation Functions
TLDR
The experiments show that the best discovered activation function, f(x) = x \cdot \text{sigmoid}(\beta x)$, which is named Swish, tends to work better than ReLU on deeper models across a number of challenging datasets. Expand
A novel activation function for multilayer feed-forward neural networks
TLDR
Experimental results demonstrate that the proposed activation function can effectively be applied across various datasets where its accuracy, given the same network topology, is competitive with the state-of-the-art. Expand
...
1
2
3
4
5
...