FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks

  title={FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks},
  author={Suo Qiu and Bolun Cai},
  journal={2018 24th International Conference on Pattern Recognition (ICPR)},
  • Suo QiuBolun Cai
  • Published 25 June 2017
  • Computer Science
  • 2018 24th International Conference on Pattern Recognition (ICPR)
Rectified linear unit (ReLU) is a widely used activation function for deep convolutional neural networks. However, because of the zero-hard rectification, ReLU networks lose the benefits from negative values. In this paper, we propose a novel activation function called flexible rectified linear unit (FReLU) to further explore the effects of negative values. By redesigning the rectified point of ReLU as a learnable parameter, FReLU expands the states of the activation output. When a network is… 

Figures and Tables from this paper

Rectified Exponential Units for Convolutional Neural Networks

This paper proposes a novel activation function called Rectified Exponential Unit (REU), inspired by two recently proposed activation functions: Exponential Linear Unit (ELU) and Swish, which is designed by introducing the advantage of flexible exponent and multiplication function form.

DPReLU: Dynamic Parametric Rectified Linear Unit

DPReLU is proposed that can control the overall functional shape of ReLU with four learnable parameters and is based on the ideas of the Parametric ReLU (PReLU) and Flexible Re LU (FReLU).

DPReLU: Dynamic Parametric Rectified Linear Unit and Its Proper Weight Initialization Method

Dynamic Parametric ReLU is proposed, which can dynamically control the overall functional shape of ReLU with four learnable parameters and provide faster convergence and better accuracy than the original ReLU and the previous ReLU variants.

Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning

The proposed Parametric Flatten-T Swish manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.

Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

An activation function called Flatten-T Swish (FTS) that leverage the benefit of the negative values is proposed and is evaluated, which improves MNIST classification accuracy and converges twice as fast as ReLU.

Soft-Root-Sign Activation Function

The proposed nonlinearity, namely "Soft-Root-Sign" (SRS), is smooth, non-monotonic, and bounded, making it more compatible with batch normalization (BN) and less sensitive to initialization.

TanhSoft—Dynamic Trainable Activation Functions for Faster Learning and Better Performance

This work proposes three novel activation functions with learnable parameters, namely TanhSoft-1, Tanh Soft-2, and Tanh soft-3, which are shown to outperform several well-known activation functions.

Activation functions in deep learning: A comprehensive survey and benchmark

Deep Isometric Learning for Visual Recognition

This paper shows that deep vanilla ConvNets without normalization nor skip connections can also be trained to achieve surprisingly good performance on standard image recognition benchmarks.

Adaptively Customizing Activation Functions for Various Layers

A novel methodology is proposed to adaptively customize activation functions only by adding very few parameters to the traditional activation functions such as Sigmoid, Tanh, and ReLU, and it can surpass other popular methods like ReLU and adaptive functions like Swish in terms of overall performance.



Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.

Parametric Exponential Linear Unit for Deep Convolutional Neural Networks

The results on the MNIST, CIFAR-10/100 and ImageNet datasets using the NiN, Overfeat, All-CNN and ResNet networks indicate that the proposed Parametric ELU (PELU) has better performances than the non-parametricELU.

P-TELU: Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks

Enhanced performance of the proposed activation function is evaluated on CIFAR10 and CI-FAR100 image dataset using two convolutional neural network architectures: KerasNet, a small 6 layer CNN model, and on 76 layer deep ResNet architecture.

Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs)

The “inverse square root linear unit” (ISRLU) is introduced to speed up learning in deep neural networks and a computationally efficient variant called the “ISRU” which can be used for RNNs is suggested which has less computational complexity but still has a similar curve to tanh and sigmoid.

Systematic evaluation of convolution neural network advances on the Imagenet

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly is given and several new streamlined architectures for both residual and non-residual Inception Networks are presented.

Deep Residual Networks with Exponential Linear Unit

This paper proposes to replace the combination of ReLU and Batch Normalization with Exponential Linear Unit (ELU) in Residual Networks, and shows that this not only speeds up the learning behavior in Residine Networks, but also improves the classification performance as the depth increases.

Network In Network

With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.