Corpus ID: 20275898

Empirical analysis of non-linear activation functions for Deep Neural Networks in classification tasks

@article{Alcantara2017EmpiricalAO,
  title={Empirical analysis of non-linear activation functions for Deep Neural Networks in classification tasks},
  author={Giovanni Alcantara},
  journal={ArXiv},
  year={2017},
  volume={abs/1710.11272}
}
We provide an overview of several non-linear activation functions in a neural network architecture that have proven successful in many machine learning applications. We conduct an empirical analysis on the effectiveness of using these function on the MNIST classification task, with the aim of clarifying which functions produce the best results overall. Based on this first set of results, we examine the effects of building deeper architectures with an increasing number of hidden layers. We also… Expand
Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning
TLDR
The proposed Parametric Flatten-T Swish manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks. Expand
A Performance Analysis of Deep Convolutional Neural Networks using Kuzushiji Character Recognition
The recent increase in computational power and data available to the general public has given rise to the development of a plethora of highly performant deep neural network architectures. A popularExpand
Predicting Zeros of the Riemann Zeta Function Using Machine Learning: A Comparative Analysis
In this study, we evaluate the predictive performance of Neural Network Regression in locating non-trivial zeros of the Riemann zeta-function relative to Support Vector Machines Regression. WeExpand
Word embedding, neural networks and text classification: what is the state-of-the-art?
In this bachelor thesis, I first introduce the machine learning methodology of text classification with the goal to describe the functioning of neural networks. Then, I identify and discuss theExpand
Email Organization Through Deep Learning Algorithms
Email overload has been an issue for all the email users over the world. In this research we aim to address the issue by devising an intelligent classifier that categories emails into various userExpand
A Machine-Learning Approach to Distinguish Passengers and Drivers Reading While Driving
TLDR
This paper model and evaluate seven cutting-edge machine-learning techniques and proposes a non-intrusive technique that uses only data from smartphone sensors and machine learning to automatically distinguish between drivers and passengers while reading a message in a vehicle. Expand
Historical Event Ordering with Contextual Information
This project investigates the usage of RNNs in the historical event ordering task, which consists of ordering a set of historical events given short textual descriptions of them and, optionally, someExpand
Neural models for information retrieval: towards asymmetry sensitive approaches based on attention models. (Modèles neuronaux pour la recherche d'information : vers des approches sensibles à l'asymétrie basées sur des modèles d'attention)
Ce travail se situe dans le contexte de la recherche d'information (RI) utilisant des techniques d'intelligence artificielle (IA) telles que l'apprentissage profond (DL). Il s'interesse a des tâchesExpand
MEDIALIS DAN ERECTOR SPINAE DALAM TRANSISI GERAKAN UNTUK KONTROL ROBOT KAKI
Loss of some parts of the body and muscle weakness due to injury are factors that interfere with human daily activities. The concept of the exoskeleton is a very positive approach for humans in termsExpand
SegNema: Nematode segmentation strategy in digital microscopy images using deep learning and shape models
Proyecto de Graduacion (Maestria en Computacion con enfasis en Ciencias de la Computacion) Instituto Tecnologico de Costa Rica, Escuela de Ingenieria en Computacion, 2019.

References

SHOWING 1-10 OF 11 REFERENCES
Empirical Evaluation of Rectified Activations in Convolutional Network
TLDR
The experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results, and are negative on the common belief that sparsity is the key of good performance in ReLU. Expand
Understanding the difficulty of training deep feedforward neural networks
TLDR
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. Expand
Self-Normalizing Neural Networks
TLDR
Self-normalizing neural networks (SNNs) are introduced to enable high-level abstract representations and it is proved that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero meanand unit variance -- even under the presence of noise and perturbations. Expand
Understanding Deep Neural Networks with Rectified Linear Units
TLDR
The gap theorems hold for smoothly parametrized families of "hard" functions, contrary to countable, discrete families known in the literature, and a new lowerbound on the number of affine pieces is shown, larger than previous constructions in certain regimes of the network architecture. Expand
Revise Saturated Activation Functions
TLDR
It is shown that "penalized tanh" is comparable and even outperforms the state-of-the-art non-saturated functions including ReLU and leaky ReLU on deep convolution neural networks. Expand
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
TLDR
The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers. Expand
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network. Expand
Understanding the exploding gradient problem
TLDR
The analysis is used to justify the simple yet effective solution of norm clipping the exploded gradient, and the comparison between this heuristic solution and standard SGD provides empirical evidence towards the hypothesis that such a heuristic is required to reach state of the art results on a character prediction task and a polyphonic music prediction one. Expand
The mnist database of handwritten digits
Disclosed is an improved articulated bar flail having shearing edges for efficiently shredding materials. An improved shredder cylinder is disclosed with a plurality of these flails circumferentiallyExpand
Revise saturated activation
  • functions. CoRR,
  • 2016
...
1
2
...