RECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS

@article{Scardapane2018RECURRENTNN,
  title={RECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS},
  author={Simone Scardapane and Steven Van Vaerenbergh and Danilo Comminiello and Simone Totaro and Aurelio Uncini},
  journal={2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)},
  year={2018},
  pages={1-6}
}
Gated recurrent neural networks have achieved remarkable results in the analysis of sequential data. Inside these networks, gates are used to control the flow of information, allowing to model even very long-term dependencies in the data. In this paper, we investigate whether the original gate equation (a linear projection followed by an element-wise sigmoid) can be improved. In particular, we design a more flexible architecture, with a small number of adaptable parameters, which is able to… 

Figures and Tables from this paper

On the Stability and Generalization of Learning with Kernel Activation Functions
TLDR
By indirectly proving two key smoothness properties of the models under consideration, it is proved that neural networks endowed with KAFs generalize well when trained with SGD for a finite number of steps.
A non-parametric softmax for improving neural attention in time-series forecasting
Flexible Generative Adversarial Networks with Non-parametric Activation Functions
TLDR
This paper evaluates training a deep convolutional GAN wherein all hidden activation functions are replaced with a version of the kernel activation function (KAF), a recently proposed technique for learning non-parametric nonlinearities during the optimization process.

References

SHOWING 1-10 OF 20 REFERENCES
Deep Gate Recurrent Neural Network
TLDR
A standard way of representing the inner structure of RNN called RNN Conventional Graph (RCG) is proposed, which helps to analyze the relationship between input units and hidden units of Rnn.
Kafnets: kernel-based non-parametric activation functions for neural networks
Minimal gated unit for recurrent neural networks
TLDR
This work proposes a gated unit for RNN, named as minimal gated units (MGU), since it only contains one gate, which is a minimal design among all gated hidden units.
Can recurrent neural networks warp time?
TLDR
It is proved that learnable gates in a recurrent model formally provide quasi- invariance to general time transformations in the input data, which leads to a new way of initializing gate biases in LSTMs and GRUs.
Depth-Gated Recurrent Neural Networks
In this short note, we present an extension of LSTM to use a depth gate to connect memory cells of adjacent layers. Doing so introduces a linear dependence between lower and upper recurrent units.
Architectural Complexity Measures of Recurrent Neural Networks
TLDR
This paper proposes three architecture complexity measures of RNNs and rigorously proves each measure's existence and computability, and demonstrates that increasing recurrent skip coefficient offers performance boosts on long term dependency problems.
Hybrid computing using a neural network with dynamic external memory
TLDR
A machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer.
LSTM: A Search Space Odyssey
TLDR
This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
TLDR
This work applies a new variational inference based dropout technique in LSTM and GRU models, which outperforms existing techniques, and to the best of the knowledge improves on the single model state-of-the-art in language modelling with the Penn Treebank.
Long Short-Term Memory
TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
...
...