• Corpus ID: 44135919

Breaking the Activation Function Bottleneck through Adaptive Parameterization

@article{Flennerhag2018BreakingTA,
  title={Breaking the Activation Function Bottleneck through Adaptive Parameterization},
  author={Sebastian Flennerhag and Hujun Yin and John A. Keane and Mark James Elliot},
  journal={ArXiv},
  year={2018},
  volume={abs/1805.08574}
}
Standard neural network architectures are non-linear only by virtue of a simple element-wise activation function, making them both brittle and excessively large. In this paper, we consider methods for making the feed-forward layer more flexible while preserving its basic structure. We develop simple drop-in replacements that learn to adapt their parameterization conditional on the input, thereby increasing statistical efficiency significantly. We present an adaptive LSTM that advances the state… 

Figures and Tables from this paper

Regularized Flexible Activation Function Combination for Deep Neural Networks
TLDR
A novel family of flexible activation functions that can replace sigmoid or tanh in LSTM cells are implemented, as well as a new family by combining ReLU and ELUs, and two new regularisation terms based on assumptions as prior knowledge are introduced.
Front Contribution instead of Back Propagation
TLDR
This work proposes a simple, novel algorithm, the Front-Contribution algorithm, as a compact alternative to BP, which produces the exact same output as BP, in contrast to several recently proposed algorithms approximating BP.
Meta-Learning with Warped Gradient Descent
TLDR
WarpGrad meta-learns an efficiently parameterised preconditioning matrix that facilitates gradient descent across the task distribution and is computationally efficient, easy to implement, and can scale to arbitrarily large meta-learning problems.
Adaptive Parameterization for Neural Dialogue Generation
TLDR
An Adaptive Neural Dialogue generation model, AdaND, which manages various conversations with conversation-specific parameterization, and proposes two adaptive parameterization mechanisms: a context-aware and a topic-aware parameterization mechanism.
On transformative adaptive activation functions in neural networks for gene expression inference
TLDR
This work analyzed the D–GEX method and determined that the inference can be improved using a logistic sigmoid activation function instead of the hyperbolic tangent, and proposed a novel transformative adaptive activation function that improves the gene expression inference even further and which generalizes several existing adaptive activation functions.
QuantNet: transferring learning across trading strategies
TLDR
This paper introduces QuantNet: an architecture that learns market-agnostic trends and use these to learn superior market-specific trading strategies and evaluates QuantNet on historical data across 3103 assets in 58 global equity markets.
QuantNet: Transferring Learning Across Systematic Trading Strategies
TLDR
This paper introduces QuantNet: an architecture that learns market-agnostic trends and use these to learn superior market-specific trading strategies and evaluates QuantNet on historical data across 3103 assets in 58 global equity markets.
Medical image segmentation using customized U-Net with adaptive activation functions
TLDR
The proposed customized U-Net based on the idea of improving a deep network performance and speeding up its learning process while using less parameters successfully compensates the accuracy drop caused by parameter alleviation and also makes the model capable to be tuned with small amount of training data.
Step size self-adaptation for SGD
TLDR
This work proposes the LIGHT function with the four configurations which regulate explicitly an improvement in convergence and generalization on testing, and refers to it as step size self-adaptation, which allows to improve both convergence andgeneralization of neural networks with no need to guarantee their stability.
...
...

References

SHOWING 1-10 OF 57 REFERENCES
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
TLDR
It is shown that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck, and a simple and effective method is proposed to address this issue.
On the State of the Art of Evaluation in Neural Language Models
TLDR
This work reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrives at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models.
Language Modeling with Gated Convolutional Networks
TLDR
A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
Regularizing and Optimizing LSTM Language Models
TLDR
This paper proposes the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization and introduces NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user.
Recurrent Batch Normalization
TLDR
It is demonstrated that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps.
Dynamic Evaluation of Neural Sequence Models
TLDR
Dynamic evaluation improves the state-of-the-art word-level perplexities on the Penn Treebank and WikiText-2 datasets to 51.1 and 44.3 respectively and character-level cross-entropies on the text8 and Hutter Prize datasets to 1.19 bits/char and 1.08bits/char respectively.
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
TLDR
This work introduces a novel theoretical framework that facilitates better learning in language modeling, and shows that this framework leads to tying together the input embedding and the output projection matrices, greatly reducing the number of trainable variables.
Using the Output Embedding to Improve Language Models
TLDR
The topmost weight matrix of neural network language models is studied and it is shown that this matrix constitutes a valid word embedding and a new method of regularizing the output embedding is offered.
Recurrent Highway Networks
TLDR
A novel theoretical analysis of recurrent networks based on Gersgorin's circle theorem is introduced that illuminates several modeling and optimization issues and improves the understanding of the LSTM cell.
On Multiplicative Integration with Recurrent Neural Networks
TLDR
This work introduces a general and simple structural design called Multiplicative Integration, which changes the way in which information from difference sources flows and is integrated in the computational building block of an RNN, while introducing almost no extra parameters.
...
...