CARU: A Content-Adaptive Recurrent Unit for the Transition of Hidden State in NLP

  title={CARU: A Content-Adaptive Recurrent Unit for the Transition of Hidden State in NLP},
  author={Ka‐Hou Chan and W. Ke and Sio Kei Im},
This article introduces a novel RNN unit inspired by GRU, namely the Content-Adaptive Recurrent Unit (CARU). The design of CARU contains all the features of GRU but requires fewer training parameters. We make use of the concept of weights in our design to analyze the transition of hidden states. At the same time, we also describe how the content adaptive gate handles the received words and alleviates the long-term dependence problem. As a result, the unit can improve the accuracy of the… 

A Multilayer CARU Framework to Obtain Probability Distribution for Paragraph-Based Sentiment Analysis

This work proposes a Multilayer Content-Adaptive Recurrent Unit (CARU) network for paragraph information extraction and presents a type of CNN-based model as an extractor to explore and capture useful features in the hidden state.

Dynamic SIoT Network Status Prediction

The proposed CARU-EKF can improve the performance of time-series data forecasting for nonlinear SIoT data traffic and shows better performance than existing prediction methods in terms of metrics of Mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE) and determination coefficient (R2).



A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding

This work proposes to use BLSTM-RNN for a unified tagging solution that can be applied to various tagging tasks including part-of-speech tagging, chunking and named entity recognition, requiring no task specific knowledge or sophisticated feature engineering.

Minimal gated unit for recurrent neural networks

This work proposes a gated unit for RNN, named as minimal gated units (MGU), since it only contains one gate, which is a minimal design among all gated hidden units.

Capacity and Trainability in Recurrent Neural Networks

It is found that for several tasks it is the per-task parameter capacity bound that determines performance, and two novel RNN architectures are proposed, one of which is easier to train than the LSTM or GRU for deeply stacked architectures.

Simplified minimal gated unit variations for recurrent neural networks

  • Joel HeckF. Salem
  • Computer Science
    2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)
  • 2017
Three model variants of the minimal gated unit which further simplify that design by reducing the number of parameters in the forget-gate dynamic equation are introduced and shown similar accuracy to the MGU model while using fewer parameters and thus lower training expense.

Learning to Forget: Continual Prediction with LSTM

This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Extensions of recurrent neural network language model

Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model.

LSTM recurrent networks learn simple context-free and context-sensitive languages

Long short-term memory (LSTM) variants are also the first RNNs to learn a simple context-sensitive language, namely a(n)b( n)c(n).

Learning representations by back-propagating errors

Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.