Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks

  title={Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks},
  author={Biao Zhang and Deyi Xiong and Jinsong Su and Qian Lin and Huiji Zhang},
In this paper, we propose an additionsubtraction twin-gated recurrent network (ATR) to simplify neural machine translation. The recurrent units of ATR are heavily simplified to have the smallest number of weight matrices among units of all existing gated RNNs. With the simple addition and subtraction operation, we introduce a twin-gated mechanism to build input and forget gates which are highly correlated. Despite this simplification, the essential non-linearities and capability of modeling… Expand
Neural Machine Translation With GRU-Gated Attention Model
A novel gated recurrent unit (GRU)-gated attention model (GAtt) for NMT that enables translation-sensitive source representations that then contribute to discriminative context vectors and achieves significant improvements over the vanilla attention-based NMT. Expand
Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation
This work approximates full LSTM context modelling by computing hidden states and gates with the current input and a simple bag-of-words representation of the preceding tokens context, and connects the outputs of each parallel step with computationally cheap element-wise computations to enable sequence-level parallelization of LSTMs. Expand
Fast Interleaved Bidirectional Sequence Generation
This work takes inspiration from bidirectional sequence generation and introduces a decoder that generates target words from the left- to-right and right-to-left directions simultaneously, and achieves a decoding speedup of ~2x compared to autoregressive decoding with comparable quality. Expand
Edinburgh’s End-to-End Multilingual Speech Translation System for IWSLT 2021
Edinburgh's submissions to the IWSLT2021 multilingual speech translation (ST) task are described, with Edinburgh's end-to-end multilingual ST model based on Transformer built, integrating techniques including adaptive speech feature selection, language-specific modeling, multi-task learning, deep and big Transformer, sparsified linear attention and root mean square layer normalization. Expand
A Lightweight Recurrent Network for Sequence Modeling
This paper proposes a lightweight recurrent network, or LRN, which uses input and forget gates to handle long-range dependencies as well as gradient vanishing and explosion, with all parameter related calculations factored outside the recurrence. Expand
A Lightweight Recurrent Network for Sequence Modeling
Recurrent networks have achieved great success on various sequential tasks with the assistance of complex recurrent units, but suffer from severe computational inefficiency due to weakExpand
A Survey of Deep Learning Techniques for Neural Machine Translation
This literature survey traces back the origin and principal development timeline of NMT, investigates the important branches, categorizes different research orientations, and discusses some future research trends in this field. Expand
Improving text simplification by corpus expansion with unsupervised learning
A simplification model that does not require a parallel corpus is constructed using an unsupervised translation model and it is confirmed that it is possible to learn the operation of simplification by preparing large-scale pseudo data even if there is non-parallel corpus for simplification. Expand


Deep Neural Machine Translation with Linear Associative Unit
A novel linear associative units (LAU) is proposed to reduce the gradient propagation path inside the recurrent unit to achieve comparable results with the state-of-the-art Neural Machine Translation. Expand
Deep Neural Machine Translation with Weakly-Recurrent Units
This work proposes a new recurrent NMT architecture, called Simple Recurrent NMT, built on a class of fast and weakly-recurrent units that use layer normalization and multiple attentions, and shows that it can achieve better results at a significantly lower computational cost. Expand
A Hierarchy-to-Sequence Attentional Neural Machine Translation Model
A hierarchy-to-sequence attentional NMT model to handle segmenting a long sentence into short clauses, each of which can be easily translated by NMT, which can not only improve parameter learning, but also well explore different scopes of contexts for translation. Expand
Neural Machine Translation of Rare Words with Subword Units
This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU. Expand
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models. Expand
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
This work introduces a new type of linear connections, named fast-forward connections, based on deep Long Short-Term Memory (LSTM) networks, and an interleaved bi-directional architecture for stacking the LSTM layers, and achieves state-of-the-art performance and outperforms the best conventional model by 0.7 BLEU points. Expand
A Convolutional Encoder Model for Neural Machine Translation
A faster and simpler architecture based on a succession of convolutional layers that allows to encode the source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies is presented. Expand
A GRU-Gated Attention Model for Neural Machine Translation
A novel GRU-gated attention model (GAtt) for NMT is proposed which enhances the degree of discrimination of context vectors by enabling source representations to be sensitive to the partial translation generated by the decoder. Expand
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
Asynchronous Bidirectional Decoding for Neural Machine Translation
This paper equip the conventional attentional encoder-decoder NMT framework with a backward decoder, in order to explore bidirectional decoding for NMT, and achieves substantial improvements over the conventional NMT by 3.14 and 1.38 BLEU points, respectively. Expand