• Corpus ID: 7961699

Sequence to Sequence Learning with Neural Networks

  title={Sequence to Sequence Learning with Neural Networks},
  author={Ilya Sutskever and Oriol Vinyals and Quoc V. Le},
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. [] Key Method Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set…

Figures and Tables from this paper

A Hierarchical Neural Network for Sequence-to-Sequences Learning

This paper presents a hierarchical deep neural network architecture to improve the quality of long sentences translation and can achieve superior results with higher BLEU (Bilingual Evaluation Understudy) scores, lower perplexity and better performance in imitating expression style and words usage than the traditional networks.

Multi-task Sequence to Sequence Learning

The results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks, and reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context.

Unsupervised Pretraining for Sequence to Sequence Learning

This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data.

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs

This work proposes a general technique for replacing the softmax layer with a continuous embedding layer, and introduces a novel probabilistic loss, and a training and inference procedure in which it generates a probability distribution over pre-trained word embeddings, instead of a multinomial distribution over the vocabulary obtained via softmax.

Sequence-to-Sequence Learning with Latent Neural Grammars

This work develops a neural parameterization of the grammar which enables parameter sharing over the combinatorial space of derivation rules without the need for manual feature engineering, and applies it to a diagnostic language navigation task and to small-scale machine translation.

The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

It is shown how traditional symbolic statistical machine translation models can still improve neural machine translation while reducing the risk of common pathologies of NMT such as hallucinations and neologisms.

Agreement on Target-Bidirectional LSTMs for Sequence-to-Sequence Learning

The proposed approach relies on the agreement between a pair of target-directional LSTMs, which generates more balanced targets, and develops two efficient approximate search methods for agreement that are empirically shown to be almost optimal in terms of sequence-level losses.

Data Generation Using Sequence-to-Sequence

The results that are got prove the hypothesis that the process of generation of clean data can be validated objectively by evaluating the models alongside the efficiency of the system to generate data in each iteration.

Semi-supervised Sequence Learning

Two approaches to use unlabeled data to improve Sequence Learning with recurrent networks are presented and it is found that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better.



Sequence Transduction with Recurrent Neural Networks

This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence.

LSTM Neural Networks for Language Modeling

This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.

Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

A Neural Probabilistic Language Model

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.

Joint Language and Translation Modeling with Recurrent Neural Networks

This work presents a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words which shows competitive accuracy compared to the traditional channel model features.

Learning long-term dependencies with gradient descent is difficult

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

Statistical Language Models Based on Neural Networks

Although these models are computationally more expensive than N -gram models, with the presented techniques it is possible to apply them to state-of-the-art systems efficiently and achieves the best published performance on well-known Penn Treebank setup.

Recurrent neural network based language model

Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.