Corpus ID: 7961699

Sequence to Sequence Learning with Neural Networks

@inproceedings{Sutskever2014SequenceTS,
  title={Sequence to Sequence Learning with Neural Networks},
  author={Ilya Sutskever and Oriol Vinyals and Quoc V. Le},
  booktitle={NIPS},
  year={2014}
}
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. [...] Key Method Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set…Expand
A Hierarchical Neural Network for Sequence-to-Sequences Learning
TLDR
This paper presents a hierarchical deep neural network architecture to improve the quality of long sentences translation and can achieve superior results with higher BLEU (Bilingual Evaluation Understudy) scores, lower perplexity and better performance in imitating expression style and words usage than the traditional networks. Expand
Multi-task Sequence to Sequence Learning
TLDR
The results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks, and reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context. Expand
Unsupervised Pretraining for Sequence to Sequence Learning
TLDR
This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data. Expand
Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs
TLDR
This work proposes a general technique for replacing the softmax layer with a continuous embedding layer, and introduces a novel probabilistic loss, and a training and inference procedure in which it generates a probability distribution over pre-trained word embeddings, instead of a multinomial distribution over the vocabulary obtained via softmax. Expand
Sequence-to-Sequence Learning with Latent Neural Grammars
TLDR
This work develops a neural parameterization of the grammar which enables parameter sharing over the combinatorial space of derivation rules without the need for manual feature engineering, and applies it to a diagnostic language navigation task and to small-scale machine translation. Expand
Agreement on Target-Bidirectional LSTMs for Sequence-to-Sequence Learning
TLDR
The proposed approach relies on the agreement between a pair of target-directional LSTMs, which generates more balanced targets, and develops two efficient approximate search methods for agreement that are empirically shown to be almost optimal in terms of sequence-level losses. Expand
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
TLDR
It is shown how traditional symbolic statistical machine translation models can still improve neural machine translation while reducing the risk of common pathologies of NMT such as hallucinations and neologisms. Expand
Data Generation Using Sequence-to-Sequence
TLDR
The results that are got prove the hypothesis that the process of generation of clean data can be validated objectively by evaluating the models alongside the efficiency of the system to generate data in each iteration. Expand
Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning
TLDR
Experimental results of three natural language tasks confirm that the proposed RNN--SVAE yields higher performance than two benchmark models, and the mean and standard deviation of the continuous semantic space are learned to take advantage of the variational method. Expand
Semi-supervised Sequence Learning
TLDR
Two approaches to use unlabeled data to improve Sequence Learning with recurrent networks are presented and it is found that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 56 REFERENCES
Sequence Transduction with Recurrent Neural Networks
  • A. Graves
  • Computer Science, Mathematics
  • ArXiv
  • 2012
TLDR
This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Expand
LSTM Neural Networks for Language Modeling
TLDR
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system. Expand
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. Expand
A Neural Probabilistic Language Model
TLDR
This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. Expand
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
TLDR
This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing. Expand
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TLDR
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs. Expand
Joint Language and Translation Modeling with Recurrent Neural Networks
TLDR
This work presents a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words which shows competitive accuracy compared to the traditional channel model features. Expand
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
TLDR
Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases. Expand
Learning long-term dependencies with gradient descent is difficult
TLDR
This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods. Expand
Statistical Language Models Based on Neural Networks
TLDR
Although these models are computationally more expensive than N -gram models, with the presented techniques it is possible to apply them to state-of-the-art systems efficiently and achieves the best published performance on well-known Penn Treebank setup. Expand
...
1
2
3
4
5
...