• Publications
  • Influence
Neural Machine Translation of Rare Words with Subword Units
TLDR
This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU. Expand
Improving Neural Machine Translation Models with Monolingual Data
TLDR
This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English. Expand
Edinburgh Neural Machine Translation Systems for WMT 16
TLDR
This work participated in the WMT 2016 shared news translation task by building neural translation systems for four language pairs, each trained in both directions, based on an attentional encoder-decoder, using BPE subword segmentation for open-vocabulary translation with a fixed vocabulary. Expand
Context-Aware Neural Machine Translation Learns Anaphora Resolution
TLDR
A context-aware neural machine translation model designed in such way that the flow of information from the extended context to the translation model can be controlled and analyzed is introduced. Expand
Nematus: a Toolkit for Neural Machine Translation
TLDR
Nematus is a toolkit for Neural Machine Translation that prioritizes high translation accuracy, usability, and extensibility and was used to build top-performing submissions to shared translation tasks at WMT and IWSLT. Expand
Controlling Politeness in Neural Machine Translation via Side Constraints
TLDR
A pilot study to control honorifics in neural machine translation (NMT) via side constraints, focusing on English→German, shows that by marking up the (English) source side of the training data with a feature that encodes the use of Honorifics on the (German) target side, it can control the honorificS produced at test time. Expand
Linguistic Input Features Improve Neural Machine Translation
TLDR
This paper generalizes the embedding layer of the encoder in the attentional encoder--decoder architecture to support the inclusion of arbitrary features, in addition to the baseline word feature, and finds that linguistic input features improve model quality according to three metrics: perplexity, BLEU and CHRF3. Expand
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
TLDR
It is found that the most important and confident heads play consistent and often linguistically-interpretable roles and when pruning heads using a method based on stochastic gates and a differentiable relaxation of the L0 penalty, it is observed that specialized heads are last to be pruned. Expand
Evaluating Discourse Phenomena in Neural Machine Translation
TLDR
This article presents hand-crafted, discourse test sets, designed to test the recently proposed multi-encoder NMT models’ ability to exploit previous source and target sentences, and explores a novel way of exploiting context from the previous sentence. Expand
...
1
2
3
4
5
...