• Corpus ID: 17078659

Guided Alignment Training for Topic-Aware Neural Machine Translation

  title={Guided Alignment Training for Topic-Aware Neural Machine Translation},
  author={Wenhu Chen and Evgeny Matusov and Shahram Khadivi and Jan-Thorsten Peter},
  booktitle={Conference of the Association for Machine Translation in the Americas},
In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input… 

Figures and Tables from this paper

Leveraging Neural Machine Translation for Word Alignment

This work summarizes different approaches on how word-alignment can be extracted from alignment scores and explores ways in which scores can be extraction from NMT, focusing on inferring the word- alignment scores based on output sentence and token probabilities.

Neural Machine Translation on scarce-resource condition: A case-study on Persian-English

This paper study NMT model on Persian-English language pairs, to analyze the model and investigate the appropriateness of the model for scarce-resourced scenarios, the situation that exist for Persian-centered translation systems.

Accurate Word Alignment Induction from Neural Machine Translation

It is shown that attention weights do capture accurate word alignment, which could only be revealed if the correct decoding step and layer to induce word alignment is chosen, and two simple but effective interpretation methods for word alignment induction are presented.

Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary

We propose a method to transfer knowledge across neural machine translation (NMT) models by means of a shared dynamic vocabulary. Our approach allows to extend an initial model for a given language

Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation

This work proposes an extension of the attention-based NMT model that introduces target information into the attention mechanism to produce high-quality alignments and halves the Aer with an absolute improvement of 19.1% Aer.

Language Modeling, Lexical Translation, Reordering: The Training Process of NMT through the Lens of Classical SMT

This work looks at the competences related to three core SMT components and finds that during training, NMT first focuses on learning target-side language modeling, then improves translation quality approaching word-by-word translation, and finally learns more complicated reordering patterns.

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

This work demonstrates that alignment extraction in transformer models can be improved by augmenting an additional alignment head to the multi-head source-to-target attention component, and proposes alignment pruning to speed up decoding in alignment-based neural machine translation (ANMT).

Paying Attention to Multi-Word Expressions in Neural Machine Translation

Results of experiments on investigating NMT attention allocation to the MWEs and improving automated translation of sentences that contain MWES in English->Latvian and English->Czech NMT systems are presented.

Cross-language Sentence Selection via Data Augmentation and Rationale Training

This paper uses data augmentation and negative sampling techniques on noisy parallel sentence data to directly learn a cross-lingual embedding-based query relevance model that performs as well as or better than multiple state-of-the-art machine translation + monolingual retrieval systems trained on the same parallel data.

Domain Control for Neural Machine Translation

A new technique for neural machine translation (NMT) that is performed at runtime using a unique neural network covering multiple domains is proposed, called domain control, which shows quality improvements when compared to dedicated domains translating on any of the covered domains and even on out-of-domain data.



Addressing the Rare Word Problem in Neural Machine Translation

This paper proposes and implements an effective technique to address the problem of end-to-end neural machine translation's inability to correctly translate very rare words, and is the first to surpass the best result achieved on a WMT’14 contest task.

Effective Approaches to Attention-based Neural Machine Translation

A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.

Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

On Using Very Large Target Vocabulary for Neural Machine Translation

It is shown that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary.

Stanford Neural Machine Translation Systems for Spoken Language Domains

This work further explores the effectiveness of NMT in spoken language domains by participating in the MT track of the IWSLT 2015 and demonstrates that using an existing NMT framework can achieve competitive results in the aforementioned scenarios when translating from English to German and Vietnamese.

Dynamic Topic Adaptation for Phrase-based MT

This work explores topic adaptation on a diverse data set and presents a new bilingual variant of Latent Dirichlet Allocation to compute topic-adapted, probabilistic phrase translation features, and dynamically infer document-specific translation probabilities for test sets of unknown origin.

Topic adaptation for machine translation of e-commerce content

Efforts to improve machine translation of item titles found in a large e-commerce inventory through topic modeling and adaptation are described and novel methods that augment the standard phrase-table models with sparse features and dense features measuring the topic match between each phrase-pair and the input text are proposed.

Supervised Attentions for Neural Machine Translation

In this paper, we improve the attention or alignment accuracy of neural machine translation by utilizing the alignments of training sentence pairs. We simply compute the distance between the machine

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.


We define a new, intuitive measure for evaluating machine translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments.