Modeling Fluency and Faithfulness for Diverse Neural Machine Translation

  title={Modeling Fluency and Faithfulness for Diverse Neural Machine Translation},
  author={Yang Feng and Wanying Xie and Shuhao Gu and Chenze Shao and Wen Zhang and Zhengxin Yang and Dong Yu},
Neural machine translation models usually adopt the teacher forcing strategy for training which requires the predicted sequence matches ground truth word by word and forces the probability of each prediction to approach a 0-1 distribution. However, the strategy casts all the portion of the distribution to the ground truth word and ignores other words in the target vocabulary even when the ground truth word cannot dominate the distribution. To address the problem of teacher forcing, we propose a… Expand
Towards Enhancing Faithfulness for Neural Machine Translation
A novel training strategy with a multi-task learning paradigm to build a faithfulness enhanced NMT model (named FEnmt), which could improve translation quality by effectively reducing unfaithful translations. Expand
Generating Diverse Translation from Model Distribution with Dropout
This paper proposes to generate diverse translations by deriving a large number of possible models with Bayesian modelling and sampling models from them for inference and shows that the method makes a better trade-off between diversity and accuracy. Expand
Prevent the Language Model from being Overconfident in Neural Machine Translation
A Margin-based Token-level Objective (MTO) and a MSO to maximize the Margin for preventing the LM from being overconfident and improve translation adequacy as well as fluency are proposed. Expand
Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation
Experimental results show the method can outperform competitive baselines significantly and achieves greater improvements on the bigger data sets and proves knowledge distillation the best way to transfer knowledge from the seer decoder to the conventional decoder compared to adversarial learning and L2 regularization. Expand
Filter Pruning Using Expectation Value of Feature Map's Summation
  • Hai Wu, Chuanbin Liu, Fanchao Lin, Yizhi Liu
  • Computer Science
  • 2021


Bridging the Gap between Training and Inference for Neural Machine Translation
This paper addresses the issues of overcorrection over different but reasonable translations by sampling context words not only from the ground truth sequence but also from the predicted sequence by the model during training, where the predicted sequences is selected with a sentence-level optimum. Expand
Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation
This work presents a method with a differentiable sequence-level training objective based on probabilistic n-gram matching which can avoid the reinforcement framework and performs greedy search in the training which uses the predicted words as context just as at inference to alleviate the problem of exposure bias. Expand
Sentence-Level Agreement for Neural Machine Translation
The proposed sentence-level agreement module can be integrated into NMT as an additional training objective function and can also be used to enhance the representation of the source sentences. Expand
Bag-of-Words as Target for Neural Machine Translation
This paper proposes an approach that uses both the sentences and the bag-of-words as targets in the training stage, in order to encourage the model to generate the potentially correct sentences that are not appeared in theTraining set. Expand
Neural Machine Translation by Jointly Learning to Align and Translate
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. Expand
Achieving Human Parity on Automatic Chinese to English News Translation
It is found that Microsoft's latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations. Expand
Recurrent Continuous Translation Models
We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentencesExpand
Neural Machine Translation of Rare Words with Subword Units
This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU. Expand
Sequence to Sequence Mixture Model for Diverse Machine Translation
A novel sequence to sequence mixture (S2SMIX) model that improves both translation diversity and quality by adopting a committee of specialized translation models rather than a single translation model is developed. Expand
Token-level and sequence-level loss smoothing for RNN language models
This work builds upon the recent reward augmented maximum likelihood approach that encourages the model to predict sentences that are close to the ground truth according to a given performance metric, and proposes improvements to the sequence-level smoothing approach. Expand