• Corpus ID: 241032942

Lingua Custodia’s Participation at the WMT 2021 Machine Translation Using Terminologies Shared Task

  title={Lingua Custodia’s Participation at the WMT 2021 Machine Translation Using Terminologies Shared Task},
  author={Melissa Ailem and Jinghsu Liu and Raheel Qader},
This paper describes Lingua Custodia’s submission to the WMT21 shared task on machine translation using terminologies. We consider three directions, namely English to French, Russian, and Chinese. We rely on a Transformer-based architecture as a building block, and we explore a method which introduces two main changes to the standard procedure to handle terminologies. The first one consists in augmenting the training data in such a way as to encourage the model to learn a copy behavior when it… 

Figures and Tables from this paper

Findings of the WMT Shared Task on Machine Translation Using Terminologies

This work introduces a benchmark for evaluating the quality and consistency of terminology translation, focusing on the medical domain for five language pairs: English to French, Chinese, Russian, and Korean, as well as Czech to German.

TMU NMT System with Automatic Post-Editing by Multi-Source Levenshtein Transformer for the Restricted Translation Task of WAT 2022

The experimental results reveal that 100% of the RTVs can be included in the generated sentences while maintaining the translation quality of the LeCA model on both English to Japanese (En→Ja) and Japanese to English (Ja→En) tasks.



On the Evaluation of Machine Translation for Terminology Consistency

This work proposes metrics to measure the consistency of MT output with regards to a domain terminology, and performs studies on the COVID-19 domain over 5 languages, also performing terminology-targeted human evaluation.

Encouraging Neural Machine Translation to Satisfy Terminology Constraints

A new approach to encourage neural machine translation to satisfy lexical constraints by augmenting the training data to specify the constraints and modifying the standard cross entropy loss to bias the model towards assigning high probabilities to constraint words.

Lexically Constrained Neural Machine Translation with Levenshtein Transformer

A simple and effective algorithm for incorporating lexical constraints in neural machine translation that leverages the flexibility and speed of a recently proposed Levenshtein Transformer model and injects terminology constraints at inference time without any impact on decoding speed.

SYSTRAN's Pure Neural Machine Translation Systems

This work presents its approach to production-ready systems simultaneously with release of online demonstrators covering a large variety of languages (12 languages, for 32 language pairs) and discusses about evaluation methodology, presents the first findings and outlines further work.

Training Neural Machine Translation to Apply Terminology Constraints

Comparative experiments show that the proposed method is not only more effective than a state-of-the-art implementation of constrained decoding, but is also as fast as constraint-free decoding.

Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation

This work presents a algorithm for lexically constrained decoding with a complexity of O(1) in the number of constraints and demonstrates the algorithm’s remarkable ability to properly place constraints, and uses it to explore the shaky relationship between model and BLEU scores.

Code-Switching for Enhancing NMT with Pre-Specified Translation

This work investigates a data augmentation method, making code-switched training data by replacing source phrases with their target translations, allowing the model to learn lexicon translations by copying source-side target words.

Neural Machine Translation of Rare Words with Subword Units

This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, finds that it is possible to achieve comparable accuracy to direct subword training from raw sentences.

Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search

Experiments show that GBS can provide large improvements in translation quality in interactive scenarios, and that, even without any user input, it can be used to achieve significant gains in performance in domain adaptation scenarios.