CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality

  title={CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality},
  author={Maria Nuadejde and Anna Currey and B. Hsu and Xing Niu and Marcello Federico and Georgiana Dinu},
The machine translation (MT) task is typically formulated as that of returning a single translation for an input segment. However, in many cases, multiple different translations are valid and the appropriate translation may depend on the intended target audience, characteris-tics of the speaker, or even the relationship between speakers. Specific problems arise when dealing with honorifics, particularly translating from English into languages with formality markers. For example, the sentence ‘Are… 
Improving Machine Translation Formality Control with Weakly-Labelled Data Augmentation and Post Editing Strategies
This paper describes Amazon Alexa AI’s implementation for the IWSLT 2022 shared task on formality control and proposes three simple yet effective post editing strategies namely, T-V conversion, utilizing a verb conjugator and seq2seq models in order to rewrite the translated phrases into formal or informal language.
Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022
This paper describes the SLT-CDT-UoS group’s submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were
Controlling Translation Formality Using Pre-trained Multilingual Language Models
Results show that this strategy can approach the translation quality and formality control achieved by dedicated translation models, however, the nature of the underlying pre-trained language model and of the finetuning samples greatly impact results.
Findings of the IWSLT 2022 Evaluation Campaign
For each shared task of the 19th International Conference on Spoken Language Translation, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved are detailed.
Sockeye 3: Fast Neural Machine Translation with PyTorch
Sockeye 3 is the latest version of the Sockeye toolkit for Neural Machine Translation (NMT). Now based on PyTorch, Sockeye 3 provides faster model implementations and more advanced features with a


MTNT: A Testbed for Machine Translation of Noisy Text
This paper proposes a benchmark dataset for Machine Translation of Noisy Text (MTNT), consisting of noisy comments on Reddit and professionally sourced translations, and demonstrates that existing MT models fail badly on a number of noise-related phenomena, even after performing adaptation on a small training set of in-domain data.
Controlling Formality and Style of Machine Translation Output Using AutoML
This work takes a transfer learning approach using Google’s AutoML Translate to train custom neural machine translation models to consistently produce a specific formality in the target language.
Controlling Neural Machine Translation Formality with Synthetic Supervision
A novel training scheme for multi-task models is introduced that automatically generates synthetic training triplets by inferring the missing element on the fly, thus enabling end-to-end training.
Towards Modeling the Style of Translators in Neural Machine Translation
This work investigates methods to augment the state of the art Transformer model with translator information that is available in part of the training data and shows that style-augmented translation models are able to capture the style variations of translators and to generate translations with different styles on new data.
Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer
This work creates the largest corpus for a particular stylistic transfer (formality) and shows that techniques from the machine translation community can serve as strong baselines for future work.
Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus
This work presents the first thorough investigation of gender bias in speech translation, contributing with the release of a benchmark useful for future studies, and the comparison of different technologies on two language directions (English-Italian/French).
Controlling Machine Translation for Multiple Attributes with Additive Interventions
Fine-grained control of machine translation (MT) outputs along multiple attributes is critical for many modern MT applications and is a requirement for gaining users’ trust. A standard approach for
Evaluating Gender Bias in Machine Translation
An automatic gender bias evaluation method for eight target languages with grammatical gender, based on morphological analysis is devised, which shows that four popular industrial MT systems and two recent state-of-the-art academic MT models are significantly prone to gender-biased translation errors for all tested target languages.
Getting Gender Right in Neural Machine Translation
The experiments show that adding a gender feature to an NMT system significantly improves the translation quality for some language pairs.
Controlling the Reading Level of Machine Translation Output
The task of reading level control to machine translation is introduced, and the first results are provided, which can be used to raise or lower the reading level of output translations.