Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022

@article{Vincent2022ControllingFI,
  title={Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022},
  author={Sebastian T. Vincent and Lo{\"i}c Barrault and Carolina Scarton},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.05990}
}
This paper describes the SLT-CDT-UoS group’s submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the provided corpora; using English as a pivot language, we propagated formality annotations to… 

Figures and Tables from this paper

Findings of the IWSLT 2022 Evaluation Campaign
TLDR
For each shared task of the 19th International Conference on Spoken Language Translation, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved are detailed.

References

SHOWING 1-10 OF 23 REFERENCES
CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality
TLDR
This work shows that it can train formality-controlled MT models by tuning on labeled contrastive data, achieving high accuracy (82% in-domain and 73% out-of-domain) while maintaining overall quality.
Controlling the Output Length of Neural Machine Translation
TLDR
Two methods for biasing the output length with a transformer architecture are investigated: i) conditioning the output to a given target-source length-ratio class and ii) enriching the transformer positional embedding with length information.
MuST-C: a Multilingual Speech Translation Corpus
TLDR
MuST-C is created, a multilingual speech translation corpus whose size and quality will facilitate the training of end-to-end systems for SLT from English into 8 languages and an empirical verification of its quality and SLT results computed with a state-of-the-art approach on each language direction.
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
TLDR
SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, finds that it is possible to achieve comparable accuracy to direct subword training from raw sentences.
Controlling Politeness in Neural Machine Translation via Side Constraints
TLDR
A pilot study to control honorifics in neural machine translation (NMT) via side constraints , focusing on English → German, shows that by marking up the (English) source side of the training data with a feature that en-codes the use of honori fic on the (German) target side, it can control the honori⬁ts produced at test time.
Getting Gender Right in Neural Machine Translation
TLDR
The experiments show that adding a gender feature to an NMT system significantly improves the translation quality for some language pairs.
Findings of the IWSLT 2022 Evaluation Campaign
TLDR
For each shared task of the 19th International Conference on Spoken Language Translation, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved are detailed.
Bifixer and Bicleaner: two open-source tools to clean your parallel data
TLDR
Two open-source tools designed for parallel data cleaning, Bifixer and Bicleaner, are shown to have a positive impact on machine translation training times and quality, particularly for the noisiest ones.
Findings of the 2020 Conference on Machine Translation (WMT20)
TLDR
This paper presents the results of the news translation task and the similar language translation task, both organised alongside the Conference on Machine Translation (WMT) 2020, and built machine translation systems for translating between closely related pairs of languages.
Rethinking Text Attribute Transfer: A Lexical Analysis
TLDR
A lexical analysis framework, the Pivot Analysis, is proposed, to quantitatively analyze the effects of these words in text attribute classification and transfer and identifies the future requirements and challenges of this task.
...
...