Improving Zero-Shot Translation by Disentangling Positional Information

  title={Improving Zero-Shot Translation by Disentangling Positional Information},
  author={Danni Liu and Jan Niehues and James Cross and Francisco Guzm{\'a}n and Xian Li},
Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. We demonstrate that a main factor causing the language-specific representations is the positional… 

Adapting to Non-Centered Languages for Zero-shot Multilingual Translation

This work proposes a simple, lightweight yet effective language-specific modeling method by adapting to non-centered languages and combining the shared information and the language-speak information to counteract the instability of zero-shot translation.

Position Information in Transformers: An Overview

An overview and theoretical comparison of existing methods to incorporate position information into Transformer models is provided and what characteristics of an application should be taken into account when selecting a position encoding is indicated.

Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders

This paper proposes SixT, a simple yet effective model that significantly outperforms mBART, a pretrained multilingual encoder-decoder model explicitly designed for NMT, with an average improvement of 7.1 BLEU on zero-shot any-to-English test sets across 14 source languages.

Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation

SixT+ is presented, a strong many-to-English NMT model that supports 100 source languages but is trained with a parallel dataset in only six source languages, and offers a set of model parameters that can be further fine-tuned to other unsupervised tasks.

Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation

SixT+, a strong many-to-English NMT model that supports 100 source languages but is trained with a parallel dataset in only six source languages, is presented, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.

The Impact of Positional Encodings on Multilingual Compression

While sinusoidal positional encodings were designed for monolingual applications, they are particularly useful in multilingual language models, because they were explicitly designed to facilitate compositionality by allowing linear projections over arbitrary time steps.

Rethinking Zero-shot Neural Machine Translation: From a Perspective of Latent Variables

This paper introduces a denoising autoencoder objective based on pivot language into traditional training objective to improve the translation accuracy on zero-shot directions and significantly outperforms stateof-the-art methods with a remarkable performance.

1Cademy at Semeval-2022 Task 1: Investigating the Effectiveness of Multilingual, Multitask, and Language-Agnostic Tricks for the Reverse Dictionary Task

The proposed Elmo- based monolingual model achieves the highest outcome, and its multitask, and multilingual varieties show competitive results as well.

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Two light-weight and complementary approaches are proposed to denoise the training data for model training, and mask out the vocabulary of the off-target languages in inference to improve the performance of zero-shot translation over strong MNMT baselines.

Tackling Data Scarcity in Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques

  • Tu Anh DinhDanni LiuJ. Niehues
  • Computer Science
    ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2022
The effects of data augmentation and auxiliary loss function were successfully applied to few-shot ST using limited ST data, with improvements of up to +12.9 BLEU points compared to direct end-to-end ST and +3.1 BLEUs compared to ST models fine-tuned from ASR model.



PMIndia - A Collection of Parallel Corpora of Languages of India

A new publicly available corpus consisting of parallel sentences which pair 13 major languages of India with English is described, including an assessment of two different automatic sentence alignment methods and some initial NMT results on the corpus.

Investigating Multilingual NMT Representations at Scale

This work attempts to understand massively multilingual NMT representations using Singular Value Canonical Correlation Analysis (SVCCA), a representation similarity framework that allows us to compare representations across different languages, layers and models.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Improving Zero-shot Translation with Language-Independent Constraints

This work intentionally creates an encoder architecture which is independent with respect to the source language, and designs regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions.

When Can Self-Attention Be Replaced by Feed Forward Layers?

The experiments offer insights to how self-attention layers process the speech signal, leading to the conclusion that the lower self-ATTention layers of the encoder encode a sufficiently wide range of inputs, hence learning further contextual information in the upper layers is unnecessary.

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

It is argued that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures.

Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

This work addresses the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences and proposes two simple but effective approaches: decoder pre-training; back-translation.

Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation

It is found that language-specific subword segmentation results in less subword copying at training time, and leads to better zero-shot performance compared to jointly trained segmentation, and this bias towards English can be effectively reduced with even a small amount of parallel data in some of the non-English pairs.

Beyond English-Centric Multilingual Machine Translation

This work creates a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages and explores how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models.

Complete Multilingual Neural Machine Translation

This paper reintroduces this direct parallel data from multi-way aligned corpora between all source and target languages, and calls MNMT with such connectivity pattern complete Multilingual Neural Machine Translation (cMNMT), and demonstrates its utility and efficacy.