• Corpus ID: 234336726

Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

  title={Duplex Sequence-to-Sequence Learning for Reversible Machine Translation},
  author={Zaixiang Zheng and Hao Zhou and Shujian Huang and Jiajun Chen and Jingjing Xu and Lei Li},
Sequence-to-sequence learning naturally has two directions. How to effectively utilize supervision signals from both directions? Existing approaches either require two separate models, or a multitask-learned model but with inferior performance. In this paper, we propose REDER ( R E versible D uplex Transform ER ), a parameter-efficient model and apply it to machine translation. Either end of REDER can simultaneously input and output a distinct language. Thus REDER enables reversible machine… 

Figures and Tables from this paper

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

The DSLP model is proposed, a highly efficient and high-performance model for machine translation to train a non-autoregressive Transformer with Deep Supervision and feed additional Layer-wise Predictions to improve the BLEU scores.

Rephrasing the Reference for Non-Autoregressive Machine Translation

A rephraser is introduced to provide a better training target for NAT by re phasing the reference sentence according to the NAT output, which can be quantified as reward functions and optimized by reinforcement learning.

One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation

It is argued that one reference is not enough and diverse distillation with reference selection (DDRS) for NAT is proposed, which enables a dataset containing multiple high-quality reference translations for each source sentence to be generated.

Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation

A Viterbi decoding framework for DA-Transformer is presented, which guarantees the joint optimal solution for the translation and decoding path under any length constraint and consistently improves the performance of DA-transformer while maintaining a similar decoding speedup.

Reversible Vision Transformers

Reversible Vision Transformers achieve a reduced memory footprint of up to 15.5× at identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for resource limited training regimes.

Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation

This work extends the alignment space to non-monotonic alignments to allow for the global word reordering and further consider all alignments that overlap with the target sentence, closing the gap between non-autoregressive and autoregressive models.

Enhanced Evaluation Method of Musical Instrument Digital Interface Data based on Random Masking and Seq2Seq Model

An enhanced evaluation method based on random masking and sequence-to-sequence (Seq2Seq) model is proposed to evaluate MIDI data and implies that the proposed method quantified the gap while accurately identifying real and generated MIDI data.



Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

Non-autoregressive Machine Translation with Disentangled Context Transformer

An attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts that achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.

Mirror-Generative Neural Machine Translation

The proposed mirror-generative NMT (MGNMT), a single unified architecture that simultaneously integrates the source to target translation model, the target to sourcetranslation model, and two language models, consistently outperforms existing approaches in a variety of scenarios and language pairs, including resource-rich and low-resource languages.

Semi-Supervised Learning for Neural Machine Translation

This work proposes a semi-supervised approach for training NMT models on the concatenation of labeled and unlabeled monolingual corpora data, in which the source- to-target and target-to-source translation models serve as the encoder and decoder, respectively.

Non-Autoregressive Machine Translation with Auxiliary Regularization

This paper proposes to address the issues of repeated translations and incomplete translations in NAT models by improving the quality of decoder hidden representations via two auxiliary regularization terms in the training process of an NAT model.

Levenshtein Transformer

Levenshtein Transformer is developed, a new partially autoregressive model devised for more flexible and amenable sequence generation and a set of new training techniques dedicated at them, effectively exploiting one as the other's learning signal thanks to their complementary nature.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

It is argued that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures.

Improving Neural Machine Translation Models with Monolingual Data

This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English.

Convolutional Sequence to Sequence Learning

This work introduces an architecture based entirely on convolutional neural networks, which outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT-French translation at an order of magnitude faster speed, both on GPU and CPU.