The USYD-JD Speech Translation System for IWSLT2021

  title={The USYD-JD Speech Translation System for IWSLT2021},
  author={Liang Ding and Di Wu and Dacheng Tao},
This paper describes the University of Sydney & JD’s joint submission of the IWSLT 2021 low resource speech translation task. We participated in the Swahili->English direction and got the best scareBLEU (25.3) score among all the participants. Our constrained system is based on a pipeline framework, i.e. ASR and NMT. We trained our models with the officially provided ASR and MT datasets. The ASR system is based on the open-sourced tool Kaldi and this work mainly explores how to make the most of… Expand

Figures and Tables from this paper

Improving Neural Machine Translation by Bidirectional Training
  • Liang Ding, Di Wu, D. Tao
  • Computer Science
  • ArXiv
  • 2021
BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher and can complement existing data manipulation strategies, i.e. back translation, data distillation and data diversification. Expand
This paper describes each shared task, data and evaluation metrics, and reports results of the received submissions of the IWSLT 2021 evaluation campaign. Expand


The University of Sydney’s Machine Translation System for WMT19
The University of Sydney's submission of the WMT 2019 shared news translation task is described, with the best result outperforms the baseline (Transformer ensemble model trained with the original parallel corpus) by approximately 5.3 BLEU score, achieving the state-of-the-art performance. Expand
Tencent AI Lab Machine Translation Systems for WMT20 Chat Translation Task
This paper proposed a hybrid data selection method to select high-quality and in-domain sentences from out-of-domain data and explored to transfer general knowledge from four different pre-training language models to the downstream translation task. Expand
An Empirical Study of Machine Translation for the Shared Task of WMT18
The submitted system focus on data clearing and techniques to build a competitive model for this task, and mainly relied on the data filtering to obtain the best BLEU score. Expand
Improving Neural Machine Translation Models with Monolingual Data
This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English. Expand
Exploiting Monolingual Data at Scale for Neural Machine Translation
This work studies how to use both the source-side and target-side monolingual data for NMT, and proposes an effective strategy leveraging both of them. Expand
Tencent Neural Machine Translation Systems for the WMT20 News Translation Task
This paper describes Tencent Neural Machine Translation systems for the WMT 2020 news translation tasks, built on deep Transformer and several data augmentation methods, and proposes a boosted in-domain finetuning method to improve single models. Expand
Progressive Multi-Granularity Training for Non-Autoregressive Translation
It is empirically shown that NAT models are prone to learn fine-grained lower-mode knowledge, such as words and phrases, compared with sentences, and proposed progressive multigranularity training for NAT is proposed, resulting in better translation quality against strong NAT baselines. Expand
Multilingual Denoising Pre-training for Neural Machine Translation
Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—aExpand
Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation
This work directly exposes the raw data into NAT by leveraging pretraining to rejuvenate more alignments for lowfrequency target words and demonstrates that the proposed approach can significantly and universally improve translation quality by reducing translation errors on low-frequency words. Expand
Understanding and Improving Lexical Choice in Non-Autoregressive Translation
This study empirically shows that as a side effect of training non-autoregressive translation models, the lexical choice errors on low-frequency words are propagated to the NAT model from the teacher model, and proposes to expose the raw data to NAT models to restore the useful information of low- Frequency words, which are missed in the distilled data. Expand