• Corpus ID: 220647507

CoVoST 2: A Massively Multilingual Speech-to-Text Translation Corpus

@article{Wang2020CoVoST2A,
  title={CoVoST 2: A Massively Multilingual Speech-to-Text Translation Corpus},
  author={Changhan Wang and Anne Wu and Juan Miguel Pino},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.10310}
}
Speech translation has recently become an increasingly popular topic of research, partly due to the development of benchmark datasets. Nevertheless, current datasets cover a limited number of languages. With the aim to foster research in massive multilingual speech translation and speech translation for low resource language pairs, we release CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages… 
Textless Speech-to-Speech Translation on Real Data
TLDR
To the knowledge, this work is the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs, and finetunes a pre-trained speech encoder with paired audios from multiple speakers and a single reference speaker to reduce the variations due to accents.
Zero-shot Speech Translation
TLDR
This work attempts to build zero-shot speech translation models that are trained only on ASR and MT tasks but can do ST task during inference, with promising results in the fewshot settings where a limited amount of ST data is available.
Multilingual Speech Translation KIT @ IWSLT2021
TLDR
The main approach is to develop both cascade and end-to-end systems and eventually combine them together to achieve the best possible results for this extremely low-resource setting.
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
TLDR
The NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task is described, which translates from the English audio to German text directly without intermediate transcription.
Translatotron 2: Robust direct speech-to-speech translation
TLDR
Experimental results suggest that Translatotron 2 outperforms the original Translattron by a large margin in terms of translation quality and predicted speech naturalness, and drastically improves the robustness of the predicted speech by mitigating over-generation, such as babbling or long pause.
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation
TLDR
A bilingual E2E-ST model is trained to predict paraphrased transcriptions as an auxiliary task with a single decoder, and bidirectional SeqKD in each direction consistently improves the translation performance, and the effectiveness is complementary regardless of the model capacity.
KIT’s IWSLT 2021 Offline Speech Translation System
TLDR
KIT’submission to the IWSLT 2021 Offline Speech Translation Task is described, a system in both cascaded condition and end-to-end condition, and the Speech Relative Transformer architecture is improved to reach or even surpass the result of the cascade system.
CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
TLDR
CORAA corpora were assembled to both improve ASR models in BP with phenomena from spontaneous speech and motivate young researchers to start their studies on ASR for Portuguese.
UPC's Speech Translation System for IWSLT 2021
TLDR
This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group, which is an end-to-end speech translation system, which combines pre-trained models with coupling modules between the encoder and decoder, and uses an efficient fine-tuning technique.
Findings of the Second Workshop on Automatic Simultaneous Translation
TLDR
A metric “Monotonic Optimal Sequence” (MOS) is proposed considering both quality and latency to rank the submissions in the AutoSimTrans shared task, and some important open issues in simultaneous translation are discussed.
...
1
2
...

References

SHOWING 1-10 OF 25 REFERENCES
Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates
TLDR
A novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions, compiled using the debates held in the European Parliament between 2008 and 2012 is presented.
MuST-C: a Multilingual Speech Translation Corpus
TLDR
MuST-C is created, a multilingual speech translation corpus whose size and quality will facilitate the training of end-to-end systems for SLT from English into 8 languages and an empirical verification of its quality and SLT results computed with a state-of-the-art approach on each language direction.
Multilingual End-to-End Speech Translation
TLDR
It is experimentally confirmed that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios and the generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair.
One-to-Many Multilingual End-to-End Speech Translation
TLDR
This paper proposes a variant of end-to-end SLT that uses target-language embed-dings to shift the input representations in different portions of the space according to the language, so to better support the production of output in the desired target language.
End-to-End Automatic Speech Translation of Audiobooks
TLDR
Experimental results show that it is possible to train compact and efficient end-to-end speech translation models in this setup and hope that the speech translation baseline on this corpus will be challenged in the future.
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings
TLDR
This paper proposes a new method for this task based on multilingual sentence embeddings, which relies on nearest neighbor retrieval with a hard threshold over cosine similarity, and accounts for the scale inconsistencies of this measure.
Neural Machine Translation of Rare Words with Subword Units
TLDR
This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
TLDR
SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, finds that it is possible to achieve comparable accuracy to direct subword training from raw sentences.
Facebook FAIR’s WMT19 News Translation Task Submission
TLDR
This paper describes Facebook FAIR’s submission to the WMT19 shared news translation task and achieves the best case-sensitive BLEU score for the translation direction English→Russian.
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
...
1
2
3
...