Share This Author
The Multilingual TEDx Corpus for Speech Recognition and Translation
- Elizabeth Salesky, Matthew Wiesner, Matt Post
- Computer Science, LinguisticsInterspeech
- 2 February 2021
The Multilingual TEDx corpus is a collection of audio recordings from TEDx talks in 8 source languages built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages.
Fluent Translations from Disfluent Speech in End-to-End Speech Translation
This work uses a sequence-to-sequence model to translate from noisy, disfluent speech to fluent text with disfluencies removed using the recently collected ‘copy-edited’ references for the Fisher Spanish-English dataset.
FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN
Each track’s goal, data and evaluation metrics are introduced, and the results of the received submissions are reported.
The IWSLT 2019 Evaluation Campaign
The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of…
Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation
This work shows that a naive method to create compressed phoneme-like speech representations is far more effective and efficient for translation than traditional frame-level speech features.
A Language-Independent Approach to Automatic Text Difficulty Assessment for Second-Language Learners
A new baseline for language-independent text difficulty assessment applied to the Interagency Language Roundtable (ILR) proficiency scale is introduced and it is demonstrated that reading level assessment is a discriminative problem that is best-suited for regression.
Towards Fluent Translations From Disfluent Speech
- Elizabeth Salesky, S. Burger, J. Niehues, A. Waibel
- Computer ScienceIEEE Spoken Language Technology Workshop (SLT)
- 7 November 2018
A corpus of cleaned target data for the Fisher Spanish-English dataset is introduced to compare how different architectures handle disfluency and provide a baseline for removing disfluencies in end-to-end translation.
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
This survey connects several lines of work from the pre-neural and neural era, by showing how hybrid approaches of words and characters as well as subwordbased approaches based on learned segmentation have been proposed and evaluated.
Exploiting Morphological, Grammatical, and Semantic Correlates for Improved Text Difficulty Assessment
This work demonstrates that the addition of morphological, information theoretic, and language modeling features to a traditional readability baseline greatly benefits this system's performance.
Relative Positional Encoding for Speech Recognition and Direct Translation
This work adapts the relative position encoding scheme to the Speech Transformer, where the key addition is relative distance between input states in the self-attention network, so the network can better adapt to the variable distributions present in speech data.