Corpus ID: 235458143

Lost in Interpreting: Speech Translation from Source or Interpreter?

  title={Lost in Interpreting: Speech Translation from Source or Interpreter?},
  author={Dominik Mach{\'a}cek and Mat{\'u}s Zilinec and Ondrej Bojar},
Interpreters facilitate multi-lingual meetings but the affordable set of languages is often smaller than what is needed. Automatic simultaneous speech translation can extend the set of provided languages. We investigate if such an automatic system should rather follow the original speaker, or an interpreter to achieve better translation quality at the cost of increased delay. To answer the question, we release Europarl Simultaneous Interpreting Corpus (ESIC), 10 hours of recordings and… Expand
1 Citations

Tables from this paper

Operating a Complex SLT System with Speakers and Human Interpreters
We describe our experience with providing automatic simultaneous spoken language translation for an event with human interpreters. We provide a detailed overview of the systems we use, focusing onExpand


Re-Translation Strategies for Long Form, Simultaneous, Spoken Language Translation
This work investigates the problem of simultaneous machine translation of long-form speech content with a continuous speech-to-text scenario, generating translated captions for a live audio feed, and adopts a re-translation approach to simultaneous translation. Expand
Lecture Translator - Speech translation framework for simultaneous lecture translation
A system that performs the task of simultaneous speech translation of university lectures by performing speech translation on a stream of audio in real-time and with low latency and features several techniques beyond the basic speech translation task, that make it fit for real-world use. Expand
Segmentation and punctuation prediction in speech language translation using a monolingual translation system
This paper builds a monolingual translation system from German to German implementing segmentation and punctuation prediction as a machine translation task and shows an upper bound of translation quality if the authors had human-generated segmentationand punctuation on the output stream of speech recognition systems. Expand
Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates
A novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions, compiled using the debates held in the European Parliament between 2008 and 2012 is presented. Expand
Multilingual Neural Machine Translation for Low Resource Languages
This work shows how the so-called multilingual NMT can help to tackle the challenges associated with low-resourced language translation, and introduces the recently proposed iterative self-training method, which incrementally improves a mult bilingual NMT on a zero-shot direction by just relying on monolingual data. Expand
Interpreting Strategies Annotation in the WAW Corpus
An automatic analysis of a corpus of parallel speeches and their human interpretations, and the results of manually annotating the human interpreting strategies in a sample of the corpus are provided. Expand
Low-Latency Neural Speech Translation
It is shown that NMT systems can be adapted to scenarios where no task-specific training data is available, and the number of corrections displayed during incremental output construction is reduced by 45%, without a decrease in translation quality. Expand
Tagging a Corpus of Interpreted Speeches: the European Parliament Interpreting Corpus (EPIC)
The performance of three different taggers (Treetagger, Freeling and GRAMPAL) is evaluated on three different languages, i.e. English, Italian and Spanish, to assess the success rate achieved in tagging and lemmatisation. Expand
A real-world system for simultaneous translation of German lectures
A real-time automatic speech translation system for university lectures that can interpret several lectures in parallel that is now being installed in several lecture halls at KIT and is able to provide the translation to the students in several parallel sessions. Expand
The Chinese/English Political Interpreting Corpus (CEPIC): A New Electronic Resource for Translators and Interpreters
  • Jun Pan
  • Political Science
  • Proceedings of the Second Workshop Human-Informed Translation and Interpreting Technology associated with RANLP 2019
  • 2019
The Chinese/English Political Interpreting Corpus (CEPIC) is a new electronic and open access resource developed for translators and interpreters, especially those working with political text types.Expand