Share This Author
Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed
- Thuong-Hai Pham, Dominik Machácek, Ondrej Bojar
- Computer ScienceComputación y Sistemas
- 25 September 2019
This work tries to promote the knowledge of source-side syntax using multi-task learning either through simple data manipulation techniques or through a dedicated model component, and finds that identical gains are obtained by using trivial ``linear trees'' instead of true dependencies.
English-Czech Systems in WMT19: Document-Level Transformer
- M. Popel, Dominik Machácek, Michal Auersperger, Ondrej Bojar, Pavel Pecina
- Computer ScienceWMT
- 30 July 2019
These NMT systems are based on the Transformer model implemented in either Tensor2Tensor (T2T) or Marian framework and aimed at improving the adequacy and coherence of translated documents by enlarging the context of the source and target.
Comprehension of Subtitles from Re-Translating Simultaneous Speech Translation
The results show that the subtitling layout or flicker have a little effect on comprehension, in contrast to machine translation itself and individual competence, and that users with a limited knowledge of the source language have different preferences to stability and latency than the users with zero knowledge.
Morphological and Language-Agnostic Word Segmentation for NMT
A critical difference between BPE and STE is identified and a simple pre-processing step for BPE is shown that considerably increases translation quality as evaluated by automatic measures.
ELITR Multilingual Live Subtitling: Demo and Strategy
An automatic speech translation system aimed at live subtitling of conference presentations that is routinely tested in recognizing English, Czech, and German speech and presenting it translated simultaneously into 42 target languages is presented.
A Speech Test Set of Practice Business Presentations with Additional Relevant Texts
We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic…
Removing European Language Barriers with Innovative Machine Translation Technology
This paper presents the progress towards deploying a versatile communication platform in the task of highly multilingual live speech translation for conferences and remote meetings live subtitling and outlines the architecture solution and briefly compares it with the ELG platform.
ELITR: European Live Translator
ELITR (European Live Translator) project aims to create a speech translation system for simultaneous subtitling of conferences and online meetings targetting up to 43 languages. The technology is…
Presenting Simultaneous Translation in Limited Space
A way how to estimate the overall usability of the combination of automatic translation and subtitling by measuring the quality, latency, and stability on a test set, and an improved measure for translation latency is proposed.
CUNI Systems for the Unsupervised News Translation Task in WMT 2019
This paper describes the CUNI translation system used for the unsupervised news shared task of the ACL 2019 Fourth Conference on Machine Translation (WMT19), creating a seed phrase-based system where the phrase table is initialized from cross-lingual embedding mappings, followed by a neural machine translation system trained on synthetic parallel data.