Neural Machine Translation Quality and Post-Editing Performance
- Vilém Zouhar, Alevs Tamchyna, M. Popel, Ondvrej Bojar
- Computer ScienceConference on Empirical Methods in Natural…
- 10 September 2021
It is found that better MT systems indeed lead to fewer changes in the sentences in this industry setting, and the relation between system quality and post-editing time is however not straightforward and, contrary to the results on phrase-based MT, BLEU is definitely not a stable predictor of the time or final output quality.
WMT20 Document-Level Markable Error Exploration
- Vilém Zouhar, Tereza Vojtechová, Ondrej Bojar
- Computer ScienceConference on Machine Translation
- 2020
This paper inspects which specific markables are problematic for MT systems and concludes with an analysis of the effect of markable error types on the MT performance measured by humans and automatic evaluation tools.
Backtranslation Feedback Improves User Confidence in MT, Not Quality
- Vilém Zouhar, Michal Nov'ak, Lisa Yankovskaya
- Computer ScienceNorth American Chapter of the Association for…
- 12 April 2021
It is shown that backward translation feedback has a mixed effect on the whole process: it increases user confidence in the produced translation, but not the objective quality.
Outbound Translation User Interface Ptakopět: A Pilot Study
- Vilém Zouhar, Ondvrej Bojar
- Computer ScienceInternational Conference on Language Resources…
- 25 November 2019
This work explores the task of outbound translation by introducing an open-source modular system Ptakopět, known to be unreliable for evaluating MT systems but its experimental evaluation documents that it works very well for users, at least on MT systems of mid-range quality.
Artefact Retrieval: Overview of NLP Models with Knowledge Base Access
- Vilém Zouhar, Marius Mosbach, Debanjali Biswas, D. Klakow
- Computer ScienceArXiv
- 24 January 2022
This paper systematically describes the typology of artefacts, retrieval mechanisms and the way these artefacts are fused into the model to uncover combinations of design decisions that had not yet been tried in NLP systems.
Extending Ptakopět for Machine Translation User Interaction Experiments
- Vilém Zouhar, M. Novák
- Computer SciencePrague Bulletin of Mathematical Linguistics
- 1 October 2020
It is shown quantitatively that even though backward translation improves machine-translation user experience, it mainly increases users’ confidence and not the translation quality.
Knowledge Base Index Compression via Dimensionality and Precision Reduction
- Vilém Zouhar, Marius Mosbach, Miaoran Zhang, D. Klakow
- Computer ScienceSPANLP
- 6 April 2022
This work systematically investigates reducing the size of the KB index by means of dimensionality (sparse random projections, PCA, autoencoders) and numerical precision reduction and shows that PCA is an easy solution that requires very little data and is only slightly worse than autoen coders, which are less stable.
Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models
- Vilém Zouhar, Marius Mosbach, D. Klakow
- Computer ScienceArXiv
- 4 August 2022
An LSTM-based autoregressive language model which uses pre-trained on text embeddings from a pretrained masked language model via fusion (e.g. concatenation) to obtain a richer context representation for language modelling to improve the perplexity.
EMMT: A simultaneous eye-tracking, 4-electrode EEG and audio corpus for multi-modal reading and translation scenarios
- Sunit Bhattacharya, Vvera Kloudov'a, Vilém Zouhar, Ondvrej Bojar
- Computer ScienceArXiv
- 6 April 2022
The EMMMT corpus, a dataset containing monocular eye movement recordings, audio and 4-electrode electroencephalogram (EEG) data of 43 participants, is well suited for research in Translation Process Studies, Cognitive Sciences among other disciplines.
Leveraging Neural Machine Translation for Word Alignment
- Vilém Zouhar, Daria Pylypenko
- Computer SciencePrague Bulletin of Mathematical Linguistics
- 31 March 2021
This work summarizes different approaches on how word-alignment can be extracted from alignment scores and explores ways in which scores can be extraction from NMT, focusing on inferring the word- alignment scores based on output sentence and token probabilities.
...
...