Share This Author
Neural Machine Translation Quality and Post-Editing Performance
It is found that better MT systems indeed lead to fewer changes in the sentences in this industry setting, and the relation between system quality and post-editing time is however not straightforward and, contrary to the results on phrase-based MT, BLEU is definitely not a stable predictor of the time or final output quality.
WMT20 Document-Level Markable Error Exploration
This paper inspects which specific markables are problematic for MT systems and concludes with an analysis of the effect of markable error types on the MT performance measured by humans and automatic evaluation tools.
Outbound Translation User Interface Ptakopět: A Pilot Study
This work explores the task of outbound translation by introducing an open-source modular system Ptakopět, known to be unreliable for evaluating MT systems but its experimental evaluation documents that it works very well for users, at least on MT systems of mid-range quality.
Backtranslation Feedback Improves User Confidence in MT, Not Quality
It is shown that backward translation feedback has a mixed effect on the whole process: it increases user confidence in the produced translation, but not the objective quality.
Extending Ptakopět for Machine Translation User Interaction Experiments
It is shown quantitatively that even though backward translation improves machine-translation user experience, it mainly increases users’ confidence and not the translation quality.
Artefact Retrieval: Overview of NLP Models with Knowledge Base Access
This paper systematically describes the typology of artefacts, retrieval mechanisms and the way these artefacts are fused into the model to uncover combinations of design decisions that had not yet been tried in NLP systems.
Leveraging Neural Machine Translation for Word Alignment
This work summarizes different approaches on how word-alignment can be extracted from alignment scores and explores ways in which scores can be extraction from NMT, focusing on inferring the word- alignment scores based on output sentence and token probabilities.
Knowledge Base Index Compression via Dimensionality and Precision Reduction
This work systematically investigates reducing the size of the KB index by means of dimensionality (sparse random projections, PCA, autoencoders) and numerical precision reduction and shows that PCA is an easy solution that requires very little data and is only slightly worse than autoen coders, which are less stable.
Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models
An LSTM-based autoregressive language model which uses pre-trained on text embeddings from a pretrained masked language model via fusion (e.g. concatenation) to obtain a richer context representation for language modelling to improve the perplexity.
Sampling and Filtering of Neural Machine Translation Distillation Data
- Vilém Zouhar
- Computer ScienceNAACL
- 1 April 2021
This paper explores the sampling method landscape with English to Czech and English to German MT models using standard MT evaluation metrics and shows that careful oversampling and combination with the original data leads to better performance when compared to training only on the original or synthesized data or their direct combination.