It is found that better MT systems indeed lead to fewer changes in the sentences in this industry setting, and the relation between system quality and post-editing time is however not straightforward and, contrary to the results on phrase-based MT, BLEU is definitely not a stable predictor of the time or final output quality.
This paper inspects which specific markables are problematic for MT systems and concludes with an analysis of the effect of markable error types on the MT performance measured by humans and automatic evaluation tools.
This work explores the task of outbound translation by introducing an open-source modular system Ptakopět, known to be unreliable for evaluating MT systems but its experimental evaluation documents that it works very well for users, at least on MT systems of mid-range quality.
It is shown that backward translation feedback has a mixed effect on the whole process: it increases user confidence in the produced translation, but not the objective quality.
It is shown quantitatively that even though backward translation improves machine-translation user experience, it mainly increases users’ confidence and not the translation quality.
This paper systematically describes the typology of artefacts, retrieval mechanisms and the way these artefacts are fused into the model to uncover combinations of design decisions that had not yet been tried in NLP systems.
This work summarizes different approaches on how word-alignment can be extracted from alignment scores and explores ways in which scores can be extraction from NMT, focusing on inferring the word- alignment scores based on output sentence and token probabilities.
This work systematically investigates reducing the size of the KB index by means of dimensionality (sparse random projections, PCA, autoencoders) and numerical precision reduction and shows that PCA is an easy solution that requires very little data and is only slightly worse than autoen coders, which are less stable.
An LSTM-based autoregressive language model which uses pre-trained on text embeddings from a pretrained masked language model via fusion (e.g. concatenation) to obtain a richer context representation for language modelling to improve the perplexity.
This paper explores the sampling method landscape with English to Czech and English to German MT models using standard MT evaluation metrics and shows that careful oversampling and combination with the original data leads to better performance when compared to training only on the original or synthesized data or their direct combination.