• Publications
  • Influence
ParaCrawl: Web-Scale Acquisition of Parallel Corpora
TLDR
Methods to create the largest publicly available parallel corpora by crawling the web, using open source software are reported on and the quality and their usefulness to create machine translation systems are evaluated. Expand
Parallel Sentence Mining by Constrained Decoding
TLDR
It is argued that a neural machine translation system by itself can be a sentence similarity scorer and it efficiently approximates pairwise comparison with a modified beam search. Expand
Character Mapping and Ad-hoc Adaptation: Edinburgh’s IWSLT 2020 Open Domain Translation System
TLDR
This paper describes the University of Edinburgh’s neural machine translation systems submitted to the IWSLT 2020 open domain Japanese\leftrightarrowChinese translation task and explores character mapping and unsupervised decoding-time adaptation. Expand
Approaching Neural Chinese Word Segmentation as a Low-Resource Machine Translation Task.
TLDR
This work applies the best practices from low-resource neural machine translation to Chinese word segmentation, and builds encoder-decoder models with attention, and examines a series of techniques including regularization, data augmentation, objective weighting, transfer learning and ensembling. Expand
The University of Edinburgh’s English-German and English-Hausa Submissions to the WMT21 News Translation Task
This paper presents the University of Edinburgh’s constrained submissions of EnglishGerman and English-Hausa systems to the WMT 2021 shared task on news translation. We build En-De systems in threeExpand
The Highs and Lows of Simple Lexical Domain Adaptation Approaches for Neural Machine Translation
TLDR
Two approaches to alleviateMachine translation systems vulnerable to domain mismatch are adopted: lexical shortlisting restricted by IBM statistical alignments, and hypothesis reranking based on similarity. Expand
Efficient Machine Translation with Model Pruning and Quantization
We participated in all tracks of the WMT 2021 efficient machine translation task: single-core CPU, multi-core CPU, and GPU hardware with throughput and latency conditions. Our submissions combineExpand
Decoding Time Lexical Domain Adaptationfor Neural Machine Translation
TLDR
Two simple methods for improving translation quality in out of domain translations are presented, using lexical shortlisting in order to restrict the neural network predictions by IBM model computed alignments and nbest list reordering by reranking all translations based on the amount they overlap with each other. Expand
Sentence and Word Weighting for Neural Machine Translation Domain Adaptation
Neural machine translation today has achieved state-of-the-art performance for machine translation, yet it is facing the problem of domain mismatch due to scarce data. Recently researchers haveExpand
The University of Edinburgh’s Bengali-Hindi Submissions to the WMT21 News Translation Task
We describe the University of Edinburgh’s Bengali↔Hindi constrained systems submitted to the WMT21 News Translation task. We submitted ensembles of Transformer models built with large-scaleExpand