Marian: Cost-effective High-Quality Neural Machine Translation in C++

@inproceedings{JunczysDowmunt2018MarianCH,
  title={Marian: Cost-effective High-Quality Neural Machine Translation in C++},
  author={Marcin Junczys-Dowmunt and Kenneth Heafield and Hieu T. Hoang and Roman Grundkiewicz and Anthony Aue},
  booktitle={NMT@ACL},
  year={2018}
}
This paper describes the submissions of the “Marian” team to the WNMT 2018 shared task. We investigate combinations of teacher-student training, low-precision matrix products, auto-tuning and other methods to optimize the Transformer model on GPU and CPU. By further integrating these methods with the new averaging attention networks, a recently introduced faster Transformer variant, we create a number of high-quality, high-performance models on the GPU and CPU, dominating the Pareto frontier… 

Figures and Tables from this paper

Edinburgh Research Explorer From Research to Production and Back: Ludicrously Fast Neural Machine Translation
TLDR
Improved teacher-student training via multi-agent dual-learning and noisy backward-forward translation for Transformer-based student models and push the Pareto frontier established during the 2018 edition towards 24x (CPU) and 14x (GPU) faster models at comparable or higher BLEU values.
Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task
TLDR
In the shared task, most of the submissions were Pareto optimal with respect the trade-off between time and quality.
Speed-optimized, Compact Student Models that Distill Knowledge from a Larger Teacher Model: the UEDIN-CUNI Submission to the WMT 2020 News Translation Task
TLDR
This work describes the joint submission of the University of Edinburgh and Charles University, Prague, to the Czech/English track in the WMT 2020 Shared Task on News Translation and achieves translation speeds of over 700 whitespace-delimited source words per second on a single CPU thread, thus making neural translation feasible on consumer hardware without a GPU.
Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation
TLDR
The lottery ticket hypothesis is applied to prune heads in the early stages of training and shows that it is possible to remove up to three-quarters of attention heads from transformer-big during early training with an average -0.1 change in BLEU for Turkish→English.
The Highs and Lows of Simple Lexical Domain Adaptation Approaches for Neural Machine Translation
TLDR
Two approaches to alleviateMachine translation systems vulnerable to domain mismatch are adopted: lexical shortlisting restricted by IBM statistical alignments, and hypothesis reranking based on similarity.
Q8BERT: Quantized 8Bit BERT
TLDR
This work shows how to perform quantization-aware training during the fine-tuning phase of BERT in order to compress BERT by 4x with minimal accuracy loss and the produced quantized model can accelerate inference speed if it is optimized for 8bit Integer supporting hardware.
Sockeye 3: Fast Neural Machine Translation with PyTorch
Sockeye 3 is the latest version of the Sockeye toolkit for Neural Machine Translation (NMT). Now based on PyTorch, Sockeye 3 provides faster model implementations and more advanced features with a
Cheat Codes to Quantify Missing Source Information in Neural Machine Translation
TLDR
A method to quantify the amount of information added by the target sentence t that is not present in the source s in a neural machine translation system that is provided in a highly compressed form (a “cheat code”) is described.
Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention
TLDR
Results on WMT and IWSLT translation tasks with five translation directions show that deep Transformers with DS-Init and MAtt can substantially outperform their base counterpart in terms of BLEU, while matching the decoding speed of the baseline model thanks to the efficiency improvements of MAtt.
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
TLDR
This work first inspects the fundamental issues of fully NAT models, and adopt dependency reduction in the learning space of output tokens as the primary guidance, and revisit methods in four different aspects that have been proven effective for improving NAT models and carefully combine these techniques with necessary modifications.
...
...

References

SHOWING 1-10 OF 13 REFERENCES
Marian: Fast Neural Machine Translation in C++
TLDR
Marian is an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs that can achieve high training and translation speed.
Sockeye: A Toolkit for Neural Machine Translation
TLDR
This paper highlights Sockeye's features and benchmark it against other NMT toolkits on two language arcs from the 2017 Conference on Machine Translation (WMT): English-German and Latvian-English, and reports competitive BLEU scores across all three architectures.
Accelerating Neural Transformer via an Average Attention Network
TLDR
The proposed average attention network is applied on the decoder part of the neural Transformer to replace the original target-side self-attention model and enables the neuralTransformer to decode sentences over four times faster than its original version with almost no loss in training time and translation performance.
Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions
TLDR
It is demonstrated that current neural machine translation could already be used for in-production systems when comparing words-persecond ratios, and aspects of translation speed are investigated, introducing AmuNMT, the authors' efficient neural machinetranslation decoder.
The University of Edinburgh’s Neural MT Systems for WMT17
TLDR
The University of Edinburgh's submissions to the WMT17 shared news translation and biomedical translation tasks are described, with novelties this year include the use of deep architectures, layer normalization, and more compact models due to weight tying and improvements in BPE segmentations.
Findings of the Second Workshop on Neural Machine Translation and Generation
TLDR
The results of the workshop’s shared task on efficient neural machine translation are described, where participants were tasked with creating MT systems that are both accurate and efficient.
Nematus: a Toolkit for Neural Machine Translation
TLDR
Nematus is a toolkit for Neural Machine Translation that prioritizes high translation accuracy, usability, and extensibility and was used to build top-performing submissions to shared translation tasks at WMT and IWSLT.
A Simple, Fast, and Effective Reparameterization of IBM Model 2
We present a simple log-linear reparameterization of IBM Model 2 that overcomes problems arising from Model 1’s strong assumptions and Model 2’s overparameterization. Efficient inference, likelihood
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU
TLDR
This work proposes a simple but powerful network architecture which uses an RNN (GRU/LSTM) layer at bottom, followed by a series of stacked fully-connected layers applied at every timestep, which achieves similar accuracy to a deep recurrent model, at a small fraction of the training and decoding cost.
...
...