Scaling Neural Machine Translation

  title={Scaling Neural Machine Translation},
  author={Myle Ott and Sergey Edunov and David Grangier and Michael Auli},
Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8GPU machine with careful tuning and implementation.1 On WMT’14 English-German translation, we match the accuracy of Vaswani et al. (2017) in under 5 hours when training on 8 GPUs and we obtain a new state of the art of 29.3 BLEU… CONTINUE READING


Publications referenced by this paper.
Showing 1-10 of 41 references


http: // • 2018
View 10 Excerpts
Highly Influenced

Attention Is All You Need

NIPS • 2017
View 10 Excerpts
Highly Influenced

Rethinking the Inception Architecture for Computer Vision

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) • 2016
View 1 Excerpt
Highly Influenced

Adam: A Method for Stochastic Optimization

View 2 Excerpts
Highly Influenced

Similar Papers

Loading similar papers…