• Corpus ID: 24218611

Sockeye: A Toolkit for Neural Machine Translation

  title={Sockeye: A Toolkit for Neural Machine Translation},
  author={F. Hieber and Tobias Domhan and Michael J. Denkowski and David Vilar and Artem Sokolov and Ann Clifton and Matt Post},
We describe Sockeye (version 1.12), an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). Sockeye is a production-ready framework for training and applying models as well as an experimental platform for researchers. Written in Python and built on MXNet, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attentional recurrent neural networks, self-attentional transformers, and fully convolutional networks… 

Figures and Tables from this paper

The Sockeye Neural Machine Translation Toolkit at AMTA 2018

SOCKEYE is a production-ready framework for training and applying models as well as an experimental platform for researchers that offers scalable training and inference for the three most prominent encoderdecoder architectures: attentional recurrent neural networks, self-attentional transformers, and fully convolutional networks.

Incorporating Source Syntax into Transformer-Based Neural Machine Translation

Two methods are introduced: a multi-task machine translation and parsing model with a single encoder and decoder, and a mixed encoder model that learns to translate directly from parsed and unparsed source sentences.

The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020

New features include a simplified code base through the use of MXNet's Gluon API, a focus on state of the art model architectures, distributed mixed precision training, and efficient CPU decoding with 8-bit quantization.

Sockeye 2: A Toolkit for Neural Machine Translation

Sockeye 2 is presented, a modernized and streamlined version of the Sockeye neural machine translation (NMT) toolkit that results in faster training and inference, higher automatic metric scores, and a shorter path from research to production.

OpenNMT: Neural Machine Translation Toolkit

The system prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements.

RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

It is shown that a layer-wise pretraining scheme for recurrent attention models gives over 1% BLEU improvement absolute and it allows to train deeper recurrent encoder networks.

Neural machine translation of low-resource languages using SMT phrase pair injection

This paper proposes an effective approach to improve an NMT system in low-resource scenario without using any additional data, based on the gated recurrent unit (GRU) and transformer networks, and finds that the proposed method outperforms SMT—which is known to work better than the neural models in high-resource scenarios—for some translation directions.

A comparative study of Neural Machine Translation frameworks for the automatic translation of open data resources

This work introduces conventional theoretical models behind NMT together with the required background to provide a comprehensive view and develops state-of-the-art NMT systems built on top of two well-known frameworks for machine learning, Tensorflow and MXNet.

How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures

This work takes a fine-grained look at the different architectures for NMT and introduces an Architecture Definition Language (ADL) allowing for a flexible combination of common building blocks and shows that self-attention is much more important on the encoder side than on the decoder side.

The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

It is shown how traditional symbolic statistical machine translation models can still improve neural machine translation while reducing the risk of common pathologies of NMT such as hallucinations and neologisms.



Improving Neural Machine Translation Models with Monolingual Data

This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English.

Neural Monkey: An Open-source Tool for Sequence Learning

The design of the Neural Monkey system is described and the reader is introduced to running experiments using Neural Monkey, an open-source neural machine translation (NMT) and general sequence-to-sequence learning system built over the TensorFlow machine learning library.

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU

This work proposes a simple but powerful network architecture which uses an RNN (GRU/LSTM) layer at bottom, followed by a series of stacked fully-connected layers applied at every timestep, which achieves similar accuracy to a deep recurrent model, at a small fraction of the training and decoding cost.

Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

Effective Approaches to Attention-based Neural Machine Translation

A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.

Language Modeling with Gated Convolutional Networks

A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

Stronger Baselines for Trustable Results in Neural Machine Translation

This work recommends three specific methods that are relatively easy to implement and result in much stronger experimental systems, and conducts an in-depth analysis of where improvements originate and what inherent weaknesses of basic NMT models are being addressed.

Convolutional Sequence to Sequence Learning

This work introduces an architecture based entirely on convolutional neural networks, which outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT-French translation at an order of magnitude faster speed, both on GPU and CPU.