• Publications
  • Influence
Sockeye: A Toolkit for Neural Machine Translation
TLDR
We describe Sockeye (version 1.12), an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). Expand
  • 176
  • 41
  • PDF
Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields
TLDR
We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs). Expand
  • 69
  • 7
  • PDF
The Sockeye Neural Machine Translation Toolkit at AMTA 2018
TLDR
We describe SOCKEYE, an open-source sequence-to-sequence toolkit for Neural Machine Translation. Expand
  • 30
  • 7
  • PDF
Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction
TLDR
This paper extends the training regime for phrase-based statistical machine translation to obtain fluent translations into morphologically complex languages (we build an English to Finnish translation system). Expand
  • 50
  • 3
  • PDF
UNSUPERVISED MORPHOLOGICAL SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION
TLDR
We investigate various methods of augmenting SMT models to use morphological information to improve the quality of translation into morphologically rich languages, comparing them on an English-Finnish translation task. Expand
  • 5
  • 1
  • PDF
100, 000 Podcasts: A Spoken English Document Corpus
TLDR
We introduce the Spotify Podcast Dataset, a new corpus of 100,000 podcasts, comprising nearly 60,000 hours of speech. Expand
  • 7
  • PDF
Kriya - The SFU System for Translation Task at WMT-12
TLDR
This paper describes our submissions for the WMT-12 translation task using Kriya - our hierarchical phrase-based system. Expand
  • 4
  • PDF
Making the most of a distributed perceptron for NLP
The perceptron algorithm (Rosenblatt, 1958), in particular the global linear model of Collins (2002), has been employed to handle NLP tasks such as part-ofspeech tagging, parsing, and segmentation.Expand
  • 3
  • PDF
The Spotify Podcasts Dataset
TLDR
We present the Spotify Podcasts Dataset, a set of approximately 100K podcast episodes comprised of raw audio files along with accompanying ASR transcripts. Expand
  • 2