• Publications
  • Influence
Pointer Sentinel Mixture Models
tl;dr
We introduce a mixture model, illustrated in Fig. 1, that combines the advantages of standard softmax classifiers with those of a pointer component for effective and efficient language modeling. Expand
  • 565
  • 121
  • Open Access
Learned in Translation: Contextualized Word Vectors
tl;dr
In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors. Expand
  • 490
  • 52
  • Open Access
Non-Autoregressive Neural Machine Translation
tl;dr
We introduce a latent variable model for non-autoregressive machine translation that enables a decoder based on Vaswani et al. (2017) to take full advantage of internal parallelism even at inference time. Expand
  • 144
  • 45
  • Open Access
Quasi-Recurrent Neural Networks
tl;dr
We introduce quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in Parallel across channels. Expand
  • 213
  • 42
  • Open Access
Towards Neural Machine Translation with Latent Tree Attention
tl;dr
We introduce a new approach to leveraging unsupervised tree structures in NLP tasks like machine translation. Expand
  • 11
  • 1
  • Open Access
MetaMind Neural Machine Translation System for WMT 2016
tl;dr
We integrate promising recent developments in NMT, including subword splitting and back-translation for monolingual data augmentation, and introduce Y-LSTM, a novel neural translation architecture. Expand
  • 16
  • Open Access
On Machine Learning and Programming Languages
tl;dr
We need a first class language for machine learning, and what such a language might look like. Expand
  • 9
  • Open Access
Block-diagonal Hessian-free Optimization for Training Neural Networks
tl;dr
We introduce a vari- ant of the Hessian-free method that leverages a block-diagonal approximation of the generalized Gauss-Newton matrix. Expand
  • 8
  • Open Access
A Flexible Approach to Automated RNN Architecture Generation
tl;dr
We propose a domain-specific language (DSL) for use in automated architecture search which can produce novel RNNs of arbitrary depth and width. Expand
  • 9
  • Open Access
A Reverse-Mode Automatic Differentiation in Haskell Using the Accelerate Library
Automatic Differentiation is a method for applying differentiation strategies to source code, by taking a computer program and deriving from that program a separate program which calculates theExpand