• Publications
  • Influence
Pointer Sentinel Mixture Models
TLDR
The pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank while using far fewer parameters than a standard softmax LSTM and the freely available WikiText corpus is introduced. Expand
A Deep Reinforced Model for Abstractive Summarization
TLDR
A neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL) that produces higher quality summaries. Expand
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
TLDR
This work proposes Seq2 SQL, a deep neural network for translating natural language questions to corresponding SQL queries, and releases WikiSQL, a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables fromWikipedia that is an order of magnitude larger than comparable datasets. Expand
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning
TLDR
This paper proposes a novel adaptive attention model with a visual sentinel that sets the new state-of-the-art by a significant margin on image captioning. Expand
Dynamic Coattention Networks For Question Answering
TLDR
The Dynamic Coattention Network (DCN) for question answering first fuses co-dependent representations of the question and the document in order to focus on relevant parts of both, then a dynamic pointing decoder iterates over potential answer spans to recover from initial local maxima corresponding to incorrect answers. Expand
Non-Autoregressive Neural Machine Translation
TLDR
A model is introduced that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference, and achieves near-state-of-the-art performance on WMT 2016 English-Romanian. Expand
Dynamic Memory Networks for Visual and Textual Question Answering
TLDR
The new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision. Expand
Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
TLDR
A Transferable Dialogue State Generator (TRADE) that generates dialogue states from utterances using copy mechanism, facilitating transfer when predicting (domain, slot, value) triplets not encountered during training. Expand
CTRL: A Conditional Transformer Language Model for Controllable Generation
TLDR
CTRL is released, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior, providing more explicit control over text generation. Expand
Learned in Translation: Contextualized Word Vectors
TLDR
Adding context vectors to a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation to contextualize word vectors improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks. Expand
...
1
2
3
4
5
...