• Publications
  • Influence
Get To The Point: Summarization with Pointer-Generator Networks
TLDR
A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator. Expand
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. Expand
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
TLDR
This work proposes pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective, PEGASUS, and demonstrates it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. Expand
Generating Wikipedia by Summarizing Long Sequences
TLDR
It is shown that generating English Wikipedia articles can be approached as a multi- document summarization of source documents and a neural abstractive model is introduced, which can generate fluent, coherent multi-sentence paragraphs and even whole Wikipedia articles. Expand
Scalable and accurate deep learning with electronic health records
TLDR
A representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format is proposed, and it is demonstrated that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. Expand
Likelihood Ratios for Out-of-Distribution Detection
TLDR
This work investigates deep generative model based approaches for OOD detection and observes that the likelihood score is heavily affected by population level background statistics, and proposes a likelihood ratio method forDeep generative models which effectively corrects for these confounding background statistics. Expand
MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization
TLDR
This work considers the setting where there are only documents with no summaries provided, and proposes an end-to-end, neural model architecture to perform unsupervised abstractive summarization, and shows that the generated summaries are highly abstractive, fluent, relevant, and representative of the average sentiment of the input reviews. Expand
Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs
The driving force behind the recent success of LSTMs has been their ability to learn complex and non-linear relationships. Consequently, our inability to describe these relationships has led to LSTMsExpand
Online and Linear-Time Attention by Enforcing Monotonic Alignments
TLDR
This work proposes an end-to-end differentiable method for learning monotonic alignments which, at test time, enables computing attention online and in linear time and validates the approach on sentence summarization, machine translation, and online speech recognition problems. Expand
Unsupervised Pretraining for Sequence to Sequence Learning
TLDR
This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data. Expand
...
1
2
3
...