• Publications
  • Influence
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks. Expand
Multilingual Denoising Pre-training for Neural Machine Translation
Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—aExpand
Mask-Predict: Parallel Decoding of Conditional Masked Language Models
This model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average, and is able to reach within about 1 BLEu point of a typical left-to-right transformer model, while decoding significantly faster. Expand
A Knowledge-Grounded Neural Conversation Model
A novel, fully data-driven, and knowledge-grounded neural conversation model aimed at producing more contentful responses that generalizes the widely-used Sequence-to-Sequence (seq2seq) approach by conditioning responses on both conversation history and external “facts”, allowing the model to be versatile and applicable in an open-domain setting. Expand
Pre-training via Paraphrasing
It is shown that fine-tuning gives strong performance on a range of discriminative and generative tasks in many languages, making MARGE the most generally applicable pre-training method to date. Expand
Hafez: an Interactive Poetry Generation System
Hafez is an automatic poetry generation system that integrates a Recurrent Neural Network (RNN) with a Finite State Acceptor (FSA) and learns to adjust its parameters to improve poetry quality. Expand
Constant-Time Machine Translation with Conditional Masked Language Models
This model improves stateof-the-art performance levels for constant-time translation models by over 3 BLEU on average, and is able to reach 92-95% of the performance of a typical left-to-right transformer model, while decoding significantly faster. Expand
Generating Topical Poetry
Hafez is a program that generates any number of distinct poems on a usersupplied topic that obeys rhythmic and rhyme constraints and shows its generality with respect to language and poetic form. Expand
Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation
This work focuses on translation performance on natural noise, as captured by frequent corrections in Wikipedia edit logs, and shows how robustness to such noise can be achieved using a balanced diet of simple synthetic noises at training time, without access to the natural noise data or distribution. Expand
Aligned Cross Entropy for Non-Autoregressive Machine Translation
Aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models and AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks, while setting a new state of the art for non-AUTOgressive models. Expand