• Publications
  • Influence
RoBERTa: A Robustly Optimized BERT Pretraining Approach
TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD. Expand
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
TLDR
BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks. Expand
SpanBERT: Improving Pre-training by Representing and Predicting Spans
TLDR
The approach extends BERT by masking contiguous random spans, rather than random tokens, and training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. Expand
Multilingual Denoising Pre-training for Neural Machine Translation
Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—aExpand
Mask-Predict: Parallel Decoding of Conditional Masked Language Models
TLDR
This model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average, and is able to reach within about 1 BLEu point of a typical left-to-right transformer model, while decoding significantly faster. Expand
Recipes for Building an Open-Domain Chatbot
TLDR
Human evaluations show the best models outperform existing approaches in multi-turn dialogue on engagingness and humanness measurements, and the limitations of this work are discussed by analyzing failure cases of the models. Expand
Cloze-driven Pretraining of Self-attention Networks
TLDR
A new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems, including cloze-style word reconstruction task, and a detailed analysis of a number of factors that contribute to effective pretraining. Expand
Constant-Time Machine Translation with Conditional Masked Language Models
TLDR
This model improves stateof-the-art performance levels for constant-time translation models by over 3 BLEU on average, and is able to reach 92-95% of the performance of a typical left-to-right transformer model, while decoding significantly faster. Expand
Hierarchical Learning for Generation with Long Source Sequences
TLDR
A new Hierarchical Attention Transformer-based architecture (HAT) that outperforms standard Transformers on several sequence to sequence tasks and investigates what the hierarchical layers learn by visualizing the hierarchical encoder-decoder attention. Expand
Chromatographic peak alignment using derivative dynamic time warping
TLDR
This work will discuss the application of dynamic time warping with a derivative weighting function to align chromatograms to facilitate process monitoring and fault detection and demonstrate the utility of this method as a preprocessing step for multivariate model development. Expand
...
1
2
...