• Publications
  • Influence
RoBERTa: A Robustly Optimized BERT Pretraining Approach
TLDR
We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. Expand
  • 2,214
  • 697
  • PDF
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
TLDR
In this work we study deceptive opinion spam---fictitious opinions that have been deliberately written to sound authentic, in order to deceive the reader. Expand
  • 946
  • 177
  • PDF
Unsupervised Cross-lingual Representation Learning at Scale
TLDR
We present XLM-R a transformer-based multilingual masked language model pre-trained on one hundred languages, which obtains state-of-the-art performance on cross-lingual classification, sequence labeling and question answering. Expand
  • 389
  • 161
  • PDF
Phrase-Based & Neural Unsupervised Machine Translation
TLDR
We propose two model variants, a neural and a phrase-based model. Expand
  • 370
  • 93
  • PDF
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
TLDR
We present FAIRSEQ, an open-source sequence modeling toolkit written in PyTorch that is fast, extensible, and useful for both research and production. Expand
  • 616
  • 69
  • PDF
Understanding Back-Translation at Scale
TLDR
An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. Expand
  • 363
  • 58
  • PDF
Scaling Neural Machine Translation
TLDR
This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation. Expand
  • 274
  • 47
  • PDF
Negative Deceptive Opinion Spam
TLDR
The rising influence of user-generated online reviews has led to growing incentive for businesses to solicit and manufacture DECEPTIVE OPINION SPAM—fictitious reviews that have been deliberately written to sound authentic and deceive the reader. Expand
  • 235
  • 42
  • PDF
Towards a General Rule for Identifying Deceptive Opinion Spam
TLDR
We explore generalized approaches for identifying online deceptive opinion spam based on a new gold standard dataset, which is comprised of data from three different domains ( Hotel, Restaurant, Doctor), each of which contains three types of reviews, i.e. customer generated truthful reviews, Turker generated deceptive reviews and employee (domain-expert)generated deceptive reviews. Expand
  • 205
  • 25
  • PDF
Classical Structured Prediction Losses for Sequence to Sequence Learning
TLDR
We survey a range of classical objective functions that have been widely used to train linear models for structured prediction and apply them to neural sequence to sequence models. Expand
  • 104
  • 23
  • PDF