• Publications
  • Influence
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
TLDR
We present FAIRSEQ, an open-source sequence modeling toolkit written in PyTorch that is fast, extensible, and useful for both research and production. Expand
  • 613
  • 69
  • PDF
Understanding Back-Translation at Scale
TLDR
An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. Expand
  • 363
  • 58
  • PDF
Scaling Neural Machine Translation
TLDR
This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation. Expand
  • 274
  • 47
  • PDF
One Trillion Edges: Graph Processing at Facebook-Scale
TLDR
We describe the usability, performance, and scalability improvements we made to Apache Giraph, an open-source graph processing system, in order to use it on Facebook-scale graphs of up to one trillion edges. Expand
  • 294
  • 38
  • PDF
Dense Passage Retrieval for Open-Domain Question Answering
TLDR
We show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. Expand
  • 89
  • 28
  • PDF
Classical Structured Prediction Losses for Sequence to Sequence Learning
TLDR
We survey a range of classical objective functions that have been widely used to train linear models for structured prediction and apply them to neural sequence to sequence models. Expand
  • 104
  • 23
  • PDF
Multilingual Denoising Pre-training for Neural Machine Translation
TLDR
We present mBART—a sequence-to-sequence auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective (Lewis et al., 2019). Expand
  • 102
  • 20
  • PDF
Facebook FAIR's WMT19 News Translation Task Submission
TLDR
This paper describes Facebook FAIR’s submission to the WMT19 shared news translation task in two language pairs and four language directions, English↔ German and English ↔ Russian. Expand
  • 84
  • 17
  • PDF
Cloze-driven Pretraining of Self-attention Networks
TLDR
We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains on GLUE and new state of the art results on NER as well as constituency parsing benchmarks. Expand
  • 94
  • 13
  • PDF
Pre-trained Language Model Representations for Language Generation
TLDR
We examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. Expand
  • 59
  • 6
  • PDF