BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

@article{Lewis2020BARTDS,
  title={BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension},
  author={M. Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and A. Mohamed and Omer Levy and Ves Stoyanov and Luke Zettlemoyer},
  journal={ArXiv},
  year={2020},
  volume={abs/1910.13461}
}
  • M. Lewis, Yinhan Liu, +5 authors Luke Zettlemoyer
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a… CONTINUE READING
    482 Citations
    BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
    • Highly Influenced
    • PDF
    Multilingual Denoising Pre-training for Neural Machine Translation
    • 112
    • PDF
    Pre-training via Paraphrasing
    • 25
    • PDF
    CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations
    • 1
    • PDF
    DynE: Dynamic Ensemble Decoding for Multi-Document Summarization
    • 1
    • PDF
    An Investigation of Fine-tuning Pre-trained Model for MR-to-Text Generation
    • Ting Hu, C. Meinel
    • Computer Science
    • 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)
    • 2020
    Incorporating BERT into Parallel Sequence Decoding with Adapters
    • 1
    • PDF
    Cross-Modal Transfer Learning for Multilingual Speech-to-Text Translation
    • 1

    References

    SHOWING 1-10 OF 34 REFERENCES
    MASS: Masked Sequence to Sequence Pre-training for Language Generation
    • 301
    • Highly Influential
    • PDF
    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    • 14,659
    • Highly Influential
    • PDF
    Text Summarization with Pretrained Encoders
    • 247
    • PDF
    Attention is All you Need
    • 15,891
    • Highly Influential
    • PDF
    Pre-trained Language Model Representations for Language Generation
    • 61
    • PDF
    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
    • 907
    • PDF
    Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
    • 53
    • PDF