Corpus ID: 215540745

Avengers: Achieving Superhuman Performance for Question Answering on SQuAD 2.0 Using Multiple Data Augmentations, Randomized Mini-Batch Training and Architecture Ensembling

@inproceedings{2020AvengersAS,
  title={Avengers: Achieving Superhuman Performance for Question Answering on SQuAD 2.0 Using Multiple Data Augmentations, Randomized Mini-Batch Training and Architecture Ensembling},
  author={},
  year={2020}
}
  • Published 2020
  • This project aims to achieve above human performance (EM/F1: 86.9/89.5) on the SQuAD 2.0 dataset. We first augmented the training data by randomly substituting with WordNet’s synonyms, followed by paraphrasing using the larger PPDB database. Then, in a novel application of GPT-2, we generated new sentences to augment the context paragraphs in a realistic and coherent manner. We further experimented with randomizing the mini-batches, which increased learning difficulty by sampling from different… CONTINUE READING

    Tables from this paper

    References

    SHOWING 1-10 OF 22 REFERENCES
    Ensemble BERT with Data Augmentation and Linguistic Knowledge on SQuAD 2 . 0
    • 5
    • PDF
    SQuAD: 100, 000+ Questions for Machine Comprehension of Text
    • 2,574
    • PDF
    SpanBERT: Improving Pre-training by Representing and Predicting Spans
    • 345
    • PDF
    Know What You Don't Know: Unanswerable Questions for SQuAD
    • 758
    • Highly Influential
    • PDF
    BERT-A : Fine-tuning BERT with Adapters and Data Augmentation
    • 4
    • PDF
    XLNet: Generalized Autoregressive Pretraining for Language Understanding
    • 2,038
    • PDF
    Bidirectional Attention Flow for Machine Comprehension
    • 1,278
    • PDF