• Corpus ID: 215540745

Avengers: Achieving Superhuman Performance for Question Answering on SQuAD 2.0 Using Multiple Data Augmentations, Randomized Mini-Batch Training and Architecture Ensembling

  title={Avengers: Achieving Superhuman Performance for Question Answering on SQuAD 2.0 Using Multiple Data Augmentations, Randomized Mini-Batch Training and Architecture Ensembling},
  • Published 2020
  • Computer Science
This project aims to achieve above human performance (EM/F1: 86.9/89.5) on the SQuAD 2.0 dataset. We first augmented the training data by randomly substituting with WordNet’s synonyms, followed by paraphrasing using the larger PPDB database. Then, in a novel application of GPT-2, we generated new sentences to augment the context paragraphs in a realistic and coherent manner. We further experimented with randomizing the mini-batches, which increased learning difficulty by sampling from different… 

Tables from this paper

Improve DistilIBERT-based Question Answering model performance on out-of-domain datasets by Mixing Right Experts

This project aims to improve the performance of DistiIBERT-based QA model trained on in- domain datasets in out-of-domain datasets by only using provided datasets.

Question Answering Augmentation System: Conditional Synonym Replacement

In this project, our team implemented Data Augmentation tools, such that we can distinguish and target different parts of speech and bolster the data with synonym replacements. We also explore



Ensemble BERT with Data Augmentation and Linguistic Knowledge on SQuAD 2 . 0

This project significantly improved the single baseline BERT model on SQuAD 2.0 and adopted a novel data augmentation approach and integrated linguistic knowledge to build a robust ensemble model.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

SpanBERT: Improving Pre-training by Representing and Predicting Spans

The approach extends BERT by masking contiguous random spans, rather than random tokens, and training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Know What You Don’t Know: Unanswerable Questions for SQuAD

SQuadRUn is a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.

BERT-A : Fine-tuning BERT with Adapters and Data Augmentation

This work inserts task-specific modules inside the pre-trained BERT model to control the flow of information between transformer blocks and achieves comparable performance to fine-tuning all BERT parameters while only training 0.57% of them.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

Bidirectional Attention Flow for Machine Comprehension

The BIDAF network is introduced, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

The contextual representations learned by the proposed replaced token detection pre-training task substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute.