Question Answering Infused Pre-training of General-Purpose Contextualized Representations

  title={Question Answering Infused Pre-training of General-Purpose Contextualized Representations},
  author={Robin Jia and Mike Lewis and Luke Zettlemoyer},
We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations, motivated by the intuition that the representation of a phrase in a passage should encode all questions that the phrase can answer in context. To this end, we train a bi-encoder QA model, which independently encodes passages and questions, to match the predictions of a more accurate cross-encoder model on 80 million synthesized QA pairs. By encoding QA-relevant… 

Simple Questions Generate Named Entity Recognition Datasets

An ask-to-generate approach is introduced which automatically generates NER datasets by asking simple natural language questions to an open-domain question answering system and largely outperform strong low- resource models on six popular NER bench- 013 marks by 20.8 F1 score.

QA Is the New KR: Question-Answer Pairs as Knowledge Bases

It is argued that the proposed type of KB has many of the key advantages of a traditional symbolic KB: in particular, it consists of small modular components, which can be combined compositionally to answer complex queries, including relational queries and queries involving “multi-hop” inferences.

Improving In-Context Few-Shot Learning via Self-Supervised Training

This paper proposes to use self-supervision in an intermediate training stage between pretraining and downstream few-shot usage with the goal to teach the model to perform in-context few shot learning.

Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering

A new QA system which aug-ments a text-to-text model with a large memory of question-answer pairs, and a new pre-training task for the latent step of question retrieval, which greatly improves performance on smaller QA benchmarks.

Few-shot QA using DNN

  • Computer Science
  • 2022
A domain adaptation technique for BERT using meta learning based pre-training approach that improves generalization across multiple domains and introduces a new special [QUESTION] token to learn relationships between question and context passages during training.

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

A new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators outperforms ELECTRA and recent state-of-the-art pretrained models by about 1 point on the GLUE benchmark for BERT base-sized models.

QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization

This work proposes an optimized metric, which they call QAFactEval, that leads to a 14% average improvement over previous QA-based metrics on the SummaC factual consistency benchmark, and also outperforms the best-performing entailment-based metric.

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

A novel open-domain question-answering dataset based on the Common Crawl project that achieves promising results in zero-shot, low resource, and tuned settings across multiple tasks, models and benchmarks is proposed.

Domain-matched Pre-training Tasks for Dense Retrieval

This work demonstrates that, with the right pre-training setup, large bi-encoder models on a recently released set of 65 million synthetically generated questions and 200 million post-comment pairs from a preexisting dataset of Reddit conversations can be overcome.

Cooperative Self-training of Machine Reading Comprehension

A cooperative self-training framework, RGX, for automatically generating more non-trivial question-answer pairs to improve model performance, and shows that RGX outperforms the state-of-the-art (SOTA) pretrained language models and transfer learning approaches on standard question-answering benchmarks, and yields the new SOTA performance under given model size and transferlearning settings.



Learning Dense Representations of Phrases at Scale

This work shows for the first time that it can learn dense representations of phrases alone that achieve much stronger performance in open-domain QA and proposes a query-side fine-tuning strategy, which can support transfer learning and reduce the discrepancy between training and inference.

Making Pre-trained Language Models Better Few-shot Learners

The LM-BFF approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

BERTScore: Evaluating Text Generation with BERT

This work proposes BERTScore, an automatic evaluation metric for text generation that correlates better with human judgments and provides stronger model selection performance than existing metrics.

PAWS: Paraphrase Adversaries from Word Scrambling

PAWS (Paraphrase Adversaries from Word Scrambling), a new dataset with 108,463 well-formed paraphrase and non-paraphrase pairs with high lexical overlap, is introduced, providing an effective instrument for driving further progress on models that better exploit structure, context, and pairwise comparisons.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Automatically Constructing a Corpus of Sentential Paraphrases

The creation of the recently-released Microsoft Research Paraphrase Corpus, which contains 5801 sentence pairs, each hand-labeled with a binary judgment as to whether the pair constitutes a paraphrase, is described.

First quora dataset release: Question pairs

  • First-Quora-Dataset-Release-Question-Pairs.
  • 2017

RoBERTa: A Robustly Optimized BERT Pretraining Approach

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

Synthetic QA Corpora Generation with Roundtrip Consistency

A novel method of generating synthetic question answering corpora is introduced by combining models of question generation and answer extraction, and by filtering the results to ensure roundtrip consistency, establishing a new state-of-the-art on SQuAD2 and NQ.

The Curious Case of Neural Text Degeneration

By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.