Rethinking the Objectives of Extractive Question Answering

  title={Rethinking the Objectives of Extractive Question Answering},
  author={Martin Fajcik and Josef Jon and Santosh Kesiraju and Pavel Smrz},
This work demonstrates that using the objective with independence assumption for modelling the span probability P (a_s , a_e ) = P (a_s )P (a_e) of span starting at position a_s and ending at position a_e has adverse effects. Therefore we propose multiple approaches to modelling joint probability P (a_s , a_e) directly. Among those, we propose a compound objective, composed from the joint probability while still keeping the objective with independence assumption as an auxiliary objective. We… 

Figures and Tables from this paper

Pruning the Index Contents for Memory Efficient Open-Domain QA
This work presents a simple approach for pruning the contents of a massive index such that the open-domain QA system altogether with index, OS, and library components fits into 6GiB docker image while retaining only 8% of original index contents and losing only 3% EM accuracy1.
R2-D2: A Modular Baseline for Open-Domain Question Answering
This work presents a novel four-stage opendomain QA pipeline R2-D2 (RANK TWICE, READ TWICE). The pipeline is composed of a retriever, passage reranker, extractive reader, generative reader and a
Cascaded Span Extraction and Response Generation for Document-Grounded Dialog
This paper summarizes the entries to both subtasks of the first DialDoc shared task which focuses on the agent response prediction task in goal-oriented document-grounded dialogs and uses a cascaded model which grounds the response prediction on the predicted span instead of the full document.
NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned
The motivation and organization of the competition is described, the best submissions are reviewed, and system predictions are analyzed to inform a discussion of evaluation for open-domain QA.


BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
The contextual representations learned by the proposed replaced token detection pre-training task substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute.
Bidirectional Attention Flow for Machine Comprehension
The BIDAF network is introduced, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
Relevance-guided Supervision for OpenQA with ColBERT
This work proposes a weak supervision strategy that iteratively uses ColBERT to create its own training data, which greatly improves OpenQA retrieval on both Natural Questions and TriviaQA, and the resulting end-to-end Open QA system attains state-of-the-art performance on both of those datasets.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.
Natural Questions: A Benchmark for Question Answering Research
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
Simple and Effective Multi-Paragraph Reading Comprehension
We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well
Pointer Networks
A new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence using a recently proposed mechanism of neural attention, called Ptr-Nets, which improves over sequence-to-sequence with input attention, but also allows it to generalize to variable size output dictionaries.
Posterior Differential Regularization with f-divergence for Improving Model Robustness
It is shown that regularizing the posterior difference with f-divergence can result in well-improved model robustness, indicating the great potential of the proposed framework for enhancing NLPmodel robustness.