Evaluation of Question Answering Systems: Complexity of judging a natural language

  title={Evaluation of Question Answering Systems: Complexity of judging a natural language},
  author={Amer Ali Sallam Farea and Z. Yang and Kien Duong and Nadeesha Perera and Frank Emmert-Streib},
Question answering (QA) systems are among the most important and rapidly developing research topics in natural language processing (NLP). A reason, therefore, is that a QA system allows humans to interact more naturally with a machine, e.g., via a virtual assistant or search engine. In the last decades, many QA systems have been proposed to address the requirements of different question-answering tasks. Furthermore, many error scores have been introduced, e.g., based on n-gram matching, word… 



A Review of Question Answering Systems

A short study of the generic QA framework vis a vis Question Analysis, Passage Retrieval and Answer Extraction and some important issues associated with QA systems is taken.

Arabic question answering system: a survey

The challenges due to the language and how these challenges make the development of new Arabic QAS more difficult are discussed, followed by an in-depth analysis of the techniques and approaches in the three modules of a QAS.

Automatic Question Generation from Sentences

This paper considers an automatic Sentence-to-Question generation task, where given a sentence, the Question Generation (QG) system generates a set of questions for which the sentence contains, implies, or needs answers.

A question-entailment approach to question answering

A novel QA approach based on Recognizing Question Entailment (RQE), which exceeds the best results of the medical task with a 29.8% increase over the best official score, and highlights the effectiveness of combining IR and RQE for future QA efforts.

Core techniques of question answering systems over knowledge bases: a survey

An overview of the techniques used in current QA systems over KBs is given, which were evaluated on a popular series of benchmarks: Question Answering over Linked Data and WebQuestions.

Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering

It is shown that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance by allowing the model to learn more complex context-question relationships.

An Empirical Comparison of Question Classification Methods for Question Answering Systems

This work makes an extensible review of the most recent methods for Question Classification, taking into consideration their applicability in low-resourced languages, and proposes a manual classification of the current state-of-the-art methods in four distinct categories: low, medium, high, and very high level of dependency on external resources.

Evaluating Question Answering Evaluation

This work studies the suitability of existing metrics in QA and explores using BERTScore, a recently proposed metric for evaluating translation, for QA, finding that although it fails to provide stronger correlation with human judgements, future work focused on tailoring a BERT-based metric to QA evaluation may prove fruitful.

QuAC: Question Answering in Context

QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as it shows in a detailed qualitative evaluation.

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

A quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora are presented.