• Corpus ID: 244896105

MetaQA: Combining Expert Agents for Multi-Skill Question Answering

  title={MetaQA: Combining Expert Agents for Multi-Skill Question Answering},
  author={Haritz Puerto and G{\"o}zde G{\"u}l Sahin and Iryna Gurevych},
  booktitle={Conference of the European Chapter of the Association for Computational Linguistics},
The recent explosion of question-answering (QA) datasets and models has increased the interest in the generalization of models across multiple domains and formats by either training on multiple datasets or combining multiple models. Despite the promising results of multi-dataset models, some domains or QA formats may require specific architectures, and thus the adaptability of these models might be limited. In addition, current approaches for combining models disregard cues such as question… 

Figures and Tables from this paper

Mixture of Prompt Experts for Generalizable and Interpretable Question Answering

A Mixture-of-Prompt-Experts (MOPE) system that ensembles multiple specialized LLMs that significantly outperforms any single specialized model on a collection of 12 QA datasets from four reasoning types.

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

UKP-SQuARE is extended, an online platform for QA research, to support three families of multi-agent systems, and experiments are conducted to evaluate their inference speed and discuss the performance vs. speed trade-off compared to multi-dataset models.

A Dataset and Multi-task Multi-view Approach for Question-Answering with the Dual Perspectives of Text and Knowledge

This paper proposes the creation of a multi-view dataset - MTL-QA, specifically designed for multi-task learning, and presents a novel approach for addressing these challenges by utilizing the structural information from the Knowledge Graph (KG) and the semantic Information from the Natural Language Context.

UKP-SQUARE: An Online Platform for Question Answering Research

UKP-SQuARE is an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests.

UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

SQuARE v2, the new version of SQuARE, is introduced to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations, and multiple adversarial attacks to compare the robustness of QA models are provided.

ChatGPT: Jack of all trades, master of none

This work examined ChatGPT's capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection, and evaluated GPT-4 model on five selected subsets of N LP tasks.

UKP-SQuARE: An Interactive Tool for Teaching Question Answering

  • Haishuo FangHaritz PuertoIryna Gurevych
  • Computer Science, Education
  • 2023
A learner-centered approach for QA education in which students proactively learn theoretical concepts and acquire problem-solving skills through interactive exploration, experimentation, and practical assignments, rather than solely relying on traditional lectures is proposed.

TWEAC: Transformer with Extendable QA Agent Classifiers

This work addresses the central research question of how to effectively and efficiently identify suitable QA agents for any given question, and shows that TWEAC - Transformer with Extendable Agent Classifiers - achieves the best performance overall with 94% accuracy.

Single-dataset Experts for Multi-dataset Question Answering

This work trains a collection of lightweight, dataset-specific adapter modules that share an underlying Transformer model, and finds that these Multi-Adapter Dataset Experts (MADE) outperform all the authors' baselines in terms of in-distribution accuracy, and simple methods based on parameter-averaging lead to better zero-shot generalization and few-shot transfer performance.

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data

HybridQA is presented, a new large-scale question-answering dataset that requires reasoning on heterogeneous information and can serve as a challenging benchmark to study question answering withheterogeneous information.

A Simple and Effective Model for Answering Multi-span Questions

This work suggests a new approach for tackling multi-span questions, based on sequence tagging, which differs from previous approaches for answering span questions, and shows that this approach leads to an absolute improvement and slightly eclipses the current state-of-the-art results on the entire DROP dataset.

UnifiedQA: Crossing Format Boundaries With a Single QA System

This work uses the latest advances in language modeling to build a single pre-trained QA model, UNIFIEDQA, that performs well across 19 QA datasets spanning 4 diverse formats, and results in a new state of the art on 10 factoid and commonsense question answering datasets.

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

This work presents CommonsenseQA: a challenging new dataset for commonsense question answering, which extracts from ConceptNet multiple target concepts that have the same semantic relation to a single source concept.

Crowdsourcing Question-Answer Meaning Representations

A crowdsourcing scheme is developed to show that QAMRs can be labeled with very little training, and a qualitative analysis demonstrates that the crowd-generated question-answer pairs cover the vast majority of predicate-argument relationships in existing datasets.

Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering

An interesting new finding is made: the answer confidence scores of state-of-the-art QA systems can be approximated well by models solely using the input question text, which enables preemptive filtering of questions that are not answered by the system due to theiranswer confidence scores being lower than the system threshold.

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

It is shown that there is a meaningful gap between the human and machine performances, which suggests that the proposed dataset could well serve as a benchmark for question-answering.