Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering

  title={Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering},
  author={Alexander Hanbo Li and Patrick Ng and Peng Xu and Henghui Zhu and Zhiguo Wang and Bing Xiang},
The current state-of-the-art generative models for open-domain question answering (ODQA) have focused on generating direct answers from unstructured textual information. However, a large amount of world’s knowledge is stored in structured databases, and need to be accessed using query languages such as SQL. Furthermore, query languages can answer questions that require complex reasoning, as well as offering full explainability. In this paper, we propose a hybrid framework that takes both… 

Figures and Tables from this paper

Open Domain Question Answering with A Unified Knowledge Interface

This work proposes a verbalizer-retriever-reader framework for ODQA over data and text where verbalized tables from Wikipedia and graphs from Wikidata are used as augmented knowledge sources and shows that the Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.

DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases

This work proposes a novel framework D EC AF that jointly generates both logical forms and direct answers, and then combines the merits of them to get the final answers.

Reasoning over Hybrid Chain for Table-and-Text Open Domain Question Answering

This paper proposes a ChAin-centric Reasoning and Pre-training framework (CARP), which utilizes hybrid chain to model the explicit intermediate reasoning process across table and text for question answering and proposes a novel chain-centric pre-training method to enhance the pre-trained model in identifying the cross-modality reasoning process and alleviating the data sparsity problem.

Integrating question answering and text-to-SQL in Portuguese

This paper implemented a complete system for the Portuguese language, using some of the main tools available for the language and translating training and testing datasets, and validates a modular question answering strategy.

Reasoning over Hybrid Chain for Table-and-Text Open Domain QA

A novel chain-centric pretraining method is proposed, to enhance the pre-trained model in identifying the cross-modality reasoning process and alleviating the data sparsity problem.

Uni-Parser: Unified Semantic Parser for Question Answering on Knowledge Base and Database

This work proposes Uni-Parser, a unified semantic parser for question answering (QA) on both KB and DB with a focus on semantic parsing, and introduces the primitive as an essential element in the framework.

Open Domain Question Answering over Virtual Documents: A Unified Approach for Data and Text

This work uses the data-to-text method as a means for coding structured knowledge for knowledge- intensive applications, i.e. open-domain answering (ODQA), and poses a verbalizer-retriever-reader framework for ODQA over data and text where verbalized tables from Wikipedia and graphs from Wikidata are used as augmented knowledge sources.

Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models

This paper creates a new multi-modal dataset based on text and table datasets from related work and compares the retrieval performance of different encoding schemata to find that dense vector embeddings of transformer models outperform sparseembeddings on four out of six evaluation datasets.

A Survey on Table Question Answering: Recent Advances

An overview of available datasets and representative methods in table QA is provided, which includes semantic-parsing-based, generative, extractive, matching- based, and retriever-reader-based methods.

Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA

This work introduces an optimized OpenQA Table-TExt Retriever (OTTER) to jointly retrieve tabular and textual evidences and proposes to enhance mixed-modality representation learning via two mechanisms: modality-enhanced representation and mixed- modality negative sampling strategy.



HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data

HybridQA is presented, a new large-scale question-answering dataset that requires reasoning on heterogeneous information and can serve as a challenging benchmark to study question answering withheterogeneous information.

Open Question Answering over Tables and Text

This work considers for the first time open QA over both tabular and textual data and presents a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.

Unified Open-Domain Question Answering with Structured and Unstructured Knowledge

This work homogenizes all sources by reducing them to text, and applies recent, powerful retriever-reader models which have so far been limited to text sources only to show that knowledge-base QA can be greatly improved when reformulated in this way.

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

Two models which make use of multiple passages to generate their answers using an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model are proposed.

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Interestingly, it is observed that the performance of this method significantly improves when increasing the number of retrieved passages, evidence that sequence-to-sequence models offers a flexible framework to efficiently aggregate and combine evidence from multiple passages.

Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

The proposed IRNet aims to address two challenges: the mismatch between intents expressed in natural language (NL) and the implementation details in SQL and the challenge in predicting columns caused by the large number of out-of-domain words.

TaPas: Weakly Supervised Table Parsing via Pre-training

TaPas is presented, an approach to question answering over tables without generating logical forms that outperforms or rivals semantic parsing models by improving state-of-the-art accuracy on SQA and performing on par with the state of theart on WikiSQL and WikiTQ, but with a simpler model architecture.

Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions

The interaction history is utilized by editing the previous predicted query to improve the generation quality of SQL queries and the benefit of editing compared with the state-of-the-art baselines which generate SQL from scratch is evaluated.

Natural Questions: A Benchmark for Question Answering Research

The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.

A Discrete Hard EM Approach for Weakly Supervised Question Answering

This paper develops a hard EM learning scheme that computes gradients relative to the most likely solution at each update and significantly outperforms previous methods on six QA tasks, including absolute gains of 2–10%, and achieves the state-of-the-art on five of them.