QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions

  title={QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions},
  author={Daniel Weiss and Paul Roit and Ayal Klein and Ori Ernst and Ido Dagan},
Multi-text applications, such as multi-document summarization, are typically required to model redundancies across related texts. Current methods confronting consolidation struggle to fuse overlapping information. In order to explicitly represent content overlap, we propose to align predicate-argument relations across texts, providing a potential scaffold for information consolidation. We go beyond clustering coreferring mentions, and instead model overlap with respect to redundancy at a… 

Figures and Tables from this paper

Question-Based Salient Span Selection for More Controllable Text Summarization

A method for incorporating question-answering (QA) signals into a summarization model that identifies salient noun phrases in the input document by automatically generating wh-questions that are answered by the NPs and automatically determining whether those questions are answered in the gold summaries.

Conditional Generation with a Question-Answering Blueprint

This work proposes a new conceptualization of text plans as a sequence of question-answer (QA) pairs, enhancing existing datasets with a QA blueprint operating as a proxy for both content selection and planning.

QA Is the New KR: Question-Answer Pairs as Knowledge Bases

It is argued that the proposed type of KB has many of the key advantages of a traditional symbolic KB: in particular, it consists of small modular components, which can be combined compositionally to answer complex queries, including relational queries and queries involving “multi-hop” inferences.

Shortcomings of Question Answering Based Factuality Frameworks for Error Localization

It is found that QA-based frameworks fail to correctly identify error spans in generated summaries and are outperformed by trivial exact match baselines, and there exist fundamental issues with localization using the QA framework which cannot be fixed solely by stronger QA and QG models.



SuperPAL: Supervised Proposition ALignment for Multi-Document Summarization and Derivative Sub-Tasks

An annotation methodology is presented by which to create gold standard development and test sets for summary-source alignment, and its utility for tuning and evaluating effective alignment algorithms, as well as for properly evaluating MDS subtasks is suggested.

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

A novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods, in which a model learns to seek and combine evidence — effectively performing multihop, alias multi-step, inference.

Understanding Points of Correspondence between Sentences for Abstractive Summarization

This paper presents an investigation into fusing sentences drawn from a document by introducing the notion of points of correspondence, which are cohesive devices that tie any two sentences together into a coherent text.

Multi-Hop Paragraph Retrieval for Open-Domain Question Answering

A method for retrieving multiple supporting paragraphs, nested amidst a large knowledge base, which contain the necessary evidence to answer a given question, by forming a joint vector representation of both a question and a paragraph.

Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language

The results show that non-expert annotators can produce high quality QA-SRL data, and also establish baseline performance levels for future work on this task, and introduce simple classifierbased models for predicting which questions to ask and what their answers should be.

PARMA: A Predicate Argument Aligner

We introduce PARMA, a system for crossdocument, semantic predicate and argument alignment. Our system combines a number of linguistic resources familiar to researchers in areas such as recognizing

DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion

A method for automatically-generating fusion examples from raw text and a sequence-to-sequence model on DiscoFuse, a large scale dataset for discourse-based sentence fusion, are proposed and shown to improve performance on WebSplit when viewed as a sentence fusion task.

QANom: Question-Answer driven SRL for Nominalizations

We propose a new semantic scheme for capturing predicate-argument relations for nominalizations, termed QANom. This scheme extends the QA-SRL formalism (He et al., 2015), modeling the relations

Coreferential Reasoning Learning for Language Representation

The CorefBERT model is presented, a novel language representation model designed to capture the relations between noun phrases that co-refer to each other, and has made significant progress on several downstream NLP tasks that require coreferential reasoning.

A Consolidated Open Knowledge Representation for Multiple Texts

It is suggested that generating OKR structures can be a useful step in the NLP pipeline, to give semantic applications an easy handle on consolidated information across multiple texts.