Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline

  title={Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline},
  author={Ori Ernst and Ori Shapira and Ramakanth Pasunuru and Michael Lepioshkin and Jacob Goldberger and Mohit Bansal and Ido Dagan},
Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we propose establishing summary-source alignment as an explicit task, while introducing two major… 

Figures and Tables from this paper

Proposition-Level Clustering for Multi-Document Summarization

This work revisits the clustering approach, grouping together sub-sentential propositions, aiming at more precise information alignment in multi-document summarization, and improves the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets.

A Proposition-Level Clustering Approach for Multi-Document Summarization

This work revisits the clustering approach, grouping together propositions for more precise information alignment, and improves the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets, both in automatic ROUGE scores and human preference.

Learning to Revise References for Faithful Summarization

This work extracts a small corpus from a noisy source–the Electronic Health Record–for the task of summarizing a hospital admission from multiple notes, and proposes a new approach: to revise–not remove–unsupported reference content.

Claim Extraction and Law Matching for COVID-19-related Legislation

To cope with the COVID-19 pandemic, many jurisdictions have introduced new or altered existing legislation. Even though these new rules are often communicated to the public in news articles, it



Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence

We present a simple, easy-to-replicate monolingual aligner that demonstrates state-of-the-art performance while relying on almost no supervision and a very small number of external resources. Based

A Cascade Approach to Neural Abstractive Summarization with Content Selection and Fusion

Empirical results are presented showing that the performance of a cascaded pipeline that separately identifies important content pieces and stitches them together into a coherent text is comparable to or outranks that of end-to-end systems, whereas a pipeline architecture allows for flexible content selection.

Towards Annotating and Creating Summary Highlights at Sub-sentence Level

This paper seeks to generate summary highlights by annotating summary-worthy sub-sentences and teaching classifiers to do the same, and frames the task as jointly selecting important sentences and identifying a single most informative textual unit from each sentence.

Bottom-Up Abstractive Summarization

This work explores the use of data-efficient content selectors to over-determine phrases in a source document that should be part of the summary, and shows that this approach improves the ability to compress text, while still generating fluent summaries.

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

An accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively to generate a concise overall summary is proposed, which achieves the new state-of-the-art on all metrics on the CNN/Daily Mail dataset, as well as significantly higher abstractiveness scores.

PEAK: Pyramid Evaluation via Automated Knowledge Extraction

PEAK is proposed, the first method to automatically assess summary content using the pyramid method that also generates the pyramid content models, and relies on open information extraction and graph algorithms.

Controlled Crowdsourcing for High-Quality QA-SRL Annotation

An improved crowdsourcing protocol for complex semantic annotation, involving worker selection and training, and a data consolidation phase is presented, which yielded high-quality annotation with drastically higher coverage, producing a new gold evaluation dataset.

Better Highlighting: Creating Sub-Sentence Summary Highlights

This paper presents a new method to produce self-contained highlights that are understandable on their own to avoid confusion, and combines determinantal point processes and deep contextualized representations to identify an optimal set of sub-sentence segments that are both important and non-redundant to form summary highlights.

Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization

This paper seeks to strengthen a DPP-based method for extractive multi-document summarization by presenting a novel similarity measure inspired by capsule networks, and shows that the DPP system with improved similarity measure performs competitively, outperforming strong summarization baselines on benchmark datasets.

Scoring Sentence Singletons and Pairs for Abstractive Summarization

This proposed framework attempts to model human methodology by selecting either a single sentence or a pair of sentences, then compressing or fusing the sentence(s) to produce a summary sentence.