Corpus ID: 225067629

AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization

  title={AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization},
  author={Sayali Kulkarni and Sheide Chammas and Wan Zhu and Fei Sha and Eugene Ie},
Summarization is the task of compressing source document(s) into coherent and succinct passages. This is a valuable tool to present users with concise and accurate sketch of the top ranked documents related to their queries. Query-based multi-document summarization (qMDS) addresses this pervasive need, but the research is severely limited due to lack of training and evaluation datasets as existing single-document and multi-document summarization datasets are inadequate in form and scale. We… Expand

Figures and Tables from this paper

WSL-DS: Weakly Supervised Learning with Distant Supervision for Query Focused Multi-Document Abstractive Summarization
This paper uses datasets similar to the target dataset as the training data where it leverage pre-trained sentence similarity models to generate the weak reference summary of each individual document in a document set from the multi-document gold reference summaries. Expand
QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization
This work defines a new query-based multi-domain meeting summarization task, where models have to select and summarize relevant spans of meetings in response to a query, and introduces QMSum, a new benchmark for this task. Expand
HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles
HOWSUMM is a novel large-scale dataset for the task of query-focused multidocument summarization (qMDS), which targets the use-case of generating actionable instructions from a set of sources and can be leveraged to advance summarization research. Expand
SummerTime: Text Summarization Toolkit for Non-experts
SummerTime is a complete toolkit for text summarization, including various models, datasets and evaluation metrics, for a full spectrum of summarization-related tasks, and integrates with libraries designed for NLP researchers, and enables users with easy-touse APIs. Expand


Query-Based Abstractive Summarization Using Neural Networks
It is shown that a neural network summarization model, similar to existing neural network models for abstractive summarization, can be constructed to make use of queries for more targeted summaries. Expand
Diversity driven attention model for query-based abstractive summarization
This work proposes a model for the query-based summarization task based on the encode-attend-decode paradigm with two key additions: a query attention model which learns to focus on different portions of the query at different time steps and a new diversity based Attention model which aims to alleviate the problem of repeating phrases in the summary. Expand
Query Focused Abstractive Summarization: Incorporating Query Relevance, Multi-Document Coverage, and Summary Length Constraints into seq2seq Models
The method (Relevance Sensitive Attention for QFS) is compared to extractive baselines and with various ways to combine abstractive models on the DUC QFS datasets and with solid improvements on ROUGE performance. Expand
Query-based summarization using MDL principle
A new unsupervised approach for query-based extractive summarization, based on the minimum description length (MDL) principle that employs Krimp compression algorithm, which competes with the best results. Expand
Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model
This work introduces Multi-News, the first large-scale MDS news dataset, and proposes an end-to-end model which incorporates a traditional extractive summarization model with a standard SDS model and achieves competitive results on MDS datasets. Expand
A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design aExpand
Cross-Task Knowledge Transfer for Query-Based Text Summarization
The viability of knowledge transfer between two related tasks: machine reading comprehension (MRC) and query-based text summarization is demonstrated and these models achieve state-of-the-art results on the publicly available CNN/Daily Mail and Debatepedia datasets. Expand
WikiHow: A Large Scale Text Summarization Dataset
This paper presents WikiHow, a dataset of more than 230,000 article and summary pairs extracted and constructed from an online knowledge base written by different human authors that represent high diversity styles. Expand
FastSum: Fast and Accurate Query-based Multi-document Summarization
A fast query-based multi-document summarizer based solely on word-frequency features of clusters, documents and topics called FastSum, which can rely on a minimal set of features leading to fast processing times: 1250 news documents in 60 seconds. Expand
DUC 2005: Evaluation of Question-Focused Summarization Systems
The evaluation shows that the best summarization systems have difficulty extracting relevant sentences in response to complex questions (as opposed to representative sentences that might be appropriate to a generic summary). Expand