A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal

@inproceedings{Ghalandari2020ALM,
  title={A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal},
  author={Demian Gholipour Ghalandari and Chris Hokamp and N. Pham and John Glover and Georgiana Ifrim},
  booktitle={ACL},
  year={2020}
}
Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries and has important applications in story clustering for newsfeeds, presentation of search results, and timeline generation. However, there is a lack of datasets that realistically address such use cases at a scale large enough for training supervised models for this task. This work presents a new dataset for MDS that is large both in the total number of document clusters and in the… Expand
AgreeSum: Agreement-Oriented Multi-Document Summarization
DESCGEN: A Distantly Supervised Dataset for Generating Abstractive Entity Descriptions
SuperPAL: Supervised Proposition ALignment for Multi-Document Summarization and Derivative Sub-Tasks
Generating Wikipedia Article Sections from Diverse Data Sources
DynE: Dynamic Ensemble Decoding for Multi-Document Summarization
MS2: Multi-Document Summarization of Medical Studies
Multi-Perspective Abstractive Answer Summarization
...
1
2
...

References

SHOWING 1-10 OF 30 REFERENCES
Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies
Which Scores to Predict in Sentence Regression for Text Summarization?
Towards a Neural Network Approach to Abstractive Multi-Document Summarization
A Redundancy-Aware Sentence Regression Framework for Extractive Summarization
...
1
2
3
...