Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

@inproceedings{Grusky2018NewsroomAD,
  title={Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies},
  author={Max Grusky and M. Naaman and Yoav Artzi},
  booktitle={NAACL},
  year={2018}
}
We present NEWSROOM, a summarization dataset of 1.3 million articles and summaries written by authors and editors in newsrooms of 38 major news publications. [...] Key Method We analyze the extraction strategies used in NEWSROOM summaries against other datasets to quantify the diversity and difficulty of our new data, and train existing methods on the data to evaluate its utility and challenges. The dataset is available online at summari.es.Expand

Figures, Tables, and Topics from this paper

BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization
WikiHow: A Large Scale Text Summarization Dataset
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
The Summary Loop: Learning to Write Abstractive Summaries Without Examples
SHEG: summarization and headline generation of news articles using deep learning
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 36 REFERENCES
Classify or Select: Neural Architectures for Extractive Document Summarization
A Neural Attention Model for Abstractive Sentence Summarization
Abstractive Document Summarization with a Graph-Based Attentional Neural Model
The Effects of Human Variation in DUC Summarization Evaluation
Improving the Estimation of Word Importance for News Multi-Document Summarization
DUC 2005: Evaluation of Question-Focused Summarization Systems
...
1
2
3
4
...