Corpus ID: 53046555

WikiHow: A Large Scale Text Summarization Dataset

@article{Koupaee2018WikiHowAL,
  title={WikiHow: A Large Scale Text Summarization Dataset},
  author={Mahnaz Koupaee and William Yang Wang},
  journal={ArXiv},
  year={2018},
  volume={abs/1810.09305}
}
Sequence-to-sequence models have recently gained the state of the art performance in summarization. [...] Key Result We evaluate the performance of the existing methods on WikiHow to present its challenges and set some baselines to further improve it.Expand
Topic Augmented Generator for Abstractive Summarization
TLDR
This paper proposes a new decoder where the output summary is generated by conditioning on both the input text and the latent topics of the document, and achieves strongly improved ROUGE scores when compared to state-of-the-art models. Expand
WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization
TLDR
A method for direct cross-lingual summarization without requiring translation at inference time is proposed by leveraging synthetic data and Neural Machine Translation as a pre-training step, which significantly outperforms the baseline approaches, while being more cost efficient during inference. Expand
HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles
TLDR
HOWSUMM is a novel large-scale dataset for the task of query-focused multidocument summarization (qMDS), which targets the use-case of generating actionable instructions from a set of sources and can be leveraged to advance summarization research. Expand
WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization
TLDR
A method for direct crosslingual summarization without requiring translation at inference time is proposed by leveraging synthetic data and Neural Machine Translation as a pre-training step, which significantly outperforms the baseline approaches, while being more cost efficient during inference. Expand
Neural Abstractive Text Summarization with Sequence-to-Sequence Models
TLDR
This article provides a comprehensive literature survey on different seq2seq models for abstractive text summarization from the viewpoint of network structures, training strategies, and summary generation algorithms. Expand
Neural Text Summarization: A Critical Evaluation
TLDR
This work critically evaluate key ingredients of the current research setup: datasets, evaluation metrics, and models, and highlights three primary shortcomings: automatically collected datasets leave the task underconstrained and may contain noise detrimental to training and evaluation. Expand
Meta-Transfer Learning for Low-Resource Abstractive Summarization
TLDR
The results demonstrate that the proposed approach achieves the state-of-the-art on 6 corpora in low-resource scenarios, with only 0.7% of trainable parameters compared to previous work. Expand
CLTS: A New Chinese Long Text Summarization Dataset
TLDR
The results show that the corpus proposed in this paper is useful to set some baselines to contribute to the further research on automatic text summarization. Expand
AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization
TLDR
This work proposes a scalable approach called AQuaMuSe to automatically mine qMDS examples from question answering datasets and large document corpora, and can general a dual dataset -- for extractive and abstractive summaries both. Expand
Improving Unsupervised Extractive Summarization with Facet-Aware Modeling
  • Xinnian Liang, Shuangzhi Wu, Mu Li, Zhoujun Li
  • Computer Science
  • FINDINGS
  • 2021
TLDR
Experimental results show that the novel facet-aware centrality-based ranking model consistently outperforms strong baselines especially in longand multi-document scenarios and even performs comparably to some supervised models. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 17 REFERENCES
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
TLDR
This work proposes several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time. Expand
Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies
TLDR
The NEWSROOM dataset is presented, a summarization dataset of 1.3 million articles and summaries written by authors and editors in newsrooms of 38 major news publications between 1998 and 2017, and the summaries combine abstractive and extractive strategies. Expand
A Neural Attention Model for Abstractive Sentence Summarization
TLDR
This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence. Expand
SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents
We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable toExpand
Get To The Point: Summarization with Pointer-Generator Networks
TLDR
A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator. Expand
Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints
TLDR
A discriminative model for single-document summarization that integrally combines compression and anaphoricity constraints that outperforms prior work on both ROUGE as well as on human judgments of linguistic quality. Expand
A Deep Reinforced Model for Abstractive Summarization
TLDR
A neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL) that produces higher quality summaries. Expand
The Effects of Human Variation in DUC Summarization Evaluation
TLDR
How the variation in human judgments does and does not affect the results and their interpretation of automatic text summarization systems’ output is examined. Expand
Annotated Gigaword
TLDR
This work has created layers of annotation on the English Gigaword v.5 corpus to render it useful as a standardized corpus for knowledge extraction and distributional semantics, and provides to the community a public reference set based on current state-of-the-art syntactic analysis and coreference resolution. Expand
ROUGE: A Package for Automatic Evaluation of Summaries
TLDR
Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations. Expand
...
1
2
...