Corpus ID: 219708634

DynE: Dynamic Ensemble Decoding for Multi-Document Summarization

  title={DynE: Dynamic Ensemble Decoding for Multi-Document Summarization},
  author={Chris Hokamp and Demian Gholipour Ghalandari and N. Pham and John Glover},
Sequence-to-sequence (s2s) models are the basis for extensive work in natural language processing. However, some applications, such as multi-document summarization, multi-modal machine translation, and the automatic post-editing of machine translation, require mapping a set of multiple distinct inputs into a single output sequence. Recent work has introduced bespoke architectures for these multi-input settings, and developed models which can handle increasingly longer inputs; however, the… Expand
Error Analysis of using BART for Multi-Document Summarization: A Study for English and German Language
An in-depth error analysis is performed of the followed approach for both languages, which leads to identifying most notable errors, from made-up facts and topic delimitation, and quantifying the amount of extractiveness. Expand
An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings
An unsupervised method for generic extractive multi-document summarization based on the sentence embedding representations and the centroid approach that outperforms several state-of-the-art methods and achieves promising results compared to the best performing methods including supervised deep learning based methods. Expand


Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization
An initial investigation into a novel adaptation method that exploits the maximal marginal relevance method to select representative sentences from multi-document input, and leverages an abstractive encoder-decoder model to fuse disparate sentences to an Abstractive summary. Expand
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
This work proposes several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time. Expand
Auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus
A large heterogeneous multilingual multi-document summarization corpus with 7,316 topics in English and German is created, which has variing summary lengths and variing number of source documents. Expand
Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion
A paraphrastic sentence fusion model which jointly performs sentence fusion and paraphrasing using skip-gram word embedding model at the sentence level is designed which improves the information coverage and at the same time abstractiveness of the generated sentences. Expand
Jointly Learning to Extract and Compress
A joint model of sentence extraction and compression for multi-document summarization and its jointly extracted and compressed summaries outperform both unlearned baselines and the authors' learned extraction-only system on both ROUGE and Pyramid, without a drop in judged linguistic quality. Expand
Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model
This work introduces Multi-News, the first large-scale MDS news dataset, and proposes an end-to-end model which incorporates a traditional extractive summarization model with a standard SDS model and achieves competitive results on MDS datasets. Expand
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks. Expand
Exploring Content Models for Multi-Document Summarization
The final model, HierSum, utilizes a hierarchical LDA-style model (Blei et al., 2004) to represent content specificity as a hierarchy of topic vocabulary distributions and yields state-of-the-art ROUGE performance and in pairwise user evaluation strongly outperforms Toutanova et al. (2007)'s state of theart discriminative system. Expand
Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression
The proposed approach identifies the most important document in the multi-document set and generates K-shortest paths from the sentences in each cluster using a word-graph structure, and selects sentences from the set of shortest paths generated from all the clusters employing a novel integer linear programming model. Expand
Hierarchical Transformers for Multi-Document Summarization
A neural summarization model which can effectively process multiple input documents and distill Transformer architecture with the ability to encode documents in a hierarchical manner is developed. Expand