Exploring Content Models for Multi-Document Summarization

@inproceedings{Haghighi2009ExploringCM,
  title={Exploring Content Models for Multi-Document Summarization},
  author={Aria Haghighi and Lucy Vanderwende},
  booktitle={NAACL},
  year={2009}
}
We present an exploration of generative probabilistic models for multi-document summarization. Beginning with a simple word frequency based model (Nenkova and Vanderwende, 2005), we construct a sequence of models each injecting more structure into the representation of document set content and exhibiting ROUGE gains along the way. Our final model, HierSum, utilizes a hierarchical LDA-style model (Blei et al., 2004) to represent content specificity as a hierarchy of topic vocabulary… Expand
A Hybrid Hierarchical Model for Multi-Document Summarization
TLDR
This paper forms extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference based on the lexical and structural characteristics of the sentences. Expand
Personalized Multi-Document Summarization using N-Gram Topic Model Fusion
We consider the problem of probabilistic topic modeling for query-focused multi-document summarization. Rather than modeling topics as distributions over a vocabulary of terms, we extend theExpand
A Generative Approach for Multi-Document Summarization using the Noisy Channel Model
TLDR
This work formulate the multi-document summarization task using a Noisy-Channel model and model these factors using the Cross-docume nt Structure Theory, which is novel for multi- document summarization. Expand
Mixture of topic model for multi-document summarization
TLDR
A generative model for multi-document summarization, namely Titled-LDA that simultaneously models the content of documents and the titles of document is proposed and achieved better performance compared to the other state-of-the-art algorithms on DUC2002 corpus. Expand
Extractive Multi-Document Summaries Should Explicitly Not Contain Document Specific Content
TLDR
A sentence selection objective for extractive summarization in which sentences are penalized for containing content that is specific to the documents they were extracted from is presented. Expand
Global and Local Models for Multi-Document Summarization
TLDR
This paper studies the effectiveness of combining corpus-level (global) tag-topic models and target document set level local models for multi-document summarization and empirically shows that the standard ROUGE SU4 scores of such summaries are comparable to those obtained from human generated counterparts. Expand
Dual pattern-enhanced representations model for query-focused multi-document summarisation
TLDR
This work presents a novel unsupervised pattern-enhanced approach for representing coherent topics across documents, as well as the query relevance, in order to generate topically coherent summaries that meet the information needs of users. Expand
Extractive summarization using a latent variable model
TLDR
A generative approach to explicitly identify summary and non-summary topic distributions in the sentences of a given set of documents using approximate summary topic probabilities as latent output variables to build a discriminative classifier model. Expand
DualSum: a Topic-Model based approach for update summarization
TLDR
An unsupervised probabilistic approach to model novelty in a document collection is presented and applied to the generation of update summaries, resulting in a model that results in the second or third position in terms of the ROUGE metrics when tuned for previous TAC competitions and tested on TAC-2011. Expand
Discovery of Topically Coherent Sentences for Extractive Summarization
TLDR
This work presents an unsupervised probabilistic approach to model the hidden abstract concepts across documents as well as the correlation between these concepts, to generate topically coherent and non-redundant summaries. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 25 REFERENCES
Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization
TLDR
An effective knowledge-lean method for learning content models from unannotated documents is presented, utilizing a novel adaptation of algorithms for Hidden Markov Models and applied to two complementary tasks: information ordering and extractive summarization. Expand
The Impact of Frequency on Summarization
TLDR
SumBasic is described, a summarization system that exploits frequency exclusively to create summaries and it is demonstrated how a frequency-based summarizer can incorporate context adjustment in a natural way and show that this adjustment contributes to the good performance of the summarizer and is sufficient means for duplication removal in multi-document summarization. Expand
Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion
TLDR
This paper details the design of a generic extractive summarization system, which ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, the system ranked third out of 35 systems. Expand
Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation
TLDR
The hypothesis that encyclopedic knowledge is a useful addition to a summarization system is confirmed by the system implemented, which ranks high compared to the participating systems in the DUC competitions. Expand
Towards Multidocument Summarization by Reformulation: Progress and Prospects
TLDR
The evaluation of system components shows that learning over multiple extracted linguistic features is more effective than information retrieval approaches at identifying similar text units for summarization and that it is possible to generate a fluent summary that conveys similarities among documents even when full semantic interpretations of the input text are not available. Expand
The PYTHY Summarization System: Microsoft Research at DUC 2007
PYTHY is a trainable extractive summarization engine that learns a log-linear sentence ranking model by maximizing three metrics of sentence goodness: two of the metrics are based on ROUGE scoresExpand
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval
Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in many text miningExpand
Improved Affinity Graph Based Multi-Document Summarization
This paper describes an affinity graph based approach to multi-document summarization. We incorporate a diffusion process to acquire semantic relationships between sentences, and then computeExpand
Latent Dirichlet Allocation
We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], andExpand
LexRank: Graph-based Centrality as Salience in Text Summarization
We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS).Expand
...
1
2
3
...