• Corpus ID: 218500482

Summarization by Latent Dirichlet Allocation: Superior Sentence Extraction through Topic Modeling

  title={Summarization by Latent Dirichlet Allocation: Superior Sentence Extraction through Topic Modeling},
  author={Kenton W. Murray},
Latent Dirichlet allocation, or LDA, is a successful, generative, probabilistic model of text corpora that has performed well in many tasks in many areas of Natural Language Processing. Despite being perfectly suited for Automatic Summarization tasks, it has never been applied to them. In this paper, I introduce Summarization by LDA, or SLDA, which better models the subtopics of a document leading to more pertinent, relevant, and concise summaries than other summarization methods. This new… 
2 Citations

Tables from this paper

Topic-based Multi-document Summarization using Differential Evolution forCombinatorial Optimization of Sentences
This paper applies evolutionary computation, especially differential evolution which is regarded as a method having a good feature in terms of calculation cost to obtain a reasonable quasi-optimum solution in real time, to the problem of combinatorial optimization of important sentences.
Context-Based Similarity Analysis for Document Summarization
Context-Based similarity analysis for document summarization extracts a condensed version of the original document in the information retrieval task, using the similarity between sentences in the document to extract the most salient sentences.


Book Reviews: Advances in Automatic Text Summarization
It has been said for decades (if not centuries) that more and more information is becoming available and that tools are needed to handle it. Only recently, however, does it seem that a sufficient
Using Random Walks for Question-focused Sentence Retrieval
A stochastic, graph-based method for comparing the relative importance of the textual units, which was previously used successfully for generic summarization is applied, and it is hypothesized that it can outperform a competitive baseline.
LexRank: Graph-based Centrality as Salience in Text Summarization
We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS).
Automatic Summarization
Experimental results show that the proposed automatic speech summarization technique for English effectively extracts relatively important information and remove redundant and irrelevant information from English news speech.
Latent Dirichlet Allocation