Topics over time: a non-Markov continuous-time model of topical trends

Abstract

This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses, showing improved topics, better timestamp prediction, and interpretable trends.

DOI: 10.1145/1150402.1150450

Extracted Key Phrases

9 Figures and Tables

050100150'06'07'08'09'10'11'12'13'14'15'16'17
Citations per Year

992 Citations

Semantic Scholar estimates that this publication has 992 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Wang2006TopicsOT, title={Topics over time: a non-Markov continuous-time model of topical trends}, author={Xuerui Wang and Andrew McCallum}, booktitle={KDD}, year={2006} }