A biterm topic model for short texts

@article{Yan2013ABT,
  title={A biterm topic model for short texts},
  author={Xiaohui Yan and J. Guo and Yanyan Lan and Xueqi Cheng},
  journal={Proceedings of the 22nd international conference on World Wide Web},
  year={2013}
}
  • Xiaohui Yan, J. Guo, Xueqi Cheng
  • Published 13 May 2013
  • Computer Science
  • Proceedings of the 22nd international conference on World Wide Web
Uncovering the topics within short texts, such as tweets and instant messages, has become an important task for many content analysis applications. [] Key Method Specifically, in BTM we learn the topics by directly modeling the generation of word co-occurrence patterns (i.e. biterms) in the whole corpus.
BTM: Topic Modeling over Short Texts
TLDR
This paper proposes a novel way for short text topic modeling, referred as biterm topic model (BTM), which learns topics by directly modeling the generation of word co-occurrence patterns in the corpus, making the inference effective with the rich corpus-level information.
Modeling over Short Texts
TLDR
A novel way for short text topic modeling, referred as biterm topic model (BTM), which learns topics by directly modeling the generation of word co-occurrence patterns in the corpus, making the inference effective with the rich corpus-level information.
A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge
TLDR
A non-parametric topic model npCTM with the above distinction is proposed which incorporates the Chinese restaurant process (CRP) into the BTM model to determine topic number automatically and distinguishes general words from topical words by jointly considering the distribution of these two word types for each word as well as word coherence information as prior knowledge.
Word Co-occurrence Augmented Topic Model in Short Text
TLDR
This paper proposes an improvement of word co-occurrence method to enhance the topic models and shows that the PMI-β-BTM gets well result in the both of regular short news title text and the noisy tweet text.
Topic Discovery for Streaming Short Texts with CTM
TLDR
This paper proposes a joint topic model for Chinese streaming short texts (CTM) based on the online algorithms of LDA and BTM and uses a combined word method to extend the length of short texts and reduce errors in extracting key phrases.
Biterm Pseudo Document Topic Model for Short Text
TLDR
This paper proposed a novel word co-occurrence network based method, referred to as biterm pseudo document topic model (BPDTM), which extended the previous biterm topic model(BTM) for short text, and utilized the word Co-Occurrence network to construct biterm Pseudo documents.
Short Text Topic Modeling with Flexible Word Patterns
  • Xiaobao Wu, Chunping Li
  • Computer Science
    2019 International Joint Conference on Neural Networks (IJCNN)
  • 2019
TLDR
A Multiterm Topic Model (MTM), which directly modeling the generative process of multiterms, can infer the word distributions of each topic and the topic distribution of each short text to alleviate the sparsity problem in short text modeling.
Topic Modeling for Short Texts via Word Embedding and Document Correlation
TLDR
A regularized non-negative matrix factorization topic model for short texts, named TRNMF, which leverages pre-trained distributional vector representation of words to overcome the data sparsity problem of short texts.
A CWTM Model of Topic Extraction for Short Text
TLDR
A simple, fast, and effective topic model for short texts, named couple-word topic model (CWTM), based on Dirichlet Multinomial Mixture (DMM) model, that can leverage couple word co-occurrence to help distill better topics over short texts instead of the traditional word co/occurrence way.
A Biterm-based Dirichlet Process Topic Model for Short Texts
TLDR
This paper proposes a Dirichlet process based on word cooccurrence to make topic mining from short texts more automatically and designs a Markov chain Monte Carlo sampling scheme for posterior inference in the model which is an extension of the sampling algorithm based on Chinese restaurant process.
...
...

References

SHOWING 1-10 OF 38 REFERENCES
Transferring topical knowledge from auxiliary long texts for short text clustering
TLDR
This article presents a novel approach to cluster short text messages via transfer learning from auxiliary long text data through a novel topic model - Dual Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsistency between data sets.
Learning Topics in Short Texts by Non-negative Matrix Factorization on Term Correlation Matrix
TLDR
This paper introduces a novel way to compute term correlation in short texts by representing each term with its co-occurred terms and formulated the topic learning problem as symmetric non-negative matrix factorization on the term correlation matrix.
Hidden Topic Markov Models
TLDR
This paper proposes modeling the topics of words in the document as a Markov chain, and shows that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics.
TM-LDA: efficient online modeling of latent topic transitions in social media
TLDR
Temporal-LDA significantly outperforms state-of-the-art static LDA models for estimating the topic distribution of new documents over time and is able to highlight interesting variations of common topic transitions, such as the differences in the work-life rhythm of cities, and factors associated with area-specific problems and complaints.
Improving Topic Coherence with Regularized Topic Models
TLDR
This work proposes two methods to regularize the learning of topic models by creating a structured prior over words that reflect broad patterns in the external data that make topic models more useful across a broader range of text data.
Empirical study of topic modeling in Twitter
TLDR
It is shown that by training a topic model on aggregated messages the authors can obtain a higher quality of learned model which results in significantly better performance in two real-world classification problems.
The Author-Topic Model for Authors and Documents
TLDR
The author-topic model is introduced, a generative model for documents that extends Latent Dirichlet Allocation to include authorship information, and applications to computing similarity between authors and entropy of author output are demonstrated.
Finding scientific topics
  • T. Griffiths, M. Steyvers
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2004
TLDR
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Learning to classify short and sparse text & web with hidden topics from large-scale data collections
TLDR
A general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from large-scale data collections that is general enough to be applied to different data domains and genres ranging from Web search results to medical text.
Reading Tea Leaves: How Humans Interpret Topic Models
TLDR
New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.
...
...