• Corpus ID: 231846780

Exclusive Topic Modeling

@article{Lei2021ExclusiveTM,
  title={Exclusive Topic Modeling},
  author={Hao Lei and Ying Chen},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.03525}
}
We propose an Exclusive Topic Modeling (ETM) for unsupervised text classification, which is able to 1) identify the field-specific keywords though less frequently appeared and 2) deliver well-structured topics with exclusive words. In particular, a weighted Lasso penalty is imposed to reduce the dominance of the frequently appearing yet less relevant words automatically, and a pairwise Kullback-Leibler divergence penalty is used to implement topics separation. Simulation studies demonstrate… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 44 REFERENCES

The Inverse Regression Topic Model

This paper introduces the inverse regression topic model (IRTM), a mixed-membership extension of MNIR that combines the strengths of both methodologies, and presents two inference algorithms for the IRTM: an efficient batch estimation algorithm and an online variant, which is suitable for large corpora.

Latent Dirichlet Allocation

Optimizing Semantic Coherence in Topic Models

A novel statistical topic model based on an automated evaluation metric based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).

How Many Topics? Stability Analysis for Topic Models

Using a topic modeling approach based on matrix factorization, evaluations performed on a range of corpora show that this strategy can successfully guide the model selection process.

Automatic Evaluation of Topic Coherence

A simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results.

Gaussian LDA for Topic Models with Word Embeddings

Gaussian LDA is replaced with multivariate Gaussian distributions on the embedding space, which encourages the model to group words that are a priori known to be semantically related into topics into topics.

Hierarchical Topic Models and the Nested Chinese Restaurant Process

A Bayesian approach is taken to generate an appropriate prior via a distribution on partitions that allows arbitrarily large branching factors and readily accommodates growing data collections.

Rethinking LDA: Why Priors Matter

The prior structure advocated substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language.

Distilled Wasserstein Learning for Word Embedding and Topic Modeling

We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance

A correlated topic model of Science

The correlated topic model (CTM) is developed, where the topic proportions exhibit correlation via the logistic normal distribution, and it is demonstrated its use as an exploratory tool of large document collections.