stm: An R Package for Structural Topic Models

  title={stm: An R Package for Structural Topic Models},
  author={Margaret E. Roberts and Brandon M Stewart and Dustin Tingley},
  journal={Journal of Statistical Software},
This paper demonstrates how to use the R package stm for structural topic modeling. The structural topic model allows researchers to flexibly estimate a topic model that includes document-level metadata. Estimation is accomplished through a fast variational approximation. The stm package provides many useful features, including rich ways to explore topics, estimate uncertainty, and visualize quantities of interest. 
Exploring Topic-Metadata Relationships with the STM: A Bayesian Approach
Two improvements are proposed: first, OLS is replaced with more appropriate Beta regression, and a fully Bayesian approach is suggested instead of the current blending of frequentist and Bayesian methods.
Landscape of Academic Finance with the Structural Topic Model
Using the structural topic model, this work identifies the research topics and explores their relation and prevalence over time and across journals, revealing that most journals have covered more topics over time, thus becoming more generalist.
Inferring Concepts from Topics: Towards Procedures for Validating Topics as Measures
This prior work evaluates whether word sets learned by a topic model appear semantically related, but does not validate that the model captures the substantive quantity implied by the researchers’ topic label, so general tools to validate topics as measures are provided.
Evaluating latent content within unstructured text: an analytical methodology based on a temporal network of associated topics
This solution is presented as a step-by-step process to facilitate the evaluation of latent topics from unstructured text, as well as the domain area that textual documents are sourced from to provision a temporal network of associated topics.
Gender distribution across topics in the top five economics journals: a machine learning approach
It is found that females are unevenly distributed over the estimated latent topics, and an unsupervised machine learning algorithm is implemented, so as to incorporate gender document-level meta-data into a probabilistic text model.
Topic Modeling Russian History
  • Mila Oiva
  • Sociology
    The Palgrave Handbook of Digital Russia Studies
  • 2020
This chapter demonstrates how topic modeling can be applied in the studies of Russian and East European history and illustrates the choices a researcher will face and the needed steps for preparing a data set for topic modeling.
Twitmo: A Twitter Data Topic Modeling and Visualization Package for R
This paper highlights the main functions of the Twitmo package and demonstrates via an example how the package reduces the effort needed to produce coherent topic models with Twitter data, while also offering remedies for arising problems when modeling sparse and noisy text such as Tweets.
Conformance Evaluation of Topic Modeling Approaches on Web-Based Short Text Dynamic Graph Databases
The results of employing various topic modeling methods on the DBLP database and the node type of article title and the evaluation of the results with the mentioned topic evaluation criteria show the stability and compatibility of the Biterm method on this database.
Human-In-The-Loop Topic Modelling: Assessing topic labelling and genre-topic relations with a movie plot summary corpus
  • P. Matthews
  • Computer Science
    The Human Position in an Artificial World: Creativity, Ethics and AI in Knowledge Organization
  • 2019
This study uses topic modelling on a corpus of Wikipedia movie summaries to illustrate challenges and potential and suggests that unsupervised models might work better for creativity and discovery than semi-supervised versions.
Trellis is a visual tool for topic model curation and dataset exploration that enables an iterative process of adjusting a working hierarchical topic model and examining the corresponding dataset based on that working model.


topicmodels: An R Package for Fitting Topic Models
The R package topicmodels provides basic infrastructure for fitting topic models based on data structures from the text mining package tm to estimate the similarity between documents as well as between a set of specified keywords using an additional layer of latent variables.
On Estimation and Selection for Topic Models
It is shown that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix, that facilitates choosing the number of latent topics.
Sprite: Generalizing Topic Models with Structured Priors
A Sprite-based model is constructed to jointly infer topic hierarchies and author perspective, which is applied to corpora of political debates and online reviews and shows that the model learns intuitive topics, outperforming several other topic models at predictive tasks.
Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models
Latent topic networks are introduced, a flexible class of richly structured topic models designed to facilitate applied research and demonstrate the broad applicability of the models with case studies on modeling influence in citation networks, and U.S. Presidential State of the Union addresses.
Improving and Evaluating Topic Models and Other Models of Text
It is shown that words that are both frequent and exclusive to a theme are more effective at characterizing topical content, and a regularization scheme is proposed that leads to better estimates of these quantities.
Evaluation methods for topic models
It is demonstrated experimentally that commonly-used methods are unlikely to accurately estimate the probability of held-out documents, and two alternative methods that are both accurate and efficient are proposed.
The structural topic model and applied social science
The Structural Topic Model (STM), a general way to incorporate corpus structure or document metadata into the standard topic model, is developed which accommodates corpus structure through document-level covariates affecting topical prevalence and/or topical content.
Optimizing Semantic Coherence in Topic Models
A novel statistical topic model based on an automated evaluation metric based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).
Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression
A Dirichlet-multinomial regression topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates is proposed.
On Smoothing and Inference for Topic Models
Using the insights gained from this comparative study, it is shown how accurate topic models can be learned in several seconds on text corpora with thousands of documents.