• Corpus ID: 59893873

The structural topic model and applied social science

  title={The structural topic model and applied social science},
  author={Margaret E. Roberts and Brandon M Stewart and Dustin Tingley and Edoardo M. Airoldi},
  booktitle={ICONIP 2013},
We develop the Structural Topic Model which provides a general way to incorporate corpus structure or document metadata into the standard topic model. Document-level covariates enter the model through a simple generalized linear model framework in the prior distributions controlling either topical prevalence or topical content. We demonstrate the model’s use in two applied problems: the analysis of open-ended responses in a survey experiment about immigration policy, and understanding differing… 

Figures from this paper

Topic Modeling of Constructed-Response Answers on Social Study Assessments
Topic models were used to detect the latent thematic structure of examinees’ answers to constructed-response items. Results for two different topic models, latent Dirichlet allocation (LDA) and
Author Clustering and Topic Estimation for Short Texts
A novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document, with user-level topic distributions is proposed, and a novel measure of echo chambers among these politicians is developed.
Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models
Latent topic networks are introduced, a flexible class of richly structured topic models designed to facilitate applied research and demonstrate the broad applicability of the models with case studies on modeling influence in citation networks, and U.S. Presidential State of the Union addresses.
Topic Modeling with Structured Priors for Text-Driven Science
This thesis will introduce topic models that can encode additional structures such as factorizations, hierarchies, and correlations of topics, and can incorporate supervision and domain knowledge.
Multi-word Structural Topic Modelling of ToR Drug Marketplaces
This paper proposes the first iterative algorithm to extend STM with n-grams, and test the solution on textual data collected from four well-known ToR drug marketplaces, employing a STM-guided n- Gram selection process so that topic-specific phrasemes can be identified regardless of their global relevance in the corpus.
TopicCheck: Interactive Alignment for Assessing Topic Model Stability
This work introduces TopicCheck, an interactive tool for assessing topic model stability, and devise an interactive alignment algorithm for matching latent topics from multiple models, and enable sensitivity evaluation across a large number of models.
Clust-LDA: Joint Model for Text Mining and Author Group Inference
A novel and statistically principled method, clust-LDA, is proposed, which incorporates authorship structure into the topical modeling, thus accomplishing the task of the topical inferences across documents on the basis of authorship and, simultaneously, the identification of groupings between authors.
Evaluating latent content within unstructured text: an analytical methodology based on a temporal network of associated topics
This solution is presented as a step-by-step process to facilitate the evaluation of latent topics from unstructured text, as well as the domain area that textual documents are sourced from to provision a temporal network of associated topics.
Diagnosing and Improving Topic Models by Analyzing Posterior Variability
This work proposes a metric called topic stability that measures the variability of the topic parameters under the posterior and shows that this metric is correlated with human judgments of topic quality as well as with the consistency of topics appearing across multiple models.
Inferring Concepts from Topics: Towards Procedures for Validating Topics as Measures
This prior work evaluates whether word sets learned by a topic model appear semantically related, but does not validate that the model captures the substantive quantity implied by the researchers’ topic label, so general tools to validate topics as measures are provided.


A correlated topic model of Science
The correlated topic model (CTM) is developed, where the topic proportions exhibit correlation via the logistic normal distribution, and it is demonstrated its use as an exploratory tool of large document collections.
The Author-Topic Model for Authors and Documents
The author-topic model is introduced, a generative model for documents that extends Latent Dirichlet Allocation to include authorship information, and applications to computing similarity between authors and entropy of author output are demonstrated.
Partially labeled topic models for interpretable text mining
Two new partially supervised generative models of labeled text make use of the unsupervised learning machinery of topic models to discover the hidden topics within each label, as well as unlabeled, corpus-wide latent topics.
Probabilistic Topic Models
  • D. Blei
  • Computer Science
    IEEE Signal Processing Magazine
  • 2010
Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.
A Latent Variable Model for Geographic Lexical Variation
A multi-level generative model that reasons jointly about latent topics and geographical regions is presented, which recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency.
Factorial LDA: Sparse Multi-Dimensional Text Models
Factorial LDA is introduced, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables, which incorporates structured word priors and learns a sparse product of factors.
Sparse Additive Generative Models of Text
This approach has two key advantages: it can enforce sparsity to prevent overfitting, and it can combine generative facets through simple addition in log space, avoiding the need for latent switching variables.
Multinomial Inverse Regression for Text Analysis
A straightforward framework of sentiment-sufficient dimension reduction for text data is introduced and it is shown that logistic regression of phrase counts onto document annotations can be used to obtain low-dimensional document representations that are rich in sentiment information.
Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression
A Dirichlet-multinomial regression topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates is proposed.
Supervised Topic Models
The supervised latent Dirichlet allocation (sLDA) model, a statistical model of labelled documents, is introduced, which derives a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations.