# The structural topic model and applied social science

@inproceedings{Roberts2013TheST, title={The structural topic model and applied social science}, author={Margaret E. Roberts and Brandon M Stewart and Dustin Tingley and Edoardo M. Airoldi}, booktitle={ICONIP 2013}, year={2013} }

We develop the Structural Topic Model which provides a general way to incorporate corpus structure or document metadata into the standard topic model. Document-level covariates enter the model through a simple generalized linear model framework in the prior distributions controlling either topical prevalence or topical content. We demonstrate the model’s use in two applied problems: the analysis of open-ended responses in a survey experiment about immigration policy, and understanding differing…

## 187 Citations

Topic Modeling of Constructed-Response Answers on Social Study Assessments

- Psychology
- 2019

Topic models were used to detect the latent thematic structure of examinees’ answers to constructed-response items. Results for two different topic models, latent Dirichlet allocation (LDA) and…

Author Clustering and Topic Estimation for Short Texts

- Computer ScienceArXiv
- 2021

A novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document, with user-level topic distributions is proposed, and a novel measure of echo chambers among these politicians is developed.

Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models

- Computer ScienceICML
- 2015

Latent topic networks are introduced, a flexible class of richly structured topic models designed to facilitate applied research and demonstrate the broad applicability of the models with case studies on modeling influence in citation networks, and U.S. Presidential State of the Union addresses.

Topic Modeling with Structured Priors for Text-Driven Science

- Computer Science
- 2015

This thesis will introduce topic models that can encode additional structures such as factorizations, hierarchies, and correlations of topics, and can incorporate supervision and domain knowledge.

Multi-word Structural Topic Modelling of ToR Drug Marketplaces

- Computer Science2018 IEEE 12th International Conference on Semantic Computing (ICSC)
- 2018

This paper proposes the first iterative algorithm to extend STM with n-grams, and test the solution on textual data collected from four well-known ToR drug marketplaces, employing a STM-guided n- Gram selection process so that topic-specific phrasemes can be identified regardless of their global relevance in the corpus.

TopicCheck: Interactive Alignment for Assessing Topic Model Stability

- Computer ScienceNAACL
- 2015

This work introduces TopicCheck, an interactive tool for assessing topic model stability, and devise an interactive alignment algorithm for matching latent topics from multiple models, and enable sensitivity evaluation across a large number of models.

Clust-LDA: Joint Model for Text Mining and Author Group Inference

- Computer ScienceArXiv
- 2018

A novel and statistically principled method, clust-LDA, is proposed, which incorporates authorship structure into the topical modeling, thus accomplishing the task of the topical inferences across documents on the basis of authorship and, simultaneously, the identification of groupings between authors.

Evaluating latent content within unstructured text: an analytical methodology based on a temporal network of associated topics

- Computer ScienceJ. Big Data
- 2021

This solution is presented as a step-by-step process to facilitate the evaluation of latent topics from unstructured text, as well as the domain area that textual documents are sourced from to provision a temporal network of associated topics.

Diagnosing and Improving Topic Models by Analyzing Posterior Variability

- Computer ScienceAAAI
- 2018

This work proposes a metric called topic stability that measures the variability of the topic parameters under the posterior and shows that this metric is correlated with human judgments of topic quality as well as with the consistency of topics appearing across multiple models.

Inferring Concepts from Topics: Towards Procedures for Validating Topics as Measures

- Computer Science
- 2019

This prior work evaluates whether word sets learned by a topic model appear semantically related, but does not validate that the model captures the substantive quantity implied by the researchers’ topic label, so general tools to validate topics as measures are provided.

## References

SHOWING 1-10 OF 19 REFERENCES

A correlated topic model of Science

- Computer Science
- 2007

The correlated topic model (CTM) is developed, where the topic proportions exhibit correlation via the logistic normal distribution, and it is demonstrated its use as an exploratory tool of large document collections.

The Author-Topic Model for Authors and Documents

- Computer ScienceUAI
- 2004

The author-topic model is introduced, a generative model for documents that extends Latent Dirichlet Allocation to include authorship information, and applications to computing similarity between authors and entropy of author output are demonstrated.

Partially labeled topic models for interpretable text mining

- Computer ScienceKDD
- 2011

Two new partially supervised generative models of labeled text make use of the unsupervised learning machinery of topic models to discover the hidden topics within each label, as well as unlabeled, corpus-wide latent topics.

Probabilistic Topic Models

- Computer ScienceIEEE Signal Processing Magazine
- 2010

Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.

A Latent Variable Model for Geographic Lexical Variation

- Computer ScienceEMNLP
- 2010

A multi-level generative model that reasons jointly about latent topics and geographical regions is presented, which recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency.

Factorial LDA: Sparse Multi-Dimensional Text Models

- Computer ScienceNIPS
- 2012

Factorial LDA is introduced, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables, which incorporates structured word priors and learns a sparse product of factors.

Sparse Additive Generative Models of Text

- Computer ScienceICML
- 2011

This approach has two key advantages: it can enforce sparsity to prevent overfitting, and it can combine generative facets through simple addition in log space, avoiding the need for latent switching variables.

Multinomial Inverse Regression for Text Analysis

- Computer Science
- 2010

A straightforward framework of sentiment-sufficient dimension reduction for text data is introduced and it is shown that logistic regression of phrase counts onto document annotations can be used to obtain low-dimensional document representations that are rich in sentiment information.

Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression

- Computer ScienceUAI
- 2008

A Dirichlet-multinomial regression topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates is proposed.

Supervised Topic Models

- Computer ScienceNIPS
- 2007

The supervised latent Dirichlet allocation (sLDA) model, a statistical model of labelled documents, is introduced, which derives a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations.