Corpus ID: 64493763

Historizing topic models: A distant reading of topic modeling texts within historical studies

  title={Historizing topic models: A distant reading of topic modeling texts within historical studies},
  author={M. Fridlund and Ren{\'e} Brauer},
Powered by TCPDF ( This material is protected by copyright and other intellectual property rights, and duplication or sale of all or part of any of the repository collections is not permitted, except that material may be duplicated by you for your research use or educational purposes in electronic or print form. You must obtain permission for any other use. Electronic or print copies may not be offered, whether for sale or otherwise to anyone who is not an authorised user… Expand
Topic modelling discourse dynamics in historical newspapers
A combined sampling, training and inference procedure for applying topic models to huge and imbalanced diachronic text collections and a discussion of the role of humanistic interpretation with regard to analysing discourse dynamics through topic models. Expand
The Many Themes of Humanism: Topic Modelling Humanism Discourse in Early 19th-Century German-Language Press
Topic modelling is often described as a text-mining tool for conducting a study of hidden semantic structures of a text or a text corpus by extracting topics from a document or a collection ofExpand
Modeling the Hebrew Bible: Potential of Topic Modeling Techniques for Semantic Annotation and Historical Analysis
This article proposes Topic Modeling as an important first step to gather semantic information beyond the lexicon which can be added as annotations in the SHEBANQ, and lays out a case study of this approach to study diachronic variety in the Bible. Expand
Entities as topic labels: Improving topic interpretability and evaluability combining Entity Linking and Labeled LDA
The potential of the approach is illustrated by applying it in order to define the most relevant topics addressed by each party in the European Parliament's fifth mandate by identifying in an ontology a series of descriptive labels for each document in a corpus. Expand
Entities as topic labels : combining entity linking and labeled LDA to improve topic interpretability and evaluability
A combination of two techniques, called Entity Linking and Labeled LDA, is proposed, which identifies in an ontology a series of descriptive labels for each document in a corpus, and generates a specific topic for each label. Expand
‘Workers of the World’? A Digital Approach to Classify the International Scope of Belgian Socialist Newspapers, 1885–1940
Socialism has always been strongly related to internationalism, yet the attitude towards and expression of internationalism has likely changed throughout the years. Events such as the First WorldExpand


Computational historiography: Data mining in a century of classics journals
Computational methods for identifying patterns and testing hypotheses about Classics as a field can help organize large collections, introduce younger scholars to the history of the field, and act as a “survey,” identifying anomalies that can be explored using more traditional methods. Expand
Polylingual Topic Models
This work introduces a polylingual topic model that discovers topics aligned across multiple languages and demonstrates its usefulness in supporting machine translation and tracking topic trends across languages. Expand
Mining the Dispatch under Supervision : Using Casualty Counts to Guide Topics from the Richmond Daily Dispatch Corpus
Large digitized text collections are of immense potential value to historians but are notoriously difficult to digest, given the near-impossibility of reading the entirety of their content within aExpand
Studying the History of Ideas Using Topic Models
Unsupervised topic modeling is applied to the ACL Anthology to analyze historical trends in the field of Computational Linguistics from 1978 to 2006, finding trends including the rise of probabilistic methods starting in 1988, a steady increase in applications, and a sharp decline of research in semantics and understanding between 1978 and 2001. Expand
Topic Modeling on Historical Newspapers
The task of automatic text processing applied to collections of historical newspapers is explored with the use of topical models as a means to identify potential issues of interest for historians. Expand
Reading Tea Leaves: How Humans Interpret Topic Models
New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood. Expand
Finding scientific topics
  • T. Griffiths, M. Steyvers
  • Computer Science, Medicine
  • Proceedings of the National Academy of Sciences of the United States of America
  • 2004
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. Expand
Probabilistic topic decomposition of an eighteenth-century American newspaper
Latent Semantic Analysis (LSA) allows one to compute whether two documents are topically similar, even if the two documents do not have any words in common, and moves us closer to the notion of topics. Expand
Detecting topic evolution in scientific literature: how can citations help?
An iterative topic evolution learning framework is proposed by adapting the Latent Dirichlet Allocation model to the citation network and develop a novel inheritance topic model, which clearly shows that citations can help to understand topic evolution better. Expand
A correlated topic model of Science
The correlated topic model (CTM) is developed, where the topic proportions exhibit correlation via the logistic normal distribution, and it is demonstrated its use as an exploratory tool of large document collections. Expand