• Corpus ID: 232335501

Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama

  title={Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama},
  author={Christof Schoch},
The concept of literary genre is a highly complex one: not only are different genres frequently defined on several, but not necessarily the same levels of description, but consideration of genres as cognitive, social, or scholarly constructs with a rich history further complicate the matter. This contribution focuses on thematic aspects of genre with a quantitative approach, namely Topic Modeling. Topic Modeling has proven to be useful to discover thematic patterns and trends in large… 


Revisiting Style, a Key Concept in Literary Studies
Abstract Language and literary studies have studied style for centuries, and even since the advent of ›stylistics‹ as a discipline at the beginning of the twentieth century, definitions of ›style‹
Deeper Delta across genres and languages: do we really need the most frequent words?
In 2007, John Burrows identified three regions in word frequency lists of corpora in authorship attribution and stylometry: Delta consists of the most frequent words, Iota deals with the lowest-frequency words, and Iota is the target of many studies employing Zeta.
A survey of modern authorship attribution methods
A survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification.
Exploring the Space of Topic Coherence Measures
This work is the first to propose a framework that allows to construct existing word based coherence measures as well as new ones by combining elementary components, and shows that new combinations of components outperform existing measures with respect to correlation to human ratings.
Reading Tea Leaves: How Humans Interpret Topic Models
New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.
Cross-Genre Authorship Verification Using Unmasking
In this paper we will stress-test a recently proposed technique for computational authorship verification, ‘‘unmasking'', which has been well received in the literature. The technique envisages an
Stylometry with R: A Package for Computational Text Analysis
The possibilities of stylo for computational text analysis are introduced, via a number of dummy case studies from English and French literature, to demonstrate how the package is particularly useful in the exploratory statistical analysis of texts, e.g. with respect to authorial writing style.
Software Framework for Topic Modelling with Large Corpora
This work describes a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion, and implements several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation in a way that makes them completely independent of the training corpus size.
Probabilistic topic models
  • D. Blei
  • Computer Science
    Commun. ACM
  • 2010
Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.
Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document
By taking into account the sequential structure within a document, the SeqLDA model has a higher fidelity over LDA in terms of perplexity (a standard measure of dictionary-based compressibility) and yields a nicer sequential topic structure than LDA.