• Publications
  • Influence
Understanding and explaining Delta measures for authorship attribution
It is shown that feature vector normalization, that is, the transformation of the feature vectors to a uniform length of 1 (implicit in the cosine measure), is the decisive factor for the improvement of Delta proposed recently.
Big? Smart? Clean? Messy? Data in the Humanities
This paper is about data in the humanities, and how the digital humanities aim to raise to the challenge and realize the potential of this data for humanistic inquiry.
Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels
The work presented here addresses both the question of how to identify direct speech in French prose fiction and that of how prevalent direct speech is in different subgenres of the nineteenth-century French novel.
Revisiting Style, a Key Concept in Literary Studies
Abstract Language and literary studies have studied style for centuries, and even since the advent of ›stylistics‹ as a discipline at the beginning of the twentieth century, definitions of ›style‹
Towards a better understanding of Burrows’s Delta in literary authorship attribution
The effects of standardization and vector normalization on the statistical distributions of features and the resulting text clustering quality are evaluated and supervised selection of discriminant words are explored as a procedure for further improving authorship attribution.
In search of comity: TEI for distant reading
The focus of the ELTeC encoding scheme is not to represent texts in all their original complexity, nor to duplicate the work of scholarly editors, but to facilitate a richer and better-informed distant reading than a transcription of lexical content alone would permit.
Corneille, Molière et les autres. Stilometrische Analysen zu Autorschaft und Gattungszugehörigkeit im französischen Theater der Klassik
The digital age, by making large amounts of text available to us, prompts us to develop new and additional reading strategies supported by the use of computers and enabling us to deal with such