• Corpus ID: 224814121

Quasi Error-free Text Classification and Authorship Recognition in a large Corpus of English Literature based on a Novel Feature Set

@article{Jacobs2020QuasiET,
  title={Quasi Error-free Text Classification and Authorship Recognition in a large Corpus of English Literature based on a Novel Feature Set},
  author={Arthur M. Jacobs and Annette Kinder},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.10801}
}
The Gutenberg Literary English Corpus (GLEC) provides a rich source of textual data for research in digital humanities, computational linguistics or neurocognitive poetics. However, so far only a small subcorpus, the Gutenberg English Poetry Corpus, has been submitted to quantitative text analyses providing predictions for scientific studies of literature. Here we show that in the entire GLEC quasi error-free text classification and authorship recognition is possible with a method using the… 

Tables from this paper

Recognizing Literary Merit with Deep Learning

Results demonstrate that end-to-end literary analysis with deep learning can accurately determine whether a text excerpt is of high literary merit.

It’s not what you said, it’s how you said it: An analysis of therapist vocal features during psychotherapy

Psychotherapy is an effective mental health treatment whereby the majority of interventions are provided verbally (Lambert & Bergen, 2004) through a therapeutic conversation (Frank, 1961; Norcross,

References

SHOWING 1-10 OF 42 REFERENCES

Does size matter? Authorship attribution, small samples, big problem

Despite significant differences in overall attributive success rate between particular methods and/or style markers, the minimal amount of textual data needed for reliable authorship attribution turned out to be method-independent.

A comparative study of machine learning methods for authorship attribution

Each of the methods tested performed well, but nearest shrunken centroids and regularized discriminant analysis had the best overall performances with 0/70 cross-validation errors.

Sentiment Analysis for Words and Fiction Characters From the Perspective of Computational (Neuro-)Poetics

  • A. Jacobs
  • Computer Science
    Front. Robot. AI
  • 2019
SentiArt produces plausible predictions regarding the emotional and personality profile of fiction characters which are correctly identified on the basis of eight character features, and it achieves a good cross-validation accuracy in classifying 100 figures into “good” vs. “bad” ones.

What makes a metaphor literary? Answers from two computational studies

ABSTRACT In this article we investigate structural differences between “literary” metaphors created by renowned poets and “nonliterary” ones imagined by non-professional authors from Katz et al.’s

Large-scale quantitative profiling of the Old English verse tradition

Analysis of the style of all surviving Old English poetry finds quantitative evidence that a single author composed Beowulf and that the poem Andreas was written by Cynewulf—two longstanding questions of English literary history.

(Neuro-)Cognitive poetics and computational stylistics

  • A. Jacobs
  • Linguistics
    Empirical Studies of Literariness
  • 2018
In six explorative computational stylistics studies, this perspective paper introduces a number of tools that provide QNA indices of the foregrounding potential at the sublexical, lexical, inter- and supralexical levels for poems by Shakespeare, Blake, or Dickens.

Investigating the Relationship between Literary Genres and Emotional Plot Development

This paper investigates the hypothesis that emotion-related information correlates with particular genres, using genre classification as a testbed, and finds that a model that computes lexicon-based emotion scores globally for complete stories is competitive with a large-vocabulary bag-of-words genre classifier.

Computing the Affective-Aesthetic Potential of Literary Texts

The SentiArt tool is established as a promising candidate for lexical sentiment analyses at both the micro- and macrolevels, i.e., short and long literary materials.

Natural Language Processing with Python

This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic

Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama

The concept of literary genre is a highly complex one: not only are different genres frequently defined on several, but not necessarily the same levels of description, but consideration of genres as