Multiple authors Detection: a Quantitative Analysis of Dream of the Red Chamber

@article{Hu2014MultipleAD,
  title={Multiple authors Detection: a Quantitative Analysis of Dream of the Red Chamber},
  author={Xianfeng Hu and Yang Wang and Qiang Wu},
  journal={Adv. Data Sci. Adapt. Anal.},
  year={2014},
  volume={6}
}
Inspired by the authorship controversy of Dream of the Red Chamber and the application of machine learning in the study of literary stylometry, we develop a rigorous new method for the mathematical analysis of authorship by testing for a so-called chrono-divide in writing styles. Our method incorporates some of the latest advances in the study of authorship attribution, particularly techniques from support vector machines. By introducing the notion of relative frequency as a feature ranking… 

Figures from this paper

Stylometry and Mathematical Study of Authorship

This work develops a rigorous new method for the mathematical analysis of authorship by testing for a so-called chrono-divide in writing styles by introducing the notion of relative frequency as a feature ranking metric.

Authorship of Dream of the Red Chamber: A Topic Modeling Approach

Three hypotheses are proposed as the cause for this situation: the presence of the four chapters in the second group certainly deserves further investigation, and the first two are within expectations.

P-leader multifractal analysis for text type identification

The recently-introduced p-leader multifractal formalism is used to analyze a corpus of novels written for adults and young adults to assess if a difference in style can be found, and results agree with the interpretation that novels Written for young adults largely follow conventions of the genre, whereas novelswritten for adults are less homogeneous.

Robust stylometric analysis and author attribution based on tones and rimes

An innovative and robust approach to stylometric analysis without annotation and leveraging lexical and sub-lexical information in Mandarin Chinese automatically extracted from unannotated texts is proposed and can in principle be applied to other languages with established phonological inventory of onset and rimes.

Domain-based Latent Personal Analysis and its use for impersonation detection in social media

It is stipulated that within a domain an author's signature can be derived from the author's missing popular words and frequently used infrequent-words, and a method is devised, termed Latent Personal Analysis (LPA), for finding such domain-based personal signatures.

Linguistic Features as Evidence for Historical Context Interpretation

The initial results show that some selected grammatical constructions are effective in extracting descriptive evidence for construing historical context and have contributed to exploring an effective avenue for innovative history studies by means of examining linguistic evidence.

Machine learning-based microarray analyses indicate low-expression genes might collectively influence PAH disease

Accurately predicting and testing the types of Pulmonary arterial hypertension (PAH) of each patient using cost-effective microarray-based expression data and machine learning algorithms could

A Comparative Study of Feature Extraction Methods for Authorship Attribution in the Text of Traditional East Asian Medicine with a Focus on Function Words

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted

References

SHOWING 1-10 OF 13 REFERENCES

A survey of modern authorship attribution methods

A survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification.

Authorship Attribution

This review shows that the authorship attribution discipline is quite successful, even in difficult cases involving small documents in unfamiliar and less studied languages; it further analyzes the types of analysis and features used and tries to determine characteristics of well-performing systems, finally formulating these in a set of recommendations for best practices.

Study Based on Statistics of word Frequency——Research on Only Author of the "Dream of the Red Chamber"

This paper adopts the objective, accurate statistics analytic approach, uses computer author identity issue to analyse literary works and indicates: the whole "Dream of the Red Chamber" is written by the same author.

Who Was the Author? An Introduction to Stylometry

Stylometry the statistical analysis of literary style does not seek to overturn traditional scholarship by literary experts and historians, rather it seeks to complement their work by providing an alternative means of investigating works of doubtful provenance.

The State of Authorship Attribution Studies: Some Problems and Solutions

The statement, ’’Results of most non-traditional authorship attribution studies are not universally accepted as definitive,'' is explicated. A variety of problems in these studies are listed and

One Piece of Evidence that Chapters 64 and 67 Are Not the Original Version

From the perspective of the actual text of A Dream of Red Mansions,through a survey of the use of "business" and "hurriedness" in the previous eighty chapters,Chapters 61 to 67 in particular,it is

Authorship of The Dream of the Red Chamber: A Computerized Statistical Study of Its Vocabulary

  • Dissertation Abstracts International Part A: Humanities and[DISS. ABST. INT. PT. A-HUM. & SOC. SCI.]
  • 1981

Gene Selection for Cancer Classification using Support Vector Machines

VI. Wincenty Lutoslawski - a forgotten father of stylometry

E-mail address: qwu@mtsu

  • E-mail address: qwu@mtsu