• Corpus ID: 16681767

Authorship Identification of Romanian Texts with Controversial Paternity

  title={Authorship Identification of Romanian Texts with Controversial Paternity},
  author={Liviu P. Dinu and Marius Claudiu Popescu and Anca Dinu},
  booktitle={International Conference on Language Resources and Evaluation},
In this work we propose a new strategy for the authorship identification problem and we test it on an example from Romanian literature: did Radu Albala found the continuation of Mateiu Caragiale’s novel Sub pecetea tainei, or did he write himself the respective continuation? The proposed strategy is based on the similarity of rankings of function words; we compare the obtained results with the results obtained by a learning method (namely Support Vector Machines -SVM- with a string kernel). 

Figures and Tables from this paper

Authorship identification from unstructured texts

Ordinal measures in authorship identification

A set of distance/similarity measures are compared, regarding theirs ability to reflect stylistic similarity between authors and texts, and tested in one of the most frequently employed multivariate statistical analysis settings: cluster analysis.

Finding a Character’s Voice: Stylome Classification on Literary Characters

The results of some initial experiments developed on the novel “Liaisons Dangereuses” are presented, showing that a simple bag of words model can be used to classify the characters.

Stylometric analysis of E-mail content for author identification

This paper explains how to optimize and extend the existing procedures of author identification to identify the author of an arbitrary e-mail by using customized writing style features.


The corpus design and the quality measures are described, the detection approaches developed by the participants are surveyed, the achieved performance results of the competitors are compiled, and an evaluation framework for plagiarism detection is described.

On the stylistic evolution from communism to democracy: Solomon Marcus study case

A stylistic analysis of Solomon Marcus’ non-scientific published texts, gathered in six volumes, aims to uncover some of his quantitative and qualitative fingerprints, revealing that the passing from the communist regime period to democracy is sharply marked by two complementary changes in Marcus' writing.

Analyzing Stylistic Variation across Different Political Regimes

Analysis of texts written across two different periods, which differ not only temporally, but politically and culturally: communism and democracy in Romania, shows that texts from the two periods can indeed be distinguished, both from the point of view of style and from that of semantic content (topic).

Authorial Studies using Ranked Lexical Features

The case of Vladimir Nabokov, a bilingual Russian English language author, is investigated and a tool for measuring distances between different styles of one or more authors is proposed.

Local Rank Distance

  • Radu Tudor Ionescu
  • Computer Science
    2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
  • 2013
Researchers have developed a wide variety of methods for string data, that can be applied with success in different fields such as computational biology, natural language processing and so on. Such

A Fast Algorithm for Local Rank Distance: Application to Arabic Native Language Identification

The proposed algorithm is more than two orders of magnitude faster than the original algorithm, and state of the art results are presented for Arabic native language identification from text documents.



A Tool for Literary Studies: Intertextual Distance and Tree Classification

The method presented provides an accurate tool for literary studies -as is demonstrated by applying it to two areas of French literature, Racine's tragedies and an authorship attribution experiment.

Short Text Authorship Attribution via Sequence Kernels, Markov Chains and Author Unmasking: An Investigation

An investigation of recently proposed character and word sequence kernels for the task of authorship attribution based on relatively short texts suggests that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, the amount of training material has more influence on discrimination performance than the amounts of test material.

Stylogenetics: clustering-based stylistic analysis of literary corpora

On the basis of the stylistic genome of authors, a new methodology for the automatic analysis of literary texts can be developed using more complex features than the simple lexical features suggested by traditional approaches.

On the Syllabic Similarities of Romance Languages

The results confirm the linguistical theories, bringing a plus of quantification and rigor, and study the syllabic similarity between Romance languages via rank distance through rank distance.

Rank Distance as a Stylistic Similarity

This paper proposes a new distance function (rank distance) designed to reflect stylistic similarity between texts and tests it in two different machine learning settings: clustering and binary classification.

New Machine Learning Methods Demonstrate the Existence of a Human Stylome

Traits referring to syntactic patterns prove less distinctive than traits referring to vocabulary, but much more distinctive than expected on the basis of current generativist theories of language learning.

A Widow and her Soldier: Stylometry and the American Civil War

This investigation strongly suggests that Pickett's widow, LaSalle Corbell Pickett, did compose the published letters, and they have been questioned, at least in part, by writers and historians of the Civil War.

On the Classification and Aggregation of Hierarchies with Different Constitutive Elements

An aggregation method which can be applied to classifications having different vocabularies using the rank distance (Dinu, 2003), a metric which measures the similarity between two hierarchies based on the ranks of objects is presented.

Spearman's Footrule as a Measure of Disarray

SUMMARY Spearman's measure of disarray D is the sum of the absolute values of the difference between the ranks. We treat D as a metric on the set of permutations. The limiting mean, variance and

Kernel Methods for Pattern Analysis

This book provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.