Authorship verification as a one-class classification problem

  title={Authorship verification as a one-class classification problem},
  author={Moshe Koppel and Jonathan Schler},
  journal={Proceedings of the twenty-first international conference on Machine learning},
  • Moshe KoppelJonathan Schler
  • Published 4 July 2004
  • Mathematics
  • Proceedings of the twenty-first international conference on Machine learning
In the authorship verification problem, we are given examples of the writing of a single author and are asked to determine if given long texts were or were not written by this author. We present a new learning-based method for adducing the "depth of difference" between two example sets and offer evidence that this method solves the authorship verification problem with very high accuracy. The underlying idea is to test the rate of degradation of the accuracy of learned models as the best… 

Figures and Tables from this paper

Measuring Differentiability: Unmasking Pseudonymous Authors

A new learning-based method for adducing the "depth of difference" between two example sets is presented and evidence that this method solves the authorship verification problem with very high accuracy is offered.

Authorship Verification based on Syntax Features

An algorithm using syntactic analysis system SET for verifying authorship of the documents and results indicate that syntactic features provide enough information to improve accuracy of authorship verification algorithms.

Probabilistic Anomaly Detection Method for Authorship Verification

Preliminary results show that the probabilistic method can achieve a high verification performance that can reach an F1 score of 85 % and can be very valuable for authorship verification.

Distractorless Authorship Verification

Using only training data from the candidate author, this work is able to perform authorship verification with high confidence (greater than 90% accuracy rates across a large corpus).

Robust Authorship Verification with Transfer Learning

This work presents an end-to-end model-building process that is universally applicable to a wide variety of corpora, with little to no modification or fine-tuning, and relies on transfer learning of a deep language model and a number of text augmentation techniques to improve the model's generalization ability.

Meta Analysis within Authorship Verification

This paper introduces authorship verification problems as decision problems, discusses possibilities for the use of meta knowledge, and applies meta analysis to post- process unreliable style analysis results.

Improving Authorship Verification using Linguistic Divergence

This paper is the first one to introduce a method designed with non-comparability in mind from the ground up, rather than indirectly, and it is also one of the first to use Deep Language Models in this setting.

Application of BERT in author verification task

A long text encoding method based on BERT, a pre-trained language model, to solve the Authorship verification task for the competition PAN@CLEF 2022, where two texts belonging to different Discourse Types are determined to determine if they are written by the same author.

Experiments with Neural Networks for Small and Large Scale Authorship Verification

Two models for a special case of authorship verification problem that compares the language models of the two documents and generates a loss which is used as a recognizable feature to verify if the authors of the pair are identical.

Determining if two documents are written by the same author

This article offers an (almost) unsupervised method for solving the authorship attribution problem by using repeated feature subsampling methods to determine if one document of the pair allows us to select the other from among a background set of “impostors” in a sufficiently robust manner.



E-Mail Authorship Attribution for Computer Forensics

This chapter describes an investigation of forensic authorship attribution or identification undertaken on a corpus of multi-author and multi-topic e-mail documents using an extended set of e- email document features such as structural characteristics and linguistic patterns together with a Support Vector Machine as the learning algorithm.

Authorship Attribution with Support Vector Machines

The support vector machine (SVM) is applied to the use of text-mining methods for the identification of the author of a text, as it is able to cope with half a million of inputs it requires no feature selection and can process the frequency vector of all words of atext.

Automatically Categorizing Written Texts by Author Gender

It is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy.

Exploiting Stylistic Idiosyncrasies for Authorship Attribution

Introduction Early researchers in authorship attribution used a variety of statistical methods to identify stylistic discriminators – characteristics which remain approximately invariant within the

Inference and Disputed Authorship: The Federalist

The 1964 publication of "Inference and Disputed Authorship" made the cover of "Time" magazine and drew the attention of academics and the public alike for its use of statistical methodology to solve

Mistake-Driven Learning in Text Categorization

This work studies three mistake-driven learning algorithms for a typical task of this nature -- text categorization and presents an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.


ONE element of style which seems to be characteristic of an author, in so far as can be judged from general impressions, is the length of his sentences. This author develops his thought in long,

Machine learning in automated text categorization

This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

An Evaluation of Statistical Approaches to Text Categorization

Analysis and empirical evidence suggest that the evaluation results on some versions of Reuters were significantly affected by the inclusion of a large portion of unlabelled documents, mading those results difficult to interpret and leading to considerable confusions in the literature.

One-Class SVMs for Document Classification

The SVM approach as represented by Schoelkopf was superior to all the methods except the neural network one, where it was, although occasionally worse, essentially comparable.