Authorship verification as a one-class classification problem
@article{Koppel2004AuthorshipVA, title={Authorship verification as a one-class classification problem}, author={Moshe Koppel and Jonathan Schler}, journal={Proceedings of the twenty-first international conference on Machine learning}, year={2004} }
In the authorship verification problem, we are given examples of the writing of a single author and are asked to determine if given long texts were or were not written by this author. We present a new learning-based method for adducing the "depth of difference" between two example sets and offer evidence that this method solves the authorship verification problem with very high accuracy. The underlying idea is to test the rate of degradation of the accuracy of learned models as the best…
315 Citations
Measuring Differentiability: Unmasking Pseudonymous Authors
- MathematicsJ. Mach. Learn. Res.
- 2007
A new learning-based method for adducing the "depth of difference" between two example sets is presented and evidence that this method solves the authorship verification problem with very high accuracy is offered.
Authorship Verification based on Syntax Features
- Computer ScienceRASLAN
- 2012
An algorithm using syntactic analysis system SET for verifying authorship of the documents and results indicate that syntactic features provide enough information to improve accuracy of authorship verification algorithms.
Probabilistic Anomaly Detection Method for Authorship Verification
- Computer ScienceSLSP
- 2014
Preliminary results show that the probabilistic method can achieve a high verification performance that can reach an F1 score of 85 % and can be very valuable for authorship verification.
Distractorless Authorship Verification
- PsychologyLREC
- 2012
Using only training data from the candidate author, this work is able to perform authorship verification with high confidence (greater than 90% accuracy rates across a large corpus).
Robust Authorship Verification with Transfer Learning
- Computer ScienceEasyChair Preprints
- 2019
This work presents an end-to-end model-building process that is universally applicable to a wide variety of corpora, with little to no modification or fine-tuning, and relies on transfer learning of a deep language model and a number of text augmentation techniques to improve the model's generalization ability.
Meta Analysis within Authorship Verification
- Computer Science2008 19th International Workshop on Database and Expert Systems Applications
- 2008
This paper introduces authorship verification problems as decision problems, discusses possibilities for the use of meta knowledge, and applies meta analysis to post- process unreliable style analysis results.
Improving Authorship Verification using Linguistic Divergence
- Computer ScienceROMCIR@ECIR
- 2021
This paper is the first one to introduce a method designed with non-comparability in mind from the ground up, rather than indirectly, and it is also one of the first to use Deep Language Models in this setting.
Application of BERT in author verification task
- Computer ScienceCLEF
- 2022
A long text encoding method based on BERT, a pre-trained language model, to solve the Authorship verification task for the competition PAN@CLEF 2022, where two texts belonging to different Discourse Types are determined to determine if they are written by the same author.
Experiments with Neural Networks for Small and Large Scale Authorship Verification
- Computer ScienceArXiv
- 2018
Two models for a special case of authorship verification problem that compares the language models of the two documents and generates a loss which is used as a recognizable feature to verify if the authors of the pair are identical.
Determining if two documents are written by the same author
- MathematicsJ. Assoc. Inf. Sci. Technol.
- 2014
This article offers an (almost) unsupervised method for solving the authorship attribution problem by using repeated feature subsampling methods to determine if one document of the pair allows us to select the other from among a background set of “impostors” in a sufficiently robust manner.
References
SHOWING 1-10 OF 26 REFERENCES
E-Mail Authorship Attribution for Computer Forensics
- Computer ScienceApplications of Data Mining in Computer Security
- 2002
This chapter describes an investigation of forensic authorship attribution or identification undertaken on a corpus of multi-author and multi-topic e-mail documents using an extended set of e- email document features such as structural characteristics and linguistic patterns together with a Support Vector Machine as the learning algorithm.
Authorship Attribution with Support Vector Machines
- Computer ScienceApplied Intelligence
- 2004
The support vector machine (SVM) is applied to the use of text-mining methods for the identification of the author of a text, as it is able to cope with half a million of inputs it requires no feature selection and can process the frequency vector of all words of atext.
Automatically Categorizing Written Texts by Author Gender
- Computer ScienceLit. Linguistic Comput.
- 2002
It is shown that automated text categorization techniques can exploit combinations of simple lexical and syntactic features to infer the gender of the author of an unseen formal written document with approximately 80 per cent accuracy.
Exploiting Stylistic Idiosyncrasies for Authorship Attribution
- Linguistics
- 2003
Introduction Early researchers in authorship attribution used a variety of statistical methods to identify stylistic discriminators – characteristics which remain approximately invariant within the…
Inference and Disputed Authorship: The Federalist
- Sociology, Computer Science
- 1966
The 1964 publication of "Inference and Disputed Authorship" made the cover of "Time" magazine and drew the attention of academics and the public alike for its use of statistical methodology to solve…
Mistake-Driven Learning in Text Categorization
- Computer ScienceEMNLP
- 1997
This work studies three mistake-driven learning algorithms for a typical task of this nature -- text categorization and presents an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.
ON SENTENCE- LENGTH AS A STATISTICAL CHARACTERISTIC OF STYLE IN PROSE: WITH APPLICATION TO TWO CASES OF DISPUTED AUTHORSHIP
- Education
- 1939
ONE element of style which seems to be characteristic of an author, in so far as can be judged from general impressions, is the length of his sentences. This author develops his thought in long,…
Machine learning in automated text categorization
- Computer ScienceCSUR
- 2002
This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
An Evaluation of Statistical Approaches to Text Categorization
- Computer ScienceInformation Retrieval
- 2004
Analysis and empirical evidence suggest that the evaluation results on some versions of Reuters were significantly affected by the inclusion of a large portion of unlabelled documents, mading those results difficult to interpret and leading to considerable confusions in the literature.
One-Class SVMs for Document Classification
- Computer ScienceJ. Mach. Learn. Res.
- 2001
The SVM approach as represented by Schoelkopf was superior to all the methods except the neural network one, where it was, although occasionally worse, essentially comparable.