Detecting authorship deception: a supervised machine learning approach using author writeprints

Abstract

We describe a new supervised machine learning approach for detecting authorship deception, a specific type of authorship attribution task particularly relevant for cybercrime forensic investigations, and demonstrate its validity on two case studies drawn from realistic online data sets. The core of our approach involves identifying uncharacteristic behavior for an author, based on a writeprint extracted from unstructured text samples of the author’s writing. The writeprints used here involve stylometric features and content features derived from topic models, an unsupervised approach for identifying relevant keywords that relate to the content areas of a document. One innovation of our approach is to transform the writeprint feature values into a representation that individually balances characteristic and uncharacteristic traits of an author, and we subsequently apply a Sparse Multinomial Logistic Regression classifier to this novel representation. Our method yields high accuracy for authorship deception detection on the two case studies, confirming its utility. .................................................................................................................................................................................

DOI: 10.1093/llc/fqs003

Extracted Key Phrases

9 Figures and Tables

051015201520162017
Citations per Year

Citation Velocity: 4

Averaging 4 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@article{Pearl2012DetectingAD, title={Detecting authorship deception: a supervised machine learning approach using author writeprints}, author={Lisa Pearl and Mark Steyvers}, journal={LLC}, year={2012}, volume={27}, pages={183-196} }