A Visualizable Evidence-Driven Approach for Authorship Attribution

Abstract

The Internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Authorship attribution is the study of identifying the actual author of the given anonymous documents based on the text itself, and for decades, many linguistic stylometry and computational techniques have been extensively studied for this purpose. However, most of the previous research emphasizes promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. In addition, these sophisticated techniques are difficult for cyber investigators or linguistic experts to interpret. In this article, based on the End-to-End Digital Investigation (EEDI) framework, we propose a visualizable evidence-driven approach, namely VEA, which aims at facilitating the work of cyber investigation. Our comprehensive controlled experiment and the stratified experiment on the real-life Enron email dataset demonstrate that our approach can achieve even higher accuracy than traditional methods; meanwhile, its output can be easily visualized and interpreted as evidential traits. In addition to identifying the most plausible author of a given text, our approach also estimates the confidence for the predicted result based on a given identification context and presents visualizable linguistic evidence for each candidate.

DOI: 10.1145/2699910

15 Figures and Tables

Cite this paper

@article{Ding2015AVE, title={A Visualizable Evidence-Driven Approach for Authorship Attribution}, author={Steven H. H. Ding and Benjamin C. M. Fung and Mourad Debbabi}, journal={ACM Trans. Inf. Syst. Secur.}, year={2015}, volume={17}, pages={12:1-12:30} }