Writer Identification Using Microblogging Texts for Social Media Forensics

  title={Writer Identification Using Microblogging Texts for Social Media Forensics},
  author={Fernando Alonso-Fernandez and Nicole Mariah Sharon Belvisi and Kevin Hernandez-Diaz and Naveed Muhammad and Josef Bigun},
  journal={IEEE Transactions on Biometrics, Behavior, and Identity Science},
Establishing authorship of online texts is fundamental to combat cybercrimes. Unfortunately, text length is limited on some platforms, making the challenge harder. We aim at identifying the authorship of Twitter messages limited to 140 characters. We evaluate popular stylometric features, widely used in literary analysis, and specific Twitter features like URLs, hashtags, replies or quotes. We use two databases with 93 and 3957 authors, respectively. We test varying sized author sets and… 

Author Identification with Machine Learning Algorithms

An experiment for the identification of the author of a Turkish language text by using classical machine learning methods including Support Vector Machines (SVM), Gaussian Naive Bayes (GaussianNB), Multi Layer Perceptron (MLP), Logistic Regression (LR), Stochastic Gradient Descent (SGD) and ensemblelearning methods including Extremely Randomized Trees (ExtraTrees), and eXtreme Gradient Boosting (XGBoost).

Machine Training for Intelligent Analysis of Text for the Identification of the Author

The continuous development of information technology has led to an increasing danger and critical cyberattacks, which have recently developed and penetrated unimpeded in various institutions that

Template Aging in Multi-Modal Social Behavioral Biometrics

The experimental results on permanence evaluation demonstrate that the developed system can perform remarkably well despite the template aging effect, and achieves the recognition accuracy of 99.25% and outperforms all prior research on SBB.

Cyber Crime Investigation: Landscape, Challenges, and Future Research Directions

Out of all the methods used in mobile digital forensics, logical extraction and hex dumps are the most effective and least likely to cause damage to the data and natural language processing has more applications and uses than any of the other options.



Forensic Authorship Analysis of Microblogging Texts Using N-Grams and Stylometric Features

This work aims at identifying authors of tweet messages, which are limited to 280 characters, and evaluates popular features employed traditionally in authorship attribution which capture properties of the writing style at different levels.

Authorship Attribution for Social Media Forensics

It is argued that there is a significant need in forensics for new authorship attribution algorithms that can exploit context, can process multi-modal data, and are tolerant to incomplete knowledge of the space of all possible authors at training time.

Mining online diaries for blogger identification

An investigation of authorship identification on personal blogs or diaries, which are different from other types of text such as essays, emails, or articles based on the text properties, utilizes couple of intuitive feature sets and studies various parameters that affect the identification performance.

Authorship Attribution for Forensic Investigation with Thousands of Authors

A novel authorship attribution model combining both profile-based and instance-based approaches to reduce the size of the candidate authors to a small number and narrow the scope of investigation with a high level of accuracy is proposed.

Author gender identification from text

Authorship verification for short messages using stylometry

A supervised learning technique combined with n-gram analysis for authorship verification in short texts with very promising results based on the Enron email dataset involving 87 authors.

Time-Aware Authorship Attribution for Short Text Streams

This paper analyses the temporal changes of word usage by authors of tweets and emails and proposes an approach to estimate the dynamicity of authors' word usage that is inspired by time-aware language models and can be employed in any time-unaware authorship attribution method.

Author identification: Using text sampling to handle the class imbalance problem

Identifying idiolect in forensic authorship attribution: an n-gram textbite approach

It is argued that textbites, small textual segments that characterise that author’s writing, providing DNA-like chunks of identifying material are able to identify authors by reducing a mass of data to key segments that move us closer to the elusive concept of idiolect.

A Needle in a Haystack? Harnessing Onomatopoeia and User-specific Stylometrics for Authorship Attribution of Micro-messages

By viewing small texts usually employed in social media as unidimensional signals, this work devise modern deep-learning techniques tailored for this kind of data to find the author of these posts with promising results.