PART: Pre-trained Authorship Representation Transformer

  title={PART: Pre-trained Authorship Representation Transformer},
  author={Javier Huertas-Tato and {\'A}lvaro Huertas-Garc{\'i}a and Alejandro Mart{\'i}n and David Camacho},
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage. Finding these details is very relevant to profile authors, relating back to their gender, occupation, age, and so on. But most importantly, repeating writing patterns can help attributing authorship to a text. Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain… 



Stacked authorship attribution of digital texts

Cross-Domain Authorship Attribution Using Pre-trained Language Models

This paper modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models and demonstrates the crucial effect of the normalization corpus in cross-domain attribution.

BertAA : BERT fine-tuning for Authorship Attribution

BertAA is introduced, a fine-tuning of a pre-trained BERT language model with an additional dense layer and a softmax activation to perform authorship classification to reach competitive performances on Enron Email, Blog Authorship, and IMDb datasets.

A survey of modern authorship attribution methods

A survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification.

Stylometric Analysis for Authorship Attribution on Twitter

This analysis targets the micro-blogging site Twitter, where people share their interests and thoughts in form of short messages called "tweets" and presents machine learning techniques and stylometric features of the authors that enable authorship to be determined at rates significantly better than chance for texts of 140 characters or less.

Authorship Attribution of Social Media and Literary Russian-Language Texts Using Machine Learning Methods and Feature Selection

Authorship attribution is one of the important fields of natural language processing (NLP). Its popularity is due to the relevance of implementing solutions for information security, as well as

Authorship Attribution for Twitter in 140 Characters or Less

It is shown that the SCAP methodology performs extremely well on twitter messages and even with restrictions on the types of information allowed, such as the recipient of directed messages, still perform significantly higher than chance.

What represents “style” in authorship attribution?

Different linguistic aspects that may help represent style are analyzed and it is shown that syntax may be helpful for cross-genre attribution while cross-topic attribution and single-domain may benefit from additional lexical information.

Topic or Style? Exploring the Most Useful Features for Authorship Attribution

An analysis of four widely used datasets to explore how different types of features affect authorship attribution accuracy under varying conditions results in conclusions that outperform the prior state-of-the-art on two out of the four datasets used.

Automatically profiling the author of an anonymous text

ImagIne that you have been gIven an Important text of unknown authorship, and wish to know as much as possible about the unknown author (demographics, personality, cultural background, among others),