Overview of PAN 2018 - Author Identification, Author Profiling, and Author Obfuscation

  title={Overview of PAN 2018 - Author Identification, Author Profiling, and Author Obfuscation},
  author={Efstathios Stamatatos and Francisco Manuel Rangel Pardo and Michael Tschuggnall and Benno Stein and Mike Kestemont and Paolo Rosso and Martin Potthast},
PAN 2018 explores several authorship analysis tasks enabling a systematic comparison of competitive approaches and advancing research in digital text forensics. [] Key Method In addition, a shared task in multimodal author profiling examines, for the first time, a combination of information from both texts and images posted by social media users to estimate their gender.

Overview of the Cross-domain Authorship Attribution Task at PAN 2019

This edition of PAN focuses on authorship attribution, where the task is to attribute an unknown text to a previously seen candidate author, and again focuses on the attribution task in the context of transformative literature, more colloquially know as ‘fanfiction’.

A transfer learning approach to cross-domain authorship attribution

This paper proposes the use of transfer learning based on pre-trained neural network language models and a multi-headed classifier for cross- domain attribution and demonstrates the crucial effect of the normalization corpus in cross-domain attribution and the usefulness of shallower layers of pre- trained models.

Custom Document Embeddings Via the Centroids Method: Gender Classification in an Author Profiling Task: Notebook for PAN at CLEF 2018

This report describes a method to address the AP problem, which is one of the three shared tasks evaluated, as an exercise in digital text forensics at PAN 2018 within the CLEF conference (Conference and Labs of the Evaluation Forum), and blends Word Embeddings and the Centroids Method to produce Document Embedding (DE).

Author Profiling: Gender Prediction from Tweets and Images: Notebook for PAN at CLEF 2018

The participation of the teams in the PAN 2018 shared task on author profiling, identifying authors’ gender, and the pre-processing, feature sets, machine learning methods and accuracy results are described.

Reduce & Attribute: Two-Step Authorship Attribution for Large-Scale Problems

By utilizing document embeddings, this study shows on a novel, comprehensive dataset collection that the set of candidate authors can be reduced with high accuracy and shows that common authorship attribution methods substantially benefit from a preliminary reduction if thousands of authors are involved.

Complexity Measures and POS n-grams for Author Identification in Several Languages: SINAI at PAN@CLEF 2018

The approach and results for the 2018 PAN Author Identification Task are presented, which consisted in using several measures of the complexity of the fanfics texts for each candidate by applying a Part-Of-Speech Tagger and a n-gram based vector space model.

CIC-GIL Approach to Cross-domain Authorship Attribution: Notebook for PAN at CLEF 2018

This year’s evaluation lab focuses on the closed-set attribution task applied to a Fanfiction corpus in five languages: English, French, Italian, Polish, and Spanish and uses the log-entropy weighting scheme and SVM as classifier.

Gender Identification in Twitter using N-grams and LSA: Notebook for PAN at CLEF 2018

The approach to gender identification in Twitter performed on the tweet corpus provided by CLEF for the task, and a linear Support Vector Machine (SVM) classifier is proposed, with different types of word and character n-grams as features.

Author Profiling based on Text and Images: Notebook for PAN at CLEF 2018

This paper identifies the gender of authors based on written text and shared images and proposes a way to combine multiple predictions on shared content into a single prediction on user-level.



Overview of the Author Identification Task at PAN 2013

The author identification task at PAN-2014 focuses on author verification and adopts the c@1 measure, originally proposed for the question answering task, and continues the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems.

Blogs, Twitter Feeds, and Reddit Comments: Cross-domain Authorship Attribution

It is determined that state-of-the-art methods in stylometry do not perform as well in cross- domain situations as they do in in-domain situations and methods are proposed that improve performance in the cross-domain setting with both feature and classification level techniques which can increase accuracy to up to 70%.

An Overview of the Traditional Authorship Attribution Subtask

This paper describes the Traditional Authorship Attribution subtask of the PAN/CLEF 2012 workshop, and established a new corpus for analysis for 2012 (Rome), which consisted of eight problems, including three closed-class authorship attribution problems, three open-class (the set of correct answers included Ònone of the aboveÓ), and two clustering problems.

Overview of the Author Obfuscation Task at PAN 2017: Safety Evaluation Revisited

There is still way to go to “perfect” automatic obfuscation that (1) tricks verification approaches, (2) keeps the meaning of the original, and (3) is, regarding its obfuscation, unsuspicious to a human eye.

Authorship Attribution Using Text Distortion

A novel method is presented that enhances authorship attribution effectiveness by introducing a text distortion step before extracting stylometric measures to mask topic-specific information that is not related to the personal style of authors.

On the Robustness of Authorship Attribution Based on Character N -gram Features

Comparative results with another competitive text representation approach based on very frequent words show that character n-grams are better able to capture stylistic properties of text when there are significant differences among the training and test corpora.

Cross-Genre Authorship Verification Using Unmasking

In this paper we will stress-test a recently proposed technique for computational authorship verification, ‘‘unmasking'', which has been well received in the literature. The technique envisages an

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution

It is demonstrated that characterngrams that capture information about affixes and punctuation account for almost all of the power of character n-grams as features.

Cross-Language Authorship Attribution

A number of cross-language stylometric features, such as those based on sentiment and emotional markers, are proposed for the task of CLAA, and an approach based on machine translation (MT) with both lexical and cross- language features is explored.

Overview of the 3rd Author Profiling Task at PAN 2015

The framework and the results for the Author Profiling Shared Task organised at PAN 2015 are overviewed, which aims at identifying age, gender, and personality traits of Twitter users.