Learn More
This paper describes an approach for the author profiling task of the PAN 2013 challenge. This work is based on the idea of linguistic modality 3 that has been successfully used in other classification tasks such as authorship attri-bution. We consider three different modalities: syntactic, stylistic, and semantic, each representing a different aspect of(More)
Character n-grams have been identified as the most successful feature in both single-domain and cross-domain Authorship Attribu-tion (AA), but the reasons for their discrimina-tive value were not fully understood. We identify subgroups of character n-grams that correspond to linguistic aspects commonly claimed to be covered by these features: morpho-syntax,(More)
Recent work on Authorship Attribution (AA) proposes the use of meta characteristics to train author models. The meta characteristics are orthogonal sets of similarity relations between the features from the different candidate authors. In that approach, the features are grouped and processed separately according to the type of information they encode, the(More)
Most previous research on authorship attribution (AA) assumes that the training and test data are drawn from same distribution. But in real scenarios, this assumption is too strong. The goal of this study is to improve the prediction results in cross-topic AA (CTAA), where the training data comes from one topic but the test data comes from another. Our(More)
We present the first domain adaptation model for authorship attribution to leverage unlabeled data. The model includes extensions to structural correspondence learning needed to make it appropriate for the task. For example, we propose a median-based classification instead of the standard binary classification used in previous work. Our results show that(More)
  • 1