Upendra Sapkota

Learn More
Character n-grams have been identified as the most successful feature in both singledomain and cross-domain Authorship Attribution (AA), but the reasons for their discriminative value were not fully understood. We identify subgroups of character n-grams that correspond to linguistic aspects commonly claimed to be covered by these features: morphosyntax,(More)
Most previous research on authorship attribution (AA) assumes that the training and test data are drawn from same distribution. But in real scenarios, this assumption is too strong. The goal of this study is to improve the prediction results in cross-topic AA (CTAA), where the training data comes from one topic but the test data comes from another. Our(More)
Recent work on Authorship Attribution (AA) proposes the use of meta characteristics to train author models. The meta characteristics are orthogonal sets of similarity relations between the features from the different candidate authors. In that approach, the features are grouped and processed separately according to the type of information they encode, the(More)
This paper describes an approach for the author profiling task of the PAN 2013 challenge. This work is based on the idea of linguistic modality that has been successfully used in other classification tasks such as authorship attribution. We consider three different modalities: syntactic, stylistic, and semantic, each representing a different aspect of text.(More)
In this paper, we describe a modified version of the profile-based approach for the Authorship Attribution (AA) task of the PAN 2012 challenge. Our PAN system for AA utilizes the concept of linguistic modalities on profile-based (PB) approaches. We concatenate all the training documents from the same author and build author-specific sub-profiles, one per(More)
We present the first domain adaptation model for authorship attribution to leverage unlabeled data. The model includes extensions to structural correspondence learning needed to make it appropriate for the task. For example, we propose a median-based classification instead of the standard binary classification used in previous work. Our results show that(More)
  • 1