Michael Tschuggnall

Learn More
The vast majority of previous studies in authorship attribution assume the existence of documents (or parts of documents) labeled by authorship to be used as training instances in either closed-set or open-set attribution. However, in several applications it is not easy or even possible to find such labeled data and it is necessary to build unsupervised(More)
Several authorship analysis tasks require the decomposition of a multiauthored text into its authorial components. In this regard two basic prerequisites need to be addressed: (1) style breach detection, i.e., the segmenting of a text into stylistically homogeneous parts, and (2) author clustering, i.e., the grouping of paragraph-length texts by authorship.(More)
Knowledge is structured - until it is stored to a wiki-like information system. In this paper we present the multi-user system <i>SnoopyDB</i>, which preserves the structure of knowledge without restricting the type or schema of inserted information. A self-learning schema system and recommendation engine support the user during the process of inserting(More)
The aim of modern authorship attribution approaches is to analyze known authors and to assign authorships to previously unseen and unlabeled text documents based on various features. In this paper we present a novel feature to enhance current attribution methods by analyzing the grammar of authors. To extract the feature, a syntax tree of each sentence of a(More)
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of digital text forensic research. PAN 2016 comprises three shared tasks: (i) author identification, addressing author clustering and diarization (or intrinsic plagiarism detection); (ii) author profiling, addressing age and(More)
The task of intrinsic plagiarism detection is to find plagiarized sections within text documents without using a reference corpus. In this paper, the intrinsic detection approach Plag-Inn is presented which is based on the assumption that authors use a recognizable and distinguishable grammar to construct sentences. The main idea is to analyze the grammar(More)
Unauthorized copying or stealing of intellectual propierties of others is a serious problem in modern society. In case of textual plagiarism, it becomes more and more easier to find appropriate sources using the huge amount of data available through online databases. To counter this problem, the two main approaches are categorized as external and intrinsic(More)