Michael Tschuggnall

Learn More
The vast majority of previous studies in authorship attribution assume the existence of documents (or parts of documents) labeled by authorship to be used as training instances in either closed-set or open-set attribution. However, in several applications it is not easy or even possible to find such labeled data and it is necessary to build unsupervised(More)
Knowledge is structured - until it is stored to a wiki-like information system. In this paper we present the multi-user system <i>SnoopyDB</i>, which preserves the structure of knowledge without restricting the type or schema of inserted information. A self-learning schema system and recommendation engine support the user during the process of inserting(More)
The aim of modern authorship attribution approaches is to analyze known authors and to assign authorships to previously unseen and unlabeled text documents based on various features. In this paper we present a novel feature to enhance current attribution methods by analyzing the grammar of authors. To extract the feature , a syntax tree of each sentence of(More)
The task of intrinsic plagiarism detection is to find plagiarized sections within text documents without using ar eference corpus. In this paper,t he intrinsic detection approach Plag-Inn is presented which is based on the assumption that authors use ar ecognizable and distinguishable grammar to construct sentences. The main idea is to analyze the grammar(More)
Unauthorized copying or stealing of intellectual propierties of others is a serious problem in modern society. In case of textual plagiarism, it becomes more and more easier to find appropriate sources using the huge amount of data available through online databases. To counter this problem, the two main approaches are categorized as external and intrinsic(More)
The task of text segmentation is to automatically split a text document into individual subparts, which differ according to specific measures. In this paper, an approach is presented that attempts to separate text sections of a collaboratively written document based on the grammar syntax of authors. The main idea is thereby to quantify differences of the(More)