Many plain text information hiding techniques demand deep semantic processing, and so suffer in reliability. In contrast, syntactic processing is a more mature and reliable technology. Assuming a perfect parser, this paper evaluates a set of automated and reversible syntactic transforms that can hide information in plain text without changing the meaning or… (More)
Error-Correcting Output Coding (ECOC) is a general framework for multiclass text classification with a set of binary classifiers. It can not only help a binary classifier solve multi-class classification problems, but also boost the performance of a multi-class classifier. When building each individual binary classifier in ECOC, multiple classes are… (More)
We present three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching. We test these techniques on a random sample of the British National Corpus. Individual candidate marks are checked for goodness of… (More)
In this paper we present the approach we took in our participation to the PAN 2013 Author Profiling task. It is an adaptation of our system submitted for author identification, assuming that a profile category (authors belonging to the same gender and age group categories) can be analyzed in the same way as an author's style.
In this paper we present the approach we took in our participation to the PAN 2013 Author Identification task. It relies on a complex process to select the features which represent the author's writing, using potentially multiple statistics and distance measures computed from the training set.
We compare two statistical methods for identifying spam or junk electronic mail. Spam filters are classifiers which determine whether an email is junk or not. The proliferation of spam email has made electronic filtering vitally important. The magnitude of the problem is discussed. We examine the Naive Bayesian method in relation to the 'Chi by degrees of… (More)
Experiments on the detection of the source language of literary translations are described. Two feature types are exploited, n-gram based features and document-level statistics. Cross-validation results on a corpus of twenty 19th-century texts including translations from Russian, French, German and texts written in English are promising: single feature… (More)
Declaration I hereby declare that this thesis is entirely my own work and that it has not been submitted as an exercise for a degree at any other university. Acknowledgements Thanks to Dr. Carl Vogel, my project supervisor, for being so supportive and helpful, to my friends for diverting me, and to these women especially, for all their love and supportive,… (More)
We introduce the formal underpinnings of our theory of non-classical feature structures. The resulting expanded universe of feature structures has direct impfications for robust parsing for linguistic theories founded upon feature theory. We present an implementation of a robust chart parser for Head-driven Phrase Structure Grammar (HPSG). The problem of… (More)