Jonathon Read

Learn More
Sentiment Classification seeks to identify a piece of text according to its author’s general feeling toward their subject, be it positive or negative. Traditional machine learning techniques have been applied to this problem with reasonable success, but they have been shown to work well only when there is a good match between the training and test data with(More)
This article explores a combination of deep and shallow approaches to the problem of resolving the scope of speculation and negation within a sentence, specifically in the domain of biomedical research literature. The first part of the article focuses on speculation. After first showing how speculation cues can be accurately identified using a very simple(More)
An important sub-task of sentiment analysis is polarity classification, in which text is classified as being positive or negative. Supervised machine learning techniques can perform this task very effectively. However, they require a large corpus of training data, and a number of studies have demonstrated that the good performance of supervised models is(More)
The Appraisal framework is a theory of the language of evaluation, developed within the tradition of systemic functional linguistics. The framework describes a taxonomy of the types of language used to convey evaluation and position oneself with respect to the evaluations of other people. Accurate automatic recognition of these types of language can inform(More)
We review the state of the art in automated sentence boundary detection (SBD) for English and call for a renewed research interest in this foundational first step in natural language processing. We observe severe limitations in comparability and reproducibility of earlier work and a general lack of knowledge about genreand domain-specific variations. To(More)
We present the WeSearch Data Collection (WDC)—a freely redistributable, partly annotated, comprehensive sample of User-Generated Content. The WDC contains data extracted from a range of genres of varying formality (user forums, product review sites, blogs and Wikipedia) and covers two different domains (NLP and Linux). In this article, we describe the data(More)
Proper treatment of negation is an important characteristic of methods for sentiment analysis. However, while there is a growing body of research on the automatic resolution of negation, it is not yet clear as to how negation is best represented for different applications. To begin to address this issue, we review representation alternatives and present a(More)
In this work, we revisit Shared Task 1 from the 2012 *SEM Conference: the automated analysis of negation. Unlike the vast majority of participating systems in 2012, our approach works over explicit and formal representations of propositional semantics, i.e. derives the notion of negation scope assumed in this task from the structure of logical-form meaning(More)
Extracting textual content and document structure from PDF presents a surprisingly (depressingly, to some, in fact) difficult challenge, owing to the purely display-oriented design of the PDF document standard. While a variety of lower-level PDF extraction toolkits exist, none fully support the recovery of original text (in reading order) and relevant(More)