Julian Brooke

Learn More
We present a lexicon-based approach to extracting sentiment from text. The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. SO-CAL is applied to the polarity classification task, the process of assigning a positive or negative(More)
We explore the adaptation of English resources and techniques for text sentiment analysis to a new language, Spanish. Our main focus is the modification of an existing English semantic orientation calculator and the building of dictionaries; however we also compare alternate approaches, including machine translation and Support Vector Machine(More)
We present a syntactic and lexically based discourse segmenter (SLSeg) that is designed to avoid the common problem of over-segmenting text. Segmentation is the first step in a discourse parser, a system that constructs discourse trees from elementary discourse units. We compare SLSeg to a probabilistic segmenter, showing that a conservative approach(More)
We present an approach to extracting sentiment from texts that makes use of contextual information. Using two di¤erent approaches, we extract the most relevant sentences of a text, and calculate semantic orientation weighing those more heavily. The …rst approach makes use of discourse structure via Rhetorical Structure Theory, and extracts nuclei as the(More)
Previous approaches to the task of native language identification (Koppel et al., 2005) have been limited to small, within-corpus evaluations. Because these are restrictive and unreliable, we apply cross-corpus evaluation to the task. We demonstrate the efficacy of lexical features, which had previously been avoided due to the within-corpus topic confounds,(More)
We begin by showing that the best publicly available, multiple-L1 learner corpus, the International Corpus of Learner English (Granger et al. 2009), has serious issues when used for the task of native language detection (NLD). The topic biases in the corpus are a confounding factor that result in crossvalidated performance that is misleading, for all the(More)
The task of native language (L1) identification suffers from a relative paucity of useful training corpora, and standard within-corpus evaluation is often problematic due to topic bias. In this paper, we introduce a method for L1 identification in second language (L2) texts that relies only on much more plentiful L1 data, rather than the L2 texts that are(More)
The transmembrane precursor of the monkey (Mk) heparin-binding, epidermal growth factor-like growth factor (proHB-EGF) functions as a diphtheria toxin (DT) receptor, whereas the mouse (Ms) precursor does not. Previously, using chimeric Ms/Mk precursors, we have shown that DT resistance of cells bearing Ms proHB-EGF may be accounted for by several amino acid(More)