Brendan T. O’Connor

Learn More
We present improvements to a Twitter part-of-speech tagger, making use of several new features and largescale word clustering. With these changes, the tagging accuracy increased from 89.2% to 92.8% and the tagging speed is 40 times faster. In addition, we expanded our Twitter tokenizer to support a broader range of Unicode characters, emoticons, and URLs.(More)
Recent advances in research tools for the systematic analysis of textual data are enabling exciting new research throughout the social sciences. For comparative politics, scholars who are often interested in nonEnglish and possibly multilingual textual datasets, these advances may be difficult to access. This article discusses practical issues that arise in(More)
BACKGROUND Mucohaemorrhagic diarrhea caused by Brachyspira hyodysenteriae, swine dysentery, is a severe production limiting disease of swine. Recently, pigs in western Canada with clinical signs indistinguishable from swine dysentery were observed. Despite the presence of spirochetes on fecal smears, recognized Brachyspira spp. including B. hyodysenteriae(More)
Across many disciplines, interest is increasing in the use of computational text analysis in the service of social science questions. We survey the spectrum of current methods, which lie on two dimensions: (1) computational and statistical model complexity; and (2) domain assumptions. This comparative perspective suggests directions of research to better(More)
In this paper I describe a preliminary experimental system, MITEXTEXPLORER, for textual linked brushing, which allows an analyst to interactively explore statistical relationships between (1) terms, and (2) document metadata (covariates). An analyst can graphically select documents embedded in a temporal, spatial, or other continuous space, and the tool(More)
Statistical models of text have become increasingly popular in statistics and computer science as a method of exploring large document collections. Social scientists often want to move beyond exploration, to measurement and experimentation, and make inference about social and political processes that drive discourse and content. In this paper, we develop a(More)
The field of Music Information Retrieval (MIR) draws from musicology, signal processing, and artificial intelligence. A long line of work addresses problems including: music understanding (extract the musically-meaningful information from audio waveforms), automatic music annotation (measuring song and artist similarity), and other problems. However, very(More)
We describe the cloning, expression and purification of the bovine XM866409 form of pyroglutamyl peptidase type-1 (PAP1). The cloned nucleotide sequence has an ORF coding for a primary sequence of 209 amino acid residues, which displays 98% identity with the human AJ278828 form of the enzyme. Three amino acid residues at positions 81, 205 and 208 were found(More)