Learn More
Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition,(More)
We consider the problem of part-of-speech tagging for informal, online conversational text. We systematically evaluate the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks; Twitter tagging(More)
We address the problem of part-of-speech tagging for English data from the popular micro-blogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related(More)
We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contempora-neous Twitter messages. While our results vary across datasets, in several cases the(More)
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as " sports " or " entertainment " are rendered differently in each geographic(More)
With Twitter and Facebook blocked in China, the stream of information from Chinese domestic social media provides a case study of social media behavior under the influence of active censorship. While much work has looked at efforts to prevent access to information in China (including IP blocking of foreign websites or search engine filtering), we present(More)
We present TweetMotif, an exploratory search application for Twitter. Unlike traditional approaches to information retrieval, which present a simple list of messages , TweetMotif groups messages by frequent significant terms — a result set's subtopics — which facilitate navigation and drilldown through a faceted search interface. The topic extraction system(More)
Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level(More)
One hundred consecutive patients over the age of 60 years with unstable fractures of the ankle were retrospectively reviewed. Fifty were treated operatively and 50 nonoperatively. The mean follow-up was 7 years (2-16 years). Satisfactory reduction was a prerequisite in both groups. Patient satisfaction with regard to pain, deformity, and stability was(More)
We measured fracture stiffness in 212 patients with tibial fractures treated by external fixation. In the first 117 patients (group 1) the decision to remove the fixator and allow independent weight-bearing was made on clinical grounds. In the other 95 patients (group 2) the frames were removed when the fracture stiffness had reached 15 Nm/degree. In group(More)