Exploring the utilization of places through a scalable “Activities in Places” analysis mechanism
Social media offers a wealth of insight into howsignificant events -- such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing -- affect individuals. The scale of available data, however, can be intimidating: duringthe Great East Japan Earthquake, over 8 million tweets weresent each day from Japan alone. Conventional word vector-based event-detection techniques for social media that use Latent SemanticAnalysis, Latent Dirichlet Allocation, or graph communitydetection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we propose an efficient method for event detection by leveraging a fast feature selection algorithm called CWC. While we begin withword count vectors of authors and words for each time slot (inour case, every hour), we extract discriminative words from eachslot using CWC, which vastly reduces the number of features to track. We then convert these word vectors into a time series of vector distances from the initial point. The distance betweeneach time slot and the initial point remains high while an eventis happening, yet declines sharply when the event ends, offeringan accurate portrait of the span of an event. This method makes it possible to detect events from vast datasets. To demonstrateour method's effectiveness, we extract events from a dataset ofover two hundred million tweets sent in the 21 days followingthe Great East Japan Earthquake. With CWC, we can identifyevents from this dataset with great speed and accuracy.