Social media provides a rich environment for understanding social connections, interactions and information sharing across many aspects of society. The relative ease of access to social media data through provision of application program interface's (API) by social media companies has led to a significant number of studies that attempt to understand how social media fits into society and how the public uses it for discourse and information sharing. One of the existing gaps in these studies is the lack of extensive description of the data collection and processing methods. These gaps exist as a result of word limits in existing publication venues and a lack of appropriate publication venues to share this type of fundamental research. The following paper provides extensive detail as to how a 52 million corpus of Twitter data on the 2012 Presidential Election in the United States was collected, parsed and analyzed. This level of detail is imperative in studies of social media as small choices in what data to collect can have material effect on the findings. In addition to the description of the methods, the following paper provides a contribution to knowledge in providing basic characteristics of one of the largest research datasets of social media activity compiled to study political discourse.
Unfortunately, ACM prohibits us from displaying non-influential references for this paper.
To see the full reference list, please visit http://dl.acm.org/citation.cfm?id=2930987.