Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter

@article{Alshaabi2021StorywranglerAM,
  title={Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter},
  author={T. Alshaabi and Jane Lydia Adams and Michael Vincent Arnold and Joshua R. Minot and David Rushing Dewhurst and Andrew J. Reagan and Christopher M. Danforth and Peter Sheridan Dodds},
  journal={Science Advances},
  year={2021},
  volume={7}
}
We present Storywrangler, an interactive cultural exploratorium of phrase popularity using 100 billion tweets in 100 languages. In real time, Twitter strongly imprints world events, popular culture, and the day-to-day, recording an ever-growing compendium of language change. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we describe Storywrangler, an ongoing curation of over 100 billion tweets… 
Computational timeline reconstruction of the stories surrounding Trump: Story turbulence, narrative control, and collective chronopathy
TLDR
It is shown that 2017 was the most turbulent overall year for Trump, and story turbulence and collective chronopathy—the rate at which a population’s stories for a subject seem to change over time are quantified—are quantified.
The growing amplification of social media: measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009–2020
TLDR
It is found that for the most common languages on Twitter there is a growing tendency, though not universal, to retweet rather than share new content, and it is shown that over time, the contagion ratios for most common language are growing more strongly than those of rare languages.
Long-term word frequency dynamics derived from Twitter are corrupted: A bespoke approach to detecting and removing pathologies in ensembles of time series
TLDR
It is shown that around 10\% of day-scale word usage frequency time series for Twitter collected in real time for a set of roughly 10,000 frequently used words for over 10 years come from tweets with, in effect, corrupted language labels.
Hurricanes and hashtags: Characterizing online collective attention for natural disasters
TLDR
It is found that a hurricane’s Saffir-Simpson wind scale category assignment is strongly associated with the amount of attention it receives, and higher category storms receive higher proportional increases of attention per proportional increases in number of deaths or dollars of damage, than lower category storms.
How the world’s collective attention is being paid to a pandemic: COVID-19 related n-gram time series for 24 languages on Twitter
TLDR
A set of 2000 day-scale time series of 1- and 2-grams across 24 languages on Twitter that are most ‘important’ for April 2020 with respect to April 2019 are curated, finding a peak for the language-specific word for ‘virus’ in January 2020 followed by a decline through February and then a surge through March and April.
Computational Paremiology: Charting the temporal, ecological dynamics of proverb use in books, news articles, and tweets
Ethan Davis, ∗ Christopher M. Danforth, 2, † Wolfgang Mieder, ‡ and Peter Sheridan Dodds 4, § Computational Story Lab, Vermont Complex Systems Center, MassMutual Center of Excellence for Complex
Blending search queries with social media data to improve forecasts of economic indicators
TLDR
This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and expensive and therefore expensive and expensive process of modeling complex systems and data science.
Quantifying language changes surrounding mental health on Twitter
Anne Marie Stupinski, ∗ Thayer Alshaabi, Michael V. Arnold, Jane Lydia Adams, Joshua R. Minot, Matthew Price, Peter Sheridan Dodds, 3, 4 and Christopher M. Danforth 4, 3, † Computational Story Lab,
The incel lexicon: Deciphering the emergent cryptolect of a global misogynistic community
Kelly Gothard, David Rushing Dewhurst, Joshua R. Minot, Jane Lydia Adams, Christopher M. Danforth, 3, 4 and Peter Sheridan Dodds 3, 4 Computational Story Lab, Vermont Complex Systems Center,
Doomscrolling during COVID-19: The negative association between daily social and traditional media consumption and mental health symptoms during the COVID-19 pandemic
Consumption of traditional and social media markedly increased at the start of the COVID-19 pandemic as new information about the virus and safety guidelines evolved. Much of the information
...
...

References

SHOWING 1-10 OF 124 REFERENCES
Computational timeline reconstruction of the stories surrounding Trump: Story turbulence, narrative control, and collective chronopathy
TLDR
It is shown that 2017 was the most turbulent overall year for Trump, and story turbulence and collective chronopathy—the rate at which a population’s stories for a subject seem to change over time are quantified—are quantified.
The growing amplification of social media: measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009–2020
TLDR
It is found that for the most common languages on Twitter there is a growing tendency, though not universal, to retweet rather than share new content, and it is shown that over time, the contagion ratios for most common language are growing more strongly than those of rare languages.
A Survey of Location Prediction on Twitter
TLDR
An overall picture of location prediction on Twitter is offered, concentrating on the prediction of user home locations, tweet locations, and mentioned locations, which defines the three tasks and reviews the evaluation metrics.
Long-term word frequency dynamics derived from Twitter are corrupted: A bespoke approach to detecting and removing pathologies in ensembles of time series
TLDR
It is shown that around 10\% of day-scale word usage frequency time series for Twitter collected in real time for a set of roughly 10,000 frequently used words for over 10 years come from tweets with, in effect, corrupted language labels.
Scaling in words on Twitter
TLDR
This work investigates the scaling relations in citywise Twitter corpora coming from the metropolitan and micropolitan statistical areas of the United States and finds that a certain core vocabulary follows the scaling relationship of that of the bulk text, but most words are sensitive to city size, exhibiting a super- or a sublinear urban scaling.
Online social networks and offline protest
TLDR
It is shown that increased coordination of messages on Twitter using specific hashtags is associated with increased protests the following day, and that traditional actors like the media and elites are not driving the results.
Proceedings of the First Workshop on Social Media Analytics
TLDR
Though there is a vast quantity of information available, the consequent challenge is to be able to analyze the large volumes of user-generated content and the implicit (or explicit) links between users, in order to glean meaningful insights therein.
Tampering with Twitter’s Sample API
TLDR
It is demonstrated that, due to the nature of Twitter’s sampling mechanism, it is possible to deliberately influence these samples, the extent and content of any topic, and consequently to manipulate the analyses of researchers, journalists, as well as market and political analysts trusting these data sources.
A systematic identification and analysis of scientists on Twitter
TLDR
This work provides new methods for disambiguating and identifying particular actors on social media and describing the behaviors of scientists, thus providing foundational information for the construction and use of indicators on the basis of social media metrics.
Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution
TLDR
Overall, the findings call into question the vast majority of existing claims drawn from the Google Books corpus, and point to the need to fully characterize the dynamics of the corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.
...
...