Estimating county health statistics with twitter

Abstract

Understanding the relationships among environment, behavior, and health is a core concern of public health researchers. While a number of recent studies have investigated the use of social media to track infectious diseases such as influenza, little work has been done to determine if other health concerns can be inferred. In this paper, we present a large-scale study of 27 health-related statistics, including obesity, health insurance coverage, access to healthy foods, and teen birth rates. We perform a linguistic analysis of the Twitter activity in the top 100 most populous counties in the U.S., and find a significant correlation with 6 of the 27 health statistics. When compared to traditional models based on demographic variables alone, we find that augmenting models with Twitter-derived information improves predictive accuracy for 20 of 27 statistics, suggesting that this new methodology can complement existing approaches.

DOI: 10.1145/2556288.2557139

Extracted Key Phrases

9 Figures and Tables

020402014201520162017
Citations per Year

75 Citations

Semantic Scholar estimates that this publication has 75 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Culotta2014EstimatingCH, title={Estimating county health statistics with twitter}, author={Aron Culotta}, booktitle={CHI}, year={2014} }