Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features

  title={Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features},
  author={Thin Nguyen and Duc Thanh Nguyen and Mark Erik Larsen and Bridianne O’Dea and John Yearwood and Dinh Q. Phung and Svetha Venkatesh and Helen Christensen},
  journal={Proceedings of the 26th International Conference on World Wide Web Companion},
From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. [] Key Method The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features.

Figures and Tables from this paper

SPDF: Set Probabilistic Distance Features for Prediction of Population Health Outcomes via Social Media
Experimental results show that the proposed approach achieves state-of-the-art performance on linguistic style features in prediction of all health indices and in both case studies.
Estimating County Health Indices Using Graph Neural Networks
This paper introduces a graph modeling method to construct the representation of each county as a graph of interactions between health-related features in the community, and adopts a graph neural network model to learn the population health representation, ended by a regression layer, to estimate the health indices.
Using Social Media for Mental Health Surveillance
Big data research of social media data may also support standard surveillance approaches and provide decision-makers with usable information about users' habits and activities.
Using Twitter Social Media for Depression Detection in the Canadian Population
This research utilizes personal narratives collected users with self-reported depression to build a model suitable for predicting depression in a sample of Twitter users that is representative for the Canadian population.
Population-level Indicators of Physical Activity, Sedentary Behaviour and Sleep in Canada based on Twitter
This thesis addresses challenges in building systems that collect high-volumes of data from social media platforms and shows how machine learning can be used to complement public health data and better inform health policy makers to improve the lives of Canadians.
Twitter-based Influenza Surveillance: An Analysis of the 2016-2017 and 2017-2018 Seasons in Italy
There is a strict correlation between the reports published on the InfluNet system, and the contents posted by Twitter users about their symptoms and health state, and it is found that the sentiment expressed by people regarding the treatment, in terms of medicines, taken to heal seems rather negative.
How people talk about health?: Detecting Health Topics from Twitter Streams
The paper proposes an online clustering algorithm for detecting health-related topics that is capable to group tweets addressing common health issues into the pertinent topic, outperforming traditional topic model approaches, like Doc-p and LDA.
Exploring the digital footprint of depression: a PRISMA systematic literature review of the empirical evidence
Background This PRISMA systematic literature review examined the use of digital data collection methods (including ecological momentary assessment [EMA], experience sampling method [ESM], digital
On the Use of Textual and Visual Data from Online Social Networks for Predicting Community Health
  • Hang Le, Hung Nguyen
  • Computer Science
    2020 International Conference on Advanced Computing and Applications (ACOMP)
  • 2020
This paper proposes two types of population health representations extracted from social media photos, in particular color histograms - a hand-crafted visual feature set, and automatic features learned from a deep convolutional neural network.
A graph-based approach for population health analysis using Geo-tagged tweets
Experimental results verified the robustness of the proposed approach and its superiority over existing ones in both case studies, confirming the potential of graph-based approach for modeling interactions in social networks for population health analysis.


Estimating county health statistics with twitter
A large-scale study of 27 health-related statistics, including obesity, health insurance coverage, access to healthy foods, and teen birth rates, finds that augmenting models with Twitter-derived information improves predictive accuracy for 20 of 27 statistics, suggesting that this new methodology can complement existing approaches.
Could behavioral medicine lead the web data revolution?
Web data are potentially the only source for real-time insights into behavioral medicine, where web data can be available almost immediately compared to a 365-day lag time between annual surveys, and can be an important source for identifying new hypotheses.
Predicting Depression via Social Media
It is found that social media contains useful signals for characterizing the onset of depression in individuals, as measured through decrease in social activity, raised negative affect, highly clustered egonetworks, heightened relational and medicinal concerns, and greater expression of religious involvement.
Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset
It is found that regional prevalence estimates for non-communicable diseases can be reasonably predicted and stressed the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.
Future-oriented tweets predict lower county-level HIV prevalence in the United States.
OBJECTIVE Future orientation promotes health and well-being at the individual level. Computerized text analysis of a dataset encompassing billions of words used across the United States on Twitter
Psychological Language on Twitter Predicts County-Level Heart Disease Mortality
Capturing community psychological characteristics through social media is feasible, and these characteristics are strong markers of cardiovascular mortality at the community level.
Characterizing Geographic Variation in Well-Being Using Tweets
The language used in tweets from 1,300 different US counties was found to be predictive of the subjective well-being of people living in those counties as measured by representative surveys. Topics,
Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak.
This work assessed correlation of volume of cholera-related HealthMap news media reports, Twitter postings, and governmentCholera cases reported in the first 100 days of the 2010 Haitian cholERA outbreak to find trends in volume of informal sources significantly correlated in time with official case data and was available up to 2 weeks earlier.
The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic
The use of information embedded in the Twitter stream is examined to (1) track rapidly-evolving public sentiment with respect to H1N1 or swine flu, and (2) track and measure actual disease activity.
Characterizing Sleep Issues Using Twitter
A novel method for studying sleep issues is demonstrated that allows for fast, cost-effective, and customizable data to be gathered and indicates a possible relationship between sleep and pyschosocial issues.