Construction of a Personal Experience Tweet Corpus for Health Surveillance

@inproceedings{Jiang2016ConstructionOA,
  title={Construction of a Personal Experience Tweet Corpus for Health Surveillance},
  author={Keyuan Jiang and Ricardo A. Calix and Matrika Gupta},
  booktitle={BioNLP@ACL},
  year={2016}
}
Studies have shown that Twitter can be used for health surveillance, and personal experience tweets (PETs) are an important source of information for health surveillance. To mine Twitter data requires a relatively balanced corpus and it is challenging to construct such a corpus due to the labor-intensive annotation tasks of large data sets. We developed a bootstrap method of finding PETs with the use of the machine learning-based filter. Through a few iterations, our approach can efficiently… 

Figures and Tables from this paper

Identifying personal health experience tweets with deep neural networks
TLDR
This study designed deep neural networks with 3 different architectural configurations, and after training them with a corpus of 8,770 annotated tweets, used them to predict personal experience tweets from a set of 821 annotate tweets, demonstrating a significant amount of improvement in predicting personal health experience tweets byDeep neural networks over that by conventional classifiers.
Semi-Supervised Language Models for Identification of Personal Health Experiential from Twitter Data: A Case for Medication Effects
TLDR
This work investigated two semi-supervised learning methods, with different mixes of labeled and unlabeled data in the training set, to understand the impact on classification performance, and found that both methods generated a noticeable improvement in F1 score when the labeled set was small, and consistency regularization could still provide a small gain even a larger labeledSet was used.
Identifying tweets of personal health experience through word embedding and LSTM neural network
TLDR
This study presented an efficient and effective method of identifying health-related personal experience tweets by combining word embedding and an LSTM neural network that outperforms the conventional methods in identifying PETs.
Deep gramulator: Improving precision in the classification of personal health-experience tweets with deep learning
TLDR
Several machine learning algorithms including deep neural nets are used to build classifiers that can help to detect Personal Experience Tweets (PETs) and a method called the Deep Gramulator is proposed that improves results.
Prediction of Personal Experience Tweets of Medication Use via Contextual Word Representations*
TLDR
This study investigated a method of predicating personal experience tweets using Google’s Bidirectional Encoder Representations from Transformers (BERT) and neural networks, in which BERT models contextually represented the tweet text, and showed that the trained BERT model performs better than Google's pre-trained models.
Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating
TLDR
This study utilized three methods based on Facebook’s Robustly Optimized BERT Pretraining Approach (RoBERTa) to predict personal experience tweets related to medication use to outperform the published methods (Word Embedding + LSTM) in classification performance.
Classifying Patient and Professional Voice in Social Media Health Posts
TLDR
The main conclusion resulting from this work is that combining social media data from platforms with different characteristics for training a patient and professional voice classifier does not result in best possible performance and it is best to train separate models per data source (Reddit and Twitter) instead of a model using the combined data from both sources.
Classifying patient and professional voice in social media health posts
TLDR
It is shown that it is best to train separate models per data source (Reddit and Twitter) instead of a model using the combined training data from both sources and it is preferable to train different models per domain (cardiovascular and skin) while showing that the difference to the combined model is only minor.
Assessment of Word Embedding Techniques for Identification of Personal Experience Tweets Pertaining to Medication Uses
TLDR
It is discovered that word embedding- based classification methods consistently outperform the engineered feature-based classification methods with statistical significance of p < 0.01, but there exist no significantly statistical differences among the 4 study word embeding methods.
...
1
2
3
...

References

SHOWING 1-10 OF 43 REFERENCES
Mining Twitter for Adverse Drug Reaction Mentions : A Corpus and Classification Benchmark
TLDR
A freely available, manually annotated corpus of 10,822 tweets is presented, which can be used to train automated tools to mine Twitter for adverse drug reactions (ADRs), and the utility of the corpus is evaluated by training two classes of machine learning algorithms: Naïve Bayes and Support Vector Machines.
Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language
TLDR
A minimally trained algorithm is developed that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term and translates an influenza case definition into a Boolean query, resulting in each symptom being described by a technical term.
OMG U got flu? Analysis of shared health messages for bio-surveillance
TLDR
The study adds to evidence supporting a high degree of correlation between pre-diagnostic social media signals and diagnostic influenza case data, pointing the way towards low cost sensor networks.
Mining Twitter Data to Improve Detection of Schizophrenia
TLDR
This work leveraged the large corpus of Twitter posts and machine-learning methodologies to detect individuals with schizophrenia, using features from tweets such as emoticon use, posting time of day, and dictionary terms.
Mining Twitter Data for Potential Drug Effects
TLDR
This work developed a computational approach that collects, processes and analyzes Twitter data for drug effects, and uses NLM’s MetaMap software to recognize and extract word phrases that belong to drug effects.
National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic
TLDR
The authors' recently developed influenza infection detection algorithm that automatically distinguishes relevant tweets from other chatter detects the weekly change in direction (increasing or decreasing) of influenza prevalence with 85% accuracy, a nearly twofold increase over a simpler model, demonstrating the utility of explicitly distinguishing infection tweets fromother chatter.
Utilizing social media data for pharmacovigilance: A review
Towards large-scale twitter mining for drug-related adverse events
TLDR
An approach to find drug users and potential adverse events by analyzing the content of twitter messages utilizing Natural Language Processing (NLP) and to build Support Vector Machine (SVM) classifiers is described, suggesting that daily-life social networking data could help early detection of important patient safety issues.
The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets
TLDR
This study demonstrates that not only does keyword choice play an important role in how well tweets correlate with disease occurrence, but that the subgroup of tweets used for analysis is also important.
Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak
TLDR
Twitter can be used for real-time content analysis and knowledge translation research, allowing health authorities to respond to public concerns, and illustrates the potential of using social media to conduct “infodemiology” studies for public health.
...
1
2
3
4
5
...