Construction of a Personal Experience Tweet Corpus for Health Surveillance
@inproceedings{Jiang2016ConstructionOA, title={Construction of a Personal Experience Tweet Corpus for Health Surveillance}, author={Keyuan Jiang and Ricardo A. Calix and Matrika Gupta}, booktitle={BioNLP@ACL}, year={2016} }
Studies have shown that Twitter can be used for health surveillance, and personal experience tweets (PETs) are an important source of information for health surveillance. To mine Twitter data requires a relatively balanced corpus and it is challenging to construct such a corpus due to the labor-intensive annotation tasks of large data sets. We developed a bootstrap method of finding PETs with the use of the machine learning-based filter. Through a few iterations, our approach can efficiently…
22 Citations
Identifying personal health experience tweets with deep neural networks
- Computer Science2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
- 2017
This study designed deep neural networks with 3 different architectural configurations, and after training them with a corpus of 8,770 annotated tweets, used them to predict personal experience tweets from a set of 821 annotate tweets, demonstrating a significant amount of improvement in predicting personal health experience tweets byDeep neural networks over that by conventional classifiers.
Semi-Supervised Language Models for Identification of Personal Health Experiential from Twitter Data: A Case for Medication Effects
- Computer ScienceBIONLP
- 2021
This work investigated two semi-supervised learning methods, with different mixes of labeled and unlabeled data in the training set, to understand the impact on classification performance, and found that both methods generated a noticeable improvement in F1 score when the labeled set was small, and consistency regularization could still provide a small gain even a larger labeledSet was used.
Identifying tweets of personal health experience through word embedding and LSTM neural network
- Computer ScienceBMC Bioinformatics
- 2018
This study presented an efficient and effective method of identifying health-related personal experience tweets by combining word embedding and an LSTM neural network that outperforms the conventional methods in identifying PETs.
Deep gramulator: Improving precision in the classification of personal health-experience tweets with deep learning
- Computer Science2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
- 2017
Several machine learning algorithms including deep neural nets are used to build classifiers that can help to detect Personal Experience Tweets (PETs) and a method called the Deep Gramulator is proposed that improves results.
COVID-19 personal health mention detection from tweets using dual convolutional neural network
- Computer ScienceExpert Systems with Applications
- 2022
Prediction of Personal Experience Tweets of Medication Use via Contextual Word Representations*
- Computer Science, Business2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
- 2019
This study investigated a method of predicating personal experience tweets using Google’s Bidirectional Encoder Representations from Transformers (BERT) and neural networks, in which BERT models contextually represented the tweet text, and showed that the trained BERT model performs better than Google's pre-trained models.
Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating
- Computer ScienceLOUHI
- 2020
This study utilized three methods based on Facebook’s Robustly Optimized BERT Pretraining Approach (RoBERTa) to predict personal experience tweets related to medication use to outperform the published methods (Word Embedding + LSTM) in classification performance.
Classifying Patient and Professional Voice in Social Media Health Posts
- Computer Science
- 2021
The main conclusion resulting from this work is that combining social media data from platforms with different characteristics for training a patient and professional voice classifier does not result in best possible performance and it is best to train separate models per data source (Reddit and Twitter) instead of a model using the combined data from both sources.
Classifying patient and professional voice in social media health posts
- Computer ScienceBMC Medical Informatics and Decision Making
- 2021
It is shown that it is best to train separate models per data source (Reddit and Twitter) instead of a model using the combined training data from both sources and it is preferable to train different models per domain (cardiovascular and skin) while showing that the difference to the combined model is only minor.
Assessment of Word Embedding Techniques for Identification of Personal Experience Tweets Pertaining to Medication Uses
- Computer SciencePrecision Health and Medicine
- 2020
It is discovered that word embedding- based classification methods consistently outperform the engineered feature-based classification methods with statistical significance of p < 0.01, but there exist no significantly statistical differences among the 4 study word embeding methods.
References
SHOWING 1-10 OF 43 REFERENCES
Mining Twitter for Adverse Drug Reaction Mentions : A Corpus and Classification Benchmark
- Computer Science
- 2014
A freely available, manually annotated corpus of 10,822 tweets is presented, which can be used to train automated tools to mine Twitter for adverse drug reactions (ADRs), and the utility of the corpus is evaluated by training two classes of machine learning algorithms: Naïve Bayes and Support Vector Machines.
Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language
- Computer SciencePloS one
- 2013
A minimally trained algorithm is developed that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term and translates an influenza case definition into a Boolean query, resulting in each symptom being described by a technical term.
OMG U got flu? Analysis of shared health messages for bio-surveillance
- BiologySemantic Mining in Biomedicine
- 2010
The study adds to evidence supporting a high degree of correlation between pre-diagnostic social media signals and diagnostic influenza case data, pointing the way towards low cost sensor networks.
Mining Twitter Data to Improve Detection of Schizophrenia
- Psychology, Computer ScienceAMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
- 2015
This work leveraged the large corpus of Twitter posts and machine-learning methodologies to detect individuals with schizophrenia, using features from tweets such as emoticon use, posting time of day, and dictionary terms.
Mining Twitter Data for Potential Drug Effects
- Computer ScienceADMA
- 2013
This work developed a computational approach that collects, processes and analyzes Twitter data for drug effects, and uses NLM’s MetaMap software to recognize and extract word phrases that belong to drug effects.
National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic
- MedicinePloS one
- 2013
The authors' recently developed influenza infection detection algorithm that automatically distinguishes relevant tweets from other chatter detects the weekly change in direction (increasing or decreasing) of influenza prevalence with 85% accuracy, a nearly twofold increase over a simpler model, demonstrating the utility of explicitly distinguishing infection tweets fromother chatter.
Utilizing social media data for pharmacovigilance: A review
- Computer Science, MedicineJ. Biomed. Informatics
- 2015
Towards large-scale twitter mining for drug-related adverse events
- Computer ScienceSHB '12
- 2012
An approach to find drug users and potential adverse events by analyzing the content of twitter messages utilizing Natural Language Processing (NLP) and to build Support Vector Machine (SVM) classifiers is described, suggesting that daily-life social networking data could help early detection of important patient safety issues.
The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets
- MedicineJournal of medical Internet research
- 2013
This study demonstrates that not only does keyword choice play an important role in how well tweets correlate with disease occurrence, but that the subgroup of tweets used for analysis is also important.
Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak
- MedicinePloS one
- 2010
Twitter can be used for real-time content analysis and knowledge translation research, allowing health authorities to respond to public concerns, and illustrates the potential of using social media to conduct “infodemiology” studies for public health.