Weibo-COV: A Large-Scale COVID-19 Social Media Dataset from Weibo

@article{Hu2020WeiboCOVAL,
  title={Weibo-COV: A Large-Scale COVID-19 Social Media Dataset from Weibo},
  author={Yong Hu and Heyan Huang and Anfan Chen and Xianling Mao},
  journal={ArXiv},
  year={2020},
  volume={abs/2005.09174}
}
With the rapid development of COVID-19, people are asked to maintain "social distance" and "stay at home". In this scenario, more and more social interactions move online, especially on social media like Twitter and Weibo. People post tweets to share information, express opinions and seek help during the pandemic, and these tweets on social media are valuable for studies against COVID19, such as early warning and outbreaks detection. Therefore, in this paper, we release a novel large-scale… Expand

Figures and Tables from this paper

Characterizing Weibo Social Media Posts From Wuhan, China During the Early Stages of the COVID-19 Pandemic: Qualitative Content Analysis
TLDR
Between the announcement of pneumonia and respiratory illness of unknown origin in late December 2019 and the discovery of human-to-human transmission on January 20, 2020, a high volume of public anxiety and confusion about COVID-19 is observed, including different reactions to the news by users, negative sentiment after being exposed to information, and public reaction that translated to self-reported behavior. Expand
Exploring Public Response to COVID-19 on Weibo with LDA Topic Modeling and Sentiment Analysis
TLDR
Analysis of sentiments and semantic networks reveals that country media, as well as influential individuals and “self-media,” together contribute to the information spread of positive sentiment. Expand
Using Reports of Symptoms and Diagnoses on Social Media to Predict COVID-19 Case Counts in Mainland China: Observational Infoveillance Study
TLDR
This infoveillance study employs the largest, most comprehensive, and most fine-grained social media data to date to predict COVID-19 case counts in mainland China and finds that reports of symptoms and diagnosis of CO VID-19 significantly predicted daily case counts up to 14 days ahead of official statistics. Expand
Local COVID-19 Severity and Social Media Responses: Evidence From China
TLDR
This study reveals how pandemics affect local sentiment and provides an easy-to-implement and explanatory pipeline to classify sentiments and resolve related socioeconomic issues. Expand
ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks
TLDR
Preliminary analysis shows that ArCOV-19 captures rising discussions associated with the first reported cases of the disease as they appeared in the Arab world. Expand
What triggers online help-seeking retransmission during the COVID-19 period? Empirical evidence from Chinese social media
TLDR
This study explores the driving forces behind the retransmission of online help-seeking posts in the COVID-19 period and builds an analytical framework that emphasized content characteristics, including information completeness, proximity, support seeking type, disease severity, and emotion of help- seeking messages. Expand
COVID-19 UK Social Media Dataset for Public Health Research: Methodology for Collection and Processing
We present a benchmark database of public social media postings from the United Kingdom related to the Covid-19 pandemic for academic research purposes, along with some initial analysis, including aExpand
Conspiracy and debunking narratives about COVID-19 origination on Chinese social media: How it started and who is to blame
TLDR
It is suggested that conspiracy narratives about COVID-19 origination can carry highly cultural and political orientations and correction efforts should consider political motives and identify important stakeholders to reconstruct international dialogues toward intercultural understanding. Expand
CHECKED: Chinese COVID-19 fake news dataset
TLDR
CHECKED is the first Chinese dataset on COVID-19 misinformation and can facilitate studies that target misinformation on coronavirus, and contains a rich set of multimedia information for each microblog including ground-truth label, textual, visual, temporal, and network information. Expand
IRLCov19: A Large COVID-19 Multilingual Twitter Dataset of Indian Regional Languages
  • D. Uniyal, Amit Agarwal
  • Computer Science
  • ArXiv
  • 2021
TLDR
The dataset related to COVID-19 collected in the period between February 2020 to July 2020 specifically for regional languages in India is studied to help the Government of India, various state governments, NGOs, researchers, and policymakers in studying different issues related to the pandemic. Expand
...
1
2
...

References

SHOWING 1-10 OF 19 REFERENCES
NAIST COVID: Multilingual COVID-19 Twitter and Weibo Dataset
TLDR
This paper releases a multilingual dataset of social media posts related to COVID-19, consisting of microblogs in English and Japanese from Twitter and those in Chinese from Weibo, and provides a quantitative as well as qualitative analysis of these datasets by creating daily word clouds as an example of text-mining analysis. Expand
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
Background At the time of this writing, the coronavirus disease (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources, and economies around the world.Expand
Reports of Own and Others' Symptoms and Diagnosis on Social Media Predict COVID-19 Case Counts in Mainland China
TLDR
It is found that reports of symptoms and diagnosis of COVID-19 significantly predicted daily case counts, up to seven days ahead of official statistics, and the predictive pattern held true for both Hubei province and the rest of mainland China, regardless of unequal distribution of healthcare resources and outbreak timeline. Expand
Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset
TLDR
Using Natural Language Processing, Text Mining, and Network Analysis to analyze corpus of tweets that relate to the COVID-19 pandemic, common responses to the pandemic are identified and how these responses differ across time is identified. Expand
Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19
TLDR
A human annotation study reveals the utility of the models on a subset of Mega-COV, a billion-scale dataset from Twitter for studying COVID-19 and develops two powerful models for identifying whether or not a tweet is related to the pandemic. Expand
Tracking the flu pandemic by monitoring the social web
TLDR
A monitoring tool to measure the prevalence of disease in a population by analysing the contents of social networking tools, such as Twitter, and turning statistical information into a flu-score, which can be used at close time intervals to provide inexpensive and timely information about the state of an epidemic. Expand
#Swineflu: Twitter Predicts Swine Flu Outbreak in 2009
TLDR
An investigation into Twitter is presented, using around 3 Million tweets gathered between May and December 2009, as a possible source of surveillance data and its feasibility to serve as an early warning system, to demonstrate that Twitter can serve as a self-reporting tool, and hence, provide indications of increased infection spreading. Expand
Early Warning and Outbreak Detection Using Social Networking Websites: The Potential of Twitter
TLDR
A method for extracting messages, called “tweets” from the Twitter website and the results of a pilot study which collected over 135,000 tweets in a week during the current Swine Flu pandemic are described. Expand
Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
TLDR
The results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizons. Expand
Detecting influenza epidemics using search engine query data
TLDR
A method of analysing large numbers of Google search queries to track influenza-like illness in a population and accurately estimate the current level of weekly influenza activity in each region of the United States with a reporting lag of about one day is presented. Expand
...
1
2
...