• Corpus ID: 215744908

ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks

@article{Haouari2020ArCOV19TF,
  title={ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks},
  author={Fatima Haouari and Maram Hasanain and Reem Suwaileh and T. Elsayed},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.05861}
}
In this paper, we present ArCOV-19, an Arabic COVID-19 Twitter dataset that spans one year, covering the period from 27th of January 2020 till 31st of January 2021. ArCOV-19 is the first publicly-available Arabic Twitter dataset covering COVID-19 pandemic that includes about 2.7M tweets alongside the propagation networks of the most-popular subset of them (i.e., most-retweeted and -liked). The propagation networks include both retweetsand conversational threads (i.e., threads of replies). ArCOV… 

Figures and Tables from this paper

ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection

The dataset covers, in addition to health, claims related to other topical categories that were influenced by COVID-19, namely, social, politics, sports, entertainment, and religious, and experiments with SOTA models of versatile approaches that either exploit content, user profiles features, temporal features and propagation structure of the conversational threads for tweet verification.

COVID-19 and Arabic Twitter: How can Arab World Governments and Public Health Organizations Learn from Social Media?

This study collects approximately 1 million Arabic tweets from the Twitter streaming API related to COVID-19 and applies three different machine learning algorithms, Logistic Regression, Support Vector Classification, and Naïve Bayes to identify the rumour related tweets.

Design and analysis of a large-scale COVID-19 tweets dataset

A large-scale Twitter dataset with more than 310 million COVID-19 specific English language tweets and their sentiment scores is presented, anticipating that they would contribute to a better understanding of spatial and temporal dimensions of the public discourse related to the ongoing pandemic.

TweetsCOV19 - A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic

Metadata about the tweets as well as extracted entities, hashtags, user mentions, sentiments, and URLs are exposed using established RDF/S vocabularies, providing an unprecedented knowledge base for a range of knowledge discovery tasks.

ArCorona: Analyzing Arabic Tweets in the Early Days of Coronavirus (COVID-19) Pandemic

This work presents the largest manually annotated dataset of Arabic tweets related to COVID-19, describes annotation guidelines, analyzes the dataset and builds effective machine learning and transformer based models for classification.

Kawarith: an Arabic Twitter Corpus for Crisis Events

Kawarith is introduced, a multi-dialect Arabic Twitter corpus for crisis events, comprising more than a million Arabic tweets collected during 22 crises that occurred between 2018 and 2020 and involved several types of hazard.

Detection of Hate Speech in COVID-19–Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach (Preprint)

This study identified Saudi Arabia as the Arab country from which the most COVID-19 hate tweets originated during the pandemic, and showed that the largest number of hate tweets appeared during the time period of March 1-30, 2020, representing 51.9% of all hate tweets.

GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information

GeoCoV19, a large-scale Twitter dataset containing more than 524 million multilingual tweets posted over a period of 90 days since February 1, 2020, is presented and it is postulate that this large- scale, multilingual, geolocated social media data can empower the research communities to evaluate how societies are collectively coping with this unprecedented global crisis.

Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset

The analyses show that COVID-19 misinformed communities are denser, and more organized than informed communities, with a possibility of a high volume of the misinformation being part of disinformation campaigns.

Psychometric Analysis and Coupling of Emotions Between State Bulletins and Twitter in India During COVID-19 Infodemic

This study analyzes the psychometric impact and coupling of the CO VID-19 infodemic with the official bulletins related to COVID-19 at the national and state level in India and presents the findings as COVibes, an interactive web application capturing psychometric insights captured upon the CoronaIndiaDataset.

References

SHOWING 1-10 OF 25 REFERENCES

ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection

The dataset covers, in addition to health, claims related to other topical categories that were influenced by COVID-19, namely, social, politics, sports, entertainment, and religious, and experiments with SOTA models of versatile approaches that either exploit content, user profiles features, temporal features and propagation structure of the conversational threads for tweet verification.

NAIST COVID: Multilingual COVID-19 Twitter and Weibo Dataset

This paper releases a multilingual dataset of social media posts related to COVID-19, consisting of microblogs in English and Japanese from Twitter and those in Chinese from Weibo, and provides a quantitative as well as qualitative analysis of these datasets by creating daily word clouds as an example of text-mining analysis.

Large Arabic Twitter Dataset on COVID-19

This work describes the first Arabic tweets dataset on COVID-19 that it has been collecting since January 1st, 2020 and would help researchers and policy makers in studying different societal issues related to the pandemic.

COVID-19: The First Public Coronavirus Twitter Dataset

A multilingual coronavirus (COVID-19) Twitter dataset that has been continuously collecting since January 22, 2020 is described and may contribute towards enabling informed solutions and prescribing targeted policy interventions to fight this global crisis.

Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set

Basic statistics that show that Twitter activity responds and reacts to COVID-19-related events are presented, to enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications.

Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset

Using Natural Language Processing, Text Mining, and Network Analysis to analyze corpus of tweets that relate to the COVID-19 pandemic, common responses to the pandemic are identified and how these responses differ across time is identified.

A First Instagram Dataset on COVID-19

A multilingual coronavirus (COVID-19) Instagram dataset that has been continuously collected since March 30, 2020 is provided to help the community to better understand the dynamics behind this phenomenon in Instagram, as one of the major social media.

Weibo-COV: A Large-Scale COVID-19 Social Media Dataset from Weibo

This paper releases a novel large-scale COVID-19 social media dataset from Weibo called Weibo-COV, covering more than 40 million tweets from 1 December 2019 to 30 April 2020, and hopes this dataset can promote studies of CO VID-19 from multiple perspectives and enable better and faster researches to suppress the spread of this disease.

CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics, Sentiment and Location Information

CML-COVID, a COVID-19 Twitter data set of 19,298,967 million tweets from 5,977,653 unique individuals is presented and some of the attributes of these data are summarized.

A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration

This paper presents a large-scale curated dataset of over 152 million tweets, growing daily, related to COVID-19 chatter generated from January 1st to April 4th at the time of writing, which will allow researchers to conduct a number of research projects relating to the emotional and mental responses to social distancing measures.