Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas
@article{Palakodety2020VoiceFT, title={Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas}, author={Shriphani Palakodety and Ashiqur R. KhudaBukhsh and Jaime G. Carbonell}, journal={ArXiv}, year={2020}, volume={abs/1910.03206} }
The Rohingya refugee crisis is one of the biggest humanitarian crises of modern times with more than 700,000 Rohingyas rendered homeless according to the United Nations High Commissioner for Refugees. While it has received sustained press attention globally, no comprehensive research has been performed on social media pertaining to this large evolving crisis. In this work, we construct a substantial corpus of YouTube video comments (263,482 comments from 113,250 users in 5,153 relevant videos…
Figures and Tables from this paper
13 Citations
HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion
- Computer SciencePEOPLES
- 2020
A Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) containing user-generated comments from the social media platform YouTube with 28,451, 20,198 and 10,705 comments in English, Tamil and Malayalam, respectively, manually labelled as containing hope speech or not is constructed.
Hope Speech detection in under-resourced Kannada language
- Computer ScienceArXiv
- 2021
An English-Kannada Hope speech dataset, KanHope is proposed and DC-BERT4HOPE, a dual-channel model that uses the Adeep Hande Indian Institute of Information Technology Tiruchirappalli, Tamil Nadu, India is introduced, bettering other models.
NLP-CUET@LT-EDI-EACL2021: Multilingual Code-Mixed Hope Speech Detection using Cross-lingual Representation Learner
- Computer ScienceLTEDI
- 2021
This work proposes three distinct models to identify hope speech in English, Tamil and Malayalam language to serve this purpose and indicates that XLM-R outdoes all other techniques by gaining a weighted f_1-score.
Social Media Attributions in the Context of Water Crisis
- Computer ScienceEMNLP
- 2020
This paper presents a novel prediction task of attribution tie detection which identifies the factors held responsible for the Chennai water crisis and presents a neural classifier to extract attribution ties that achieved a reasonable performance.
Annotation Efficient Language Identification from Weak Labels
- Computer ScienceWNUT
- 2020
A minimally supervised NLP technique is leveraged to obtain weak language labels from a large-scale Indian social media corpus leading to a robust and annotation-efficient language-identification technique spanning nine Romanized Indian languages.
Low Resource Social Media Text Mining
- Computer ScienceSpringer Briefs in Computer Science
- 2021
This chapter will help NLP practitioners understand the importance of analyzing the low-resource components of corpora from various societies and how ignoring them can skew results, how to go about addressing these, and a broad set of examples and statistics to reinforce the important of low- resource social media text mining.
The Non-native Speaker Aspect: Indian English in Social Media
- LinguisticsWNUT
- 2020
This paper conducts a comprehensive analysis of web-expressions of Indian English in noisy social media with English generated by a social media user base that is predominantly native speakers and proposes a novel application of language models to perform automatic linguistic quality assessment.
Empathy and Hope: Resource Transfer to Model Inter-country Social Media Dynamics
- Political ScienceNLP4POSIMPACT
- 2021
A new task of detecting supportive content is defined and it is demonstrated that existing NLP for social impact tools can be effectively harnessed for such tasks within a quick turnaround time and the first publicly available data set at the intersection of geopolitical relations and a raging pandemic in the context of India and Pakistan is released.
Harnessing Code Switching to Transcend the Linguistic Barrier
- Computer ScienceArXiv
- 2020
This paper provides a systematic approach to sample code mixed documents leveraging a polyglot embedding based method that requires minimal supervision and holds promise in substantially reducing web moderation efforts.
Multilingual Detection of Personal Employment Status on Twitter
- BusinessACL
- 2022
Detecting disclosures of individuals’ employment status on social media can provide valuable information to match job seekers with suitable vacancies, offer social protection, or measure labor market…
References
SHOWING 1-10 OF 55 REFERENCES
Hate Me, Hate Me Not: Hate Speech Detection on Facebook
- Computer ScienceITASEC
- 2017
This work proposes a variety of hate categories and designs and implements two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM).
Kashmir: A Computational Analysis of the Voice of Peace
- Computer ScienceArXiv
- 2019
It is argued the importance of automatic identification of user-generated web content that can diffuse hostility and address this prediction task, dubbed hope-speech detection, in the context of heated discussions in a politically tense situation where two nations are at the brink of a full-fledged war.
Automated Hate Speech Detection and the Problem of Offensive Language
- Computer ScienceICWSM
- 2017
This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.
Sentiment Analysis of Comments on Rohingya Movement with Support Vector Machine
- Computer ScienceArXiv
- 2018
To analyse the comments based on all Rohingya related posts, a classifier was created and modified based on the Support Vector Machine algorithm, specifically used a support vector machine with linear kernel.
Deep Learning for Hate Speech Detection in Tweets
- Computer ScienceWWW
- 2017
These experiments on a benchmark dataset of 16K annotated tweets show that such deep learning methods outperform state-of-the-art char/word n-gram methods by ~18 F1 points.
The Effect of Extremist Violence on Hateful Speech Online
- Political ScienceICWSM
- 2018
The focus of the research is to model the effect of the attacks on the volume and type of hateful speech on two social media platforms, Twitter and Reddit, and observe that extremist violence tends to lead to an increase in online hate speech, particularly on messages directly advocating violence.
Submodularity-inspired Data Selection for Goal-oriented Chatbot Training based on Sentence Embeddings
- Computer ScienceIJCAI
- 2018
A submodularity-inspired data ranking function, the ratio-penalty marginal gain, for selecting data points to label based only on the information extracted from the textual embedding space is proposed and it is shown that the distances in theembedding space are a viable source of information that can be used for data selection.
Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings
- Computer ScienceNAACL
- 2019
An NLP framework is provided to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force, and proposes clustering of tweet embeddings as a means to identify salient topics for analysis across events.
Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying
- Computer ScienceTIIS
- 2012
An “air traffic control”-like dashboard is proposed, which alerts moderators to large-scale outbreaks that appear to be escalating or spreading and helps them prioritize the current deluge of user complaints.
Online social media in the Syria conflict: Encompassing the extremes and the in-betweens
- Sociology2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014)
- 2014
The findings indicate that social media activity in Syria is considerably more convoluted than reported in many other studies of online political activism, suggesting that alternative analytical approaches can play an important role in this type of scenario.