Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas

@article{Palakodety2020VoiceFT,
  title={Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas},
  author={Shriphani Palakodety and Ashiqur R. KhudaBukhsh and Jaime G. Carbonell},
  journal={ArXiv},
  year={2020},
  volume={abs/1910.03206}
}
The Rohingya refugee crisis is one of the biggest humanitarian crises of modern times with more than 700,000 Rohingyas rendered homeless according to the United Nations High Commissioner for Refugees. While it has received sustained press attention globally, no comprehensive research has been performed on social media pertaining to this large evolving crisis. In this work, we construct a substantial corpus of YouTube video comments (263,482 comments from 113,250 users in 5,153 relevant videos… 
HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion
TLDR
A Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) containing user-generated comments from the social media platform YouTube with 28,451, 20,198 and 10,705 comments in English, Tamil and Malayalam, respectively, manually labelled as containing hope speech or not is constructed.
Hope Speech detection in under-resourced Kannada language
TLDR
An English-Kannada Hope speech dataset, KanHope is proposed and DC-BERT4HOPE, a dual-channel model that uses the Adeep Hande Indian Institute of Information Technology Tiruchirappalli, Tamil Nadu, India is introduced, bettering other models.
NLP-CUET@LT-EDI-EACL2021: Multilingual Code-Mixed Hope Speech Detection using Cross-lingual Representation Learner
TLDR
This work proposes three distinct models to identify hope speech in English, Tamil and Malayalam language to serve this purpose and indicates that XLM-R outdoes all other techniques by gaining a weighted f_1-score.
Social Media Attributions in the Context of Water Crisis
TLDR
This paper presents a novel prediction task of attribution tie detection which identifies the factors held responsible for the Chennai water crisis and presents a neural classifier to extract attribution ties that achieved a reasonable performance.
Annotation Efficient Language Identification from Weak Labels
TLDR
A minimally supervised NLP technique is leveraged to obtain weak language labels from a large-scale Indian social media corpus leading to a robust and annotation-efficient language-identification technique spanning nine Romanized Indian languages.
Low Resource Social Media Text Mining
TLDR
This chapter will help NLP practitioners understand the importance of analyzing the low-resource components of corpora from various societies and how ignoring them can skew results, how to go about addressing these, and a broad set of examples and statistics to reinforce the important of low- resource social media text mining.
The Non-native Speaker Aspect: Indian English in Social Media
TLDR
This paper conducts a comprehensive analysis of web-expressions of Indian English in noisy social media with English generated by a social media user base that is predominantly native speakers and proposes a novel application of language models to perform automatic linguistic quality assessment.
Empathy and Hope: Resource Transfer to Model Inter-country Social Media Dynamics
TLDR
A new task of detecting supportive content is defined and it is demonstrated that existing NLP for social impact tools can be effectively harnessed for such tasks within a quick turnaround time and the first publicly available data set at the intersection of geopolitical relations and a raging pandemic in the context of India and Pakistan is released.
Harnessing Code Switching to Transcend the Linguistic Barrier
TLDR
This paper provides a systematic approach to sample code mixed documents leveraging a polyglot embedding based method that requires minimal supervision and holds promise in substantially reducing web moderation efforts.
Multilingual Detection of Personal Employment Status on Twitter
Detecting disclosures of individuals’ employment status on social media can provide valuable information to match job seekers with suitable vacancies, offer social protection, or measure labor market
...
1
2
...

References

SHOWING 1-10 OF 55 REFERENCES
Hate Me, Hate Me Not: Hate Speech Detection on Facebook
TLDR
This work proposes a variety of hate categories and designs and implements two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM).
Kashmir: A Computational Analysis of the Voice of Peace
TLDR
It is argued the importance of automatic identification of user-generated web content that can diffuse hostility and address this prediction task, dubbed hope-speech detection, in the context of heated discussions in a politically tense situation where two nations are at the brink of a full-fledged war.
Automated Hate Speech Detection and the Problem of Offensive Language
TLDR
This work used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords and labels a sample of these tweets into three categories: those containinghate speech, only offensive language, and those with neither.
Sentiment Analysis of Comments on Rohingya Movement with Support Vector Machine
TLDR
To analyse the comments based on all Rohingya related posts, a classifier was created and modified based on the Support Vector Machine algorithm, specifically used a support vector machine with linear kernel.
Deep Learning for Hate Speech Detection in Tweets
TLDR
These experiments on a benchmark dataset of 16K annotated tweets show that such deep learning methods outperform state-of-the-art char/word n-gram methods by ~18 F1 points.
The Effect of Extremist Violence on Hateful Speech Online
TLDR
The focus of the research is to model the effect of the attacks on the volume and type of hateful speech on two social media platforms, Twitter and Reddit, and observe that extremist violence tends to lead to an increase in online hate speech, particularly on messages directly advocating violence.
Submodularity-inspired Data Selection for Goal-oriented Chatbot Training based on Sentence Embeddings
TLDR
A submodularity-inspired data ranking function, the ratio-penalty marginal gain, for selecting data points to label based only on the information extracted from the textual embedding space is proposed and it is shown that the distances in theembedding space are a viable source of information that can be used for data selection.
Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings
TLDR
An NLP framework is provided to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force, and proposes clustering of tweet embeddings as a means to identify salient topics for analysis across events.
Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying
TLDR
An “air traffic control”-like dashboard is proposed, which alerts moderators to large-scale outbreaks that appear to be escalating or spreading and helps them prioritize the current deluge of user complaints.
Online social media in the Syria conflict: Encompassing the extremes and the in-betweens
TLDR
The findings indicate that social media activity in Syria is considerably more convoluted than reported in many other studies of online political activism, suggesting that alternative analytical approaches can play an important role in this type of scenario.
...
1
2
3
4
5
...