• Corpus ID: 227230585

HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion

@inproceedings{Chakravarthi2020HopeEDIAM,
  title={HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion},
  author={Bharathi Raja Chakravarthi},
  booktitle={PEOPLES},
  year={2020}
}
Over the past few years, systems have been developed to control online content and eliminate abusive, offensive or hate speech content. However, people in power sometimes misuse this form of censorship to obstruct the democratic right of freedom of speech. Therefore, it is imperative that research should take a positive reinforcement approach towards online content that is encouraging, positive and supportive contents. Until now, most studies have focused on solving this problem of negativity… 

Tables from this paper

Overview of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion
TLDR
An overview of the findings and results from the shared task on hope speech detection for Tamil, Malayalam, Kannada, English and Spanish languages conducted in the second workshop on Language Technology for Equality, Diversity and Inclusion organised as a part of ACL 2022 is reported.
The Best of both Worlds: Dual Channel Language modeling for Hope Speech Detection in low-resourced Kannada
TLDR
DC-LM, a dual-channel language model that sees hope speech by using the English translations of the code-mixed dataset for additional training is presented, jointly modelled on both English and code- mixed Kannada to enable effective cross-lingual transfer between the languages.
IDIAP Submission@LT-EDI-ACL2022 : Hope Speech Detection for Equality, Diversity and Inclusion
TLDR
This paper classifies given a social media post, hope speech or not hope speech, using ensembled voting of BERT, ERNIE 2.0 and RoBERTa for English language and non-English languages, and identifies hope in these social media posts.
SSN_ARMM@ LT-EDI -ACL2022: Hope Speech Detection for Equality, Diversity, and Inclusion Using ALBERT model
TLDR
This paper developed a system using the pre-trained language model BERT to complete the task to classify the sentences into ‘Hope speech’ and ‘Non-hope speech”.
SOA_NLP@LT-EDI-ACL2022: An Ensemble Model for Hope Speech Detection from YouTube Comments
TLDR
To recognize hope speech from YouTube comments, the current study offers an ensemble approach that combines a support vector machine, logistic regression, and random forest classifiers that performed significantly well among English, Spanish, Tamil, Malayalam, and Kannada YouTube comments.
Findings of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion
TLDR
The shared task of hope speech detection for Tamil, English, and Malayalam languages was conducted as a part of the EACL 2021 workshop on Language Technology for Equality, Diversity, and Inclusion.
MUCIC@LT-EDI-ACL2022: Hope Speech Detection using Data Re-Sampling and 1D Conv-LSTM
TLDR
The proposed methodology uses the re-sampling technique to deal with imbalanced data in the corpus and obtained 1st rank for English language with a macro-averaged F1-score of 0.550.
Hope Speech Detection for Dravidian Languages Using Cross-Lingual Embeddings with Stacked Encoder Architecture
TLDR
This paper proposes a multilingual model, with main emphasis on Dravidian languages, to automatically detect hope speech, and employs a stacked encoder architecture which makes use of language agnostic cross-lingual word embeddings as the dataset consists of code-mixed YouTube comments.
CFILT IIT Bombay@LT-EDI-EACL2021: Hope Speech Detection for Equality, Diversity, and Inclusion using Multilingual Representation fromTransformers
TLDR
A system that employs multilingual transformer models to obtain the representation of text and classifies it into one of the three classes of hope speech, not in intended language, which was ranked 2nd for English, 2 second for Malayalam, and 7th for the Tamil language in the final leader board published by organizers.
NLP-CUET@LT-EDI-EACL2021: Multilingual Code-Mixed Hope Speech Detection using Cross-lingual Representation Learner
TLDR
This work proposes three distinct models to identify hope speech in English, Tamil and Malayalam language to serve this purpose and indicates that XLM-R outdoes all other techniques by gaining a weighted f_1-score.
...
...

References

SHOWING 1-10 OF 58 REFERENCES
CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech
TLDR
This paper describes the creation of the first large-scale, multilingual, expert-based dataset of hate-speech/counter-narrative pairs, built with the effort of more than 100 operators from three different NGOs that applied their training and expertise to the task.
Offensive Language and Hate Speech Detection for Danish
TLDR
This work constructs a Danish dataset DKhate containing user-generated comments from various social media platforms, and to the authors' knowledge, the first of its kind, annotated for various types and target of offensive language, and develops four automatic classification systems designed to work for both the English and the Danish language.
Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas
TLDR
This work constructs a classifier that can detect comments defending the Rohingyas among larger numbers of disparaging and neutral ones and advocates that beyond the burgeoning field of hate speech detection, automatic detection of help speech can lend voice to the voiceless people and make the internet safer for marginalized communities.
Hope Speech Detection: A Computational Analysis of the Voice of Peace
TLDR
It is argued the importance of automatic identification of user-generated web content that can diffuse hostility and address this prediction task, dubbed hope-speech detection, in the context of heated discussions in a politically tense situation where two nations are at the brink of a full-fledged war.
Generating Counter Narratives against Online Hate Speech: Data and Strategies
TLDR
A study on how to collect responses to hate effectively is presented, employing large scale unsupervised language models such as GPT-2 for the generation of silver data, and the best annotation strategies/neural architectures that can be used for data filtering before expert validation/post-editing.
Comparative Studies of Detecting Abusive Language on Twitter
TLDR
This paper conducts the first comparative study of various learning models on Hate and Abusive Speech on Twitter, and shows that bidirectional GRU networks trained on word-level features, with Latent Topic Clustering modules, is the most accurate model.
Thou shalt not hate: Countering Online Hate Speech
TLDR
This paper creates and releases the first ever dataset for counterspeech using comments from YouTube, and performs a rigorous measurement study characterizing the linguistic structure of counterspeeches for the first time.
Predicting the Type and Target of Offensive Posts in Social Media
TLDR
The Offensive Language Identification Dataset (OLID), a new dataset with tweets annotated for offensive content using a fine-grained three-layer annotation scheme, is complied and made publicly available.
Comparison of Pretrained Embeddings to Identify Hate Speech in Indian Code-Mixed Text
TLDR
This paper compares pretrained models and creates an ensemble model for code-mixed data of hate speech classification task on Hindi-English data to show that XLNet performs better for hate speech detection in code-Mixed text.
Assessing gender bias in machine translation: a case study with Google Translate
TLDR
Experimental evidence is provided that even if one does not expect in principle a 50:50 pronominal gender distribution, Google Translate yields male defaults much more frequently than what would be expected from demographic data alone.
...
...