• Corpus ID: 236965782

Hope Speech detection in under-resourced Kannada language

  title={Hope Speech detection in under-resourced Kannada language},
  author={Adeep Hande and Ruba Priyadharshini and Anbukkarasi Sampath and Kingston Pal Thamburaj and Prabakaran Chandran and Bharathi Raja Chakravarthi},
Numerous methods have been developed to monitor the spread of negativity in modern years by eliminating vulgar, offensive, and fierce comments from social media platforms. However, there are relatively lesser amounts of study that converges on embracing positivity, reinforcing supportive and reassuring content in online forums. Consequently, we propose creating an English-Kannada Hope speech dataset, KanHope and comparing several experiments to benchmark the dataset. The dataset consists of 6… 
The Best of both Worlds: Dual Channel Language modeling for Hope Speech Detection in low-resourced Kannada
DC-LM, a dual-channel language model that sees hope speech by using the English translations of the code-mixed dataset for additional training is presented, jointly modelled on both English and code- mixed Kannada to enable effective cross-lingual transfer between the languages.
IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages
The work for the shared task conducted by Dravidian-CodeMix at FIRE 2021 is described by employing pre-trained models like ULMFiT and multilingual BERT fine-tuned on the code-mixed dataset, transliteration (TRAI), English translations (TRAA) of the TRAI data and the combination of all the three.
Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling
This work intends to classify code-mixed social media comments/posts in the Dravidian languages of Tamil, Kannada, andMalayalam to improve offensive language identification by generating pseudo-labels on the dataset.
UMUTeam@TamilNLP-ACL2022: Abusive Detection in Tamil using Linguistic Features and Transformers
The TamilNLP shared task has proposed a multi-classification challenge for Tamil written in Tamil script and code-mixed to detect abusive comments and hope-speech.
BpHigh@TamilNLP-ACL2022: Effects of Data Augmentation on Indic-Transformer based classifier for Abusive Comments Detection in Tamil
An exploration of different techniques used to tackle and increase the accuracy of the models using data augmentation in NLP for abusive Comment detection in Tamil@DravidianLangTech-ACL 2022.
DLRG@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil using Multilingual Transformer Models
A system developed for the Shared Task on Abusive Comment Detection in Tamil DravidianLangTech@ACL 2022 to detect the abusive category of each comment.
Overview of the DravidianCodeMix 2021 Shared Task on Sentiment Detection in Tamil, Malayalam, and Kannada
The quality and quantity of the submission show that there is great interest in Dravidian languages in code-mixed setting and state of the art in this domain still needs improvement.
COMBATANT@TamilNLP-ACL2022: Fine-grained Categorization of Abusive Comments using Logistic Regression
This work presents the system developed as part of the shared task to detect the abusive language in Tamil, and shows that Logistic regression and CNN+BiLSTM models outperformed the others.
IIITSurat@LT-EDI-ACL2022: Hope Speech Detection using Machine Learning
This study uses and compares the experimental outcomes of the different oversampling techniques and proposes a robust model that helps in predicting the target class with higher accuracy.
Overview of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion
An overview of the findings and results from the shared task on hope speech detection for Tamil, Malayalam, Kannada, English and Spanish languages conducted in the second workshop on Language Technology for Equality, Diversity and Inclusion organised as a part of ACL 2022 is reported.


Detection of Hate Speech Text in Hindi-English Code-mixed Data
HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion
A Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) containing user-generated comments from the social media platform YouTube with 28,451, 20,198 and 10,705 comments in English, Tamil and Malayalam, respectively, manually labelled as containing hope speech or not is constructed.
MC-BERT4HATE: Hate Speech Detection using Multi-channel BERT for Different Languages and Translations
  • Hajung Sohn, Hyunju Lee
  • Computer Science
    2019 International Conference on Data Mining Workshops (ICDMW)
  • 2019
A multi-channel model with three versions of BERT (MC-BERT), the English, Chinese, and multilingual BERTs for hate speech detection and the usage of translations as additional input by translating training and test sentences to the corresponding languages required for different BERT models is proposed.
NLP-CUET@LT-EDI-EACL2021: Multilingual Code-Mixed Hope Speech Detection using Cross-lingual Representation Learner
This work proposes three distinct models to identify hope speech in English, Tamil and Malayalam language to serve this purpose and indicates that XLM-R outdoes all other techniques by gaining a weighted f_1-score.
KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection
The KanCMD dataset contains actual comments in code mixed text posted by users on YouTube social media, rather than in monolingual text from the textbook, and has been annotated for two tasks, namely sentiment analysis and offensive language detection for under-resourced Kannada language.
Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas
This work constructs a classifier that can detect comments defending the Rohingyas among larger numbers of disparaging and neutral ones and advocates that beyond the burgeoning field of hate speech detection, automatic detection of help speech can lend voice to the voiceless people and make the internet safer for marginalized communities.
SSNCSE_NLP@DravidianLangTech-EACL2021: Offensive Language Identification on Multilingual Code Mixing Text
This paper describes an automatic offensive language identification from Dravidian languages with various machine learning algorithms and explains the submissions made by SSNCSE_NLP in DravidanLangTech-EACL2021 Code-mix tasks for Offensive language detection.
Detecting stance in kannada social media code-mixed text using sentence embedding
For the first time, stance detection system implemented for Indian language Kannada code-mix text comments extracted from popular social media site Facebook is presented, emphasized on trending local and national current issues in Karnataka geographic region.
A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection
This work presents a Hindi-English code-mixed dataset consisting of tweets posted online on Twitter and proposes a supervised classification system for detecting hate speech in the text using various character level, word level, and lexicon based features.