• Corpus ID: 237346806

Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling

  title={Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling},
  author={Adeep Hande and Karthik Puranik and Konthala Yasaswini and Ruba Priyadharshini and Sajeetha Thavareesan and Anbukkarasi Sampath and Kogilavani Shanmugavadivel and Durairaj Thenmozhi and Bharathi Raja Chakravarthi},
Social media has effectively become the prime hub of communication and digital marketing. As these platforms enable the free manifestation of thoughts and facts in text, images and video, there is an extensive need to screen them to protect individuals and groups from offensive content targeted at them. Our work intends to classify code-mixed social media comments/posts in the Dravidian languages of Tamil, Kannada, andMalayalam. We intend to improve offensive language identification by… 
IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages
The work for the shared task conducted by Dravidian-CodeMix at FIRE 2021 is described by employing pre-trained models like ULMFiT and multilingual BERT fine-tuned on the code-mixed dataset, transliteration (TRAI), English translations (TRAA) of the TRAI data and the combination of all the three.
Multilingual Text Classification for Dravidian Languages
This work proposed a multilingual text classification framework for the Dravidian languages using the LaBSE pre-trained model as the base model, and proposed a language-specific representation module to enrich semantic information for the model.
SSNCSE NLP@TamilNLP-ACL2022: Transformer based approach for detection of abusive comment for Tamil language
The task was to automate the process of identifying abusive comments and classify them into appropriate categories using pre-trained transformer models such as BERT,m-BERT, and XLNET.
Sentiment Analysis on Multilingual Code-Mixed Kannada Language
A model is presented that aids in sentiment analysis of Dravidian Code-Mixed Kannada comments, which achieved a promising weighted 𝐹 1 -score of 0.66 using the BERT model on the validation dataset, whereas the F1-score on the test dataset was 0.619.
Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text
The quality and quantity of the submission show that there is great interest in Dravidian languages in code-mixed setting and state of the art in this domain still needs more improvement.
Overview of the DravidianCodeMix 2021 Shared Task on Sentiment Detection in Tamil, Malayalam, and Kannada
The quality and quantity of the submission show that there is great interest in Dravidian languages in code-mixed setting and state of the art in this domain still needs improvement.
PSG@HASOC-Dravidian CodeMixFIRE2021: Pretrained Transformers for Offensive Language Identification in Tanglish
This task aims to identify offensive content in code-mixed comments/posts in Dravidian Languages collected from social media and utilizes pooling the last layers of pretrained transformer multilingual BERT for this task which helped it achieve rank nine on the leaderboard.
NAYEL @LT-EDI-ACL2022: Homophobia/Transphobia Detection for Equality, Diversity, and Inclusion using SVM
This paper illustrates the system submitted by the team for the homophobia/transphobia detection in social media comments shared task with a machine learning-based model designed and various classification algorithms implemented.
Hypers at ComMA@ICON: Modelling Aggressiveness, Gender Bias and Communal Bias Identification
This paper presents an approach which utilizes different pretrained models with Attention and mean pooling methods on the shared task ComMA@ICON, where the sentence has to classify how aggressive the sentence is and if it is gender-biased or communal-biased.
TOLD: Tamil Offensive Language Detection in Code-Mixed Social Media Comments using MBERT with Features based Selection
The immense growth in social media forums does increase the spread of offensive language. We detect and examine the challenges faced by automatic approaches for offensive language detection in the


DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text
A multilingual, manually annotated dataset for three under-resourced Dravidian languages generated from social media comments, which contains all types of code-mixing phenomena since it comprises user-generated content from a multilingual country.
HUB@DravidianLangTech-EACL2021: Identify and Classify Offensive Text in Multilingual Code Mixing in Social Media
This is the first task to detect offensive comments posted in social media comments in the Dravidian language and uses the multilingual BERT model to complete this task.
Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada
A shared task on offensive language detection in Dravidian languages is created and an overview of the methods and the results of the competing systems are presented.
SSNCSE_NLP@DravidianLangTech-EACL2021: Offensive Language Identification on Multilingual Code Mixing Text
This paper describes an automatic offensive language identification from Dravidian languages with various machine learning algorithms and explains the submissions made by SSNCSE_NLP in DravidanLangTech-EACL2021 Code-mix tasks for Offensive language detection.
IRNLP_DAIICT@DravidianLangTech-EACL2021:Offensive Language identification in Dravidian Languages using TF-IDF Char N-grams and MuRIL
The participation of the IRNLPDAIICT team from Information Retrieval and Natural Language Processing lab at DA-IICT, India in DravidianLangTech-EACL2021 Offensive Language identification in Duvidian Languages is presented.
NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers
An automated system that can identify offensive text from multilingual code-mixed data is presented and results show that XLM-R outperforms other techniques in Tamil and Malayalam languages while m-BERT achieves the highest score in the Kannada language.
JUNLP@DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Langauges
Offensive language identification has been an active area of research in natural language processing. With the emergence of multiple social media platforms offensive language identification has
Bitions@DravidianLangTech-EACL2021: Ensemble of Multilingual Language Models with Pseudo Labeling for offence Detection in Dravidian Languages
A multilingual ensemble-based model is proposed that can identify offensive content targeted against an individual (or group) in low resource Dravidian language and is able to handle code-mixed data as well as instances where the script used is mixed.
Simon @ DravidianLangTech-EACL2021: Detecting Offensive Content in Kannada Language
  • Qinyu Que
  • Computer Science
  • 2021
The system for the shared task of Offensive Language Identification in Dravidian Languages-EACL 2021, in which the XLM-Roberta model is used for pre-training, and some tweaks to the output of this model are made.
Benchmarking Multi-Task Learning for Sentiment Analysis and Offensive Language Identification in Under-Resourced Dravidian Languages
Analysis of fine-tuned models indicates the preference of multi-task learning over single- task learning resulting in a higher weighted F1-score on all three languages, including Kannada, Malayalam and Tamil.