• Corpus ID: 244345885

Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text

  title={Findings of the Sentiment Analysis of Dravidian Languages in Code-Mixed Text},
  author={Bharathi Raja Chakravarthi and Ruba Priyadharshini and Sajeetha Thavareesan and Dhivya Chinnappa and D. Thenmozhi and Elizabeth Sherly and John P. McCrae and Adeep Hande and Rahul Ponnusamy and Shubhanker Banerjee and Charangan Vasantharajan},
We present the results of the Dravidian-CodeMix shared task1 held at FIRE 2021, a track on sentiment analysis for Dravidian Languages in Code-Mixed Text. We describe the task, its organization, and the submitted systems. This shared task is the continuation of last year’s Dravidian-CodeMix shared task2 held at FIRE 2020. This year’s tasks included code-mixing at the intra-token and inter-token levels. Additionally, apart from Tamil and Malayalam, Kannada was also introduced. We received 22… 
1 Citations

Tables from this paper

SSNCSE_NLP@TamilNLP-ACL2022: Transformer based approach for Emotion analysis in Tamil language
This thesis has adopted the transformer model approach to identify the emotions present in the text sequence and evaluated the datasets using the pretrained transformer models using the LT-EDI organizers for two tasks, in the Tamil language.


A Sentiment Analysis Dataset for Code-Mixed Malayalam-English
A new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators is presented, which obtained a Krippendorff’s alpha above 0.8 for the dataset.
SSN_NLP_MLRG@Dravidian-CodeMix-FIRE2020: Sentiment Code-Mixed Text Classification in Tamil and Malayalam using ULMFiT
The AWD-LSTM model with ULMFiT framework with FastAi library dealing with the detection and classification of sentiment from the Dravidian–CodeMix-FIRE2020 Dataset is employed to identify the sentiment message polarity from social media comments.
Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling
This work intends to classify code-mixed social media comments/posts in the Dravidian languages of Tamil, Kannada, andMalayalam to improve offensive language identification by generating pseudo-labels on the dataset.
Benchmarking Multi-Task Learning for Sentiment Analysis and Offensive Language Identification in Under-Resourced Dravidian Languages
Analysis of fine-tuned models indicates the preference of multi-task learning over single- task learning resulting in a higher weighted F1-score on all three languages, including Kannada, Malayalam and Tamil.
Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text
A gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube is created and inter-annotator agreement is presented, and the results of sentiment analysis trained on this corpus are shown.
CIA_NITT@Dravidian-CodeMix-FIRE2020: Malayalam-English Code Mixed Sentiment Analysis Using Sentence BERT And Sentiment Features
DravidianCodeMix FIRE 2020 is to classify comments into positive, negative, unknown_state, mixed_feelings and not-malayalam categories based on messagelevel polarity using Malayalam English dataset using sentence-level BERT.
KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection
The KanCMD dataset contains actual comments in code mixed text posted by users on YouTube social media, rather than in monolingual text from the textbook, and has been annotated for two tasks, namely sentiment analysis and offensive language detection for under-resourced Kannada language.
KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for Detection of Hate Speech and Offensive Codemix Social Media text
This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC), at Forum for
Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments
A new hierarchical taxonomy for online homophobia and transphobia is provided, as well as an expert-labelled dataset that will allow homophobic/transphobic content to be automatically identified, which is the first such dataset created.
Hope Speech detection in under-resourced Kannada language
An English-Kannada Hope speech dataset, KanHope is proposed and DC-BERT4HOPE, a dual-channel model that uses the Adeep Hande Indian Institute of Information Technology Tiruchirappalli, Tamil Nadu, India is introduced, bettering other models.