• Publications
  • Influence
Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text
TLDR
A gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube is created and inter-annotator agreement is presented, and the results of sentiment analysis trained on this corpus are shown.
A Sentiment Analysis Dataset for Code-Mixed Malayalam-English
TLDR
A new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators is presented, which obtained a Krippendorff’s alpha above 0.8 for the dataset.
Overview of the track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text
TLDR
The participants were given a dataset of YouTube comments and the goal of the shared task submissions was to recognise the sentiment of each comment by classifying them into positive, negative, neutral, mixed-feeling classes or by recognising whether the comment is not in the intended language.
HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion
TLDR
A Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) containing user-generated comments from the social media platform YouTube with 28,451, 20,198 and 10,705 comments in English, Tamil and Malayalam, respectively, manually labelled as containing hope speech or not is constructed.
Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German
TLDR
This paper presents the HASOC track and its two parts, creating test collections for languages with few resources and English for comparison, and presents the tasks, the data and the main results.
Overview of the track on HASOC-Offensive Language Identification-DravidianCodeMix
TLDR
The results and main findings of the HASOC-Offensive Language Identification on code mixed Dravidian languages and the system submission and methods used by participants are presented.
Findings of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion
TLDR
The shared task of hope speech detection for Tamil, English, and Malayalam languages was conducted as a part of the EACL 2021 workshop on Language Technology for Equality, Diversity, and Inclusion.
KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection
TLDR
The KanCMD dataset contains actual comments in code mixed text posted by users on YouTube social media, rather than in monolingual text from the textbook, and has been annotated for two tasks, namely sentiment analysis and offensive language detection for under-resourced Kannada language.
Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada
TLDR
A shared task on offensive language detection in Dravidian languages is created and an overview of the methods and the results of the competing systems are presented.
Findings of the Shared Task on Troll Meme Classification in Tamil
TLDR
A resource (TamilMemes) is provided that could be used to train a system capable of identifying a troll meme in the Tamil language and 10 system submissions from the participants which were evaluated using the weighted average F1-score.
...
1
2
3
4
5
...