• Publications
  • Influence
Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text
TLDR
A gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube is created and inter-annotator agreement is presented, and the results of sentiment analysis trained on this corpus are shown.
Overview of the track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text
TLDR
The participants were given a dataset of YouTube comments and the goal of the shared task submissions was to recognise the sentiment of each comment by classifying them into positive, negative, neutral, mixed-feeling classes or by recognising whether the comment is not in the intended language.
KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection
TLDR
The KanCMD dataset contains actual comments in code mixed text posted by users on YouTube social media, rather than in monolingual text from the textbook, and has been annotated for two tasks, namely sentiment analysis and offensive language detection for under-resourced Kannada language.
Findings of the Shared Task on Offensive Language Identification in Tamil, Malayalam, and Kannada
TLDR
A shared task on offensive language detection in Dravidian languages is created and an overview of the methods and the results of the competing systems are presented.
Hope Speech detection in under-resourced Kannada language
TLDR
An English-Kannada Hope speech dataset, KanHope is proposed and DC-BERT4HOPE, a dual-channel model that uses the Adeep Hande Indian Institute of Information Technology Tiruchirappalli, Tamil Nadu, India is introduced, bettering other models.
Overview of the DravidianCodeMix 2021 Shared Task on Sentiment Detection in Tamil, Malayalam, and Kannada
TLDR
The quality and quantity of the submission show that there is great interest in Dravidian languages in code-mixed setting and state of the art in this domain still needs improvement.
Multilingual Multimodal Machine Translation for Dravidian Languages utilizing Phonetic Transcription
This work is supported by a research grant from Science Foundation Ireland, co-funded by the European Regional Development Fund, for the Insight Centre under Grant Number SFI/12/RC/2289 and the
UVCE-IIITT@DravidianLangTech-EACL2021: Tamil Troll Meme Classification: You need to Pay more Attention
TLDR
This work presents an ingenious model consisting of transformer-transformer architecture that tries to attain state of the art by using attention as its main component of troll and non-troll Tamil memes.
Attentive fine-tuning of Transformers for Translation of low-resourced languages @LoResMT 2021
TLDR
This paper reports the Machine Translation systems submitted by the IIITT team for the English→Marathi and English⇔Irish language pairs LoResMT 2021 shared task, and fine-tune IndicTrans, a pretrained multilingual NMT model for English→ Marathi, using external parallel corpus as input for additional training.
Named Entity Recognition for Code-Mixed Indian Corpus using Meta Embedding
TLDR
This paper utilizes the pre-trained embedding, sub-word embedding and closely related languages of languages in the code mixed corpus to create a meta-embedding that can predict the Named Entity from Code-Mixed Indian text written using native script and Roman script in social media.
...
...