Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text

  title={Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text},
  author={Bharathi Raja Chakravarthi and Vigneshwaran Muralidaran and Ruba Priyadharshini and John P. McCrae},
Understanding the sentiment of a comment from a video or an image is an essential task in many applications. Sentiment analysis of a text can be useful for various decision-making processes. One such application is to analyse the popular sentiments of videos on social media based on viewer comments. However, comments from social media do not follow strict rules of grammar, and they contain mixing of more than one language, often written in non-native scripts. Non-availability of annotated code… 

Figures and Tables from this paper

A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

A new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators is presented, which obtained a Krippendorff’s alpha above 0.8 for the dataset.

Sentiment Analysis of Persian-English Code-mixed Texts

This study collects, labels and creates a dataset of Persian-English code-mixed tweets, and introduces a model which uses BERT pretrained embeddings as well as translation models to automatically learn the polarity scores of these Tweets, which outperforms the baseline models that use Naïve Bayes and Random Forest methods.

Sentiment Analysis Model For Code-Mixed Tamil Language

A model that codes the input data by looking at the frequency of terms and is then categorized using a multiclass classification algorithm is described, which produces better results in classifying the data based on the terms available in the input sequence.

Sentiment Analysis on Multilingual Code-Mixed Kannada Language

A model is presented that aids in sentiment analysis of Dravidian Code-Mixed Kannada comments, which achieved a promising weighted 𝐹 1 -score of 0.66 using the BERT model on the validation dataset, whereas the F1-score on the test dataset was 0.619.

BiLSTM-Sentiments Analysis in Code Mixed Dravidian Languages

The proposed Bidirectional Long Short Term Memory (BiLSTM) model is submitted to “Sentiment Analysis of Dravidian Languages in Code-Mixed Text” - a shared task at Forum for Information Retrieval Evalu-ation (FIRE) 2021 to analyze the sentiments in Kannada-English, Malayalam-English (Ma-En), and Tamil- English (Ta-En) code-mixed texts.

LA-SACo: A Study of Learning Approaches for Sentiments Analysis inCode-Mixing Texts

Three proposed models namely, SACo-Ensemble, Saco-Keras, and SACi-ULMFiT using Machine Learning (ML), Deep Learning (DL), and Transfer Learning (TL) approaches respectively for the task of Sentiments Analysis in Tamil-English and Malayalam-English code-mixed texts are described.

Sentiment Analysis on Tamil Code-Mixed Text using Bi-LSTM

To identify the user sentiment from the code-mixed language, this research suggested a deep learning-based framework that automat-ically extracts the features from input sentences and predicts their sentiment with a 0.552 F1-score for the best case.

Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags

This work takes up a similar challenge of developing a sentiment analysis model that can work with English-Tamil code-mixed data by using bi-directional LSTMs along with language tagging using a Neural Network based model.

Sentiment Analysis using Cross Lingual Word Embedding Model

This paper shows how a multi label classification of the given text could be implemented by considering the sentiment associated with the text by the models that are applied for monolingual sentiment analysis.

IndicBERT Based Approach for Sentiment Analysis on Code-Mixed Tamil Tweets

An experimental study to handle the challenges in Code-Mixed Tamil tweets and a transformer based Indic-BERT approach is conducted and it is shown that an 𝐹 1 score of 61.73% can be achieved, which is a significant improvement over the other traditional methods.



A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

A new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators is presented, which obtained a Krippendorff’s alpha above 0.8 for the dataset.

Sentiment analysis of mixed language employing Hindi-English code switching

This paper proposes a strategy to determine the sentiment of sentences written in a mixed language comprising of Hindi and English lexicons and demonstrates the effectiveness of the proposed approach via case studies on social media data sets.

Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text

This paper introduces a Hindi-English (Hi-En) code-mixed dataset for sentiment analysis and performs empirical analysis comparing the suitability and performance of various state-of-the-art SA methods in social media.

Towards Building a SentiWordNet for Tamil

A generic approach followed for the development of a Tamil SentiWordNet using currently available resources in English is discussed, which would serve as a baseline for future improvements in the task of sentiment analysis specific to Tamil.

Cross-Lingual Sentiment Analysis for Indian Languages using Linked WordNets

The crux of the idea is to use the linked WordNets of two languages to bridge the language gap by using WordNet senses as features for supervised sentiment classification in Hindi and Marathi.

EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis

The aim of this paper is to provide a resource to the research community to evaluate the performance of sentiment classification techniques on this complex multilingual environment, proposing an English-Spanish corpus of tweets with code-switching (EN-ES-CS CORPUS).

SentiWordNet for Indian Languages

Multiple computational techniques like, WordNet based, dictionary based, Dictionary based, corpus based or generative approaches for generating SentiWordNet(s) for Indian languages are proposed.

Emotion in Code-switching Texts: Corpus Construction and Analysis

This paper proposes a general framework to construct and analyze the code-switching emotional posts in social media, and proposes a multiple-classifier-based automatic detection approach to detect emotion in the codeswitching corpus for evaluating the effectiveness of both Chinese and English texts.

Overcoming Language Variation in Sentiment Analysis with Social Attention

This paper shows how to exploit social networks to make sentiment analysis more robust to social language variation, and formalizes the key idea of linguistic homophily: the tendency of socially linked individuals to use language in similar ways in a novel attention-based neural network architecture.

A Survey of Current Datasets for Code-Switching Research

A set of quality metrics to evaluate the dataset and categorize them accordingly is proposed and will assist users in various natural language processing tasks such as part-of-speech tagging, named entity recognition, sentiment analysis, conversational systems, and machine translation, etc.