Sentiment Analysis on Monolingual, Multilingual and Code-Switching Twitter Corpora

Abstract

We address the problem of performing polarity classification on Twitter over different languages, focusing on English and Spanish, comparing three techniques: (1) a monolingual model which knows the language in which the opinion is written, (2) a monolingual model that acts based on the decision provided by a language identification tool and (3) a multilingual model trained on a multilingual dataset that does not need any language recognition step. Results show that multilingual models are even able to outperform the monolingual models on some monolingual sets. We introduce the first code-switching corpus with sentiment labels, showing the robust-ness of a multilingual approach.

Extracted Key Phrases

6 Figures and Tables

Showing 1-10 of 27 references

Semeval-2014 task 9: Sentiment analysis in Twitter

  • S Rosenthal, P Nakov, A Ritter, V Stoyanov
  • 2014

Big social data analysis. Big data computing

  • E Cambria, D Rajagopal, D Olsher, D Das
  • 2013
1 Excerpt

SemEval-2013 Task 2: Sentiment Analysis in Twitter

  • P Nakov, S Rosenthal, Z Kozareva, V Stoyanov, A Ritter, T Wilson
  • 2013
1 Excerpt