Sentiment analysis in tweets: an assessment study from classical to modern word representation models

  title={Sentiment analysis in tweets: an assessment study from classical to modern word representation models},
  author={S'ergio Barreto and Ricardo Moura and Jonnathan Carvalho and A. Paes and Alexandre Plastino},
  journal={Data Mining and Knowledge Discovery},
  pages={318 - 380}
With the exponential growth of social media networks, such as Twitter, plenty of user-generated data emerge daily. The short texts published on Twitter – the tweets – have earned significant attention as a rich source of information to guide many decision-making processes. However, their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks, including sentiment analysis. Sentiment classification is tackled… 

Semantic relational machine learning model for sentiment analysis using cascade feature selection and heterogeneous classifier ensemble

A Semantic Relational Machine Learning (SRML) model that automatically classifies the sentiment of tweets by using classifier ensemble and optimal features is proposed that achieved higher accuracy and outperforms more accomplished models employing quantum-inspired sentiment representation (QSR), transformer-based methods like BERT, BERTweet, RoBERTa and ensemble techniques.

Enriching datasets for sentiment analysis in tweets with instance selection

Different strategies for selecting instances from a set of labeled source datasets in order to improve the performance of classifiers trained only with the target dataset are proposed, including similarity metrics and variations in the number of selected instances.



BERTweet: A pre-trained language model for English Tweets

BERTweet is presented, the first public large-scale pre-trained language model for English Tweets, trained using the RoBERTa pre-training procedure, producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Advances in Pre-Training Distributed Word Representations

This paper shows how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together to outperform the current state of the art by a large margin on a number of tasks.

Distributed Representations of Words and Phrases and their Compositionality

This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

From Frequency to Meaning: Vector Space Models of Semantics

The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training

Emo2Vec is proposed which encodes emotional semantics into vectors and outperforms existing affect-related representations, such as Sentiment-Specific Word Embedding and DeepMoji embeddings with much smaller training corpora.

Learning Emotion-enriched Word Representations

This work proposes a novel method of obtaining emotion-enriched word representations, which projects emotionally similar words into neighboring spaces and emotionally dissimilar ones far apart, and demonstrates that the proposed representations outperform several competitive general-purpose and affective word representations.