• Corpus ID: 49221615

RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian

@inproceedings{Rogers2018RuSentimentAE,
  title={RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian},
  author={Anna Rogers and Alexey Romanov and Anna Rumshisky and Svitlana Volkova and Mikhail Gronas and Alex Gribov},
  booktitle={COLING},
  year={2018}
}
This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. [] Key Method To diversify the dataset, 6,950 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and we also release the best-performing word embeddings trained on 3.2B corpus of Russian social media posts.

Figures and Tables from this paper

FinnSentiment - A Finnish Social Media Corpus for Sentiment Polarity Annotation
TLDR
This publications introduces a 27,000 sentence data set annotated independently with sentiment polarity by three native annotators for Finnish, and analyses their inter-annotator agreement and provides two baselines to validate the usefulness of the data set.
Extracting sentiments towards COVID-19 aspects
TLDR
A specialized Russian dataset and study approaches for aspect-based sentiment analysis of Russian users’ comments about the COVID-19 are introduced and various methods of machine learning are applied, including finetuning of the pre-trained RuBERT model.
LANGUAGE MODEL EMBEDDINGS IMPROVE SENTIMENT ANALYSIS IN RUSSIAN
TLDR
Pre-trained Russian language models which are used to extract embeddings (ELMo) to improve accuracy for classification of short conversational texts are introduced and state-of-the-art results for RuSentiment dataset are established.
Multi-Level Sentiment Analysis of PolEmo 2.0: Extended Corpus of Multi-Domain Consumer Reviews
TLDR
An extended version of PolEmo – a corpus of consumer reviews from 4 domains: medicine, hotels, products and school is presented, which explored recent deep learning approaches for the recognition of sentiment, such as Bi-directional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT).
Some Features of Sentiment Analysis for Russian Language Posts and Comments from Social Networks
TLDR
The task of creating a sentiment analysis software tool for Russian posts and comments from the most popular social networks without any domain restriction is considered and some quality and performance metrics of the suggested neural network system are presented.
Improving Results on Russian Sentiment Datasets
TLDR
Standard neural network architectures and recently appeared BERT architectures are tested on previous Russian sentiment evaluation datasets and it is shown that for all sentiment tasks in this study the conversational variant of Russian BERT performs better.
Transfer Learning for Improving Results on Russian Sentiment Datasets
TLDR
The BERT-NLI model treating sentiment classification problem as a natural language inference task reached the human level of sentiment analysis on one of the datasets and transfer learning approach on Russian sentiment benchmark datasets was tested.
An Ensemble Based Classification Approach for Persian Sentiment Analysis
TLDR
This work introduces an ensemble classifier for Persian sentiment analysis using shallow and deep learning algorithms to improve the performance of the state-of-art approaches.
Sentiment Analysis of Posts and Comments in the Accounts of Russian Politicians on the Social Network
TLDR
The algorithm of sentiment analysis was implemented on the basis of bidirectional recurrent neural network and two text corpora were used: Rubtsova’s corpus and RuSentiment corpus.
L3CubeMahaSent: A Marathi Tweet-based Sentiment Analysis Dataset
TLDR
This paper presents the first major publicly available Marathi Sentiment Analysis Dataset - L3CubeMahaSent, curated using tweets extracted from various Maharashtrian personalities’ Twitter accounts and presents the statistics of the dataset and baseline classification results using CNN, LSTM, ULMFiT, and BERT based models.
...
...

References

SHOWING 1-10 OF 36 REFERENCES
Creating a General Russian Sentiment Lexicon
TLDR
The paper describes the new Russian sentiment lexicon - RuSentiLex, which was utilized by the participants of the SentiRuEval-2016 Twitter reputation monitoring shared task and allowed them to achieve high results.
SemEval-2016 Task 4: Sentiment Analysis in Twitter
TLDR
The fourth year of the SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions, and the task continues to be very popular, attracting a total of 43 teams.
Enhanced Sentiment Learning Using Twitter Hashtags and Smileys
TLDR
A supervised sentiment classification framework which is based on data from Twitter, a popular microblogging service, is proposed, utilizing 50 Twitter tags and 15 smileys as sentiment labels, allowing identification and classification of diverse sentiment types of short texts.
Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
TLDR
This work bootstraps Twitter-specific sentiment lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams, using a small amount of labeled data to guide the process.
Debunking Sentiment Lexicons: A Case of Domain-Specific Sentiment Classification for Croatian
TLDR
The results indicate that, even with as few as 500 labeled instances, a supervised model substantially outperforms a word-counting model and that adding lexicon-based features does not significantly improve supervised sentiment classification.
Efficient Twitter sentiment classification using subjective distant supervision
TLDR
The concept of EFWS (Effective Word Score) of a tweet that is derived from polarity scores of frequently used words, which is an additional heuristic that can be used to speed up the sentiment classification with standard machine learning algorithms is introduced.
SenticNet 5: Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings
TLDR
This work couple sub-symbolic and symbolic AI to automatically discover conceptual primitives from text and link them to commonsense concepts and named entities in a new three-level knowledge representation for sentiment analysis.
How Noisy Social Media Text, How Diffrnt Social Media Sources?
TLDR
This work investigates just how linguistically noisy or otherwise text in social media text is over a range of social media sources, in the form of YouTube comments, Twitter posts, web user forum posts, blog posts and Wikipedia, which is compared to a reference corpus of edited English text.
Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
TLDR
A corpus of consumer reviews from the rateitall and the eopinions websites annotated with opinion-related information is introduced and a two-level annotation scheme is presented for pinpointing the properties and functional components of the evaluations.
Topic-based sentiment analysis for the social web: The role of mood and issue-related words
TLDR
2 new methods, mood setting and lexicon extension, are introduced, to improve the accuracy of topic‐specific lexical sentiment strength detection for the social web.
...
...